Python爬虫初学者：解析HTML问题实例及解决方案-蒲公英云

Python爬虫初学者：解析HTML问题实例及解决方案

在学习Python爬虫时，常常会遇到解析HTML的问题。这里我将举几个实例并提供解决方案。

获取网页标题：
有时我们想要获取网页的标题，而不是像内容那样直接提取。

from bs4 import BeautifulSoup
# 请求网页
response = requests.get('https://example.com')
# 解析HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 获取标题
title = soup.find('title').text
print(title)  # 输出网页标题

提取HTML中的特定元素：
有时候我们需要从HTML中提取特定的元素，如链接、表格等。

import requests
from bs4 import BeautifulSoup
url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 提取所有链接
links = soup.find_all('a')
for link in links:
    href = link.get('href')
    print(href)  # 输出每个链接
# 提取表格数据
tables = soup.find_all('table')
for table in tables:
    rows = table.find_all('tr')
    for row in rows:
        cols = row.find_all('td')
        if cols:
            col_data = [cell.text.strip() for cell in cols]  # 去除空格
            print(col_data)  # 输出每行数据