Python爬虫实践：从网页抓取特定数据案例-蒲公英云

Python爬虫实践：从网页抓取特定数据案例

原创蔚落 2025-02-22 14:24 57阅读 0赞

在Python中，我们可以使用各种库来抓取网页上的特定数据。这里我们将介绍一个常见的例子——抓取网站新闻内容。

首先，我们需要安装两个库：requests 和 BeautifulSoup4。

pip install requests beautifulsoup4

然后，我们编写代码来抓取新闻内容：

import requests
from bs4 import BeautifulSoup
# 指定要爬取的网站URL
url = 'https://example.com/news'  # 替换为你需要爬取的实际链接
response = requests.get(url)  # 发送GET请求
if response.status_code == 200:  # 检查状态码是否为200，表示请求成功
    soup = BeautifulSoup(response.text, 'html.parser')  # 解析HTML内容
    news_list = soup.find_all('div', class_='news-item'))  # 找到所有新闻条目（一般情况下是class）
    for item in news_list:
        title = item.find('h2').text.strip()  # 提取新闻标题
        content = item.find('p').text.strip()  # 提取新闻内容
        print(f'Title: {title}')
        print(f'Content: {content}\n')
else:  # 状态码不是200，表示请求失败或网页不存在
    print('Failed to fetch news. Check the URL and try again.')