Python爬虫：抓取百度新闻首页内容并保存到CSV文件

您是否想学习如何使用Python从网站提取数据？本教程将引导您完成使用Python和Beautiful Soup库抓取百度新闻首页内容并将数据保存到CSV文件的步骤。

以下是实现此目标的Python代码：pythonimport requestsfrom bs4 import BeautifulSoupimport csv

def scrape_baidu_news(): url = 'https://news.baidu.com/' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser')

news_list = soup.find_all('li', class_=['hdline', 'bold-item'])        with open('baidu_news.csv', 'w', newline='', encoding='utf-8') as csvfile:        writer = csv.writer(csvfile)        writer.writerow(['标题', '链接', '日期', '分类'])                for news in news_list:            title = news.find('a').text.strip()            link = news.find('a')['href']            date = news.find('span', class_='date').text.strip()            category = news.find('span', class_='ctype').text.strip()                        writer.writerow([title, link, date, category])

print('爬取完成，结果已保存到 baidu_news.csv 文件中。')

scrape_baidu_news()

代码解释：

导入必要的库: * requests: 用于发送HTTP请求获取网页内容。 * BeautifulSoup: 用于解析HTML结构，提取所需数据。 * csv: 用于处理CSV文件，将数据写入文件。
定义 scrape_baidu_news() 函数: * 设置目标网址为百度新闻首页。 * 使用 requests.get() 获取网页内容。 * 使用 BeautifulSoup 解析HTML内容。 * 使用 find_all() 方法找到所有包含新闻信息的列表项。 * 打开一个CSV文件，并写入标题行。 * 遍历每个新闻列表项，提取标题、链接、日期和分类信息，并写入CSV文件。 * 打印完成消息。
运行代码: * 调用 scrape_baidu_news() 函数执行抓取操作。

运行此代码后，您将在当前目录下找到名为 baidu_news.csv 的文件，其中包含从百度新闻首页抓取的所有新闻数据。

注意： 在运行代码之前，请确保您已安装所有必要的库。您可以使用 pip install requests beautifulsoup4 命令安装它们。