Python 爬取 Amazon 产品评论：详细指南

以下是用 Python 实现爬取 Amazon 产品评论的基本步骤：

安装必要的库

需要安装 requests 和 beautifulsoup4 库，可以使用 pip 命令安装：

pip install requests
pip install beautifulsoup4

获取产品页面的 HTML 代码

使用 requests 库向 Amazon 发送 HTTP 请求，获取产品页面的 HTML 代码：

import requests

url = 'https://www.amazon.com/dp/B01J94SWWU'
response = requests.get(url)
html = response.text

其中，url 是要爬取的 Amazon 产品页面的 URL。

解析 HTML 代码

使用 beautifulsoup4 库解析 HTML 代码，并获取评论部分的 HTML 代码：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
reviews = soup.find_all('div', {'class': 'review'})

其中，find_all() 方法用于查找所有 class 为 'review' 的 div 标签。

提取评论数据

遍历评论部分的 HTML 代码，提取评论数据，包括评论内容、评分、评论日期等：

for review in reviews:
    # 提取评论内容
    content = review.find('span', {'data-hook': 'review-body'}).text.strip()
    # 提取评分
    rating = review.find('i', {'data-hook': 'review-star-rating'}).text.strip().split()[0]
    # 提取评论日期
    date = review.find('span', {'data-hook': 'review-date'}).text.strip()
    # 打印评论数据
    print('Content: ' + content + '\nRating: ' + rating + '\nDate: ' + date + '\n')

其中，text 属性用于获取标签内的文本内容，strip() 方法用于去除文本前后的空格和换行符。

完整的代码如下所示：

import requests
from bs4 import BeautifulSoup

url = 'https://www.amazon.com/dp/B01J94SWWU'
response = requests.get(url)
html = response.text

soup = BeautifulSoup(html, 'html.parser')
reviews = soup.find_all('div', {'class': 'review'})

for review in reviews:
    content = review.find('span', {'data-hook': 'review-body'}).text.strip()
    rating = review.find('i', {'data-hook': 'review-star-rating'}).text.strip().split()[0]
    date = review.find('span', {'data-hook': 'review-date'}).text.strip()
    print('Content: ' + content + '\nRating: ' + rating + '\nDate: ' + date + '\n')

注意：

Amazon 的网站结构可能会发生变化，导致代码失效。你需要根据实际情况调整代码。
爬取网站数据时，请务必遵守网站的 robots.txt 协议，并避免过度爬取，以免影响网站的正常运行。