使用Python爬虫抓取《穿靴子的猫2》豆瓣影评数据

本教程将使用Python的Selenium库抓取《穿靴子的猫2》在豆瓣电影上的所有页的影评数据，包括评论人名称、评论时间以及评论内容，并将数据存储为JSON格式。

前提条件

已安装Python 3.x
已安装Selenium库
已安装Chrome浏览器

步骤

1. 进入电影页面并点击'全部影评'按钮

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# 打开Chrome浏览器
browser = webdriver.Chrome()

# 进入豆瓣电影页面
browser.get('https://movie.douban.com/subject/25868125/')

# 点击'全部影评'按钮
button = WebDriverWait(browser, 10).until(
    EC.presence_of_element_located((By.LINK_TEXT, '全部影评'))
)
button.click()

2. 抓取前3页的影评数据

import time
import json

# 存储抓取结果的列表
comments = []

# 抓取前3页的评论
for i in range(3):
    # 构造当前页面的URL
    url = 'https://movie.douban.com/subject/25868125/comments?start={}&limit=20&status=P&sort=new_score'.format(i*20)
    
    # 进入当前页面
    browser.get(url)
    
    # 等待评论加载完成
    time.sleep(5)
    
    # 抓取评论
    comment_items = browser.find_elements_by_css_selector('div.comment-item')
    for item in comment_items:
        name = item.find_element_by_css_selector('span.comment-info > a').text
        time = item.find_element_by_css_selector('span.comment-info > span.comment-time').text
        content = item.find_element_by_css_selector('div.comment > p').text
        comments.append({'name': name, 'time': time, 'content': content})
        
# 存储抓取结果
with open('comments.json', 'w', encoding='utf-8') as f:
    json.dump(comments, f, ensure_ascii=False)

3. 读取并输出抓取结果

# 读取抓取结果
with open('comments.json', 'r', encoding='utf-8') as f:
    comments = json.load(f)

# 输出抓取结果
for comment in comments:
    print(comment['name'], comment['time'], comment['content'])

代码说明

使用Selenium库打开Chrome浏览器，并访问豆瓣电影页面
使用WebDriverWait和expected_conditions等待页面元素加载完毕
点击'全部影评'按钮，进入评论页面
使用循环抓取前3页的评论数据
使用find_elements_by_css_selector获取评论元素
提取评论人名称、评论时间和评论内容，并存入列表
使用json.dump将抓取结果存储为JSON格式文件
使用json.load读取JSON文件，并输出抓取结果

注意事项

豆瓣电影的页面结构可能会发生变化，需要根据实际情况调整代码
抓取数据时，请遵守网站的robots.txt协议
不要进行频繁的抓取操作，以免影响网站服务器的正常运行

扩展

可以修改代码，抓取更多页的评论数据
可以提取评论的评分信息
可以对抓取到的数据进行分析，例如统计评论数量、分析评论情感等

希望本教程对您有所帮助。如果您有任何问题，欢迎留言交流。