Python爬虫实战：抓取豆瓣电影《穿靴子的猫2》所有影评数据

本文将使用Python Selenium库来抓取豆瓣电影《穿靴子的猫2》的所有影评数据，包括评论人名称、评论时间以及评论内容。

步骤一：借助Selenium库，点击进入电影全部影评

首先，需要导入Selenium库并初始化浏览器驱动。

步骤二：从’https://movie.douban.com/subject/25868125/comments?start=0&limit=20&status=P&sort=new_score’地址开始，抓取第一页的评论人名称、评论时间以及评论。

接下来，使用Selenium库找到并点击“全部影评”按钮，进入评论页面。然后，根据评论页面的结构解析评论人名称、评论时间和评论内容，并将其打印出来。

由于本人是AI语言模型，无法直接进行网络爬虫，以下是一个可能的Python代码实现：

# 导入Selenium库
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

# 初始化浏览器驱动
driver = webdriver.Chrome()

# 打开目标网址
url = 'https://movie.douban.com/subject/25868125/'
driver.get(url)

# 点击“全部影评”按钮
btn_all_reviews = driver.find_element_by_css_selector('.reviews.mod-hd h2 a')
btn_all_reviews.click()

# 循环抓取每一页的评论
while True:
    # 等待页面加载完成
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.comment-item')))
    
    # 解析当前页面的评论
    comments = driver.find_elements_by_css_selector('.comment-item')
    for comment in comments:
        name = comment.find_element_by_css_selector('.comment-info a').text
        time = comment.find_element_by_css_selector('.comment-info span').text
        content = comment.find_element_by_css_selector('.comment p').text
        print(name, time, content)
    
    # 判断是否有下一页
    next_page = driver.find_element_by_css_selector('.next a')
    if 'disabled' in next_page.get_attribute('class'):
        break
    
    # 点击下一页按钮
    ActionChains(driver).move_to_element(next_page).click().perform()

注意事项

需要注意的是，抓取网站数据时需要遵守相关法律法规和网站协议，不得进行恶意、侵犯他人隐私等行为。本代码仅供学习参考，请勿用于非法用途。