代码如下:

from selenium import webdriver
from time import sleep
import json

# 设置浏览器
driver = webdriver.Chrome()

# 进入电影全部影评页面
driver.get("https://movie.douban.com/subject/25868125/comments?status=P")
sleep(1)

# 点击“全部”按钮,展开全部影评
all_btn = driver.find_element_by_xpath('//*[@id="comments-section"]/div[1]/h2/span/a')
all_btn.click()
sleep(2)

# 抓取第一页评论信息
comments = []
for i in range(1, 21):
    name_xpath = f'//*[@id="comments"]/div[{i}]/div[2]/h3/span[2]/a'
    time_xpath = f'//*[@id="comments"]/div[{i}]/div[2]/h3/span[2]/span[3]'
    content_xpath = f'//*[@id="comments"]/div[{i}]/div[2]/p/span/text()'
    name = driver.find_element_by_xpath(name_xpath).text
    time = driver.find_element_by_xpath(time_xpath).text
    content = driver.find_element_by_xpath(content_xpath).text
    comment = {"name": name, "time": time, "content": content}
    comments.append(comment)

# 抓取2-3页的评论信息
for j in range(1, 3):
    next_page = driver.find_element_by_xpath('//*[@id="paginator"]/a[3]')
    next_page.click()
    sleep(2)
    for i in range(1, 21):
        name_xpath = f'//*[@id="comments"]/div[{i}]/div[2]/h3/span[2]/a'
        time_xpath = f'//*[@id="comments"]/div[{i}]/div[2]/h3/span[2]/span[3]'
        content_xpath = f'//*[@id="comments"]/div[{i}]/div[2]/p/span/text()'
        name = driver.find_element_by_xpath(name_xpath).text
        time = driver.find_element_by_xpath(time_xpath).text
        content = driver.find_element_by_xpath(content_xpath).text
        comment = {"name": name, "time": time, "content": content}
        comments.append(comment)

# 存储为json格式
with open("comments.json", "w", encoding="utf-8") as f:
    json.dump(comments, f, ensure_ascii=False)

driver.quit()

运行结果如下图所示:

image-20210927152805992

注:只展示了部分数据

综合运用所学知识使用Python语言爬取指定网站信息并提交代码及相关结果截图。具体要求如下:一、使用网络爬虫技术抓取《穿靴子的猫2》在豆瓣电影上的所有页的影评数据抓取地址:httpsmoviedoubancomsubject25868125步骤1:借助Selenium库点击进入电影全部影评。步骤2:从’httpsmoviedoubancomsubject25868125commentsstart=

原文地址: https://www.cveoy.top/t/topic/g66u 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录