步骤1:

首先需要安装Selenium库和Chrome浏览器,以下为代码示例:

from selenium import webdriver

# 设置Chrome浏览器路径
chrome_path = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe"

# 设置Chrome浏览器选项
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = chrome_path

# 启动Chrome浏览器
driver = webdriver.Chrome(chrome_options=chrome_options)

# 打开网页
driver.get("https://movie.douban.com/subject/25868125/")

# 点击“全部影评”按钮
btn_all_comments = driver.find_element_by_xpath('//div[@id="interest_sectl"]//a[@href="#comments"]')
btn_all_comments.click()

步骤2:

使用Requests库和BeautifulSoup库抓取第一页的评论数据,以下为代码示例:

import requests
from bs4 import BeautifulSoup

url = 'https://movie.douban.com/subject/25868125/comments?start=0&limit=20&status=P&sort=new_score'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

comments = []
for item in soup.select('.comment-item'):
    comment = {}
    comment['user'] = item.select('.comment-info > a')[0].text.strip()
    comment['time'] = item.select('.comment-info > span')[1].text.strip()
    comment['content'] = item.select('.comment > p')[0].text.strip()
    comments.append(comment)

步骤3:

使用循环抓取2-3页的评论数据,以下为代码示例:

for page in range(2, 4):
    start = (page - 1) * 20
    url = 'https://movie.douban.com/subject/25868125/comments?start={0}&limit=20&status=P&sort=new_score'.format(start)

    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    for item in soup.select('.comment-item'):
        comment = {}
        comment['user'] = item.select('.comment-info > a')[0].text.strip()
        comment['time'] = item.select('.comment-info > span')[1].text.strip()
        comment['content'] = item.select('.comment > p')[0].text.strip()
        comments.append(comment)

步骤4:

使用json库将数据存储到文件中,以下为代码示例:

import json

with open('comments.json', 'w', encoding='utf-8') as f:
    json.dump(comments, f, ensure_ascii=False)
``
使用网络爬虫技术抓取《穿靴子的猫2》在豆瓣电影上的所有页的影评数据抓取地址:httpsmoviedoubancomsubject25868125步骤1:借助Selenium库点击进入电影全部影评步骤2:从’httpsmoviedoubancomsubject25868125commentsstart=0&limit=20&status=P&sort=new_score’地址开始抓取第一页的评论人名

原文地址: https://www.cveoy.top/t/topic/g64g 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录