使用Python语言使用网络爬虫技术抓取《穿靴子的猫2》在豆瓣电影上的所有页的影评数据抓取地址:httpsmoviedoubancomsubject25868125
抓取代码如下:
import requests
from bs4 import BeautifulSoup
# 设置请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 获取总页数
url = 'https://movie.douban.com/subject/25868125/comments?start=0&limit=20&sort=new_score&status=P'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
pagination = soup.find('div', {'class': 'comment-item'})
total_page = int(pagination['data-total-page'])
# 循环抓取每一页的数据
for i in range(total_page):
start = i * 20
url = f'https://movie.douban.com/subject/25868125/comments?start={start}&limit=20&sort=new_score&status=P'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all('div', {'class': 'comment-item'})
# 解析每条影评的数据
for comment in comments:
username = comment.find('span', {'class': 'comment-info'}).find('a').text.strip()
rating = comment.find('span', {'class': 'comment-info'}).find_all('span')[1]['class'][0][7]
content = comment.find('span', {'class': 'short'}).text.strip()
print(f'用户名:{username} 评分:{rating} 影评:{content}')
该代码使用requests库发送HTTP请求,使用BeautifulSoup库解析HTML文档,抓取了《穿靴子的猫2》在豆瓣电影上的所有页的影评数据,并将用户名、评分和影评内容打印出来
原文地址: https://www.cveoy.top/t/topic/g607 著作权归作者所有。请勿转载和采集!