Python爬虫实战：抓取豆瓣电影《穿靴子的猫2》所有页影评数据

本文将使用Python语言，利用requests和BeautifulSoup库，抓取豆瓣电影《穿靴子的猫2》所有页面的影评数据，包括用户名、评分和影评内容。

代码实现

import requests
from bs4 import BeautifulSoup

# 设置请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

# 获取总页数
url = 'https://movie.douban.com/subject/25868125/comments?start=0&limit=20&sort=new_score&status=P'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
pagination = soup.find('div', {'class': 'comment-item'})
total_page = int(pagination['data-total-page'])

# 循环抓取每一页的数据
for i in range(total_page):
    start = i * 20
    url = f'https://movie.douban.com/subject/25868125/comments?start={start}&limit=20&sort=new_score&status=P'
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    comments = soup.find_all('div', {'class': 'comment-item'})

    # 解析每条影评的数据
    for comment in comments:
        username = comment.find('span', {'class': 'comment-info'}).find('a').text.strip()
        rating = comment.find('span', {'class': 'comment-info'}).find_all('span')[1]['class'][0][7]
        content = comment.find('span', {'class': 'short'}).text.strip()
        print(f'用户名：{username} 评分：{rating} 影评：{content}')

代码说明

代码首先使用requests库发送HTTP请求，获取豆瓣电影《穿靴子的猫2》评论页面的HTML文档。
使用BeautifulSoup库解析HTML文档，获取总页数。
循环抓取每一页的评论数据，并解析每条影评的用户名、评分和内容。
最后将解析后的数据打印到控制台。

总结

本文提供了一个使用Python爬虫技术抓取豆瓣电影《穿靴子的猫2》所有页影评数据的完整代码示例，希望能帮助读者学习和理解Python爬虫的基本原理和实现方法。