以下是一个简单的爬虫案例:

目标:从豆瓣电影中爬取电影名称、评分和评价人数,并保存到文件中。

步骤:

  1. 导入必要的库
import requests
from bs4 import BeautifulSoup
  1. 发送请求获取网页内容
url = 'https://movie.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
  1. 解析网页内容
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find('ol', class_='grid_view').find_all('li')
  1. 提取需要的数据
for movie in movie_list:
    # 电影名称
    title = movie.find('span', class_='title').text
    # 评分
    rating = movie.find('span', class_='rating_num').text
    # 评价人数
    num = movie.find('div', class_='star').find_all('span')[3].text
    # 将数据保存到文件
    with open('movies.txt', 'a', encoding='utf-8') as f:
        f.write('{} {} {}
'.format(title, rating, num))

完整代码:

import requests
from bs4 import BeautifulSoup

url = 'https://movie.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find('ol', class_='grid_view').find_all('li')

for movie in movie_list:
    # 电影名称
    title = movie.find('span', class_='title').text
    # 评分
    rating = movie.find('span', class_='rating_num').text
    # 评价人数
    num = movie.find('div', class_='star').find_all('span')[3].text
    # 将数据保存到文件
    with open('movies.txt', 'a', encoding='utf-8') as f:
        f.write('{} {} {}
'.format(title, rating, num))

运行后,电影名称、评分和评价人数会保存到当前目录下的 movies.txt 文件中。

Python爬虫案例:从豆瓣电影网站抓取数据

原文地址: https://www.cveoy.top/t/topic/n0WE 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录