以下是一个简单的爬取当当网图书分类下的图书信息的Python代码示例:

import requests
from bs4 import BeautifulSoup

url = 'http://book.dangdang.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# 获取“图书分类”下的所有分类链接
category_links = []
for category in soup.select('.classify_books > li > a'):
    category_links.append(category['href'])

# 遍历每个分类链接,获取图书信息
for link in category_links:
    response = requests.get(link, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    # 获取该分类下的所有图书链接
    book_links = []
    for book in soup.select('.book_list > ul > li > p.name > a'):
        book_links.append(book['href'])

    # 遍历每个图书链接,获取图书信息
    for book_link in book_links:
        response = requests.get(book_link, headers=headers)
        soup = BeautifulSoup(response.text, 'html.parser')

        # 获取图书信息
        title = soup.select_one('.name_info > h1')['title']
        author = soup.select_one('.messbox_info > span:nth-of-type(1) > a')['title']
        publisher = soup.select_one('.messbox_info > span:nth-of-type(3) > a')['title']
        publish_date = soup.select_one('.messbox_info > span:nth-of-type(4)').text.strip()
        price = soup.select_one('.price_n > span:nth-of-type(1)').text

        # 打印图书信息
        print('书名:', title)
        print('作者:', author)
        print('出版社:', publisher)
        print('出版日期:', publish_date)
        print('价格:', price)
        print('--------------------')

该代码首先获取当当网首页,然后获取“图书分类”下的所有分类链接。接着遍历每个分类链接,获取该分类下的所有图书链接。最后遍历每个图书链接,获取图书信息并打印。

用python爬取当当网的数据

原文地址: https://www.cveoy.top/t/topic/btst 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录