用python爬取当当网的数据
以下是一个简单的爬取当当网图书分类下的图书信息的Python代码示例:
import requests
from bs4 import BeautifulSoup
url = 'http://book.dangdang.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 获取“图书分类”下的所有分类链接
category_links = []
for category in soup.select('.classify_books > li > a'):
category_links.append(category['href'])
# 遍历每个分类链接,获取图书信息
for link in category_links:
response = requests.get(link, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 获取该分类下的所有图书链接
book_links = []
for book in soup.select('.book_list > ul > li > p.name > a'):
book_links.append(book['href'])
# 遍历每个图书链接,获取图书信息
for book_link in book_links:
response = requests.get(book_link, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 获取图书信息
title = soup.select_one('.name_info > h1')['title']
author = soup.select_one('.messbox_info > span:nth-of-type(1) > a')['title']
publisher = soup.select_one('.messbox_info > span:nth-of-type(3) > a')['title']
publish_date = soup.select_one('.messbox_info > span:nth-of-type(4)').text.strip()
price = soup.select_one('.price_n > span:nth-of-type(1)').text
# 打印图书信息
print('书名:', title)
print('作者:', author)
print('出版社:', publisher)
print('出版日期:', publish_date)
print('价格:', price)
print('--------------------')
该代码首先获取当当网首页,然后获取“图书分类”下的所有分类链接。接着遍历每个分类链接,获取该分类下的所有图书链接。最后遍历每个图书链接,获取图书信息并打印。
原文地址: https://www.cveoy.top/t/topic/btst 著作权归作者所有。请勿转载和采集!