Python爬虫实战:从当当网获取图书信息 - 提取书名、促销语、作者、价格等
以下是一个使用Python编写爬虫爬取当当网搜索页面信息的示例代码:\n\npython\nimport requests\nfrom lxml import etree\nimport pandas as pd\n\ndef scrape_dangdang(keyword):\n url = f"http://search.dangdang.com/?key={keyword}&act=input"\n headers = {\n "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"\n }\n response = requests.get(url, headers=headers)\n html = response.text\n tree = etree.HTML(html)\n\n titles = tree.xpath('//ul[@class="bigimg"]/li//a[@class="pic"]/img/@alt')\n promotions = tree.xpath('//ul[@class="bigimg"]/li//p[@class="detail"]/text()')\n authors = tree.xpath('//ul[@class="bigimg"]/li//p[@class="search_book_author"]/span[1]/a/text()')\n publish_times = tree.xpath('//ul[@class="bigimg"]/li//p[@class="search_book_author"]/span[2]/text()')\n publishers = tree.xpath('//ul[@class="bigimg"]/li//p[@class="search_book_author"]/span[3]/a/text()')\n prices = tree.xpath('//ul[@class="bigimg"]/li//p[@class="price"]/span[@class="search_now_price"]/text()')\n stars = tree.xpath('//ul[@class="bigimg"]/li//div[@class="star"]/a/text()')\n comments = tree.xpath('//ul[@class="bigimg"]/li//div[@class="star"]/span/a/text()')\n intros = tree.xpath('//ul[@class="bigimg"]/li//p[@class="detail"]/text()')\n\n data = {\n "书名": titles,\n "促销语": promotions,\n "作者": authors,\n "出版时间": publish_times,\n "出版社": publishers,\n "价格": prices,\n "星级": stars,\n "评论数": comments,\n "简介": intros\n }\n\n df = pd.DataFrame(data)\n df.to_csv(f"当当搜索_{keyword}_单页面.csv", index=False)\n print(df)\n\nif __name__ == "__main__":\n keyword = "神经网络"\n scrape_dangdang(keyword)\n\n\n请确保已安装requests、lxml和pandas库。运行代码后,将会输出提取的信息并保存为csv文件。
原文地址: https://www.cveoy.top/t/topic/pqEm 著作权归作者所有。请勿转载和采集!