Python爬虫实战:使用requests和BeautifulSoup提取小说章节标题
import requests\nfrom bs4 import BeautifulSoup\n\nurl='https://book.zongheng.com/showchapter/1%20245125.html'\nheaders={\n 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',\n 'cookie':'aliyungf_tc=3262172781ae7c98c3607e8149b475d92b9350091fab4dee9de88face2943351; acw_tc=781bad3a16895079565272401e6b2d57414d3341f69820b68ca5f691628efc; ZHID=6D69BA91A5310EF5B553A524D3696E45; ver=2018; zhffr=0; zh_visitTime=1689507962983; sajssdk_2015_cross_new_user=1; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221895e85a09646d-02d79b9c7a91fd-4c657b58-1821369-1895e85a097cf7%22%2C%22%24device_id%22%3A%221895e85a09646d-02d79b9c7a91fd-4c657b58-1821369-1895e85a097cf7%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%7D%7D; Hm_lvt_c202865d524849216eea846069349eb9=1689507963; Hm_lpvt_c202865d524849216eea846069349eb9=1689508520'\n}\n\nres=requests.get(url,headers=headers)\n\nsoup = BeautifulSoup(res.text,'lxml')\nlis=soup.find_all('li',class_='col-4')\n\nfor i in lis:\n a = i.find('a')\n print(a.text)\n\n在代码中存在两个错误:\n1. 在BeautifulSoup的初始化中,应该传入的是res.text而不是字符串'res.text'。\n2. 在headers中,键名为'uesr-agent'应该更正为'user-agent'。\n\n以下是更正后的代码:\n\npython\nimport requests\nfrom bs4 import BeautifulSoup\n\nurl = 'https://book.zongheng.com/showchapter/1%20245125.html'\nheaders = {\n 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',\n 'cookie': 'aliyungf_tc=3262172781ae7c98c3607e8149b475d92b9350091fab4dee9de88face2943351; acw_tc=781bad3a16895079565272401e6b2d57414d3341f69820b68ca5f691628efc; ZHID=6D69BA91A5310EF5B553A524D3696E45; ver=2018; zhffr=0; zh_visitTime=1689507962983; sajssdk_2015_cross_new_user=1; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221895e85a09646d-02d79b9c7a91fd-4c657b58-1821369-1895e85a097cf7%22%2C%22%24device_id%22%3A%221895e85a09646d-02d79b9c7a91fd-4c657b58-1821369-1895e85a097cf7%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%7D%7D; Hm_lvt_c202865d524849216eea846069349eb9=1689507963; Hm_lpvt_c202865d524849216eea846069349eb9=1689508520'}\n\nres = requests.get(url, headers=headers)\n\nsoup = BeautifulSoup(res.text, 'lxml')\nlis = soup.find_all('li', class_='col-4')\n\nfor i in lis:\n a = i.find('a')\n print(a.text)\n\n\n请注意,这里使用了lxml解析器,所以需要在代码中导入lxml库。
原文地址: https://www.cveoy.top/t/topic/pPOv 著作权归作者所有。请勿转载和采集!