使用bs4解析网页
-
首先需要安装bs4库,可以使用以下命令安装:
pip install beautifulsoup4 -
导入bs4库和requests库(用于发送HTTP请求):
from bs4 import BeautifulSoup import requests -
发送HTTP请求获取网页内容:
url = 'https://www.example.com' response = requests.get(url) content = response.text -
使用BeautifulSoup解析网页内容:
soup = BeautifulSoup(content, 'html.parser') -
通过标签名、类名、id等方式定位元素并获取其内容或属性:
# 通过标签名定位元素 title = soup.title.string print(title) # 通过类名定位元素 items = soup.find_all('div', class_='item') for item in items: name = item.find('h3', class_='name').string price = item.find('span', class_='price').string print(name, price) # 通过id定位元素 logo = soup.find('img', id='logo') src = logo['src'] print(src) -
其他常用操作:
# 获取所有链接 links = soup.find_all('a') for link in links: href = link['href'] text = link.string print(href, text) # 获取所有图片 imgs = soup.find_all('img') for img in imgs: src = img['src'] alt = img['alt'] print(src, alt) # 获取表格数据 table = soup.find('table', class_='table') rows = table.find_all('tr') for row in rows: cols = row.find_all('td') for col in cols: text = col.string print(text, end='\t') print()
原文地址: https://www.cveoy.top/t/topic/bnuw 著作权归作者所有。请勿转载和采集!