1. 首先需要安装bs4库,可以使用以下命令安装:

    pip install beautifulsoup4
    
  2. 导入bs4库和requests库(用于发送HTTP请求):

    from bs4 import BeautifulSoup
    import requests
    
  3. 发送HTTP请求获取网页内容:

    url = 'https://www.example.com'
    response = requests.get(url)
    content = response.text
    
  4. 使用BeautifulSoup解析网页内容:

    soup = BeautifulSoup(content, 'html.parser')
    
  5. 通过标签名、类名、id等方式定位元素并获取其内容或属性:

    # 通过标签名定位元素
    title = soup.title.string
    print(title)
    
    # 通过类名定位元素
    items = soup.find_all('div', class_='item')
    for item in items:
        name = item.find('h3', class_='name').string
        price = item.find('span', class_='price').string
        print(name, price)
    
    # 通过id定位元素
    logo = soup.find('img', id='logo')
    src = logo['src']
    print(src)
    
  6. 其他常用操作:

    # 获取所有链接
    links = soup.find_all('a')
    for link in links:
        href = link['href']
        text = link.string
        print(href, text)
    
    # 获取所有图片
    imgs = soup.find_all('img')
    for img in imgs:
        src = img['src']
        alt = img['alt']
        print(src, alt)
    
    # 获取表格数据
    table = soup.find('table', class_='table')
    rows = table.find_all('tr')
    for row in rows:
        cols = row.find_all('td')
        for col in cols:
            text = col.string
            print(text, end='\t')
        print()
    
使用bs4解析网页

原文地址: https://www.cveoy.top/t/topic/bnuw 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录