以下是用Python实现的爬取二次元电脑壁纸的爬虫,并保存壁纸,要求等于或高于1920x1080的,低于1920x1080的不要。

import os
import requests
from bs4 import BeautifulSoup

# 设置请求头,模拟浏览器访问
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

# 获取网页源代码
def get_html(url):
    r = requests.get(url, headers=headers)
    r.encoding = 'utf-8'
    return r.text

# 获取图片链接
def get_img_links(html):
    soup = BeautifulSoup(html, 'html.parser')
    img_links = []
    for img in soup.find_all('img'):
        link = img.get('data-original')
        if link and link.endswith('.jpg') and 'p' in link:
            img_links.append(link)
    return img_links

# 下载图片
def download_img(img_link, path):
    r = requests.get(img_link, headers=headers)
    with open(path, 'wb') as f:
        f.write(r.content)
    print('图片已保存:', path)

# 主函数
def main():
    url = 'https://www.acg-wall.com'
    html = get_html(url)
    img_links = get_img_links(html)
    for link in img_links:
        r = requests.head(link)
        if int(r.headers['Content-Length']) >= 1920*1080:
            filename = os.path.basename(link)
            path = os.path.join('images', filename)
            download_img(link, path)

if __name__ == '__main__':
    main()

该爬虫首先访问https://www.acg-wall.com网站,获取网页源代码,然后使用BeautifulSoup库解析网页,提取所有以.jpg为后缀的图片链接,筛选出宽度和高度等于或大于1920x1080的图片链接,并下载保存到本地images文件夹中。


原文地址: https://www.cveoy.top/t/topic/nhuj 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录