Python爬虫:获取二次元高清壁纸 (1920x1080及以上)
以下是用Python实现的爬取二次元电脑壁纸的爬虫,并保存壁纸,要求等于或高于1920x1080的,低于1920x1080的不要。
import os
import requests
from bs4 import BeautifulSoup
# 设置请求头,模拟浏览器访问
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 获取网页源代码
def get_html(url):
r = requests.get(url, headers=headers)
r.encoding = 'utf-8'
return r.text
# 获取图片链接
def get_img_links(html):
soup = BeautifulSoup(html, 'html.parser')
img_links = []
for img in soup.find_all('img'):
link = img.get('data-original')
if link and link.endswith('.jpg') and 'p' in link:
img_links.append(link)
return img_links
# 下载图片
def download_img(img_link, path):
r = requests.get(img_link, headers=headers)
with open(path, 'wb') as f:
f.write(r.content)
print('图片已保存:', path)
# 主函数
def main():
url = 'https://www.acg-wall.com'
html = get_html(url)
img_links = get_img_links(html)
for link in img_links:
r = requests.head(link)
if int(r.headers['Content-Length']) >= 1920*1080:
filename = os.path.basename(link)
path = os.path.join('images', filename)
download_img(link, path)
if __name__ == '__main__':
main()
该爬虫首先访问https://www.acg-wall.com网站,获取网页源代码,然后使用BeautifulSoup库解析网页,提取所有以.jpg为后缀的图片链接,筛选出宽度和高度等于或大于1920x1080的图片链接,并下载保存到本地images文件夹中。
原文地址: https://www.cveoy.top/t/topic/nhuj 著作权归作者所有。请勿转载和采集!