Python爬虫实战:自动下载二次元壁纸(1920x1080及以上)
以下是一个简单的爬虫示例,它可以从ACG网站抓取二次元电脑壁纸,并保存到本地文件夹。它会过滤掉低于1920x1080分辨率的壁纸,只保留等于或高于该分辨率的壁纸。
import requests
import os
url = 'https://www.acgarea.com/'
def download_image(url, path):
response = requests.get(url)
with open(path, 'wb') as f:
f.write(response.content)
def get_image_urls():
response = requests.get(url)
image_urls = []
for line in response.text.splitlines():
if 'data-src=' in line:
image_url = line.split('data-src=')[1].split('')[0]
image_urls.append(image_url)
return image_urls
def main():
image_urls = get_image_urls()
for image_url in image_urls:
if 'http' not in image_url:
image_url = url + image_url
response = requests.get(image_url)
if response.status_code == 200:
content_length = int(response.headers.get('content-length', 0))
if content_length >= 1920 * 1080:
filename = image_url.split('/')[-1]
path = os.path.join('images', filename)
download_image(image_url, path)
print(f'Saved image {filename}')
if __name__ == '__main__':
main()
该爬虫首先通过发送一个GET请求获取网站的HTML代码,然后在HTML代码中查找所有带有data-src属性的图片元素,获取它们的URL。接下来,它会遍历所有的图片URL,下载高于或等于1920x1080分辨率的壁纸,并保存在本地文件夹中。
原文地址: https://www.cveoy.top/t/topic/nhuk 著作权归作者所有。请勿转载和采集!