Python爬取漫画图片数据：实战教程及代码解析

以下Python代码可以实现对给定网页的前3章漫画图片内容的爬取，并将图片保存到不同的文件夹中：

import os
import requests
from bs4 import BeautifulSoup

# 网页地址
url = 'https://ac.qq.com/Comic/ComicInfo/id/651263'

# 发送请求
r = requests.get(url)

# 解析网页内容
soup = BeautifulSoup(r.content, 'html.parser')

# 获取漫画名称
comic_name = soup.find('h1', {'class': 'works-intro-title'}).text.strip()

# 获取每一章节的名称和链接
chapter_list = []
chapters = soup.find_all('a', {'class': 'chapter-item'})
for c in chapters:
    chapter_name = c.find('div', {'class': 'works-chapter-item-name'}).text.strip()
    chapter_url = 'https://ac.qq.com' + c['href']
    chapter_list.append((chapter_name, chapter_url))

# 获取每一章节的前3页漫画图片
for i, (chapter_name, chapter_url) in enumerate(chapter_list):
    if i >= 3:
        break
    # 创建保存漫画图片的文件夹
    folder_name = f'{comic_name}/{chapter_name}'
    os.makedirs(folder_name, exist_ok=True)
    # 发送请求
    r = requests.get(chapter_url)
    # 解析网页内容
    soup = BeautifulSoup(r.content, 'html.parser')
    # 获取漫画图片链接
    image_urls = [img['data-src'] for img in soup.find_all('img', {'class': 'comic-img'})]
    # 下载漫画图片
    for j, image_url in enumerate(image_urls):
        image_name = f'{j+1}.jpg'
        image_path = f'{folder_name}/{image_name}'
        with open(image_path, 'wb') as f:
            f.write(requests.get(image_url).content)

在上面的代码中，requests包用于发送HTTP请求，BeautifulSoup包用于解析HTML网页内容。首先，我们使用requests包发送GET请求获取网页内容，然后使用BeautifulSoup包解析网页内容，以便从中获取需要的信息。

我们首先获取漫画名称，然后获取每一章节的名称和链接，存储在chapter_list列表中。接着，我们遍历chapter_list列表，对于前3个章节，创建以漫画名称和章节名称命名的文件夹，向每个文件夹中保存该章节的前3页漫画图片。这里我们使用了os包中的makedirs函数，它可以创建多级目录。我们使用requests包下载漫画图片，并使用with open语句打开文件以便保存漫画图片。在下载漫画图片时，我们将漫画图片保存为jpg格式，并以页码为文件名。最后，我们将漫画图片保存到以漫画名称和章节名称命名的文件夹中。