用Python编写一个可运行代码爬取新版王者荣耀皮肤图片主要任务:设计一个窗体应用系统具有以下功能:1加载需要用到的各种第三方库如requests;BeautifulSoup4;lxml;sqlite3;jieba;;WordCloud;openpyxl等。2爬取一个网站信息3将信息保存到sqlite数据库表中或者Excel表中4生成图表或显示处理后的信息
import requests from bs4 import BeautifulSoup import sqlite3 import openpyxl from wordcloud import WordCloud import jieba import matplotlib.pyplot as plt
爬取新版王者荣耀皮肤图片
url = 'https://pvp.qq.com/web201605/herolist.shtml' res = requests.get(url) res.encoding = 'gbk' soup = BeautifulSoup(res.text, 'html.parser') skin_list = soup.select('.pic-pf-list > ul > li')
将信息保存到sqlite数据库表中
conn = sqlite3.connect('skin.db')
c = conn.cursor()
c.execute('''CREATE TABLE IF NOT EXISTS skins (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, hero TEXT, skin TEXT, pic_url TEXT)''')
for skin in skin_list: name = skin.select_one('.pic-pf-name').text hero = skin.select_one('.pic-pf-title').text skin_name = skin.select_one('.pic-pf-name span').text pic_url = skin.select_one('.pic-pf-img')['src']
c.execute("INSERT INTO skins (name, hero, skin, pic_url) VALUES (?, ?, ?, ?)",
(name, hero, skin_name, pic_url))
conn.commit() conn.close()
将信息保存到Excel表中
wb = openpyxl.Workbook() ws = wb.active
ws.append(['名称', '英雄', '皮肤', '图片链接'])
for skin in skin_list: name = skin.select_one('.pic-pf-name').text hero = skin.select_one('.pic-pf-title').text skin_name = skin.select_one('.pic-pf-name span').text pic_url = skin.select_one('.pic-pf-img')['src']
ws.append([name, hero, skin_name, pic_url])
wb.save('skin.xlsx')
生成词云图
text = '' for skin in skin_list: text += skin.select_one('.pic-pf-name span').text + ' '
words = ' '.join(jieba.cut(text))
wc = WordCloud(background_color='white') wc.generate(words)
plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.show(
原文地址: https://www.cveoy.top/t/topic/hnsf 著作权归作者所有。请勿转载和采集!