用python正则表达式爬取百度热搜标题在网址httpstopbaiducomboardtab=realtime里面
的步骤如下:
- 导入需要的模块:
import requests
import re
- 发送请求,获取网页内容:
url = 'https://top.baidu.com/board?tab=realtime'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
html = response.text
- 使用正则表达式匹配网页内容中的热搜标题:
pattern = re.compile('<a.*?class="title".*?>(.*?)</a>', re.S)
titles = pattern.findall(html)
- 打印结果:
for title in titles:
print(title)
完整代码如下:
import requests
import re
url = 'https://top.baidu.com/board?tab=realtime'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
html = response.text
pattern = re.compile('<a.*?class="title".*?>(.*?)</a>', re.S)
titles = pattern.findall(html)
for title in titles:
print(title)
``
原文地址: https://www.cveoy.top/t/topic/eZo6 著作权归作者所有。请勿转载和采集!