用python pandas抓取httpwwwsinacomcn网页中的数据显示数据内容加入请求头加入utf8编码

以下是一个示例代码，可以抓取新浪网首页的新闻标题和链接：

import requests
import pandas as pd

url = 'http://www.sina.com.cn/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'

dfs = pd.read_html(response.text)
df = dfs[0].loc[:, ['新闻标题', '链接']]
print(df)

其中，headers中包含了请求头信息，response.encoding = 'utf-8'指定了编码格式为utf-8。使用pd.read_html函数可以直接将网页中的表格数据读取为DataFrame格式。由于新浪网首页中有多个表格，因此需要通过dfs[0]来访问第一个表格。最后，我们只选择了新闻标题和链接两列进行展示。

用python pandas抓取httpwwwsinacomcn网页中的数据显示数据内容加入请求头加入utf8编码