Python爬虫实战：解决json.loads()和字符串索引错误

import requests
import os
import json
from moviepy.editor import VideoFileClip, AudioFileClip
from bs4 import BeautifulSoup

# 数据抓取
base_url = 'http://www.zkk78.com/index.php/user/ajax_ulog/?ac=set&mid=1&id=4721&sid=1&nid=1&type=4'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.55',
    'Referer': 'http://www.zkk78.com/dongmanplay/4721-1-1.html',
    'Cookie': '__tins__21589017=%7B%22sid%22%3A%201689927814622%2C%20%22vd%22%3A%207%2C%20%22expires%22%3A%201689929681755%7D; __51laig__=7;'
}

response = requests.get(base_url, headers=headers)  # 第一次请求
print(response.status_code)

# 检查响应内容是否为JSON格式
print(response.text)

try:
    data = response.json()
    print(data)

except json.JSONDecodeError:
    print('响应内容不是有效的JSON格式!') 

# 数据解析
if isinstance(data, dict) and 'info' in data:
    json_list = data['info']
    print(json_list)

    # 这里需要根据实际返回的json_list结构进行解析
    # 例如，如果json_list是一个列表，则需要遍历列表，获取每个元素的'title'和'play_url'
    # 以下代码仅供参考
    # for item in json_list:  
    #     video_title = item.get('title', '') + '.mp4'
    #     video_url = item.get('play_url', '')
    #     print(video_title, video_url)
    # 
    #     print('正在下载:', video_title)
    #     # 第二次请求
    #     video_data = requests.get(video_url, headers=headers).content
    #     with open(r'./视频/' + video_title, 'wb') as f:
    #         f.write(video_data)
    #         print('下载完成\n')
else:
    print('数据格式不正确，无法解析')

错误分析：

出现TypeError: string indices must be integers错误是因为代码尝试将字符串索引为字典。

问题根源： data = response.text 获取的是响应内容的字符串形式，而不是 JSON 对象。
解决方法： 使用 response.json() 方法将响应内容解析为 JSON 对象。

代码改进：

检查响应内容： 在解析之前，打印 response.text 检查服务器返回的数据是否为预期格式。
异常处理： 使用 try...except 块捕获 json.JSONDecodeError 异常，处理非 JSON 格式的响应。
数据类型检查： 在访问 data['info'] 之前，使用 isinstance(data, dict) 和 'info' in data 检查数据类型和键是否存在，避免错误。

总结：

在处理网络请求和 JSON 数据时，务必仔细检查数据类型和格式，并添加必要的错误处理机制，以提高代码的健壮性。