Python 爬取豆瓣电影评论数据：解决'NoneType' object has no attribute 'text' 错误

在使用 Python 爬取豆瓣电影评论数据时，经常会遇到 'NoneType' object has no attribute 'text' 错误。这是因为在解析网页结构时，可能存在某些元素在页面中不存在，导致无法获取其文本内容。

以下代码展示了如何解决这个错误，并提供完整代码示例：

import time
import json
from selenium import webdriver
import requests
from bs4 import BeautifulSoup

# 创建浏览器对象
browser = webdriver.Chrome()

# 访问url地址
browser.get('https://movie.douban.com/subject/25868125/')

# 定位元素并点击
btn = browser.find_element_by_css_selector('#comments-section > div:nth-child(1) > h2 > span > a')
btn.click()

# 请求头提取
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
    'Referer': 'https://movie.douban.com/subject/25868125/comments?status=P'
}

# 初始页面url确定
url = 'https://movie.douban.com/subject/25868125/comments?start=0&limit=20&status=P&sort=new_score'

# 遍历页面循环
for i in range(3):
    # 发送请求并获得网页数据
    response = requests.get(url, headers=headers)

    # 解析网页结构
    soup = BeautifulSoup(response.text, 'html.parser')

    # 定位评论人名称、评论时间以及评论
    comments = soup.find_all('div', class_='comment-item')
    data = []
    for comment in comments:
        # 避免 'NoneType' object has no attribute 'text' 错误
        name = comment.find('a', class_='')
        if name:
            name = name.text.strip()
        else:
            name = ''
        sj = comment.find('span', class_='comment-sj')
        if sj:
            sj = sj.text.strip()
        else:
            sj = ''
        content = comment.find('span', class_='short')
        if content:
            content = content.text.strip()
        else:
            content = ''
        data.append({'name': name, 'sj': sj, 'content': content})

    # 文件数据写入
    with open(f'comments_{i+1}.json', 'w', encoding='utf-8') as f:
        # 文件定义json格式
        json.dump(data, f, ensure_ascii=False, indent=4)

    # 页面翻页规律设置
    url = 'https://movie.douban.com/subject/25868125/comments?start={}&limit=20&status=P&sort=new_score'.format(i*20)

    # 间隔3秒爬取下一页
    time.sleep(3)

# 关闭浏览器
browser.quit()

代码解析：

使用 if 语句判断元素是否存在，如果存在则获取其文本内容，否则将该字段设为空字符串。
通过此方法可以避免由于元素不存在导致的 'NoneType' object has no attribute 'text' 错误。

其他注意事项：

确保在运行代码前安装了 selenium、requests 和 beautifulsoup4 库。
爬取数据时请遵守网站的Robots协议，不要过度频繁地访问网站。
合理设置爬取速度，避免对服务器造成过大压力。

希望以上内容能够帮助您解决爬取豆瓣电影评论数据的错误。如果您还有其他问题，请随时提出。

Python 爬取豆瓣电影评论数据：解决'NoneType' object has no attribute 'text' 错误