Selenium爬取携程评论并写入Excel：代码解析与错误修复

本文将解析使用Selenium爬取携程评论并写入Excel的代码，并解决代码中AttributeError: 'WebElement' object has no attribute 'xpath'的错误。

代码解析

from openpyxl import Workbook
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# 创建浏览器对象
options = webdriver.FirefoxOptions()
#浏览器地址
options.binary_location = r'C:\Users\yuxin\AppData\Local\Mozilla Firefox\firefox.exe'
# options.add_argument('--headless')
# options.add_argument('--diable-gpu')
browser = webdriver.Firefox(options=options)

# 打开网站
browser.get('https://you.ctrip.com/sight/dayi3130/71986.html#ctm_ref=www_hp_his_lst')

wb = Workbook()
sheet = wb.active
sheet['A1'] = '内容'
sheet['B1'] = '时间'

page_num = 1
while True:
    # 等待下一页按钮可点击
    next_button = WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.ant-pagination-next > span:nth-child(1)'))) 

    # 获取当前页的所有评论
    comments = browser.find_elements(By.XPATH, '/html/body/div[2]/div[2]/div/div[3]/div/div[4]/div[1]/div[4]/div/div[5]/div')
    for comment in comments:
        content = comment.find_element(By.XPATH, './/div[@class='commentDetail']/text()').get_attribute('xpath')
        contenttime = comment.find_element(By.XPATH, './/div[@class='commentTime']/text()').get_attribute('xpath')
        sheet.append([content, contenttime])

    # 保存数据
    wb.save('./jc6301.xlsx')

    # 点击下一页按钮
    browser.execute_script('arguments[0].click();', next_button)

    page_num += 1
    if page_num > 215:
        break

# 关闭浏览器
browser.quit()

错误修复

错误：AttributeError: 'WebElement' object has no attribute 'xpath'

原因：代码中的comment是一个WebElement对象，它没有xpath属性。你需要使用comment.find_element来找到子元素，并使用comment.get_attribute('xpath')来获取评论的xpath。

修复方法：

将原本使用comment.xpath获取评论内容和时间的方式改为使用comment.find_element和get_attribute('xpath')：

        content = comment.find_element(By.XPATH, './/div[@class='commentDetail']/text()').get_attribute('xpath')
        contenttime = comment.find_element(By.XPATH, './/div[@class='commentTime']/text()').get_attribute('xpath')

总结

本文通过解析代码并修复错误，帮助您理解Selenium爬虫和数据处理的步骤。希望本篇文章能够帮助您更好地学习和运用Selenium。