Python 爬取淘宝笔记本电脑商品信息 - 代码详解与SEO优化 - 常规

这段代码是用 Python 编写的，使用了 Selenium 库来自动化爬取淘宝网站上的笔记本电脑商品信息，并将其保存为 CSV 格式的文件。具体实现过程如下：

导入需要的库和模块，包括 selenium、csv、datetime 和 time 等。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver import ChromeOptions
import csv
import datetime
import time

设置所要爬取的网址。

url = 'https://www.taobao.com/'

通过 open 函数创建一个 CSV 文件，并设置表头。

with open(f'./csv/TB_笔记本电脑.csv', 'w', encoding='utf-8', newline='') as files:
    # 设置表头并写入csv文件
    csv_obj = csv.DictWriter(files, fieldnames=['平台','商品名称', '商品价格', '店铺名称', '店铺地址', '购买人数', '商品链接','类别'])
    csv_obj.writeheader()

配置浏览器，加载 Chrome 浏览器，并打开所要爬取的网址。

    profile_dir = r'--user-data-dir=C:\Program Files\Google\Chrome\UserData'
    # 加载配置数据
    c_option = webdriver.ChromeOptions()
    c_option.add_argument('--profile-directory=Default')
    c_option.add_argument('--disable-blink-features=AutomationControlled')
    c_option.add_argument('--user-data-dir=C:/Temp/ChromeProfile')
    c_option.add_experimental_option('excludeSwitches', ['enable-loggin'])
    # 启动浏览器并配置
    driver = webdriver.Chrome(chrome_options=c_option)
    driver.get(url)
    # 等待浏览器加载完毕
    driver.implicitly_wait(7)

查找页面元素，并输入关键字“笔记本电脑”进行搜索。

    search_keyWord = driver.find_element(By.ID, 'q').send_keys('笔记本电脑')
    search_btn = driver.find_element(By.CLASS_NAME, 'btn-search').click()

定义一个函数 get_goodsInfo()，用于获取商品信息。通过 find_elements() 方法获取所有商品信息的列表，再通过循环遍历每个商品，并获取商品的名称、价格、店铺名称、店铺地址、购买人数、商品链接等信息。将获取到的信息保存在一个字典中，并通过 writerow() 方法将字典写入 CSV 文件中。

    def get_goodsInfo():
        # 获取商品信息的列表
        goods_list = driver.find_elements(By.CLASS_NAME, 'ctx-box')

        for goods in goods_list:

            goods_title = goods.find_element(By.CLASS_NAME, 'J_ClickStat').text
            goods_price = goods.find_element(By.CLASS_NAME, 'g_price-highlight').text
            goods_store = goods.find_element(By.CLASS_NAME, 'J_ShopInfo').text
            goods_location = goods.find_element(By.CLASS_NAME, 'location').text
            goods_people = goods.find_element(By.CLASS_NAME, 'deal-cnt').text
            goods_link = goods.find_element(By.CLASS_NAME, 'J_ClickStat').get_attribute('href')
            goods_price = goods_price.replace('¥', '')
            txt = goods_people.replace('万', '0000')
            txt = goods_people.replace('0+', '0')
            txt = goods_people.replace('人付款', '')
            dict_goods = {
                '平台': '淘宝',
                '商品名称': goods_title,
                '商品价格': goods_price,
                '店铺名称': goods_store, 
                '店铺地址': goods_location, 
                '购买人数': txt,
                '商品链接': goods_link,
                '类别': 3
                }
            csv_obj.writerow(dict_goods)

通过循环遍历多个页面，调用 get_goodsInfo() 函数获取每个页面的商品信息，并将其写入 CSV 文件中。

    for page in range(1, 11):
        time.sleep(10)
        get_goodsInfo()
        print(f'第{page}页数据已经写入csv文件')
        driver.find_element(By.CLASS_NAME, 'icon-btn-next-2').click()

关闭浏览器。

    driver.quit()

SEO 优化建议：

标题：使用更具吸引力的标题，例如“Python 爬取淘宝笔记本电脑信息，快速获取海量数据”。
描述：添加更详细的描述，包括代码的功能、用途和优势，例如“使用 Python 和 Selenium 库自动爬取淘宝笔记本电脑商品信息，获取商品名称、价格、店铺名称等关键数据，并将结果保存为 CSV 文件，方便后续分析和处理。”
关键词：添加更多相关的关键词，例如“爬虫、数据采集、电商、淘宝、笔记本电脑、数据分析”。
内容：添加更多解释性的内容，例如代码中每个步骤的详细说明、代码中变量和函数的用途、代码运行的注意事项等。
图片：添加与代码相关的图片，例如爬取到的商品列表截图、代码运行结果截图等。
外部链接：添加指向相关网站和资源的链接，例如 Selenium 官方网站、淘宝官方网站等。
代码块：使用代码块格式突出显示代码，提高代码的可读性。
评论：鼓励读者评论，进行互动交流。

通过以上优化，可以提升文章的搜索排名，吸引更多用户阅读，达到更好的传播效果。