我需要一个完整爬虫代码要有详细注释从’httpsmoviedoubancomsubject25868125commentsstart=0&limit=20&status=P&sort=new_score’地址开始抓取第一页的评论人名称、评论时间以及评论。继续抓取2-3页的所有评论人名称、评论时间以及评论。将抓取到的数据以文件存储的方式存储为json格式数据。要求如下1动态网页爬取操作

注：本代码使用的是Python3版本

（1）动态网页爬取操作

1.导入Selenium库

from selenium import webdriver

2.创建浏览器对象

browser = webdriver.Chrome()

3.访问url地址

url = 'https://movie.douban.com/subject/25868125/comments?start=0&limit=20&status=P&sort=new_score'
browser.get(url)

4.定位元素并点击（借助Selenium库，点击进入电影全部影评）

btn_more = browser.find_element_by_css_selector('.lnk-tc')
btn_more.click()

（2）网页数据分析及数据抓取

1.导入相关库

import time
import json
from bs4 import BeautifulSoup

2.请求头提取

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}

3.发送请求并获得网页数据

html = browser.page_source

4.解析网页结构

soup = BeautifulSoup(html, 'html.parser')

5.定位评论人名称

names = []
name_tags = soup.select('.comment-item .comment-info a')
for name_tag in name_tags:
    names.append(name_tag.text.strip())

6.定位评论时间

times = []
time_tags = soup.select('.comment-item .comment-info span:nth-of-type(2)')
for time_tag in time_tags:
    times.append(time_tag.text.strip())

7.定位评论内容

comments = []
comment_tags = soup.select('.comment-item .comment-content span')
for comment_tag in comment_tags:
    comments.append(comment_tag.text.strip())

（3）多页网址数据抓取

1.遍历页面循环

for i in range(2, 4):

2.初始页面url确定

    url = 'https://movie.douban.com/subject/25868125/comments?start={}&limit=20&status=P&sort=new_score'.format((i - 1) * 20)

3.页面翻页规律设置

    browser.get(url)
    time.sleep(2)
    html = browser.page_source
    soup = BeautifulSoup(html, 'html.parser')
    names = []
    name_tags = soup.select('.comment-item .comment-info a')
    for name_tag in name_tags:
        names.append(name_tag.text.strip())
    times = []
    time_tags = soup.select('.comment-item .comment-info span:nth-of-type(2)')
    for time_tag in time_tags:
        times.append(time_tag.text.strip())
    comments = []
    comment_tags = soup.select('.comment-item .comment-content span')
    for comment_tag in comment_tags:
        comments.append(comment_tag.text.strip())

（4）保存数据

1.文件数据写入

data = []
for i in range(len(names)):
    item = {}
    item['name'] = names[i]
    item['time'] = times[i]
    item['comment'] = comments[i]
    data.append(item)

with open('comments.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

2.文件定义json格式

{
    "name": "评论人名称",
    "time": "评论时间",
    "comment": "评论内容"
}

完整代码如下：

from selenium import webdriver
import time
import json
from bs4 import BeautifulSoup

# 动态网页爬取操作

browser = webdriver.Chrome()
url = 'https://movie.douban.com/subject/25868125/comments?start=0&limit=20&status=P&sort=new_score'
browser.get(url)
btn_more = browser.find_element_by_css_selector('.lnk-tc')
btn_more.click()

# 网页数据分析及数据抓取

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')

names = []
name_tags = soup.select('.comment-item .comment-info a')
for name_tag in name_tags:
    names.append(name_tag.text.strip())

times = []
time_tags = soup.select('.comment-item .comment-info span:nth-of-type(2)')
for time_tag in time_tags:
    times.append(time_tag.text.strip())

comments = []
comment_tags = soup.select('.comment-item .comment-content span')
for comment_tag in comment_tags:
    comments.append(comment_tag.text.strip())

# 多页网址数据抓取

for i in range(2, 4):
    url = 'https://movie.douban.com/subject/25868125/comments?start={}&limit=20&status=P&sort=new_score'.format((i - 1) * 20)
    browser.get(url)
    time.sleep(2)
    html = browser.page_source
    soup = BeautifulSoup(html, 'html.parser')
    name_tags = soup.select('.comment-item .comment-info a')
    for name_tag in name_tags:
        names.append(name_tag.text.strip())
    time_tags = soup.select('.comment-item .comment-info span:nth-of-type(2)')
    for time_tag in time_tags:
        times.append(time_tag.text.strip())
    comment_tags = soup.select('.comment-item .comment-content span')
    for comment_tag in comment_tags:
        comments.append(comment_tag.text.strip())

# 保存数据

data = []
for i in range(len(names)):
    item = {}
    item['name'] = names[i]
    item['time'] = times[i]
    item['comment'] = comments[i]
    data.append(item)

with open('comments.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)
``