爬取携程景点评论：大连景点评论抓取代码示例

本代码示例展示了如何使用 Python 和 BeautifulSoup 库爬取携程网站上的大连景点评论。代码简洁易懂，并包含详细的注释，适合初学者学习。

import requests
from bs4 import BeautifulSoup

# 构造请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}

# 发送请求
url = 'https://you.ctrip.com/sight/dalian4.html'
response = requests.get(url, headers=headers)

# 解析HTML
soup = BeautifulSoup(response.text, 'html.parser')

# 获取所有景点的id和名称
sight_list = soup.select('.list_mod2 .list_mod2_box .rdetailbox h2 a')
sight_ids = [sight['href'].split('/')[-1].split('.')[0] for sight in sight_list]
sight_names = [sight.text for sight in sight_list]

# 遍历每一个景点，获取评论
for i, sight_id in enumerate(sight_ids):
    sight_name = sight_names[i]

    # 构造请求
    url = f'https://you.ctrip.com/sight/{sight_id}.html#ctm_ref=hod_sr_lst_dl_n_4_2'
    response = requests.get(url, headers=headers)

    # 解析HTML
    soup = BeautifulSoup(response.text, 'html.parser')

    # 获取评论
    comments = soup.select('.comment_ctrip .comment_single')
    for comment in comments:
        print(f'{sight_name}: {comment.select('.text_comment')}')

解释：

首先，我们发送一个请求，获取目标网页的HTML。

url = 'https://you.ctrip.com/sight/dalian4.html'
response = requests.get(url, headers=headers)

然后，我们解析HTML，获取每一个景点的id和名称。

sight_list = soup.select('.list_mod2 .list_mod2_box .rdetailbox h2 a')
sight_ids = [sight['href'].split('/')[-1].split('.')[0] for sight in sight_list]
sight_names = [sight.text for sight in sight_list]

接着，我们遍历每一个景点，构造请求，获取评论。

for i, sight_id in enumerate(sight_ids):
    sight_name = sight_names[i]
    url = f'https://you.ctrip.com/sight/{sight_id}.html#ctm_ref=hod_sr_lst_dl_n_4_2'
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    comments = soup.select('.comment_ctrip .comment_single')
    for comment in comments:
        print(f'{sight_name}: {comment.select('.text_comment')}')

最后，我们打印出每一个景点的名称和评论内容。

代码说明：

代码中使用了 requests 和 BeautifulSoup 库，需要先安装。可以使用以下命令进行安装：
```
pip install requests
pip install beautifulsoup4
```
代码中的 User-Agent 可以更改为其他值，以模拟不同的浏览器。
代码中的 comment.select('.text_comment') 可以根据实际网页结构进行调整，以获取评论内容。

注意：

爬取网站数据时，请遵守网站的 robots.txt 文件，避免对网站造成过大的负荷。
爬取数据时，请注意不要侵犯用户的隐私。

希望本代码示例对您有所帮助！