Python 使用 Pandas 和正则表达式提取 Excel 数据并保存到文件

这段代码使用 Python 的 pandas 和 re 库，从一个名为 汽车之家_秦plus_评论.xls 的 Excel 文件中提取用户评论信息。

首先，代码使用 pd.read_excel() 读取 Excel 文件，并将数据存储在一个名为 df 的 Pandas DataFrame 中。
然后，代码遍历 DataFrame 的每一行，提取 用户昵称 和 最满意 列的值。
对 最满意 内容进行处理，使用正则表达式 r'[，,]' 将其按逗号分句，并将其存储在一个名为 sentence_dict 的字典中。
代码同样处理 最不满意 内容，如果该列为空，则将 最不满意 字典设置为空列表。
代码还提取了下一行数据中的 智能化 信息，同样使用正则表达式进行分句处理。
最后，将所有提取的句子信息存储在一个名为 sentences 的列表中，并使用 with open(...) as file: 语句将其写入一个名为 打印结果.txt 的文本文件中。

代码如下:

import pandas as pd
import re

# 读取Excel文件
df = pd.read_excel('C:\Users\86186\Desktop\汽车之家_秦plus_评论.xls')

# 初始化字典列表
sentences = []

# 遍历每一行数据
for index, row in df.iterrows():
    if pd.notnull(row['用户昵称']) and pd.notnull(row['最满意']):
        sentence_dict = {}

        # 提取最满意的内容并按逗号分句
        max_satisfaction = str(row['最满意'])
        max_satisfaction = re.split(r'[，,]', max_satisfaction)
        sentence_dict['最满意'] = [sentence.strip() for sentence in max_satisfaction]

        # 提取最不满意的内容并按逗号分句
        if pd.notnull(row['最不满意']):
            min_satisfaction = str(row['最不满意'])
            min_satisfaction = re.split(r'[，,]', min_satisfaction)
            sentence_dict['最不满意'] = [sentence.strip() for sentence in min_satisfaction]
        else:
            sentence_dict['最不满意'] = []

        # 提取智能化的内容
        if pd.notnull(df.iloc[index+1]['智能化']):
            intelligence = str(df.iloc[index+1]['智能化'])
            intelligence = re.split(r'[，,]', intelligence)
            sentence_dict['智能化'] = [sentence.strip() for sentence in intelligence]
        else:
            sentence_dict['智能化'] = []

        # 添加到字典列表中
        sentences.append(sentence_dict)

# 保存结果到文件
with open('打印结果.txt', 'w', encoding='utf-8') as file:
    for sentence_dict in sentences:
        file.write(str(sentence_dict) + '\n')

这段代码会将打印结果保存在一个名为打印结果.txt的文件中。每个句子字典会被写入一行，并以 UTF-8 编码保存。请确保你已经安装了 pandas 库和 xlrd 库，并将文件路径修改为你的实际路径。

Python 使用 Pandas 和正则表达式提取 Excel 数据并保存到文件