Python 批量统计 docx 文件关键词词频并导出到 Excel

可以使用 Python 中的 'docx'、'openpyxl' 和 'os' 模块来实现这一功能。

首先，需要安装 'docx' 和 'openpyxl' 模块。可以使用 'pip' 命令进行安装：

pip install python-docx openpyxl

然后，可以使用 'os' 模块来获取文件夹下的所有 docx 文件，并使用 'docx' 模块来读取文件内容。接着，使用正则表达式或 'split' 函数来分割文本，并使用 'collections' 模块中的 'Counter' 函数来统计关键词出现的次数。最后，使用 'openpyxl' 模块将结果写入 Excel 文件中。

下面是一个示例代码：

import os
import re
from collections import Counter
from docx import Document
from openpyxl import Workbook

# 定义关键词列表
keywords = ['Python', '数据分析', '机器学习']

# 获取文件夹下的所有 docx 文件
folder_path = r'C:\Documents'
file_list = [os.path.join(folder_path, file) for file in os.listdir(folder_path) if file.endswith('.docx')]

# 统计关键词词频
result = Counter()
for file in file_list:
    document = Document(file)
    text = '\n'.join([paragraph.text for paragraph in document.paragraphs])
    words = re.findall(r'\b\w+\b', text)
    for word in words:
        if word in keywords:
            result[word] += 1

# 将结果写入 Excel 文件
wb = Workbook()
ws = wb.active
ws.title = 'Result'
ws.cell(row=1, column=1, value='Keyword')
ws.cell(row=1, column=2, value='Frequency')
row = 2
for word, freq in result.items():
    ws.cell(row=row, column=1, value=word)
    ws.cell(row=row, column=2, value=freq)
    row += 1
wb.save('result.xlsx')

上述代码将统计文件夹下所有 docx 文件中出现的关键词列表中的关键词的词频，并将结果写入名为“result.xlsx”的 Excel 文件中。可以根据需要修改关键词列表和文件夹路径。