如何利用python批量统计文件夹下txt文件中的特定关键词词频并将输出结果导出到excel
步骤如下:
-
导入必要的库:os、re、collections、xlwt
-
定义统计函数count_word,输入参数为文件路径和关键词,输出为关键词的词频
-
遍历文件夹下的所有txt文件,调用count_word函数统计关键词的词频
-
使用xlwt库将结果导出到excel文件中
代码如下:
import os
import re
import collections
import xlwt
# 统计关键词的词频
def count_word(file_path, keyword):
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
word_list = re.findall(keyword, content)
return collections.Counter(word_list)[keyword]
# 遍历文件夹下的所有txt文件,统计关键词的词频
def traverse_folder(folder_path, keyword):
result = []
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.endswith('.txt'):
file_path = os.path.join(root, file)
result.append((file_path, count_word(file_path, keyword)))
return result
# 将结果导出到excel文件中
def export_to_excel(result, output_path):
workbook = xlwt.Workbook(encoding='utf-8')
worksheet = workbook.add_sheet('Sheet1')
row = 0
for file_path, count in result:
worksheet.write(row, 0, file_path)
worksheet.write(row, 1, count)
row += 1
workbook.save(output_path)
if __name__ == '__main__':
folder_path = 'path/to/folder'
keyword = 'some_keyword'
result = traverse_folder(folder_path, keyword)
export_to_excel(result, 'output.xls')
其中,需要将path/to/folder替换为实际的文件夹路径,将some_keyword替换为实际的关键词,将output.xls替换为实际的输出文件路径。
原文地址: https://www.cveoy.top/t/topic/bjlK 著作权归作者所有。请勿转载和采集!