Python批量统计文件夹下txt文件特定关键词词频
以下是一个简单的Python脚本,可以批量统计文件夹下所有txt文件中的特定关键词词频:
import os
# 要查找的关键词
keywords = ['apple', 'banana', 'orange']
# 统计词频的函数
def count_word_freq(file_path, keywords):
freq_dict = {}
with open(file_path, 'r') as f:
for line in f:
for word in line.split():
if word.lower() in keywords:
if word.lower() in freq_dict:
freq_dict[word.lower()] += 1
else:
freq_dict[word.lower()] = 1
return freq_dict
# 统计文件夹下所有txt文件的词频
def count_folder_word_freq(folder_path, keywords):
freq_dict = {}
for filename in os.listdir(folder_path):
if filename.endswith('.txt'):
file_path = os.path.join(folder_path, filename)
file_freq_dict = count_word_freq(file_path, keywords)
for word in keywords:
if word in file_freq_dict:
if word in freq_dict:
freq_dict[word] += file_freq_dict[word]
else:
freq_dict[word] = file_freq_dict[word]
return freq_dict
# 示例:统计当前文件夹下所有txt文件中apple、banana、orange的词频
folder_path = './'
freq_dict = count_folder_word_freq(folder_path, keywords)
print(freq_dict)
该脚本将文件夹路径和关键词作为输入,使用count_word_freq函数统计单个txt文件中的词频,再使用count_folder_word_freq函数统计整个文件夹下所有txt文件的词频,并返回一个包含每个关键词词频的字典。
原文地址: https://www.cveoy.top/t/topic/mK4y 著作权归作者所有。请勿转载和采集!