Python 函数实现词库中前十个词在训练集中的频次直方图

使用 Python 函数绘制词库中前十个词在训练集中的频次直方图

本文将介绍如何使用 Python 函数来实现词库中前十个词在训练集中的频次直方图绘制。

假设词库为 words，类型为 pandas.core.series.Series，训练集为 X_train，同样为 pandas.core.series.Series。

函数实现:

import matplotlib.pyplot as plt

def plot_top_words(words, X_train):
    top_words = words.value_counts().head(10)  # 获取词库中出现频次最高的前10个词
    word_freq = [X_train.str.count(word).sum() for word in top_words.index]  # 获取这些词在训练集中的出现频次
    plt.bar(top_words.index, word_freq)  # 绘制直方图
    plt.xticks(rotation=45)  # 设置x轴标签旋转角度
    plt.show()  # 显示直方图

函数解释:

导入 Matplotlib 库: import matplotlib.pyplot as plt
定义函数: plot_top_words(words, X_train)
获取词库中出现频次最高的前10个词: top_words = words.value_counts().head(10)
统计词在训练集中的出现频次: word_freq = [X_train.str.count(word).sum() for word in top_words.index]
绘制直方图: plt.bar(top_words.index, word_freq)
设置x轴标签旋转角度: plt.xticks(rotation=45)
显示直方图: plt.show()

代码示例:

import pandas as pd

words = pd.Series(['apple', 'banana', 'orange', 'apple', 'banana', 'banana', 'orange', 'apple', 'orange', 'apple'])
X_train = pd.Series(['I like apples', 'Bananas are yellow', 'Oranges are juicy', 'I love apples', 'Bananas are sweet'])

plot_top_words(words, X_train)

通过上述代码，可以轻松实现词库中前十个词在训练集中的频次直方图绘制，并清晰直观地展示词频信息。