Python 文本聚类代码：使用 K-Means 算法

以下是一个简单的 Python 文本聚类代码，使用 K-Means 算法：

import numpy as np
from sklearn.cluster import KMeans
from sklearn.feature_extraction.text import TfidfVectorizer

# 载入文本数据
docs = ['This is the first document.',
        'This is the second document.',
        'And this is the third one.',
        'Is this the first document?']

# 将文本转化为 TF-IDF 特征向量
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(docs)

# 使用 K-Means 算法进行聚类
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

# 输出聚类结果
labels = kmeans.labels_
for i in range(len(docs)):
    print(docs[i], ' - Cluster ', labels[i])

输出结果如下：

This is the first document.  - Cluster  0
This is the second document.  - Cluster  0
And this is the third one.  - Cluster  1
Is this the first document?  - Cluster  0

在这个例子中，我们使用了 4 个文本数据，将它们转化为了 TF-IDF 特征向量，并使用 K-Means 算法将它们聚类成了 2 个簇。输出结果显示，第 1、2、4 个文本被归为了一个簇，第 3 个文本被归为了另一个簇。