用python写K均值聚类方法

K均值聚类是一种常见的无监督学习算法，用于将数据分成K个簇。以下是一个用Python实现K均值聚类的例子：

import numpy as np
import matplotlib.pyplot as plt

def k_means(X, K, max_iters=10):
    # 初始化质心
    centroids = X[np.random.choice(len(X), K, replace=False)]
    for i in range(max_iters):
        # 计算每个点到质心的距离
        distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
        # 找到每个点距离最近的质心的索引
        cluster_indices = np.argmin(distances, axis=0)
        # 更新质心
        for j in range(K):
            centroids[j] = X[cluster_indices == j].mean(axis=0)
    return cluster_indices, centroids

# 生成数据
np.random.seed(0)
X = np.random.randn(100, 2)
X[:50] += 5

# 聚类
cluster_indices, centroids = k_means(X, 2)

# 可视化
plt.scatter(X[:,0], X[:,1], c=cluster_indices)
plt.scatter(centroids[:,0], centroids[:,1], s=100, marker='X', c='red')
plt.show()

该代码使用numpy和matplotlib库，生成一个二维数据集并将其聚类为两个簇。函数k_means接受三个参数：数据集X，簇的数量K和最大迭代次数max_iters。在每次迭代中，该函数计算每个点到质心的距离，并将每个点分配到距离最近的质心所在的簇。然后，它更新每个簇的质心。此过程重复max_iters次或直到质心不再变化为止。最后，该函数返回每个点所属的簇的索引和每个簇的质心。最后，使用matplotlib库将数据和质心可视化。