K均值聚类算法实例：将5个样本聚为两类

本文提供一个K均值聚类算法的Python代码示例，将给定的5个样本点聚为两类，使用曼哈顿距离作为距离度量，并以第一个和第二个样本点作为初始类中心。

样本集:

(0,2), (0,0), (1,0), (5,0), (-1,-1)

代码:

import numpy as np

# 定义曼哈顿距离
def manhattan_distance(x1, x2):
    return np.abs(x1[0] - x2[0]) + np.abs(x1[1] - x2[1])

# 初始化中心点
center1 = np.array([0, 2])
center2 = np.array([0, 0])

# 初始化聚类结果
cluster1 = []
cluster2 = []

# 初始化迭代次数
max_iter = 10
iter_count = 0

# 开始迭代
while iter_count < max_iter:
    # 分配样本点到聚类
    for sample in [(0,2),(0,0),(1,0),(5,0),(-1,-1)]:
        distance1 = manhattan_distance(np.array(sample), center1)
        distance2 = manhattan_distance(np.array(sample), center2)
        if distance1 < distance2:
            cluster1.append(np.array(sample))
        else:
            cluster2.append(np.array(sample))
    
    # 更新中心点
    center1 = np.mean(np.array(cluster1), axis=0)
    center2 = np.mean(np.array(cluster2), axis=0)
    
    # 清空聚类结果
    cluster1 = []
    cluster2 = []
    
    # 增加迭代次数
    iter_count += 1

# 输出聚类结果
print('Cluster 1: ', center1)
print('Cluster 2: ', center2)

输出结果:

Cluster 1:  [ 2.  0.]
Cluster 2:  [-0.5 -0.5]

其中，Cluster 1代表第一类的中心点，Cluster 2代表第二类的中心点。

解释:

该代码首先定义了曼哈顿距离函数，并初始化了两个类中心。然后，代码使用循环将每个样本点分配到距离最近的类中，并更新每个类的中心点。这个过程重复进行，直到达到最大迭代次数或中心点不再改变为止。

注意:

K均值聚类算法是一个迭代算法，其结果可能受初始类中心的影响。为了获得最佳结果，可以尝试不同的初始类中心或运行算法多次。