调用numpy用python实现DBSCAN算法返回值为 簇中心各样本所属簇的标签
以下是一个使用numpy实现DBSCAN算法的示例代码:
import numpy as np
def dbscan(X, eps, min_samples):
"""
DBSCAN clustering algorithm implementation using numpy
Parameters:
X: numpy array, shape (n_samples, n_features)
The input data to be clustered
eps: float
The maximum distance between two samples for them to be considered
as in the same neighborhood
min_samples: int
The minimum number of samples in a neighborhood for a point to be
considered as a core point
Returns:
labels: numpy array, shape (n_samples,)
The cluster labels for each sample. Noise points are labeled as -1.
centers: numpy array, shape (n_clusters, n_features)
The cluster centers. Only available if at least one cluster is found.
"""
n_samples, n_features = X.shape
# Compute pairwise distances between all samples
dists = np.sqrt(np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis=-1))
# Initialize cluster labels and core point flags
labels = -np.ones(n_samples, dtype=np.int32)
is_core = np.zeros(n_samples, dtype=np.bool)
# Find core points
for i in range(n_samples):
neighbors = np.where(dists[i] <= eps)[0]
if len(neighbors) >= min_samples:
is_core[i] = True
# Assign cluster labels to core points
curr_label = 0
for i in range(n_samples):
if is_core[i] and labels[i] == -1:
labels[i] = curr_label
stack = [i]
while stack:
curr = stack.pop()
neighbors = np.where(dists[curr] <= eps)[0]
for j in neighbors:
if is_core[j] and labels[j] == -1:
labels[j] = curr_label
stack.append(j)
elif labels[j] == -1:
labels[j] = -2
curr_label += 1
# Assign cluster labels to non-core points
for i in range(n_samples):
if labels[i] == -1:
neighbors = np.where(dists[i] <= eps)[0]
for j in neighbors:
if labels[j] >= 0:
labels[i] = labels[j]
break
# Remove noise points
n_clusters = np.max(labels) + 1
if -1 in labels:
labels = labels[labels != -1]
X = X[labels != -1]
is_core = is_core[labels != -1]
n_samples, n_features = X.shape
# Compute cluster centers
centers = np.zeros((n_clusters, n_features))
for i in range(n_clusters):
centers[i] = np.mean(X[labels == i], axis=0)
return labels, centers
该函数接受输入数据 X,最大距离 eps 和最小样本数 min_samples 作为参数,并返回每个样本所属的簇标签以及簇中心。其中,簇标签为一个 numpy 数组,簇中心为一个 numpy 数组,其中每行表示一个簇的中心点。
以下是一个使用示例:
X = np.random.rand(100, 2)
labels, centers = dbscan(X, eps=0.3, min_samples=5)
print(labels)
print(centers)
该示例将生成一个随机二维数据集,并使用 DBSCAN 算法对其进行聚类。聚类结果将打印输出
原文地址: http://www.cveoy.top/t/topic/hlAk 著作权归作者所有。请勿转载和采集!