MNIST 手写体识别数据集降维与聚类实验报告
由于没有提供相关资料,无法进行完整的实验报告。以下仅提供PCA和K-Means的代码示例供参考。
PCA降维:
import pandas as pd
from sklearn.decomposition import PCA
# 读取数据
data = pd.read_csv('MNIST_train.csv')
# 获取像素数据
pixels = data.iloc[:, 1:]
# 进行PCA降维,保留85%的能量
pca = PCA(n_components=0.85)
pixels_pca = pca.fit_transform(pixels)
# 将降维后的数据存储到文件中
data_pca = pd.DataFrame(data['label'])
data_pca = pd.concat([data_pca, pd.DataFrame(pixels_pca)], axis=1)
data_pca.to_csv('MNIST_train_pca.csv', index=False)
K-Means聚类:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# 读取数据
data = pd.read_csv('MNIST_train_pca.csv')
# 获取像素数据
pixels = data.iloc[:, 1:]
# 选取超参数k
best_k = 0
best_score = -1
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, random_state=42)
labels = kmeans.fit_predict(pixels)
score = silhouette_score(pixels, labels)
if score > best_score:
best_score = score
best_k = k
# 使用最佳的k进行聚类
kmeans = KMeans(n_clusters=best_k, random_state=42)
labels = kmeans.fit_predict(pixels)
# 将聚类结果存储到文件中
data_clustered = pd.DataFrame(data['label'])
data_clustered['cluster'] = labels
data_clustered.to_csv('MNIST_train_clustered.csv', index=False)
# 计算聚类结果的评价指标
score = silhouette_score(pixels, labels)
print('Best k:', best_k)
print('Silhouette score:', score)
原文地址: https://www.cveoy.top/t/topic/oqba 著作权归作者所有。请勿转载和采集!