使用python编写无监督聚类算法对基因进行聚类并且满足以下要求:1读入Excel表格第0行为基因名称其余各行为基因表达量路径为:CUserslenovoDesktopHIVDNN神经网络测试output_data1xlsx
2.实现K-means算法 3.输出每个聚类的基因名称和基因表达量,并将结果保存为Excel表格,路径为:C:\Users\lenovo\Desktop\HIV\DNN神经网络测试\output_data2.xlsx
代码如下:
import pandas as pd
import numpy as np
# 读取Excel文件
data = pd.read_excel(r'C:\Users\lenovo\Desktop\HIV\DNN神经网络测试\output_data1.xlsx', index_col=0)
# K-means算法
def kmeans(data, k):
# 随机选择k个质心
centroids = data.sample(k)
# 初始化聚类结果
clusters = pd.DataFrame(index=data.index, columns=['cluster'])
# 迭代聚类
while True:
# 计算每个样本到各个质心的距离
distances = pd.DataFrame(index=data.index, columns=centroids.index)
for i in centroids.index:
distances[i] = np.sqrt(((data - centroids.loc[i])**2).sum(axis=1))
# 分配聚类
clusters['cluster'] = distances.idxmin(axis=1)
# 计算新的质心
new_centroids = data.groupby(clusters['cluster']).mean()
# 判断是否收敛
if new_centroids.equals(centroids):
break
centroids = new_centroids
return clusters
# 聚类结果
clusters = kmeans(data, 3)
# 输出每个聚类的基因名称和基因表达量
results = []
for i in range(clusters['cluster'].nunique()):
cluster = clusters[clusters['cluster']==i].index
result = pd.DataFrame({'Gene': cluster, 'Expression': data.loc[cluster].mean(axis=1)})
results.append(result)
# 保存结果到Excel文件
writer = pd.ExcelWriter(r'C:\Users\lenovo\Desktop\HIV\DNN神经网络测试\output_data2.xlsx')
for i, result in enumerate(results):
result.to_excel(writer, sheet_name='Cluster{}'.format(i), index=False)
writer.save()
原文地址: https://www.cveoy.top/t/topic/bPvA 著作权归作者所有。请勿转载和采集!