请用python写一段代码读入csv文件以文件中的pubid变量为单位把ResearchAreas变量转换成共现矩阵的格式。我将展示一部分数据给您供您参考pubid ResearchAreasche001 Microbiologyche001 Genetics & Heredityche001 Biochemistry & Molecular Biologyche001 Microbi
import pandas as pd import numpy as np
读入csv文件
data = pd.read_csv("filename.csv")
以pubid为单位,把ResearchAreas变量转换成共现矩阵的格式
pubids = data["pubid"].unique() research_areas = data["ResearchAreas"].unique() co_occurrence_matrix = pd.DataFrame(0, index=research_areas, columns=research_areas) for pubid in pubids: areas = data.loc[data["pubid"] == pubid, "ResearchAreas"].unique() for i in range(len(areas)): for j in range(i+1, len(areas)): co_occurrence_matrix.loc[areas[i], areas[j]] += 1 co_occurrence_matrix.loc[areas[j], areas[i]] += 1
print(co_occurrence_matrix
原文地址: http://www.cveoy.top/t/topic/eBPN 著作权归作者所有。请勿转载和采集!