import pandas as pd import numpy as np

读入csv文件

data = pd.read_csv("filename.csv")

以pubid为单位,把ResearchAreas变量转换成共现矩阵的格式

pubids = data["pubid"].unique() research_areas = data["ResearchAreas"].unique() co_occurrence_matrix = pd.DataFrame(0, index=research_areas, columns=research_areas) for pubid in pubids: areas = data.loc[data["pubid"] == pubid, "ResearchAreas"].unique() for i in range(len(areas)): for j in range(i+1, len(areas)): co_occurrence_matrix.loc[areas[i], areas[j]] += 1 co_occurrence_matrix.loc[areas[j], areas[i]] += 1

print(co_occurrence_matrix

请用python写一段代码读入csv文件以文件中的pubid变量为单位把ResearchAreas变量转换成共现矩阵的格式。我将展示一部分数据给您供您参考pubid	 ResearchAreasche001	Microbiologyche001	Genetics & Heredityche001	Biochemistry & Molecular Biologyche001	Microbi

原文地址: http://www.cveoy.top/t/topic/eBPN 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录