import pandas as pd

# 读入csv文件
df = pd.read_csv('data.csv')

# 将ResearchAreas变量转换成共现矩阵的格式
co_occurrence_matrix = pd.crosstab(df['pubid'], df['ResearchAreas'])

# 输出共现矩阵
print(co_occurrence_matrix)

原理解释:

  1. 首先使用pandas库中的read_csv方法读入CSV文件,将其命名为df
  2. 然后使用pandas库中的crosstab方法,将pubidResearchAreas两个变量进行共现矩阵的转换,将转换后的矩阵命名为co_occurrence_matrix
  3. 最后使用print函数输出共现矩阵。

示例数据:

pubid	ResearchAreas
che001	Microbiology
che001	Genetics & Heredity
che001	Biochemistry & Molecular Biology
che001	Microbiology
che001	Science & Technology - Other Topics
che001	Life Sciences & Biomedicine - Other Topics
che002	Biochemistry & Molecular Biology
che002	Science & Technology - Other Topics
che002	Cell Biology
che002	Science & Technology - Other Topics
che002	Biochemistry & Molecular Biology
che002	Science & Technology - Other Topics
che002	Genetics & Heredity
che003	Reproductive Biology
che003	Pediatrics
che003	Environmental Sciences & Ecology
che003	Cardiovascular System & Cardiology
che003	Radiology, Nuclear Medicine & Medical Imaging
che003	Ethnic Studies
che005	Physics
che005	Chemistry
che005	Genetics & Heredity
che005	Biochemistry & Molecular Biology
che005	Microscopy
che006	Biochemistry & Molecular Biology
che006	Computer Science
che006	Chemistry
che006	Genetics & Heredity
che006	Microbiology
Python代码:使用pandas库将CSV文件中的研究领域转换成共现矩阵

原文地址: https://www.cveoy.top/t/topic/nToK 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录