Python代码:使用pandas库将CSV文件中的研究领域转换成共现矩阵
import pandas as pd
# 读入csv文件
df = pd.read_csv('data.csv')
# 将ResearchAreas变量转换成共现矩阵的格式
co_occurrence_matrix = pd.crosstab(df['pubid'], df['ResearchAreas'])
# 输出共现矩阵
print(co_occurrence_matrix)
原理解释:
- 首先使用pandas库中的
read_csv方法读入CSV文件,将其命名为df。 - 然后使用pandas库中的
crosstab方法,将pubid和ResearchAreas两个变量进行共现矩阵的转换,将转换后的矩阵命名为co_occurrence_matrix。 - 最后使用
print函数输出共现矩阵。
示例数据:
pubid ResearchAreas
che001 Microbiology
che001 Genetics & Heredity
che001 Biochemistry & Molecular Biology
che001 Microbiology
che001 Science & Technology - Other Topics
che001 Life Sciences & Biomedicine - Other Topics
che002 Biochemistry & Molecular Biology
che002 Science & Technology - Other Topics
che002 Cell Biology
che002 Science & Technology - Other Topics
che002 Biochemistry & Molecular Biology
che002 Science & Technology - Other Topics
che002 Genetics & Heredity
che003 Reproductive Biology
che003 Pediatrics
che003 Environmental Sciences & Ecology
che003 Cardiovascular System & Cardiology
che003 Radiology, Nuclear Medicine & Medical Imaging
che003 Ethnic Studies
che005 Physics
che005 Chemistry
che005 Genetics & Heredity
che005 Biochemistry & Molecular Biology
che005 Microscopy
che006 Biochemistry & Molecular Biology
che006 Computer Science
che006 Chemistry
che006 Genetics & Heredity
che006 Microbiology
原文地址: https://www.cveoy.top/t/topic/nToK 著作权归作者所有。请勿转载和采集!