Python 使用 Pickle 获取语料库共现词表
首先,需要导入 pickle 模块和语料库数据:
import pickle
with open('corpus.pkl', 'rb') as f:
corpus = pickle.load(f)
然后,可以使用 Python 中的 Counter 类来计算共现词表:
from collections import Counter
cooccur = Counter()
for doc in corpus:
for i, word1 in enumerate(doc):
for j, word2 in enumerate(doc):
if i != j:
cooccur[(word1, word2)] += 1
最后,可以将共现词表保存到 pickle 文件中:
with open('cooccur.pkl', 'wb') as f:
pickle.dump(cooccur, f)
原文地址: https://www.cveoy.top/t/topic/jZpc 著作权归作者所有。请勿转载和采集!