Python 代码:根据标签提取词语并输出到文件
-- coding: utf-8 --
import re
定义需要提取的标签
tags = ['B-prov', 'I-prov', 'E-prov', 'B-city', 'I-city', 'E-city', 'B-district', 'I-district', 'E-district', 'B-town', 'I-town', 'E-town', 'B-community', 'I-community', 'E-community', 'B-village_group', 'E-village_group', 'B-road', 'I-road', 'E-road', 'B-subpoi', 'I-subpoi', 'E-subpoi']
读取文件内容
with open('train.conll', 'r', encoding='utf-8') as f: content = f.read()
使用正则表达式匹配标签和词语
pattern = r'(\S+)\s+(\S+)' matches = re.findall(pattern, content)
提取符合标签的词语
words = [] for i in range(len(matches)): if matches[i][1] in tags: words.append(matches[i][0])
将提取的词语按要求输出到文件
with open('ci.txt', 'w', encoding='utf-8') as f: for i in range(len(words)): if 'E-prov' in tags and words[i] == '省': f.write(words[i] + '\n') elif 'E-city' in tags and words[i] == '市': f.write(words[i] + '\n') elif 'E-district' in tags and words[i] == '区': f.write(words[i] + '\n') elif 'E-town' in tags and words[i] == '镇': f.write(words[i] + '\n') elif 'E-community' in tags and words[i] == '村': f.write(words[i] + '\n') elif 'E-village_group' in tags and words[i] == '组': f.write(words[i] + '\n') elif 'E-road' in tags and words[i] == '路': f.write(words[i] + '\n') elif 'E-subpoi' in tags and words[i] == '部': f.write(words[i] + '\n') else: f.write(words[i])
原文地址: https://www.cveoy.top/t/topic/f2JO 著作权归作者所有。请勿转载和采集!