-- coding: utf-8 --

import re

定义需要提取的标签

tags = ['B-prov', 'I-prov', 'E-prov', 'B-city', 'I-city', 'E-city', 'B-district', 'I-district', 'E-district', 'B-town', 'I-town', 'E-town', 'B-community', 'I-community', 'E-community', 'B-village_group', 'E-village_group', 'B-road', 'I-road', 'E-road', 'B-subpoi', 'I-subpoi', 'E-subpoi']

读取文件内容

with open('train.conll', 'r', encoding='utf-8') as f: content = f.read()

使用正则表达式匹配标签和词语

pattern = r'(\S+)\s+(\S+)' matches = re.findall(pattern, content)

提取符合标签的词语

words = [] for i in range(len(matches)): if matches[i][1] in tags: words.append(matches[i][0])

将提取的词语按要求输出到文件

with open('ci.txt', 'w', encoding='utf-8') as f: for i in range(len(words)): if 'E-prov' in tags and words[i] == '省': f.write(words[i] + '\n') elif 'E-city' in tags and words[i] == '市': f.write(words[i] + '\n') elif 'E-district' in tags and words[i] == '区': f.write(words[i] + '\n') elif 'E-town' in tags and words[i] == '镇': f.write(words[i] + '\n') elif 'E-community' in tags and words[i] == '村': f.write(words[i] + '\n') elif 'E-village_group' in tags and words[i] == '组': f.write(words[i] + '\n') elif 'E-road' in tags and words[i] == '路': f.write(words[i] + '\n') elif 'E-subpoi' in tags and words[i] == '部': f.write(words[i] + '\n') else: f.write(words[i])

Python 代码:根据标签提取词语并输出到文件

原文地址: https://www.cveoy.top/t/topic/f2JO 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录