打开文件

f = open("train.conll", "r", encoding="utf-8")

定义需要提取的标签

tags = ["B-prov", "I-prov", "E-prov", "B-city", "I-city", "E-city", "B-district", "I-district", "E-district", "B-town", "I-town", "E-town", "B-community", "I-community", "E-community", "B-village_group", "E-village_group", "B-road", "I-road", "E-road", "B-subpoi", "I-subpoi", "E-subpoi"]

定义存储词的列表

words = []

定义当前词的标签和内容

current_tag = None current_word = ""

逐行读取文件

for line in f: # 去除行末的换行符 line = line.strip() # 如果行为空,则跳过 if not line: continue # 将行按空格分割成列表 parts = line.split() # 获取标签和词 tag = parts[-1] word = parts[0] # 如果标签在需要提取的标签列表中 if tag in tags: # 如果当前词的标签为空,则将当前标签和词更新为当前标签和词 if current_tag is None: current_tag = tag current_word = word # 如果当前词的标签与当前标签相同,则将当前词的内容更新为当前词的内容加上当前词的词 elif current_tag == tag: current_word += word # 如果当前词的标签与当前标签不同,则将当前词的内容添加到词列表中,并更新当前标签和词 else: words.append(current_word) current_tag = tag current_word = word # 如果标签不在需要提取的标签列表中,则将当前词的内容添加到词列表中,并重置当前标签和词 else: if current_tag is not None: words.append(current_word) current_tag = None current_word = "" else: continue

关闭文件

f.close()

将词按空格连接成字符串

result = " ".join(words)

将字符串按空格分割成列表

result_list = result.split()

将列表中的每个词加上换行符

result_list = [word + "\n" for word in result_list]

将列表中的词写入文件

f = open("ci.txt", "w", encoding="utf-8") f.writelines(result_list) f.close(

# 打开文件f = opentrainconll r encoding=utf-8# 定义需要提取的标签tags = B-prov I-prov E-prov B-city I-city E-city B-district I-district E-district B-town I-town E-town B-community I-com

原文地址: https://www.cveoy.top/t/topic/hwAR 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录