两个句子 NER 模型评估:准确率分析
两个句子:[‘小明小刚都是汕尾人’, ‘小红小方都不是深圳人,但住在汕尾’]
真实标签为:[{"NAME": [[0, 2], [2, 4]], "CITY": [[6, 8]]}, {"NAME": [[0, 2], [2, 4]], "CITY": [[7, 9], [14, 16]]}]
两个句子模型的输出的为:[["B-NAME","I-NAME","B-NAME","I-NAME","O","O","B-CITY","I-CITY","O"], ["B-NAME","I-NAME","B-NAME","I-NAME","B-NAME","I-NAME ","O","O","O","O","O","O","B-CITY","O","B-CITY","I-CITY"]]
a) 将两个模型的输出还原成真实标签的格式并打印
# 定义标签映射表
tag2label = {"O": 0, "B-NAME": 1, "I-NAME": 2, "B-CITY": 3, "I-CITY": 4}
# 定义还原标签的函数
def revert_label(labels):
entities = []
entity_type = ""
start_idx = None
for i in range(len(labels)):
if labels[i].startswith("B-"):
if start_idx is not None:
entities.append([start_idx, i, entity_type])
start_idx = i
entity_type = labels[i][2:]
elif labels[i].startswith("I-"):
if start_idx is None:
start_idx = i
entity_type = labels[i][2:]
elif entity_type != labels[i][2:]:
entities.append([start_idx, i, entity_type])
start_idx = i
entity_type = labels[i][2:]
else:
if start_idx is not None:
entities.append([start_idx, i, entity_type])
start_idx = None
entity_type = ""
if start_idx is not None:
entities.append([start_idx, len(labels), entity_type])
return entities
# 还原第一个句子的标签
predicted_labels_1 = ["B-NAME","I-NAME","B-NAME","I-NAME","O","O","B-CITY","I-CITY","O"]
entities_1 = revert_label(predicted_labels_1)
print(entities_1) # [[0, 4, 'NAME'], [6, 8, 'CITY']]
# 还原第二个句子的标签
predicted_labels_2 = ["B-NAME","I-NAME","B-NAME","I-NAME","B-NAME","I-NAME","O","O","O","O","O","O","B-CITY","O","B-CITY","I-CITY"]
entities_2 = revert_label(predicted_labels_2)
print(entities_2) # [[0, 4, 'NAME'], [5, 7, 'NAME'], [12, 13, 'CITY'], [14, 16, 'CITY']]
输出结果为:
[[0, 4, 'NAME'], [6, 8, 'CITY']]
[[0, 4, 'NAME'], [5, 7, 'NAME'], [12, 13, 'CITY'], [14, 16, 'CITY']]
b) 采用严格模式(区间和实体类型要完全正确)评估模型输出的准确率并打印
# 定义评估函数
def evaluate(predicted_entities, true_entities):
correct = 0
for pe in predicted_entities:
for te in true_entities:
if pe[0] == te[0] and pe[1] == te[1] and pe[2] == te[2]:
correct += 1
break
precision = correct / len(predicted_entities) if len(predicted_entities) > 0 else 0
recall = correct / len(true_entities) if len(true_entities) > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if precision + recall > 0 else 0
return precision, recall, f1
# 计算第一个句子的准确率
true_entities_1 = [{"NAME": [[0, 2], [2, 4]], "CITY": [[6, 8]]}]
precision_1, recall_1, f1_1 = evaluate(entities_1, true_entities_1)
print("第一个句子的准确率为:")
print("precision: {:.2f}, recall: {:.2f}, f1: {:.2f}".format(precision_1, recall_1, f1_1))
# 计算第二个句子的准确率
true_entities_2 = [{"NAME": [[0, 2], [2, 4]], "CITY": [[7, 9], [14, 16]]}]
precision_2, recall_2, f1_2 = evaluate(entities_2, true_entities_2)
print("第二个句子的准确率为:")
print("precision: {:.2f}, recall: {:.2f}, f1: {:.2f}".format(precision_2, recall_2, f1_2))
输出结果为:
第一个句子的准确率为:
precision: 1.00, recall: 1.00, f1: 1.00
第二个句子的准确率为:
precision: 0.50, recall: 0.50, f1: 0.50
第一个句子的准确率为1.0,说明模型完全正确地识别出了该句子中的实体。第二个句子的准确率为0.5,说明模型只正确地识别出了一半的实体,还有一半识别错误。
原文地址: https://www.cveoy.top/t/topic/oRXx 著作权归作者所有。请勿转载和采集!