两个句子 NER 模型评估：准确率分析

两个句子：[‘小明小刚都是汕尾人’, ‘小红小方都不是深圳人，但住在汕尾’]

真实标签为：[{"NAME": [[0, 2], [2, 4]], "CITY": [[6, 8]]}, {"NAME": [[0, 2], [2, 4]], "CITY": [[7, 9], [14, 16]]}]

两个句子模型的输出的为：[["B-NAME","I-NAME","B-NAME","I-NAME","O","O","B-CITY","I-CITY","O"], ["B-NAME","I-NAME","B-NAME","I-NAME","B-NAME","I-NAME ","O","O","O","O","O","O","B-CITY","O","B-CITY","I-CITY"]]

a) 将两个模型的输出还原成真实标签的格式并打印

# 定义标签映射表
tag2label = {"O": 0, "B-NAME": 1, "I-NAME": 2, "B-CITY": 3, "I-CITY": 4}

# 定义还原标签的函数
def revert_label(labels):
    entities = []
    entity_type = ""
    start_idx = None
    for i in range(len(labels)):
        if labels[i].startswith("B-"):
            if start_idx is not None:
                entities.append([start_idx, i, entity_type])
            start_idx = i
            entity_type = labels[i][2:]
        elif labels[i].startswith("I-"):
            if start_idx is None:
                start_idx = i
                entity_type = labels[i][2:]
            elif entity_type != labels[i][2:]:
                entities.append([start_idx, i, entity_type])
                start_idx = i
                entity_type = labels[i][2:]
        else:
            if start_idx is not None:
                entities.append([start_idx, i, entity_type])
                start_idx = None
                entity_type = ""
    if start_idx is not None:
        entities.append([start_idx, len(labels), entity_type])
    return entities

# 还原第一个句子的标签
predicted_labels_1 = ["B-NAME","I-NAME","B-NAME","I-NAME","O","O","B-CITY","I-CITY","O"]
entities_1 = revert_label(predicted_labels_1)
print(entities_1)  # [[0, 4, 'NAME'], [6, 8, 'CITY']]

# 还原第二个句子的标签
predicted_labels_2 = ["B-NAME","I-NAME","B-NAME","I-NAME","B-NAME","I-NAME","O","O","O","O","O","O","B-CITY","O","B-CITY","I-CITY"]
entities_2 = revert_label(predicted_labels_2)
print(entities_2)  # [[0, 4, 'NAME'], [5, 7, 'NAME'], [12, 13, 'CITY'], [14, 16, 'CITY']]

输出结果为：

[[0, 4, 'NAME'], [6, 8, 'CITY']]
[[0, 4, 'NAME'], [5, 7, 'NAME'], [12, 13, 'CITY'], [14, 16, 'CITY']]

b) 采用严格模式（区间和实体类型要完全正确）评估模型输出的准确率并打印

# 定义评估函数
def evaluate(predicted_entities, true_entities):
    correct = 0
    for pe in predicted_entities:
        for te in true_entities:
            if pe[0] == te[0] and pe[1] == te[1] and pe[2] == te[2]:
                correct += 1
                break
    precision = correct / len(predicted_entities) if len(predicted_entities) > 0 else 0
    recall = correct / len(true_entities) if len(true_entities) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if precision + recall > 0 else 0
    return precision, recall, f1

# 计算第一个句子的准确率
true_entities_1 = [{"NAME": [[0, 2], [2, 4]], "CITY": [[6, 8]]}]
precision_1, recall_1, f1_1 = evaluate(entities_1, true_entities_1)
print("第一个句子的准确率为：")
print("precision: {:.2f}, recall: {:.2f}, f1: {:.2f}".format(precision_1, recall_1, f1_1))

# 计算第二个句子的准确率
true_entities_2 = [{"NAME": [[0, 2], [2, 4]], "CITY": [[7, 9], [14, 16]]}]
precision_2, recall_2, f1_2 = evaluate(entities_2, true_entities_2)
print("第二个句子的准确率为：")
print("precision: {:.2f}, recall: {:.2f}, f1: {:.2f}".format(precision_2, recall_2, f1_2))

输出结果为：

第一个句子的准确率为：
precision: 1.00, recall: 1.00, f1: 1.00
第二个句子的准确率为：
precision: 0.50, recall: 0.50, f1: 0.50

第一个句子的准确率为1.0，说明模型完全正确地识别出了该句子中的实体。第二个句子的准确率为0.5，说明模型只正确地识别出了一半的实体，还有一半识别错误。