Discontinuous Named Entity Recognition (NER) can be formulated as a problem of Maximal Clique Discovery (MCD). In NER, the goal is to identify and classify named entities such as person names, organization names, and location names in a given text. Discontinuous NER refers to the task of identifying named entities that are not contiguous in the text, i.e., they are composed of multiple non-consecutive tokens.

MCD is a well-known problem in graph theory, where the goal is to find the largest clique in a graph, i.e., a set of vertices that are all connected to each other. In the case of NER, the tokens in the text can be represented as vertices in a graph, and the edges between them represent their co-occurrence in the text. A clique in this graph corresponds to a set of tokens that form a named entity.

To solve the MCD problem, various algorithms can be used, such as Bron-Kerbosch algorithm, which is a recursive backtracking algorithm that finds all maximal cliques in a graph. Other algorithms can also be used, such as Clique Percolation Method (CPM), which finds overlapping cliques in a graph.

Using MCD for discontinuous NER has several advantages. First, it allows for the identification of named entities that are not contiguous in the text, which is a common occurrence in natural language. Second, it can handle noisy data and ambiguous cases, as it looks for the largest set of tokens that form a named entity. Finally, it can be extended to handle other tasks, such as relation extraction, by looking for maximal cliques that are connected by specific types of edges.

Discontinuous Named Entity Recognition as Maximal Clique Discovery

原文地址: https://www.cveoy.top/t/topic/bZGI 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录