Discontinuous Named Entity Recognition as Maximal Clique Discovery
Discontinuous Named Entity Recognition (NER) can be formulated as a problem of Maximal Clique Discovery (MCD). In NER, the goal is to identify and classify named entities such as person names, organization names, and location names in a given text. Discontinuous NER refers to the task of identifying named entities that are not contiguous in the text, i.e., they are composed of multiple non-consecutive tokens.
MCD is a well-known problem in graph theory, where the goal is to find the largest clique in a graph, i.e., a set of vertices that are all connected to each other. In the case of NER, the tokens in the text can be represented as vertices in a graph, and the edges between them represent their co-occurrence in the text. A clique in this graph corresponds to a set of tokens that form a named entity.
To solve the MCD problem, various algorithms can be used, such as Bron-Kerbosch algorithm, which is a recursive backtracking algorithm that finds all maximal cliques in a graph. Other algorithms can also be used, such as Clique Percolation Method (CPM), which finds overlapping cliques in a graph.
Using MCD for discontinuous NER has several advantages. First, it allows for the identification of named entities that are not contiguous in the text, which is a common occurrence in natural language. Second, it can handle noisy data and ambiguous cases, as it looks for the largest set of tokens that form a named entity. Finally, it can be extended to handle other tasks, such as relation extraction, by looking for maximal cliques that are connected by specific types of edges.
原文地址: https://www.cveoy.top/t/topic/bZGI 著作权归作者所有。请勿转载和采集!