Multi-modal entity alignment involves determining whether two entities in different knowledge graphs, which contain various forms of information such as structural triples and image descriptions, refer to the same real-world object. However, many previous multi-modal entity alignment models have only utilized the respective encoders of each modality to encode entity information and then combined them for multi-modal feature fusion to obtain a joint representation of the entity. As a result, there is room for improvement in alignment performance as the multi-modal information is not fully utilized.

To address this issue, this paper proposes a feature-enhanced multi-modal entity alignment method that leverages pre-trained multi-modal models, OCR models, and GATv2 networks to enhance the ability to extract information from entity structural triples and image descriptions, and thus obtain more effective multi-modal representations. Through experiments on cross-lingual and cross-graph multi-modal datasets, the proposed method has been shown to outperform models that use traditional feature extraction methods in terms of alignment performance.

请在文章大意不变的前提下帮我润色如下英文: Multi-modal entity alignment refers to determining whether two entities in different knowledge graphs which contain multi-modal information such as structural triples and image d

原文地址: https://www.cveoy.top/t/topic/br5N 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录