Feature-Enhanced Multimodal Entity Alignment with Pre-trained Models and Modal Distribution

Abstract:

Multimodal entity alignment aims to determine whether two multimodal entities from different knowledge graphs refer to the same real-world object. Multimodal entities encompass multiple information modalities, such as structural triplets and image descriptions. Previous methods primarily rely on modality-specific encoders to encode entity information and then perform multimodal feature fusion. However, this approach underutilizes the rich information contained within the aligned entities. To address this limitation, this paper proposes a feature-enhanced multimodal entity alignment method. This method leverages pre-trained multimodal models, OCR models, and GATv2 networks to enhance information extraction from entity structural triplets and image descriptions, respectively. This results in more effective multimodal representations. Additionally, the method incorporates the modal distribution of the entity to improve the model's ability to understand entity information. Experiments conducted on cross-lingual and cross-graph multimodal datasets demonstrate that the proposed method outperforms models employing traditional feature extraction methods.

Feature-Enhanced Multimodal Entity Alignment with Pre-trained Models and Modal Distribution