Multimodal entity alignment refers to determining whether two multimodal entities from different knowledge graphs refer to the same object in reality, where multimodal entities refer to entities that contain multiple modalities of information, such as structural triplets and image descriptions. Most previous multimodal entity alignment models only use the corresponding encoders of each modality to encode entity information and then perform multimodal feature fusion to obtain the multimodal joint representation of the entity. However, this approach does not fully utilize the modalities of aligned entities. To address this issue, this paper proposes a feature-enhanced multimodal entity alignment method, which uses pre-trained multimodal models, OCR models, and GATv2 networks to enhance the information extraction ability of entity structural triplets and image descriptions, respectively, to obtain more effective multimodal representations. The method also inputs the modal distribution of the entity to enhance the modeling ability of the model to understand entity information. Experiments on cross-lingual and cross-graph multimodal datasets demonstrate that the proposed method has better alignment performance than models that use traditional feature extraction methods.

请帮我翻译以下文本成英文:多模态实体对齐指判断两个不同知识图谱中的多模态实体是否指代现实中的同一个对象其中多模态实体指包含如结构三元组、图像描述等多种模态信息的实体。大多之前的多模态实体对齐模型在进行多模态编码时仅仅使用各模态相应的编码器对实体信息进行编码进而进行多模态特征融合以获取实体的多模态联合表示。但此种方式并没有充分利用对齐实体的各模态信息为了解决该问题本文提出了一种基于特征增强的多模态实

原文地址: https://www.cveoy.top/t/topic/br9B 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录