翻译:为了进一步提高模型对实体各模态信息的利用能力本文提出了一种基于多模态特征加强的实体对齐模型其核心思想在于最大可能性利用实体的视觉、文本、关系等模态信息从而增强获得的相应模态的特征嵌入以提高模型的知识表示能力。模型首先针对图像模态信息进行特征加强即在利用预训练图像编码器对实体描述图像进行编码的基础上利用预训练多模态模型对实体低语义相关的图像进行清洗再利用OCR模型提取实体图像中可能存在的文本信
To further enhance the utilization of entity multimodal information, this paper proposes a multimodal feature-enhanced entity alignment model. The core idea is to maximize the use of entity visual, textual, and relational modalities to enhance the corresponding feature embeddings and improve the knowledge representation ability of the model. Firstly, the model enhances the image modality information by using a pre-trained image encoder to encode the entity description image, cleaning up the low semantic relevant images using a pre-trained multimodal model, extracting text information possibly present in the entity image using an OCR model, and then using a pre-trained language model for encoding as a supplementary representation of the image encoding. Secondly, the GATv2 network is used instead of traditional graph convolutional networks or graph attention networks to extract the neighborhood structure features of the entity. Finally, the distribution of the entity's modal information, including whether the entity has a certain modality data and the specific quantity, is used as an auxiliary attribute of the entity to enhance the model's understanding and modeling ability of various modal information. In addition, we introduce the intra-modal contrastive loss and multimodal alignment loss used in the MCLEA model to better align entities in different knowledge graphs.
原文地址: https://www.cveoy.top/t/topic/buJ8 著作权归作者所有。请勿转载和采集!