Some researchers have proposed effective models in the field of multimodal entity alignment, such as the MMEA model proposed by Chen et al. from the University of Science and Technology of China in 2020, which models different modalities of entity attributes and conducts knowledge fusion to achieve the effect of modeling and alignment of multimodal entities. The MSNEA model proposed by the University of Science and Technology of China in 2022 also extracts visual, relational, and attribute features of entities separately, and integrates visual features based on modal enhancement mechanism to guide multimodal feature learning, and adaptively allocates attention weights to capture valuable attributes for alignment. The MCLEA model proposed by Lin et al. from Southeast University in 2022 jointly models the intra-modality and inter-modality interactions based on contrastive learning after acquiring the attribute features of each modality, to improve the model's representation capability. However, the above models did not deeply mine the input data of each modality, but directly encoded them using pre-trained encoders of each modality, resulting in insufficient data information mining capability and still having some room for improvement. For example, for the same movie entity referred to in reality, its description image in different knowledge graphs may be different posters, and directly using visual models to encode it may result in low similarity, but there should be similar text in the posters, such as the movie title and slogan, etc. If the text information can be extracted as a supplement to the entity image information, it can further improve the accuracy of alignment.

翻译:部分研究者已在多模态实体对齐领域提出一些较为有效的模型如中国科学技术大学的Chen等人于2020年提出的MMEA51模型分别建模不同模态的实体属性并进行知识融合从而达到对多模态实体建模与对齐的效果。中国科学技术大学于2022年提出的MSNEA52模型同样分别提取实体的视觉、关系和属性特征并基于模态增强机制来整合视觉特征并指导多模态特征学习以自适应地分配注意力权重以捕捉有价值的属性进行对齐。中

原文地址: https://www.cveoy.top/t/topic/buq6 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录