However, the above models did not deeply explore the entity information of each modality, and only used pre-trained encoders of each modality for encoding, so their ability to mine deep features of the data is insufficient. For example, for the same movie entity in the real world, its description images in different knowledge graphs may be different posters, and directly using a visual model for encoding may result in low similarity. However, there should be similar text in the posters, such as the movie name and slogan, if the text information can be extracted to supplement the entity image information, it can further improve the alignment accuracy.

翻译:但上述模型对各模态的实体信息未进行深度地挖掘仅是利用经过预训练的各模态编码器直接进行编码以至于对数据深层特征的挖掘能力不足。如针对指代现实中同一个电影实体其在不同图谱中的描述图像可能是不同的海报而直接使用视觉模型对其进行编码可能相似度较低但海报里理应存在相似的文本如电影名称口号等等若能针对其中的文本信息进行提取以作为实体图像信息的补充则可以进一步提高对齐准确性

原文地址: http://www.cveoy.top/t/topic/buyy 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录