多模态知识图谱：融合视觉数据，赋能知识表示与应用

Translation: Knowledge graph refers to a large-scale knowledge repository that structures knowledge for storage. It represents entity-related knowledge in the form of triplets such as ... to clearly and intuitively display entity attribute relationship information. Large public knowledge graphs such as DBPedia and YAGO have demonstrated their important support role in tasks such as information retrieval and relationship mining. With the emergence of multi-modal data such as images and videos, researchers have realized that they are more information-rich and intuitive than text data. The excellent performance of large-scale multi-modal pre-training models such as UNITER and ImageBERT also proves that training models with multi-modal data can make them more capable of representation. Therefore, many multi-modal knowledge graphs such as MMKG and RichPedia have emerged. Multi-modal knowledge graphs integrate visual data such as images and videos into traditional knowledge graphs and treat them as entities or descriptive attributes to further improve the completeness and richness of the knowledge graph. They can be applied to multi-modal downstream tasks such as visual question answering and image-text generation, thus enhancing the universality of the knowledge graph.