请帮我进行论文查重:随着互联网技术的飞速发展社交媒体领域出现了越来越多的多模态数据。这些多模态数据包含文本、图像和视频等大量有用信息在情感分析、个性化推荐和舆情监控等方面都具有极其重要的作用。不论是在工业界还是在学术界多模态领域都受到了前所未有的关注。本文的研究主题是社交媒体领域的图文多模态分类问题从张量特征融合和神经网络特征融合两个角度出发提出了两种新颖的多模态分类模型。第一:针对基于向量拼接的
With the rapid development of internet technology, there has been an increasing amount of multimodal data in the field of social media. This multimodal data contains useful information such as text, images, and videos, which are extremely important in sentiment analysis, personalized recommendation, and public opinion monitoring. Whether in industry or academia, the multimodal field has received unprecedented attention. The research topic of this paper is the problem of image-text multimodal classification in social media. From the perspectives of tensor feature fusion and neural network feature fusion, two novel multimodal classification models are proposed.
Firstly, in response to the inefficiency of vector concatenation-based multimodal classification models and the neglect of single-modal information, this paper proposes a multimodal classification model based on compact bilinear pooling and multiple losses. This model uses compact bilinear pooling for efficient multimodal feature fusion, and introduces multiple loss functions in the training stage to strengthen the model's handling of single-modal information. Compared with existing multimodal classification models, the multimodal classification model proposed in this paper has better performance and smaller model parameter size.
Secondly, in response to the insufficient utilization of multi-level encoding information in pre-trained models and the high complexity of multimodal fusion modules, this paper proposes a multimodal classification model based on gated multimodal units and multi-level encoding. This model jointly models the multimodal and multi-level information of data, uses advanced pre-trained models for single-modal feature extraction, and saves the multi-level encoding information of features. Based on gated multimodal units, the model constructs multimodal feature fusion modules and multi-level feature fusion modules for feature fusion between modalities and levels, respectively, to obtain high-quality and more comprehensive multimodal feature representations.
Comparison experiments and core module ablation experiments were conducted on two public datasets, Twitter-15 and Twitter-17, with existing advanced multimodal classification models. The experimental results show that the two multimodal classification models proposed in this paper have significant improvements in various performance metrics, proving the effectiveness of the models in multimodal classification tasks. Comparison experiments with multiple multimodal classification models on the private dataset AIFUN also demonstrated the effectiveness and practicality of the multimodal classification models proposed in this paper.
Keywords: multimodal learning; multimodal fusion; bilinear pooling; gated multimodal unit.
原文地址: https://www.cveoy.top/t/topic/bdTp 著作权归作者所有。请勿转载和采集!