将这些文献整理成论文格式:1 Sepp Hochreiter and Jürgen Schmidhuber Long short-term memory Neural computation 91735–1780 1997 3 182 Tomᡠs Mikolov Martin Karafiát Lukᡠs Burget Janˇ Cernock y and Sanjeev Khuda
Title: A Survey of Recent Advances in Referring Expression Comprehension
Abstract: Referring expression comprehension (REC) is an important task in the field of computer vision and natural language processing, which aims to understand natural language descriptions of objects in visual scenes. In recent years, there have been significant advances in REC, driven by the development of deep learning techniques and large-scale datasets. In this paper, we provide a comprehensive survey of recent research in REC, focusing on approaches that use deep learning models. We organize the literature into several categories based on the key ideas and methods used, including recurrent neural networks, attention mechanisms, graph-based models, and pre-training techniques. We also discuss important challenges and directions for future research in this area.
Introduction: Referring expression comprehension (REC) is an important task that involves understanding natural language descriptions of objects in visual scenes. Given an image and a natural language expression that refers to an object in the image, the goal of REC is to locate the referred object. REC has numerous applications, including image captioning, visual question answering, and robotics. In recent years, there have been significant advances in REC, driven by the development of deep learning techniques and large-scale datasets. In this paper, we provide a comprehensive survey of recent research in REC, focusing on approaches that use deep learning models.
Related Work: There has been a significant amount of research on REC in recent years. In the early days of REC, researchers mainly used hand-crafted features and rule-based methods. However, with the development of deep learning techniques, researchers have shifted towards using neural network models for REC. Some of the key works in this area include the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber (1997) and the recurrent neural network (RNN) based language model proposed by Mikolov et al. (2010). More recent works have explored the use of attention mechanisms (e.g., Bahdanau et al., 2015), graph-based models (e.g., Yang et al., 2019), and pre-training techniques (e.g., Chen et al., 2020) for REC.
Approaches: In this section, we organize the literature on REC into several categories based on the key ideas and methods used. We start with the early works that used hand-crafted features and rule-based methods, and then move on to the more recent deep learning-based approaches.
-
Hand-crafted features and rule-based methods: Bolme et al. (2010) proposed a visual object tracking method that uses adaptive correlation filters. Henriques et al. (2014) improved upon this method by using kernelized correlation filters.
-
Recurrent neural networks: Mikolov et al. (2010) proposed an RNN-based language model for REC. Mao et al. (2016) proposed a method for generating and comprehending unambiguous object descriptions using an RNN.
-
Attention mechanisms: Bahdanau et al. (2015) proposed a method for incorporating attention into neural machine translation models. Yu et al. (2018) proposed a modular attention network for REC.
-
Graph-based models: Yang et al. (2019) proposed a dynamic graph attention method for REC. Su et al. (2020) proposed a pre-training method that uses a graph neural network to model relationships between objects and words.
-
Pre-training techniques: Chen et al. (2020) proposed a real-time REC method that uses a single-stage grounding network pre-trained on a large-scale dataset. Zhu et al. (2020) proposed a method for incorporating BERT into neural machine translation models.
Challenges and Future Directions: Despite the recent progress in REC, there are still many challenges that need to be addressed. One major challenge is the lack of large-scale datasets that cover diverse visual scenes and natural language expressions. Another challenge is the difficulty of modeling complex relationships between objects and words in visual scenes. To address these challenges, future research could focus on developing more sophisticated deep learning models that can handle complex visual and linguistic structures, as well as collecting larger and more diverse datasets for training and evaluation.
Conclusion: In this paper, we provided a comprehensive survey of recent research in REC, focusing on approaches that use deep learning models. We organized the literature into several categories based on the key ideas and methods used, and discussed important challenges and directions for future research in this area. We hope that this survey will provide a useful reference for researchers working on REC and related tasks
原文地址: http://www.cveoy.top/t/topic/foou 著作权归作者所有。请勿转载和采集!