A Novel Pseudo-Siamese Deep CNN for Spatiotemporal Fusion of Remote Sensing Images

The hybrid-based method is no longer limited to using a single method for fusion but focuses on using different methods to process diverse change information for the research image [16]. Typical methods include the flexible spatiotemporal method (FSDAF) [21], improved FSDAF [22], enhanced FSDAF [23], and FSDAF 2.0 [24]. This method can effectively balance the spatial detail preservation and spectral change reconstruction, but the high complexity of the algorithm limits its wider application [25].

Learning-based methods have developed rapidly in recent years and have been widely used in the field of remote sensing image processing [25], [26]. The main idea of this method is to use numerous known data samples to train the weight parameters of the model [27], [28], automatically learning the feature information from the known data samples and inputting the coarse image of the prediction time in the trained model to generate a fine image at the prediction time.

There are two kinds of learning-based methods: sparse representation and deep learning. Based on the classical sparse representation method [29], assuming the same sparse coding between the fine and coarse images, a sparse dictionary between the coarse and fine images at reference time is learned, and the fine image of the prediction time is reconstructed using the learned dictionary. The convolutional neural network (CNN) is generally used to establish nonlinear mapping between the input and output data using the deep learning method. Numerous research methods have proved that the CNN can be effectively applied in the field of remote sensing data fusion [11], [25], [30].

In the field of remote sensing data fusion, the deep learning method has achieved outstanding results. Panchromatic and multispectral remote sensing images are based on the CNN [31], [32], and multispectral fusion is based on CNN [33]. Recently, CNN has also been applied to the spatiotemporal fusion of remote sensing images. The hybrid method of the deep CNN (STFDCNN) [29] is used to establish the mapping relationship between the coarse and fine images, and the feature information extracted by the neural network is used to reconstruct the final prediction result using a weight strategy. The end-to-end network (DCSTFN) [34] and its improved DCSTFN (EDCSTFN) [35] use a pair of coarse and fine images of reference time and two pairs of coarse and fine images of reference time to reconstruct the fine image of prediction time, respectively. A two-stream network (StfNet) [36] fuses at the original pixel level by learning the change information between different reference times.

Although the above methods have achieved spatiotemporal fusion to a certain extent, they are not perfect. For example, STARFM, ESTAFM and STFDCNN have difficulty predicting drastic land-cover change. The FSDAF and StfNet, EDCSTFN images are too smooth, and spatial details are lost. The existing fusion models cannot effectively balance the spatial detail preservation and spectral change reconstruction. In this article, a pseudo-Siamese deep CNN (PDCNN) is proposed to solve these problems. The innovation of the proposed method is as follows.

A deep pseudo-Siamese network with two symmetrical and identical feature extraction streams is designed and implemented for application in the field of spatiotemporal fusion, which can process the coarse and fine images from two reference moments and better capture the image information from two reference moments. The two feature extraction streams of the pseudo-Siamese network are independent end-to-end networks, and the weights of the feature extraction streams are not shared. Therefore, this method can be understood as combining the known information of two reference times and the prediction time, respectively predicting two prediction time images based on different reference times and obtaining the final prediction time image through the weight. This method can make full use of the known information, and the end-to-end network design also reduces the possibility of error.

The PDCNN adds a flexible perception design to the two independent feature extraction streams. Remote sensing images contain more information than traditional images, and the surface objects' sizes are different. The flexible perception design can extract the edge information of images according to the main characteristics of the surface objects, the spatial information of images can be effectively retained and the problem of losing spatial details in the prediction process can be solved.

The PDCNN designs two independent feature extraction streams, each focus on a specific reference prediction time. Compared with the traditional fusion methods, the independent feature extraction streams can extract different change information based on different reference images and can achieve more accurate prediction of time change information. The attention mechanism and the inclusion of residual connections can effectively focus on critical change information and be preserved in the operation of deep convolution.

In the specific experimental verification, we use MODIS and Landsat data fusion and compare it with the excellent fusion algorithms recently proposed in the same local experimental environment. The subjective and objective evaluation results reveal that the proposed model can accurately predict spectral changes while retaining spatial details. The reconstructed image has a good objective index and subjective advantage.

A Novel Pseudo-Siamese Deep CNN for Spatiotemporal Fusion of Remote Sensing Images