Spatiotemporal Fusion Algorithm for High-Resolution Image Prediction

The high-level outline of the proposed spatiotemporal fusion algorithm has been presented in Fig. 2. In this paper, spatiotemporal fusion aims to predict a target fine image F'2 given the corresponding coarse one C'2 at date t'2, as well as two coarse and fine image pairs (F'1 and C'1, F'3 and C'3) at neighboring dates t'1 and t'3. Accordingly, we obtain high spatial resolution images with a dense time series.

To alleviate the spatial information loss problem in coarse images, the proposed StfNet first incorporates temporal information in image time series, i.e., TD and temporal consistency, to model the mapping between fine and coarse difference image pairs. Then, with the learned mapping, we predict the unknown fine difference images F'12 and F'23 from the corresponding coarse ones C'12 and C'23, with the neighboring fine images F'1 and F'3, respectively. Finally, the target image F'2 could be reconstructed.

A. StfNet Architecture

In spatiotemporal fusion, learning a mapping only from the spatial similarity between coarse and fine image pair is a severely ill-posed inverse problem, since the magnification factors are always large. To this end, we incorporate TD and temporal consistency to model the mapping between fine and coarse image pairs and propose a two-stream convolutional neural network architecture, i.e., StfNet.

The high-level idea is represented in the sequence of potential network architectures as shown in Fig. 3. We show the basic LBF model in Fig. 3(a), which attempts to learn a mapping between coarse and fine difference image pairs. In Fig. 3(b), we introduce TD into the basic learning-based model and exploit the neighboring fine images for the difference image prediction. We present the proposed StfNet in Fig. 3(c), in which two kinds of temporal information in image time series, i.e., TD and temporal consistency, are both incorporated in learning and leveraged for the mapping model. In this way, the ill-posed inverse problem, i.e., recovering details from a largely down-sampled coarse image, could be well alleviated, and one can expect to obtain a more accurate estimation of the missing fine images.

Spatiotemporal Fusion Algorithm for High-Resolution Image Prediction