Single Image Super-Resolution: Techniques and Advancements

2.1. Single Image Super-Resolution

Conventional SISR algorithms aim to reconstruct HR images as accurately as possible by optimizing pixel-level reconstruction errors such as MSE and MAE. Dong et al. [5] propose a three-layer CNN-based SISR algorithm, referred to as SRCNN. Each layer of SRCNN is closely related to sparse representation, and it shows substantial performance improvements compared to those of conventional algorithms. Kim et al. [13, 14] propose a very deep CNN with input-output skip connections and a recursive architecture, offering stable and rapid convergence. Recently, the reconstruction accuracy was improved even further by adopting deeper networks with residual blocks and sub-pixel convolutions [18].

To overcome the major drawback of reconstruction-oriented SISR algorithms which produce blurred and non-realistic textures [17], perceptual loss [12] has been proposed to improve the perceptual quality of the generated images by minimizing feature-level differences extracted from a ImageNet [15] pre-trained network. Currently, GAN is known to be effective when used to generate realistic images [8], and numerous GAN-based SISR algorithms [17, 30] have been proposed. SRGAN [17] is the first GAN-based SISR algorithm which generates more realistic SR images compared to those of conventional algorithms. However, it was also found that degradation of the reconstruction accuracy is inevitable with GAN-based approaches, because generated realistic textures do not always correspond to ground truth textures.

2.2. Reference-based SR

Earlier works on RefSR derive from patch matching or patch synthesis schemes [2, 41]. Zheng et al. [41] propose a RefSR algorithm based on patch matching and synthesis with a deep network. Down-sampled patches are used for patch matching and for finding correspondences between input and reference images. However, those schemes have critical drawbacks in that they produce blur and grid artifacts and are unable to handle non-rigid image deformations or inter-patch misalignments. Moreover, optimization including patch matching is inefficient due to its high computational cost. CrossNet [42] defines RefSR as a task where the reference image shares a similar viewpoint with a LR input image, and proposes an end-to-end neural network combining a warping process and image synthesis based on an optical flow [6, 10]. However, the ground truth for the optical flow is obtained at a high cost, and the flow estimation from other pre-trained networks is not accurate. In addition, although warping somewhat handles non-rigid deformation, it is highly vulnerable to large motions. SRNTT [40] points out the problem of robustness in CrossNet [42], arguing that severe performance degradation occurs when an unrelated reference image is paired with an input image. In SRNTT [40], a patch-wise matching scheme is adopted at the multi-scale feature level, which sacrifices computational efficiency for capturing long distance dependencies.

2.3. Self-Similarity and Non-local Block in SR

In a natural image, similar patterns tend to recur within the same image. Various methods have been studied regarding how to exploit self-similarity for image restoration [7, 33]. Those approaches attempt to utilize the internal information as a reference to reconstruct high-quality images. Huang et al. [9] propose a model allowing geometric transformation, which handles perspective distortions and affine transformations. However, the method of utilizing the intrinsic properties of images in deep learning-based methods remains ambiguous.

To deal with this problem, non-local block [29] based approaches [20, 38] have been proposed. The non-local operation computes pixel-wise correlations to capture long-range and global dependencies. The correlation is computed as a weighted sum of all positions in the input feature maps. This approach largely overcomes the locality of previous CNNs and is therefore suitable for various computer vision applications that require large receptive fields. The proposed method can be used to search not only for correspondences between input and reference image but also for self-similarity within a single image with the help of non-local blocks.

Single Image Super-Resolution: Techniques and Advancements