Cﾲ -Matching for Robust Reference-based Super-Resolution

Reference-based Super-Resolution (Ref-SR) [40, 39, 34, 26] has attracted substantial attention in recent years. Compared to Single-Image-Super-Resolution (SISR) [6, 13, 14, 17, 25, 4], where the only input is a single low-resolution (LR) image, Ref-SR super-resolves the LR image with the guidance of an additional high-resolution (HR) reference image. Textures of the HR reference image are transferred to provide more fine details for the LR image.

The key step in texture transfer for Ref-SR is to find correspondences between the input image and the reference image. Existing methods [39, 34, 32] perform correspondence matching implicitly. Their correspondences are computed based on the content and appearance similarities, which are then embedded into the main framework. However, it is a difficult task to accurately compute the correspondences under real-world variations due to two major challenges: 1) the underlying transformation gap between input images and reference images; 2) the resolution gap between input images and reference images. In Ref-SR, same objects or similar texture patterns are often present in both input images and reference images, but their appearances vary due to scale and rotation transformations. In this case, correspondences computed purely by appearance are inaccurate, leading to an unsatisfactory texture transfer. For the resolution gap, due to the imbalance in the amount of information contained in an LR input image and an HR reference image, the latter is often downsampled (to an LR image) to match the former (in resolution). The downsampling operation inevitably results in information loss, hampering the search for accurate correspondences, especially for the fine-texture regions.

To address the aforementioned challenges, we propose Cﾲ -matching for Robust Reference-based Super-Resolution, where Cross transformation and Cross resolution matching are explicitly performed. To handle the transformation gap, a contrastive correspondence network is proposed to learn transformation-robust correspondences between input images and reference images. Specifically, we employ an additional triplet margin loss to minimize the distance of point-wise features before and after transformations while maximizing the distance of irrelevant features. Thus, the extracted feature descriptors are more robust to scale and rotation transformations, and can be used to compute more accurate correspondences.

As for the resolution gap, inspired by knowledge distillation, we propose a teacher-student correlation distillation. We train the teacher contrastive correspondence network for HR-HR matching. Since the teacher network takes two HR images as input, it is better at matching the regions with complicated textures. Thus, the knowledge of the teacher model can be distilled to guide the more ambiguous LR-HR matching. The teacher-student correlation distillation enables the contrastive correspondence network to compute correspondences more accurately for texture regions.

After obtaining correspondences, we then fuse the information of reference images through a dynamic aggregation module to transfer the HR textures. With Cﾲ -Matching, we achieve over 1dB improvement on the standard CUFED5 dataset. As shown in Fig. 1, compared to SRNTT [39], our Cﾲ -Matching finds more accurate correspondences (marked as red dotted lines) and thus has a superior restoration performance.

To facilitate the evaluation of Ref-SR tasks in a more realistic setting, we contribute a new dataset named Webly-Reference SR (WR-SR) dataset. In real-world applications, given an LR image, users may find its similar HR reference images through some web search engines. Motivated by this, for every input image in WR-SR, we search for its reference image through Google Image. The collected WR-SR can serve as a benchmark for real-world scenarios.

To summarize, our main contributions are: 1) To mitigate the transformation gap, we propose the contrastive correspondence network to compute correspondences more robust to scale and rotation transformations. 2) To bridge the resolution gap, a teacher-student correlation distillation is employed to further boost the performance of student LR-HR matching model with the guidance of HR-HR matching, especially for fine texture regions. 3) We contribute a new benchmark dataset named Webly-Referenced SR (WR-SR) to encourage a more practical application in real scenarios.

Cﾲ -Matching for Robust Reference-based Super-Resolution