GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Authors: Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that Gaussian SR achieves superior ASSR performance with fewer parameters than existing methods while enjoying interpretable and content-aware feature aggregations. Experiments Datasets For our training, we use the DIV2K dataset (Agustsson and Timofte 2017). Sourced from the NTIRE challenge (Radu Timofte and Zhang. 2017), this dataset comprises 1000 diverse 2K-resolution images featuring a wide range of content, including individuals, urban scenes, flora, fauna and natural landscapes. Within this collection, we allocate 800 images for the training set, 100 images for validation. To evaluate the generalization performance of the model, we report the results on the DIV2K validation set with 100 images. Another four benchmark datasets are also utilized, namely General100 (Dong, Loy, and Tang 2016), BSD100 (Martin et al. 2001), Urban100 (Huang, Singh, and Ahuja 2015), and Manga109, (Matsui et al. 2016) which provided a comprehensive landscape for assessing cross-dataset robustness. Quantitative and Qualitative Results To validate the efficacy of our Gaussian SR, we conduct a comparative analysis against several advanced methods, including Meta SR (Hu et al. 2019), LIIF (Chen, Liu, and Wang 2021), and its variants (ITSRN (Yang et al. 2021), A-LIIF (Li et al. 2022), DIINN (Nguyen and Beksi 2023)). All models are re-trained under a unified framework for fairness. Results in Tables 1 and 2 show that our method achieves competitive performance across five datasets for various scaling factors.
Researcher Affiliation Collaboration 1Tsinghua University 2The Chinese University of Hong Kong 3Peking University 4The Hong Kong Polytechnic University 5OPPO Research Institute
Pseudocode No The paper describes the model architecture and processes, such as 'Overall Architecture' and 'Selective Gaussian Splatting', in natural language and with diagrams (Figure 2, Figure 3), but it does not include a formal pseudocode block or algorithm steps explicitly labeled as such.
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. It mentions 'As the DIINN method has not been open-sourced, we directly use the results reported in their article.', but this refers to a third-party method, not the code for the current paper's methodology.
Open Datasets Yes For our training, we use the DIV2K dataset (Agustsson and Timofte 2017). Sourced from the NTIRE challenge (Radu Timofte and Zhang. 2017), this dataset comprises 1000 diverse 2K-resolution images featuring a wide range of content, including individuals, urban scenes, flora, fauna and natural landscapes. Another four benchmark datasets are also utilized, namely General100 (Dong, Loy, and Tang 2016), BSD100 (Martin et al. 2001), Urban100 (Huang, Singh, and Ahuja 2015), and Manga109, (Matsui et al. 2016) which provided a comprehensive landscape for assessing cross-dataset robustness.
Dataset Splits Yes Within this collection, we allocate 800 images for the training set, 100 images for validation. To evaluate the generalization performance of the model, we report the results on the DIV2K validation set with 100 images. Following the settings in LIIF and its variants (Chen, Liu, and Wang 2021; Li et al. 2022; Yang et al. 2021), we randomly crop the LR images into 48 x 48 patches, collect 2304 random pixels on the HR images.
Hardware Specification Yes The model is trained in parallel on 4 Tesla V100 GPUs with a mini-batch size of 16. The training process takes about 2000 epochs to converge. In addition, we report the inference time of Gaussian SR and LIIF with the same input and output sizes in Table 4, providing a comprehensive evaluation of their efficiency. We report the average runtime over 100 runs on a Tesla V100 GPU.
Software Dependencies No The paper mentions 'Following the settings in LIIF and its variants (Chen, Liu, and Wang 2021; Li et al. 2022; Yang et al. 2021)', but it does not specify any software names with version numbers for its own implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes To simulate a continuous magnification process, the downsampling factor is randomly sampled from a uniform distribution, U(1, 4). The loss function used is the L1 distance between the reconstructed image and the ground truth image. Following the settings in LIIF and its variants (Chen, Liu, and Wang 2021; Li et al. 2022; Yang et al. 2021), we randomly crop the LR images into 48 x 48 patches, collect 2304 random pixels on the HR images. The initial learning rate for all modules is set to 1e-4, and is halved every 200 epochs. The model is trained in parallel on 4 Tesla V100 GPUs with a mini-batch size of 16. The training process takes about 2000 epochs to converge.