Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields

Authors: Alexander Becker, Rodrigo Caye Daudt, Dominik Narnhofer, Torben Peters, Nando Metzger, Jan Dirk Wegner, Konrad Schindler

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, Thera outperforms all competing ASR methods, often by a substantial margin, and is more parameter-efficient (see Figure 2). To the best of our knowledge, Thera is also the first neural field method to allow bandwidth control at test time. ... We first evaluate the three variants of our method on the held-out DIV2K validation set, following the setup described above. Table 1 shows PSNR values for all tested methods, for both indistribution ( 2 to 4) and out-of-distribution ( 6 to 30) scaling factors. ... In Table 3, we ablate individual components and design choices of our method to understand their contributions to overall performance.
Researcher Affiliation Academia Alexander Becker EMAIL Photogrammetry and Remote Sensing, ETH Zurich Rodrigo Caye Daudt EMAIL Photogrammetry and Remote Sensing, ETH Zurich Dominik Narnhofer EMAIL Photogrammetry and Remote Sensing, ETH Zurich Torben Peters EMAIL Photogrammetry and Remote Sensing, ETH Zurich Nando Metzger EMAIL Photogrammetry and Remote Sensing, ETH Zurich Jan Dirk Wegner EMAIL Department of Mathematical Modeling and Machine Learning, University of Zurich Konrad Schindler EMAIL Photogrammetry and Remote Sensing, ETH Zurich
Pseudocode No The paper describes the methodology, including equations (1) and (2) for the neural heat field and its thermal activation function, and the overall architecture in Figure 3. However, it does not present these or any other procedures in a structured pseudocode or algorithm block format.
Open Source Code No The project page is at https://therasr.github.io.
Open Datasets Yes our models are trained with the DIV2K (Agustsson & Timofte, 2017) training set, consisting of 800 high-resolution RGB images of diverse scenes. We report evaluation metrics on the official DIV2K validation split as well as on standard benchmark datasets: Set5 (Bevilacqua et al., 2012), Set14 (Zeyde et al., 2012), BSDS100 (Martin et al., 2001), Urban100 (Huang et al., 2015), and Manga109 (Matsui et al., 2017). ...experiments using the COZ dataset (Fu et al., 2024). ...from the fast MRI (Zbontar et al., 2018) medical imaging dataset (single-coil knee validation split, with intensities scaled to [0,1]).
Dataset Splits Yes Similar to prior work (Chen et al., 2021; Lee & Jin, 2022; Cao et al., 2023; Zhu et al., 2025), we randomly sample a scaling factor r U(1.2, 4) for each image during training, then randomly crop an area of size (48r)2 pixels as the target patch, from which the source is generated by bicubic downsampling to size 482. As corresponding targets, 482 random pixels are sampled from the target patch. ...our models are trained with the DIV2K (Agustsson & Timofte, 2017) training set... We report evaluation metrics on the official DIV2K validation split as well as on standard benchmark datasets: Set5 (Bevilacqua et al., 2012), Set14 (Zeyde et al., 2012), BSDS100 (Martin et al., 2001), Urban100 (Huang et al., 2015), and Manga109 (Matsui et al., 2017).
Hardware Specification Yes All tests were performed on an NVIDIA Ge Force RTX 3090 Ti GPU.
Software Dependencies No Thera is implemented in JAX (Bradbury et al., 2018). ...We use the SSIM implementation from torchmetrics (Detlefsen et al., 2022).
Experiment Setup Yes Similar to prior work (Chen et al., 2021; Lee & Jin, 2022; Cao et al., 2023; Zhu et al., 2025), we randomly sample a scaling factor r U(1.2, 4) for each image during training, then randomly crop an area of size (48r)2 pixels as the target patch, from which the source is generated by bicubic downsampling to size 482. As corresponding targets, 482 random pixels are sampled from the target patch. We train with standard augmentations (random flipping, rotation, and resizing), using the Adam optimizer (Kingma & Ba, 2015) with a batch size of 16 for 5 × 10^6 iterations, with initial learning rate 10^-4, β1 = 0.9, β2 = 0.999 and ϵ = 10^-8. The learning rate is decayed to zero according to a cosine annealing schedule (Loshchilov & Hutter, 2016). We use MAE as reconstruction loss, to which the TV loss from Eq. 5 is added with a weight of 10^-4. Like previous work (Timofte et al., 2016; Lim et al., 2017; Vasconcelos et al., 2023), we employ geometric self-ensembling (GSE) instead of the local self-ensembling introduced in LIIF (Chen et al., 2021).