Adaptive Resolution Residual Networks — Generalizing Across Resolutions Easily and Efficiently
Authors: Léa Demeule, Mahtab Sandhu, Glen Berseth
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a set of experiments showing (subsection 4.1) that our method yields stronger robustness at lower resolutions compared to mainstream methods; (subsection 4.2) that our method enables significant computational savings through adaptation; (subsection 4.3) that our method is capable of generalizing across layer types in a way that far surpasses prior adaptive-resolution architectures with variable sampling density; (subsection 4.4) that our theoretical guarantee for adaptation using perfect smoothing kernels holds empirically; (subsection 4.5) that our theoretical interpretation of the dual regularizing effect of Laplacian dropout also holds empirically. |
| Researcher Affiliation | Academia | Léa Demeule EMAIL Mila Quebec AI Institute, Université de Montréal |
| Pseudocode | No | The paper includes mathematical equations and diagrams of architecture (Figures 3 and 4), but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured steps formatted like code. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | Yes | We perform a set of classification tasks that require models to effectively leverage the information of low-resolution to medium-resolution images; CIFAR10 (32 32) (Krizhevsky et al., 2009), CIFAR100 (32 32) (Krizhevsky et al., 2009), Tiny Image Net (64 64) (Le & Yang, 2015) and STL10 (96 96) (Coates et al., 2011). |
| Dataset Splits | No | The paper states that models are trained at full dataset resolution and evaluated at various resolutions, and describes data augmentation techniques, but does not provide specific percentages or sample counts for training, validation, and test splits. |
| Hardware Specification | No | The paper mentions using "CUDA event timers and CUDA synchronization barriers" which implies the use of NVIDIA GPUs, but it does not specify any particular GPU model (e.g., RTX 3090, A100) or other hardware details like CPU, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions optimizers like Adam W and SGD, and data augmentation methods like Trivial Augment Wide, with citations to relevant papers. However, it does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation. |
| Experiment Setup | Yes | For CIFAR10 and CIFAR100, across all methods, we use Adam W (Loshchilov & Hutter, 2019) with a learning rate of 10 3 and (β1, β2) = (0.9, 0.999), cosine annealing (Loshchilov & Hutter, 2022) to a minimum learning rate of 10 5 in 100 epochs, weight decay of 10 3, and a batch size of 128. We use a basic data augmentation consisting of normalization, random horizontal flipping with p = 0.5, and randomized cropping that applies zero-padding by 4 along each edge to raise the resolution, then crops back to the original resolution. For Tiny Image Net and STL10, across all methods, we use SGD with a learning rate of 10 2, cosine annealing (Loshchilov & Hutter, 2022) to a minimum learning rate of 0 in 100 epochs, weight decay of 10 3, and a batch size of 128. We use Trivial Augment Wide (Müller & Hutter, 2021) to augment training. |