IT$^3$: Idempotent Test-Time Training
Authors: Nikita Durasov, Assaf Shocher, Doruk Oner, Gal Chechik, Alexei A Efros, Pascal Fua
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across diverse domains (including image classification, aerodynamics prediction, and aerial segmentation) and architectures (MLPs, CNNs, GNNs) show that IT3 consistently outperforms existing approaches while being simpler and more widely applicable. Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures. |
| Researcher Affiliation | Collaboration | 1CVLAB, EPFL 2NVIDIA 3Neura Vision Lab, Bilkent University 4UC Berkeley. |
| Pseudocode | No | The paper describes the methods and algorithms through text and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | poster / code / video / web |
| Open Datasets | Yes | We conducted similar experiments using the CIFAR10 (Krizhevsky et al., 2014) dataset, selecting CIFAR-C (Hendrycks & Dietterich, 2019) as the out-of-distribution (OOD) data. [...] we use The Boston Housing dataset describes housing prices in the suburbs of Boston, Massachusetts. [...] For this experiment, we used the Deep Layer Aggregation (DLA) (Yu et al., 2018) network [...] we use the UTKFace dataset (Zhang et al., 2017) [...] road segmentation in aerial imagery using the Road Tracer dataset (Bastani et al., 2018). We train a DRU-Net (Wang et al., 2019), on the Road Tracer dataset. [...] We perform OOD experiments using Massachusetts Road dataset (Mnih, 2013) [...] we generated a dataset of 2,000 wing profiles, as depicted in Fig.10, by sampling the widely used NACA parameters (Jacobs & Sherman, 1937). [...] we experimented with 3D car models from a subset of the Shape Net dataset (Chang et al., 2015) [...] Image Net-C (Hendrycks & Dietterich, 2019) consists of Image Net (Krizhevsky et al., 2012) test images corrupted using the same transformations as CIFAR-10/100C (Sec. 4.2). |
| Dataset Splits | No | The paper refers to using training and test sets and describes how OOD data was generated or selected, but it does not provide specific percentages, sample counts, or detailed methodologies for the train/validation/test splits of the primary datasets for reproducibility. For instance, for tabular data, it states: "We take a test set and gradually apply random feature zeroing with increasing probabilities of 5%, 10%, 15%, and 20% (4 mentioned levels of OOD)." and for age prediction: "We train our model on the UTKFace training set (limited to individuals aged 20-60)." It does not explicitly state how these original training/test sets were formed or their sizes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory amounts. Table 5 in Appendix A mentions "Memory Consumption (OOD Imaget Net), GPU Gb" but does not specify the type of GPU. |
| Software Dependencies | No | The paper mentions using "Adam optimizer", "Pytorch training protocol (Paszke et al., 2017)", "XFoil simulator (Drela, 1989)", and "Open FOAM (Jasak et al., 2007)". However, it does not specify version numbers for PyTorch, Adam, or the other software packages, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | To predict drag associated to a triangulated 3D car, we utilize similar model to airfoil experiments but with increased capacity. Instead of twenty five GMM layers, we use thirty five and also apply skip-connections with ELU activations. Final model is being trained for 100 epochs with Adam optimizer and 10 3 learning rate. |