reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

QT-DoG: Quantization-Aware Training for Domain Generalization

Authors: Saqib Javed, Hieu Le, Mathieu Salzmann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on diverse datasets from Domainbed and WILDS (Koh et al., 2021) Benchmark. All implementation, datasets, metric details and various ablation studies are provided in the Appendix.
Researcher Affiliation	Academia	1CVLab, EPFL, Switzerland. 2Swiss Data Science Center, Switzerland.. Correspondence to: Mathieu Salzmann <EMAIL>.
Pseudocode	No	The paper includes mathematical equations (e.g., Eq. 1, 3, 5, 6) describing the quantization process and flatness calculation, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code is released at: https: //saqibjaved1.github.io/QT_Do G/.
Open Datasets	Yes	We evaluate our approach on diverse datasets from Domainbed and WILDS (Koh et al., 2021) Benchmark... PACS (Li et al., 2017) is a 7 object classification challenge encompassing four domains... VLCS (Fang et al., 2013) poses a 5 object classification problem... Office Home (Venkateswara et al., 2017) comprises a total of 15,588 samples... Terra Incognita (Beery et al., 2018) addresses a 10 object classification challenge... Domain Net (Peng et al., 2019) provides a 345 object classification problem...
Dataset Splits	Yes	During this training phase, 20% of the samples are used for validation and model selection. We validate the model every 300 steps using held-out data from the source domains, and assess the final performance on the excluded domain (target). ... As (Cha et al., 2021), we split the in-domain datasets into training (60%), validation (20%), and test (20%) sets.
Hardware Specification	Yes	Every experiment in our work was executed on a single NVIDIA A100
Software Dependencies	Yes	Every experiment in our work was executed on a single NVIDIA A100, Python 3.8.16, Py Torch 1.10.0, Torchvision 0.11.0, and CUDA 12.1.
Experiment Setup	Yes	We use the same training procedure as Domain Bed (Gulrajani & Lopez-Paz, 2021), incorporating additional components from quantization. Specifically, we adopt the default hyperparameters from Domain Bed (Gulrajani & Lopez-Paz, 2021), including a batch size of 32 (per-domain). We employ a Res Net-50 (He et al., 2016) pre-trained on Image Net (Russakovsky et al., 2015) as initial model and use a learning rate of 5e-5 along with the Adam optimizer, and no weight decay. Following SWAD(Cha et al., 2021), the models are trained for 15,000 steps on Domain Net and 5,000 steps on the other datasets.