reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regularizing Energy among Training Samples for Out-of-Distribution Generalization

Authors: Yiting Chen, Qitian Wu, Junchi Yan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on long-tail datasets, subpopulation shift benchmarks, and OOD generalization benchmarks to show the effectiveness of the proposed energy regularization.
Researcher Affiliation	Academia	1Sch. of Computer Science & Sch. of Artificial Intelligence, Shanghai Jiao Tong University 2Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard EMAIL, EMAIL
Pseudocode	No	The paper describes methods and mathematical derivations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using a third-party Python package for influence function calculation: "We implement the calculation of the influence function based on the Python package for calculating the influence function (Lo & Bae, 2022). URL https://github.com/alstonlo/torch-influence.". However, it does not provide an explicit statement or link for the authors' own source code for the methodology described in this paper.
Open Datasets	Yes	We evaluate IAER on the imbalanced version of CIFAR10, CIFAR100 (Cui et al., 2019) and Image Net-LT (Liu et al., 2019) that are artificially created with class imbalance and i Naturalist 2018 (Van Horn et al., 2018), a naturally long-tailed dataset. [...] We conduct our experiments on widely used datasets in Subpop Bench, including Colored MNIST (Arjovsky et al., 2019), Meta Shift cats vs. dogs (Liang & Zou, 2022), NICO++ (Zhang et al., 2023). [...] Experiments are performed on benchmarks Colored MNIST (Arjovsky et al., 2019), PACS (Li et al., 2017) and VLCS (Fang et al., 2013).
Dataset Splits	Yes	CIFAR10 and CIFAR100 both contain 50, 000 images in training and 10, 000 images in testing with 10 and 100 classes, respectively. We construct the imbalanced version of CIFAR10 and CIFAR100 by reducing the number of images for each class. [...] For a fair comparison, the validation set used to calculate the influence function is sampled from the training set, and the models are not exposed to testing data during training. [...] For Image Net-LT and i Naturalist, we take images of Many-shot, Medium-shot, and Few-shot classes in the training set as the validation set, respectively. [...] We follow the setting in Gulrajani & Lopez-Paz (2021) to conduct experiments and use the training domain validation set to calculate the influence function.
Hardware Specification	Yes	We tested the time cost for calculating the influence function for Res Net-32 on CIFAR10 with Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz and one Ge Force RTX 2080Ti for 5000 iteration and averaged ten times.
Software Dependencies	No	The paper mentions using a "Python package for calculating the influence function (Lo & Bae, 2022)", but does not provide specific version numbers for Python or any other libraries/frameworks used.
Experiment Setup	Yes	The model is trained for 200 epochs with SGD optimizer where the learning rate is at 0.1, momentum at 0.9, and weight decay at 2e 4. The learning rate is decayed with factor 0.01 at 160-th epoch and 180-th epoch. For IAER, we finetune the model for 5 epochs with batch size at 128 and learning rate at 1e 4. [...] For Image Net-LT, the classifier is finetuned for 10 epochs with batch size at 512 and learning rate at 0.2. For i Naturalist, the classifier is finetuned for 30 epochs with batch size at 512 and learning rate at 0.2. The γ in Eq. 9 is searched in {0.1, 0.5, 1, 10} for CIFAR-LT and set to be 0.5 for Image Net-LT and i Naturalist2018.