reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo

Authors: Hyunsu Kim, Giung Nam, Chulhee Yun, Hongseok Yang, Juho Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on image classification tasks, including OOD robustness, diversity, loss surface analyses, and a comparative study with Hamiltonian Monte Carlo, demonstrate the superiority of the proposed approach. In this section, we present empirical results demonstrating the effectiveness of the parameter expansion strategy proposed in Section 3 for image classification tasks.
Researcher Affiliation	Academia	KAIST, South Korea EMAIL
Pseudocode	Yes	C ALGORITHMS In this section, we outline the practical implementation of the SGMCMC algorithms used in our experiments: Stochastic Gradient Langevin Dynamics (SGLD; Welling & Teh, 2011), Stochastic Gradient Hamiltonian Monte Carlo (SGHMC; Chen et al., 2014), Stochastic Gradient Nos e-Hoover Thermostat (SGNHT; Ding et al., 2014), and preconditioned SGLD (p SGLD; Li et al., 2016). Additionally, we experimented with Stochastic Gradient Riemann Hamiltonian Monte Carlo (SGRHMC; Ma et al., 2015) using diagonal empirical Fisher and RMSProp estimates for the preconditioner. However, within the hyperparameter range explored, it demonstrated significantly lower performance than SGLD, leading us to exclude it from further experiments. Algorithms 1, 3 and 4 summarize our practical implementations of SGLD, p SGLD, SGHMC, and SGNHT, while Appendix B provides a detailed hyperparameter setup for each method used in our experiments.
Open Source Code	Yes	The code is available at https://github.com/cs-giung/px-sgmcmc.
Open Datasets	Yes	We present results on CIFAR-10 (Krizhevsky et al., 2009) and extensively study in advanced tasks such as robustness analysis and OOD detection. Specifically, we test on natural distribution shifts using CIFAR-like test datasets, including CIFAR-10.1 (Recht et al., 2019), CIFAR-10.2 (Lu et al., 2020), and STL (Coates et al., 2011), as well as on image corruptions using CIFAR-10-C (Hendrycks & Dietterich, 2019). We used 40,960 training examples and 9,040 validation examples, consistent with CIFAR-10. For Tiny Image Net, we employed 81,920 training examples and 18,080 validation examples. CIFAR-10 (unknown license); https://www.cs.toronto.edu/ kriz/cifar.html CIFAR-10.1 under the MIT license; https://github.com/modestyachts/CIFAR-10.1 CIFAR-10.2 (unknown license); https://github.com/modestyachts/cifar-10.2 STL (unknown license); https://cs.stanford.edu/ acoates/stl10/ SVHN (unknown license); https://github.com/facebookresearch/odin LSUN (unknown license); https://github.com/facebookresearch/odin CIFAR-100 (unknown license); https://www.cs.toronto.edu/ kriz/cifar.html Tiny Image Net (unknown license); https://www.kaggle.com/c/tiny-imagenet
Dataset Splits	Yes	We utilized 40,960 training examples and 9,040 validation examples based on the HMC settings from Izmailov et al. (2021), with the final evaluation conducted on 10,000 test examples. For CIFAR-100, we used 40,960 training examples and 9,040 validation examples, consistent with CIFAR-10. For Tiny Image Net, we employed 81,920 training examples and 18,080 validation examples. To obtain the ROC curve and associated metrics (i.e., AUROC and TNR at TPR of 95% and 99%, as shown in Table 2), we used 1,000 in-distribution (ID) examples as positives and 1,000 out-of-distribution (OOD) examples as negatives.
Hardware Specification	Yes	All experiments were conducted on machines equipped with an RTX 2080, RTX 3090, or RTX A6000. The code is available at https://github.com/cs-giung/px-sgmcmc. We have compiled system logs comparing PX-SGHMC with SGHMC in Table 7. Notably, the logs show that PX-SGHMC exhibits no significant differences both in sampling speed and memory consumption compared to SGHMC in practice. Space (in practice) represents the actual GPU memory allocated in our experimental setup using a single RTX A6000.
Software Dependencies	No	We built our experimental code using JAX (Bradbury et al., 2018), which is licensed under Apache-2.0. The paper mentions JAX but does not specify a version number for JAX itself, nor does it list other software components with their specific versions.
Experiment Setup	Yes	Starting from the He normal initialization (He et al., 2015), SGMCMC methods were allocated 5,000 steps per sampling cycle (approximately 31 epochs) to generate a total of 100 samples. Table 11 provides detailed hyperparameters. Table 11: Hyperparameters for CIFAR. It summarizes the hyperparameters for each method used in our main evaluation results on the CIFAR experiments (i.e., Tables 1 and 2). If a hyperaparameter was manually set without tuning, it is indicated with a dash in the Search Space column.