reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variational Bayesian Pseudo-Coreset

Authors: Hyungi Lee, Seungyoo Lee, Juho Lee

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present empirical results that demonstrate the effectiveness of posterior approximation using VBPC across various datasets and scenarios. We compare VBPC with four BPC algorithms that use SGMCMC to perform Bayesian Model Averaging (BMA) with posterior samples: BPC-r KL (Kim et al., 2022), BPC-f KL (Kim et al., 2022), FBPC (Kim et al., 2023), and BPC-CD (Tiwary et al., 2024). ... we assess the performance of the resulting predictive distributions using negative log-likelihood (NLL).
Researcher Affiliation	Academia	Hyungi Lee KAIST EMAIL Seungyoo Lee KAIST EMAIL Juho Lee KAIST EMAIL
Pseudocode	Yes	B ALGORITHM FOR TRAINING AND INFERENCE In this section, we present algorithms for training and inference. In Algorithm 1, the overall training procedures are presented, and note that we utilize the model pool M to prevent overfitting. We also use the Gaussian likelihood to update the weights contained in the model pool. Additionally, in Algorithm 2, we present computationally and memory-efficient variational inference and BMA methods. Here, we store Φ and (Iˆn + γ ρβS ΦΦ ) 1 instead of directly computing V .
Open Source Code	No	Our VBPC code implementation is built on the official FRe Po (Zhou et al., 2022)1 codebase. The implementation utilizes the following libraries, all available under the Apache-2.0 license2: JAX (Bradbury et al., 2018), Flax (Babuschkin et al., 2020), Optax (Babuschkin et al., 2020), Tensor Flow Datasets (Abadi et al., 2015), and Augmax3. While the paper states their implementation is built on an official codebase and lists libraries, it does not explicitly provide a link or an unambiguous statement for the release of their specific VBPC code.
Open Datasets	Yes	For the BMA comparison experiments, we utilize 5 different datasets: 1) MNIST (Le Cun et al., 1998), 2) Fashion-MNIST (Xiao et al., 2017), 3) CIFAR10 (Krizhevsky, 2009), 4) CIFAR100 (Krizhevsky, 2009), and 5) Tiny Image Net (Le & Yang, 2015). For the distribution shift and OOD scenarios, we use CIFAR10-C (Hendrycks & Dietterich, 2019)
Dataset Splits	Yes	MNIST: The MNIST dataset4 contains 10 classes of handwritten digits with 60,000 training images and 10,000 test images... Fashion-MNIST: The Fashion-MNIST dataset5 consists of... with 60,000 training images and 10,000 test images... CIFAR-10/100: The CIFAR-10/100 dataset6 contains... with 50,000 training images and 10,000 test images... Tiny-Image Net: The Tiny-Image Net dataset7 contains... with 100,000 training images and 10,000 test images. CIFAR10-C: The CIFAR10-C dataset8 consists of... 50,000 test images for each corruption type. It applies various corruptions to 10,000 test images from CIFAR10, with five levels of severity, each containing 10,000 images.
Hardware Specification	Yes	All experiments, except those on the Tiny-Image Net (Le & Yang, 2015) dataset, were performed on NVIDIA RTX 3090 GPU machines, while Tiny-Image Net experiments were conducted on NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The implementation utilizes the following libraries, all available under the Apache-2.0 license2: JAX (Bradbury et al., 2018), Flax (Babuschkin et al., 2020), Optax (Babuschkin et al., 2020), Tensor Flow Datasets (Abadi et al., 2015), and Augmax3. The paper lists several software libraries used for the implementation but does not provide specific version numbers for these libraries (e.g., PyTorch 1.9, JAX X.Y.Z, etc.).
Experiment Setup	Yes	Following previous works (Kim et al., 2022; 2023; Tiwary et al., 2024), we select 1, 10, or 50 images per class for all datasets when training VBPC for evaluation. For βS, we use ˆn, which corresponds to the number of pseudo-coresets in each experiment. ...For βD, we used 1e-8... For ρ and γ, we set the default values to ρ = 1.0 and γ = 100.0... P, to 10... T, to 100. For the model pool optimizer, we used the Adam (Kingma, 2014) optimizer with a fixed learning rate of 0.0003... For the pseudo-coreset optimizer, we also used the Adam optimizer by default, with a cosine learning rate schedule starting at 0.003... Lastly, we used a batch size of 1024 and trained for 0.5 million steps to ensure sufficient convergence.