reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Unbiased Implicit Variational Inference

Authors: Tobias Pielok, Bernd Bischl, David Rügamer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following, we analyze the performance of our proposed methods AISIVI and BSIVI under different data scenarios. We start by comparing our two methods on well-known toy examples that serve as a first sanity check (Section 5.1). We then compare our methods with the state-of-the-art methods KSIVI and PVI on a 22-dimensional problem in the context of a Bayesian logistic regression model (Section 5.2, which serves as another common benchmark example for SIVI). Finally, we move to a 100-dimensional problem related to a conditioned diffusion process (Section 5.3).
Researcher Affiliation	Academia	1Department of Statistics, LMU Munich, Munich, Germany 2Munich Center for Machine Learning (MCML), Munich, Germany. Correspondence to: Tobias Pielok <EMAIL>.
Pseudocode	Yes	Algorithm 1 BSIVI Algorithm 2 AISIVI
Open Source Code	No	The paper does not provide a direct link to a code repository, an explicit statement of code release, or mention code in supplementary materials for the methodology described.
Open Datasets	Yes	Next, we perform a Bayesian logistic regression on the WAVEFORM dataset as proposed by Yin & Zhou (2018). For the target variables yi {0, 1}, i = 1, . . . , N with N = 400 and the feature vectors xi R21 6https://archive.ics.uci.edu/ml/ machine-learningdatabases/waveform
Dataset Splits	No	The paper mentions using the WAVEFORM dataset but does not specify any training, validation, or test splits. For the other experiments (toy examples, conditioned diffusion process), the data is either defined within the paper or generated for ground truth estimation, without explicit splitting for training/evaluation purposes of the models themselves.
Hardware Specification	Yes	All experiments are performed on a Linux-based server A5000 server with 2 GPUs, 24GB VRAM, and Intel Xeon Gold 5315Y processor with 3.20 GHz.
Software Dependencies	No	We implemented AISIVI and BSIVI in Py Torch (Paszke et al., 2019). The paper mentions PyTorch but does not specify a version number or other software dependencies with their versions.
Experiment Setup	Yes	For both methods, we use the same NN architecture and train them for 4000 iterations. For the NF of AISIVI, we use 6 conditional affine coupling layers. We train AISIVI and BSIVI for 10,000 iterations and use ϵi batch sizes of 9182 and 91,820 respectively. All methods use a batch size m = 128 the latent dimension is set to 10, i.e., ϵ R10. For the NF of AISIVI, we use 16 conditional affine coupling layers. To ensure a fair comparison, we fixed the outer batch size (number of sampled z) for all SIVI methods and adjusted the inner batch size (number of sampled ϵ) until we achieved approximately the same iterations per second as AISIVI. The ϵi batch sizes for AISIVI, BSIVI, and IWHI are 256, 40960, and 7000, respectively. The latent dimension is 100 for all SIVI variants. For the NF of AISIVI, we use 32 conditional affine coupling layers.