reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Simulation-Based Inference under Missing Data via Neural Processes

Authors: Yogesh Verma, Ayush Bharti, Vikas Garg

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results on SBI benchmarks show that our approach provides robust inference outcomes compared to standard baselines for varying levels of missing data. Moreover, we demonstrate the merits of our imputation model on two real-world bioactivity datasets (Adrenergic and Kinase assays). Code is available at https://github.com/Aalto-QuML/RISE.
Researcher Affiliation	Collaboration	Yogesh Verma, Ayush Bharti Department of Computer Science, Aalto University EMAIL Vikas Garg Yai Yai Ltd and Aalto University EMAIL
Pseudocode	Yes	Algorithm 1 RISE (training) Require: Simulator p( \| θ), prior p(θ), iterations niter, missingness degree ε 1: Initialize parameters ϕ, φ of RISE 2: for k = 1, . . . , niter do 3: Sample (x, θ) p( \| θ)p(θ) 4: Create mask s wrt ε and MCAR/MAR/MNAR 5: Compute ℓRISE using Equation (6) 6: ϕ, φ optimize(ℓRISE ; ϕ, φ) 7: end for
Open Source Code	Yes	Code is available at https://github.com/Aalto-QuML/RISE.
Open Datasets	Yes	Moreover, we demonstrate the merits of our imputation model on two real-world bioactivity datasets (Adrenergic and Kinase assays). [...] The task is to predict and impute bioactivity data on Adrenergic receptor assays (Whitehead et al., 2019) and Kinase assays (Martin et al., 2017) from the field of drug discovery.
Dataset Splits	No	The paper describes how missingness is introduced in the datasets (e.g., "We take ε {10%, 25%, 60%} to test performance from low to high missingness scenarios"). However, it does not explicitly provide details about standard train/test/validation splits for the benchmark datasets or the real-world bioactivity datasets used in the experiments. It mentions simulating data and a simulation budget, which is related to data generation rather than splitting existing datasets for evaluation.
Hardware Specification	Yes	Table 7 describes the time (in seconds) per epoch to train different models on a single V100 GPU.
Software Dependencies	No	RISE is implemented in Py Torch (Paszke et al., 2019) and utilizes the same training configuration as the competing baselines (see Appendix A.4.4 for details). Our inference model implementations are based on publicly available code from the sbi library https://github.com/mackelab/sbi. While PyTorch and sbi library are mentioned, specific version numbers for these software dependencies are not provided in the paper.
Experiment Setup	Yes	Throughout our experiments, we maintained a consistent batch size of 50 and a fixed learning rate of 5 10 4. We set a simulation budget of n = 1000 for all the SBI experiments, and take 1000 samples from the posterior distributions to compute the MMD, C2ST and NLPP.