Robust Simulation-Based Inference under Missing Data via Neural Processes
Authors: Yogesh Verma, Ayush Bharti, Vikas Garg
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical results on SBI benchmarks show that our approach provides robust inference outcomes compared to standard baselines for varying levels of missing data. Moreover, we demonstrate the merits of our imputation model on two real-world bioactivity datasets (Adrenergic and Kinase assays). Code is available at https://github.com/Aalto-QuML/RISE. |
| Researcher Affiliation | Collaboration | Yogesh Verma, Ayush Bharti Department of Computer Science, Aalto University EMAIL Vikas Garg Yai Yai Ltd and Aalto University EMAIL |
| Pseudocode | Yes | Algorithm 1 RISE (training) Require: Simulator p( | θ), prior p(θ), iterations niter, missingness degree ε 1: Initialize parameters ϕ, φ of RISE 2: for k = 1, . . . , niter do 3: Sample (x, θ) p( | θ)p(θ) 4: Create mask s wrt ε and MCAR/MAR/MNAR 5: Compute ℓRISE using Equation (6) 6: ϕ, φ optimize(ℓRISE ; ϕ, φ) 7: end for |
| Open Source Code | Yes | Code is available at https://github.com/Aalto-QuML/RISE. |
| Open Datasets | Yes | Moreover, we demonstrate the merits of our imputation model on two real-world bioactivity datasets (Adrenergic and Kinase assays). [...] The task is to predict and impute bioactivity data on Adrenergic receptor assays (Whitehead et al., 2019) and Kinase assays (Martin et al., 2017) from the field of drug discovery. |
| Dataset Splits | No | The paper describes how missingness is introduced in the datasets (e.g., "We take ε {10%, 25%, 60%} to test performance from low to high missingness scenarios"). However, it does not explicitly provide details about standard train/test/validation splits for the benchmark datasets or the real-world bioactivity datasets used in the experiments. It mentions simulating data and a simulation budget, which is related to data generation rather than splitting existing datasets for evaluation. |
| Hardware Specification | Yes | Table 7 describes the time (in seconds) per epoch to train different models on a single V100 GPU. |
| Software Dependencies | No | RISE is implemented in Py Torch (Paszke et al., 2019) and utilizes the same training configuration as the competing baselines (see Appendix A.4.4 for details). Our inference model implementations are based on publicly available code from the sbi library https://github.com/mackelab/sbi. While PyTorch and sbi library are mentioned, specific version numbers for these software dependencies are not provided in the paper. |
| Experiment Setup | Yes | Throughout our experiments, we maintained a consistent batch size of 50 and a fixed learning rate of 5 10 4. We set a simulation budget of n = 1000 for all the SBI experiments, and take 1000 samples from the posterior distributions to compute the MMD, C2ST and NLPP. |