reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Consistent Bayesian Inference from Synthetic Data

Authors: Ossi Räisä, Joonas Jälkö, Antti Honkela

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that mixing posterior samples obtained separately from multiple large synthetic data sets, that are sampled from a posterior predictive, converges to the posterior of the downstream analysis under standard regularity conditions when the analyst’s model is compatible with the data provider’s model. We also present several examples showing how the theory works in practice, and showing how Bayesian inference can fail when the compatibility assumption is not met, or the synthetic data set is not significantly larger than the original. Keywords: synthetic data, Bayesian inference, Bernstein-von Mises theorem, differential privacy
Researcher Affiliation	Academia	Ossi Räisä EMAIL Joonas Jälkö EMAIL Antti Honkela EMAIL Department of Computer Science University of Helsinki P.O. Box 68 (Pietari Kalmin katu 5) 00014 University of Helsinki, Finland
Pseudocode	No	The paper describes methodologies and algorithms like NAPSU-MQ and NUTS, but it does not present any structured pseudocode or algorithm blocks in the main text.
Open Source Code	Yes	Our code is available under an open-source license.1 1. https://github.com/DPBayes/NAPSU-MQ-bayesian-downstream-experiments
Open Datasets	Yes	To test our theory on real data, we used the UCI Adult data set (Kohavi and Becker, 1996) setting that was used to test NAPSU-MQ (Räisä et al., 2023).
Dataset Splits	No	The paper mentions using a toy data set of nX = 2000 samples and the UCI Adult dataset with nX = 46043 datapoints, and states 'We take bootstrap samples of the data to simulate draws from a population.' However, it does not provide specific train/test/validation splits by percentage, count, or a reference to a predefined split.
Hardware Specification	No	The authors wish to thank the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources. This statement refers to general computational resources but does not provide specific hardware details (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions using 'NUTS (Hoﬀman and Gelman, 2014)', 'DP-GLM from Kulkarni et al. (2021)', 'synthpop (Nowok et al., 2016)', 'DP-SGD (Rajkumar and Agarwal, 2012; Song et al., 2013; Abadi et al., 2016), speciﬁcally DP-Adam', and 'Optuna library (Akiba et al., 2019)'. While various software components are named, specific version numbers for these are not provided.
Experiment Setup	Yes	For NAPSU-MQ, we use the hyperparameters of Räisä et al. (2023), except we used NUTS (Hoﬀman and Gelman, 2014) with 200 warmup samples and 500 kept samples per chain for ϵ {0.5, 1}, and 1500 kept samples per chain for ϵ = 0.1, as the posterior sampling algorithm. The NAPSU-MQ prior is N(0, 102I), and the summary is the single 3-way marginal query over all three variables. The hyperparameters of DP-GLM are the L2-norm upper bound R for the covariates of the logistic regression, a coeﬃcient norm upper bound s, and the parameters of the posterior sampling algorithm DP-GLM uses. We set R = 2 so that the covariates do not get clipped, and set s = 5 after some preliminary runs. The posterior sampling algorithm is NUTS (Hoﬀman and Gelman, 2014) with 1000 warmup samples and 1000 kept samples from 4 parallel chains. The prior for the downstream Bayesian logistic regression is N(0, 10), i.i.d. for each coeﬃcient. The privacy parameters are ϵ {0.25, 0.5, 1}, and δ = n 2 X 4.7 10 10. DPVI runs DP-SGD (Rajkumar and Agarwal, 2012; Song et al., 2013; Abadi et al., 2016), speciﬁcally DP-Adam, under the hood, so it inherits the clip bound, learning rate, number of iterations, and subsampling (without replacement) ratio hyperparameters from DP-SGD. We tuned these with the Optuna library (Akiba et al., 2019), using the bounds [0.1, 50] for the clip bound, [10 4, 10 1] for the learning rate, [104, 105] for the number of iterations and [0.001, 1] for the subsampling ratio.