reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variational Bayes In Private Settings (VIPS)

Authors: Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the eﬀectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets.
Researcher Affiliation	Academia	Mijung Park EMAIL Max Planck Institute for Intelligent Systems, Department of Computer Science, University of Tübingen, Max-Planck-Ring 4, 72076 Tüinbingen, Germany James Foulds EMAIL Department of Information Systems, University of Maryland, Baltimore County, ITE 447, 1000 Hilltop Circle, Baltimore, MD 21250, USA Kamalika Chaudhuri EMAIL Department of Computer Science, University of California, San Diego, EBU3B 4110, University of California, San Diego, CA 92093, USA Max Welling EMAIL Amsterdam Machine Learning LAB (AMLAB), Informatics Institute, University of Amsterdam, Science Park 904 1098 XH Amsterdam, the Netherlands
Pseudocode	Yes	Algorithm 1 (Stochastic) Variational Bayes for CE family distributions, Algorithm 2 Private VIPS for CE family distributions, Algorithm 3 VIPS for LDA, Algorithm 4 (ϵtot, δtot)-DP VIPS for non-CE family with binomial likelihoods, Algorithm 5 VIPS for Bayesian logistic regression, Algorithm 6 VIPS for sigmoid belief networks
Open Source Code	Yes	Our code is available at https://github.com/mijungi/vips_code.
Open Datasets	Yes	We empirically demonstrate the eﬀectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets. ... We downloaded a random D = 400, 000 documents from Wikipedia to test our VIPS algorithm. ... We used the Stroke dataset, which was ﬁrst introduced by Letham, Rudin, Mc Cormick & Madigan (2014) ... We used the Adult data (from the UCI data repository) ... We tested our VIPS algorithm for the SBN model on the binarized17 MNIST digit dataset
Dataset Splits	Yes	We randomly shuﬄed the data to make 5 pairs of training and test sets. For each set, we used 10, 069 patients records as training data and the rest as test data. ... We used 80% or original data for training and the rest for computing the AUC on test data. ... The MNIST digit dataset which contains 60,000 training images of ten handwritten digits (0 to 9)... we selected 100 randomly selected test images from 10, 000 test datapoints.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instance types) are explicitly mentioned in the paper for running experiments. The paper generally discusses 'stochastic learning' and 'big data setting' but without hardware specifications.
Software Dependencies	No	No specific software dependencies with version numbers are provided. The paper mentions 'Python implementation' for perplexity approximation but does not specify its version or any other library versions.
Experiment Setup	Yes	In our experiments, we use N = 500. ... We set a = 0.1 in our experiments... We used 50 topics and a vocabulary set of approximately 8000 terms. The algorithm was run for one epoch in each experiment. ... in which we varied the noise level σ {1.0, 1.1, 1.24, 1.5, 2} and the minibatch size S {5, 000, 10, 000, 20, 000}. ... In each training step, we randomly selected data samples from the training with a sampling rate 0.004. We ran each of these algorithms for 100 training steps with 20 diﬀerent random initializations with varying levels of noise. ... The resulting ϵ values for a ﬁxed value δ = 10 3 are ϵtot = {0.8, 0.05, 0.025} for σ = {1, 6, 12}, respectively. ... we considered a one-hidden layer SBN with 50 or 100 hidden units, i.e. K = {50, 100}. We varied the mini-batch size S = {400, 800, 1600, 3200}. For a ﬁxed σ = 1, ... In all of these cases, we set δtotal = 10 4.