reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Hyperparameter Selection for Differentially Private Gradient Descent

Authors: Dominik Fay, Sindri Magnússon, Jens Sjölund, Mikael Johansson

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments show that the schedules consistently perform well across a range of datasets without manual tuning. ... We evaluate the performance of the proposed hyperparameter schedules on synthetic and real-world datasets, on convex and strongly convex loss functions. The primary purpose of our experiments is to verify whether the proposed automatic hyperparameter selection can consistently outperform hyperparameters found via exhaustive search.
Researcher Affiliation	Academia	Dominik Fay EMAIL Division of Decision and Control Systems KTH Royal Institute of Technology; Sindri Magnússon EMAIL Department of Computer and Systems Sciences Stockholm University; Jens Sjölund EMAIL Department of Information Technology Uppsala University; Mikael Johansson EMAIL Division of Decision and Control Systems KTH Royal Institute of Technology
Pseudocode	No	The paper describes the gradient descent steps as mathematical equations like 'θt+1 = θt ηt ( F(θt) + ζt) , ζt N(0, σ2 t I) (2)' but does not present a clearly labeled 'Algorithm' block or pseudocode.
Open Source Code	No	The paper does not contain an explicit statement about releasing its own source code, nor does it provide a link to a code repository. It mentions 'open-source code' in the context of related work but not for the methodology described in this paper.
Open Datasets	Yes	Datasets We repeat our experiments on six different datasets. ... The CIFAR-10 dataset (Krizhevsky, 2009, Chapter 3) ... The MNIST dataset (Le Cun et al., 1998) ... The Iris dataset (Fisher, 1936) ... The UCI ML Breast Cancer Wisconsin Diagnostic dataset (Dua & Graff, 2017) ... The KDD Cup 99 (Bay et al., 2000) dataset
Dataset Splits	No	The paper specifies the total number of examples (N) for each dataset (e.g., 'N = 10^4 examples', 'N = 60,000 images', 'N = 150 examples'), but it does not provide specific training/test/validation split percentages or sample counts, nor does it cite predefined standard splits for the experimental setup.
Hardware Specification	No	The paper does not contain any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments.
Software Dependencies	No	The paper does not explicitly list any specific software dependencies with their version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x) that were used for the implementation or experiments.
Experiment Setup	Yes	The schedule for strongly convex losses is used when λ > 0, and the schedule for convex losses for λ = 0. We use the same step size ηt = 1/(2M) in all runs. ... We compare the privacy-utility performance of our adaptive, data-independent schedules (cf. Proposition 3) to that of the constant schedule σt = σ for a wide range of values of σ {0.001, 0.01, 0.1, 1.0}. ... The privacy cost is computed in the same way for all gradient perturbation methods: The per-iteration costs are aggregated via z CDP composition, and then converted to (ϵ, 1/N)-differential privacy.