reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Differentially Private Riemannian Optimization: Fast Sampling and Variance Reduction

Authors: Saiteja Utpala, Andi Han, Pratik Jawanpuria, Bamdev Mishra

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main contributions on improving differentially private Riemannian optimization framework are summarized below. 1. Sampling. We propose generic fast sampling methods on the tangent space for various matrix manifolds of interest. This makes differentially private Riemannian optimization more practically appealing for real-world applications. The proposed sampling strategy is based on linear isometry between tangent spaces. We show that it is computationally efficient and orders of magnitude faster than other sampling schemes, MCMC, and explicit basis construction, presented in (Han et al., 2022a). 2. DP-SVRG. We propose a differentially private Riemannian stochastic variance reduced gradient (DP-RSVRG), enriching the suite of differentially private stochastic Riemannian optimization methods. Our first contribution allows to scale differentially private Riemannian optimization algorithms since sampling is now faster. Our second contribution is on developing DP-RSVRG and together with faster sampling, DPRSVRG is scalable to large datasets. In the experiments section, we show the empirical benefit of both of these contributions. Section 6 discusses the empirical results.
Researcher Affiliation	Collaboration	Saiteja Utpala EMAIL Independent Andi Han EMAIL University of Sydney Pratik Jawanpuria EMAIL Microsoft India Bamdev Mishra EMAIL Microsoft India
Pseudocode	Yes	Algorithm 1: Sampling from tangent Gaussian: a general algorithm Algorithm 2: Sampling from tangent Gaussian using isometric transportation Algorithm 3: Sampling on SPD with Affine-Invariant metric Algorithm 4: Sampling on SPD with Bures-Wasserstein metric Algorithm 5: Sampling on SPD with Log-Euclidean metric Algorithm 6: Sampling on Poincaré ball Algorithm 7: Sampling on Lorentz hyperboloid Algorithm 8: Sampling on sphere Algorithm 9: Sampling on Stiefel manifold Algorithm 10: Sampling on Grassmann manifold Algorithm 11: DP-RSVRG Algorithm 12: DP-RSVRG with restarts
Open Source Code	No	The paper does not provide an explicit statement about the release of its own source code for the methodology described, nor does it include a link to a code repository. It mentions using existing libraries like geomstats and autodp but not providing their own implementation code.
Open Datasets	Yes	Private Fréchet mean on SPD manifold. We consider the problem of privately estimating the Fréchet mean of SPD matrices under the Affine-Invariant metric. We select images from PATHMNIST medical imaging dataset (Yang et al., 2021) and pass them through the covariance descriptor pipeline to generate images, each represented as a SPD matrix of size 11 11. Private principal eigenvector computation on sphere. ...We take images from two classes of MNIST and generate 784 vectors to form two sets of 6903 and 7877 images. Private Fréchet mean of the Poincaré word embeddings. We generate the hierarchy tree of transitive closure of mammal subtree of the Word Net dataset (Miller, 1995), a lexical database, and compute the private Fréchet mean of the Poincaré word embeddings (Nickel & Kiela, 2017).
Dataset Splits	No	The paper mentions using 'sets' of images and data points for different problems (e.g., 'two sets consisting of 10704 and 10356 images', 'two sets of 6903 and 7877 images'). However, it does not provide explicit details about how these sets were split into training, validation, or testing subsets, or if standard splits from referenced datasets were used without further specification. The problems described (Fréchet mean, leading eigenvector) are more about estimation on a given dataset rather than typical supervised learning with train/test splits.
Hardware Specification	No	The paper does not specify any particular hardware used for running its experiments, such as specific GPU or CPU models, or details about computational resources like cloud instances.
Software Dependencies	No	The paper mentions using 'geomstats (Miolane et al., 2020; 2021; Myers et al., 2022)' and 'autodp library (Wang et al., 2019c)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The parameter details for all the algorithms are in Section C. For DPRGD, we tune the clipping parameters from the set C = {1, 0.1, 0.01} and the number of epochs from {10, 20, 30}. For DP-RSGD, clipping parameter is chosen from C = {1, 0.1, 0.01} and number of epochs from {n, n 5, n 10, n 20, n 30}. For DP-RSVRG number of epochs is chosen from {5, 10} and set the frequency as m = n/10 and full gradient clipping parameter C is tuned from {1, 0.1} and variance reduced gradient clipping parameter C2 from {1, 0.1, 0.01}. For all three algorithms, we tune the learning rate from η = {5e 5, 1e 55e 4, 1e 4, . . . , 5e 1, 1e 1, 1, 2, . . . , 5}.