On Perturbed Proximal Gradient Algorithms

Authors: Yves F. Atchadé, Gersende Fort, Eric Moulines

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To support our findings, we discuss the inference of a sparse generalized linear model with random effect and the problem of learning the edge structure and parameters of sparse undirected graphical models. In section 4, these results are illustrated with the problem of estimating a highdimensional discrete graphical models. In section 5, we consider high-dimensional random effect logistic regression model. We report in this section some simulation results showing the performances of the stochastic proximal gradient algorithm. The relative errors are presented on Figure 1 and suggest that, when measured as function of resource used, Solver 1 and Solver 2 have roughly the same convergence rate.
Researcher Affiliation Academia Yves F. Atchad e EMAIL University of Michigan, 1085 South University, Ann Arbor, 48109, MI, United States, Gersende Fort EMAIL LTCI, CNRS, Telecom Paris Tech, Universit e Paris-Saclay, 46, rue Barrault 75013 Paris, France, Eric Moulines EMAIL CMAP, INRIA XPOP, Ecole Polytechnique, 91128 Palaiseau, France
Pseudocode Yes Algorithm 1 (Proximal gradient algorithm) Given θn, compute θn+1 = Proxγn+1,g (θn γn+1 f(θn)) . Algorithm 2 (Perturbed Proximal Gradient algorithm) Let θ0 Θ be the initial solution and {γn, n N} be a sequence of posi tive step sizes. For n 1, given (θ0, . . . , θn) construct an approximation Hn+1 of f(θn) and compute θn+1 = Proxγn+1,g (θn γn+1Hn+1) .
Open Source Code No The paper mentions the 'BMN package' as a third-party tool used for comparison but does not provide any information or links to source code specifically for the methodology described by the authors.
Open Datasets No We generate the true matrix θtrue such that it has on average p non-zero elements below the diagonal which are simulated from a uniform distribution on ( 4, 1) (1, 4). We generate the N p covariates matrix X columnwise, by sampling a stationary RN-valued autoregressive model with parameter ρ = 0.8 and Gaussian noise p 1 ρ2 NN(0, I). We generate the vector of regressors βtrue from the uniform distribution on [1, 5] and randomly set 98% of the coefficients to zero.
Dataset Splits No The paper describes generating synthetic data for its experiments and mentions the total size of the datasets (N=250, p={50, 100, 200} in Section 4; N=500, p=1000, q=5 in Section 5.2). However, it does not specify any explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details for running its experiments.
Software Dependencies No The paper mentions 'BMN package' in the context of comparing algorithms but does not specify its version or any other software dependencies with version numbers used for the authors' own implementation.
Experiment Setup Yes We use M = 20, B0(x) = x, N = 250 and for p {50, 100, 200}. By trial-and-error we set the regularization parameter to λ = 2.5 p log(p)/n for all the simulations. We compare the following two versions of Algorithm 2: 1. Solver 1: A version with a fixed Monte Carlo batch size mn = 500, and decreasing step size γn = 25/(n+1) 2. Solver 2: A version with increasing Monte Carlo batch size mn = 500 + n1.2, and fixed step size γn = 25. We run Solver 2 for Niter = 5p iterations, where p {50, 100, 200} is as above. We test the algorithms with N = 500, p = 1, 000 and q = 5. The variance of the random effect is set to σ2 = 0.1. We use the Lasso penalty (α = 1 in (19)) with λ = 30. We compare the Monte Carlo proximal gradient algorithm (i) with fixed batch size: γn = 0.01/ n and mn = 275 (Algo 1); γn = 0.5/n and mn = 275 (Algo 2). (ii) with increasing batch size: γn = γ = 0.005, mn = 200 + n (Algo 3); γn = γ = 0.001, mn = 200 + n (Algo 4); and γn = 0.05/ n and mn = 270 + n (Algo 5). Each algorithm is run for 150 iterations.