Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections
Authors: Junhong Lin, Volkan Cevher
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We first investigate regularized algorithms adapted to a projection operator on a closed subspace of the Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nystr om regularized algorithms. Our results provide optimal, distribution-dependent rates that do not have any saturation effect for sketched/Nystr om regularized algorithms, considering both the attainable and non-attainable cases, in the well-conditioned regimes. We then study stochastic gradient methods with projection over the subspace, allowing multi-pass over the data and minibatches, and we derive similar optimal statistical convergence results. |
| Researcher Affiliation | Academia | Junhong Lin EMAIL Center for Data Science, Zhejiang University, 310027, Hang Zhou, P. R. China. Laboratory for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne, CH1015-Lausanne, Switzerland. Volkan Cevher EMAIL Laboratory for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne, CH1015-Lausanne, Switzerland. |
| Pseudocode | Yes | Algorithm 1 The projected (iterated) ridge regression algorithm of order τ over the sample z and subspace S is given by fz λ = Sρωz λ, where3 ωz λ = PGλ(PTx P)PS x y, Gλ(u) = i=1 λi 1(λ + u) i. Algorithm 2 The stochastic gradient method with projection is defined by ω1 = 0, ωt+1 = ωt η1 i=b(t 1)+1 ( ωt, xji H yji)Pxji, t = 1, , T, where η is a step-size, j1, j2, , jb T are i.i.d. random variables from the uniform distribution on {1, , n}, and b N+. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide links to any code repositories. |
| Open Datasets | No | Let the input space H be a separable Hilbert space with inner product denoted by , H, and the output space R. Let ρ be an unknown probability measure on H R. In this paper, we study the following expected risk minimization, inf ω H e E(ω), e E(ω) = Z H R ( ω, x H y)2dρ(x, y), (1) where the measure ρ is known only through a sample z = {zi = (xi, yi)}n i=1 of size n N, independently and identically distributed (i.i.d.) according to ρ. |
| Dataset Splits | No | The paper is theoretical and does not use any specific real-world or benchmark datasets, therefore, no dataset splits are discussed. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware. There is no mention of GPU, CPU, memory, or other hardware components. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical proofs and algorithms, not on implemented software. It does not list any software libraries, tools, or their specific version numbers. |
| Experiment Setup | No | The paper is theoretical and analyzes algorithms and convergence rates mathematically. While it discusses parameters like regularization parameter λ, step-size η, number of iterations T, and minibatch size b, these are discussed in a theoretical context for proving convergence, not as part of a concrete experimental setup with specific values for empirical evaluation. |