Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections

Authors: Junhong Lin, Volkan Cevher

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We first investigate regularized algorithms adapted to a projection operator on a closed subspace of the Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nystr om regularized algorithms. Our results provide optimal, distribution-dependent rates that do not have any saturation effect for sketched/Nystr om regularized algorithms, considering both the attainable and non-attainable cases, in the well-conditioned regimes. We then study stochastic gradient methods with projection over the subspace, allowing multi-pass over the data and minibatches, and we derive similar optimal statistical convergence results.
Researcher Affiliation Academia Junhong Lin EMAIL Center for Data Science, Zhejiang University, 310027, Hang Zhou, P. R. China. Laboratory for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne, CH1015-Lausanne, Switzerland. Volkan Cevher EMAIL Laboratory for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne, CH1015-Lausanne, Switzerland.
Pseudocode Yes Algorithm 1 The projected (iterated) ridge regression algorithm of order τ over the sample z and subspace S is given by fz λ = Sρωz λ, where3 ωz λ = PGλ(PTx P)PS x y, Gλ(u) = i=1 λi 1(λ + u) i. Algorithm 2 The stochastic gradient method with projection is defined by ω1 = 0, ωt+1 = ωt η1 i=b(t 1)+1 ( ωt, xji H yji)Pxji, t = 1, , T, where η is a step-size, j1, j2, , jb T are i.i.d. random variables from the uniform distribution on {1, , n}, and b N+.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide links to any code repositories.
Open Datasets No Let the input space H be a separable Hilbert space with inner product denoted by , H, and the output space R. Let ρ be an unknown probability measure on H R. In this paper, we study the following expected risk minimization, inf ω H e E(ω), e E(ω) = Z H R ( ω, x H y)2dρ(x, y), (1) where the measure ρ is known only through a sample z = {zi = (xi, yi)}n i=1 of size n N, independently and identically distributed (i.i.d.) according to ρ.
Dataset Splits No The paper is theoretical and does not use any specific real-world or benchmark datasets, therefore, no dataset splits are discussed.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require specific hardware. There is no mention of GPU, CPU, memory, or other hardware components.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and algorithms, not on implemented software. It does not list any software libraries, tools, or their specific version numbers.
Experiment Setup No The paper is theoretical and analyzes algorithms and convergence rates mathematically. While it discusses parameters like regularization parameter λ, step-size η, number of iterations T, and minibatch size b, these are discussed in a theoretical context for proving convergence, not as part of a concrete experimental setup with specific values for empirical evaluation.