Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning

Authors: Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that PWSGD converges to a medium-precision solution, e.g., ϵ = 10 3, more quickly. [...] In this section, we provide empirical evaluations of our main algorithm PWSGD. We evaluate its convergence rate and overall running time on both synthetic and real datasets.
Researcher Affiliation Academia Jiyan Yang EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Yin-Lam Chow EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Christopher R e EMAIL Department of Computer Science Stanford University Stanford, CA 94305, USA; Michael W. Mahoney EMAIL International Computer Science Institute and Department of Statistics University of California, Berkeley Berkeley, CA 94720, USA
Pseudocode Yes Algorithm 1 PWSGD preconditioned weighted SGD for over-determined ℓ1 and ℓ2 regression [...] Algorithm 2 Compute ϵ-coreset [...] Algorithm 3 RLA methods with algorithmic leveraging for constrained ℓp regression
Open Source Code No The paper does not provide explicit statements or links to source code for the methodology described.
Open Datasets Yes name #rows # columns κ(A) Year7 5 105 90 2 103 Buzz8 5 105 77 108 [...] 7. https://archive.ics.uci.edu/ml/datasets/Year Prediction MSD 8. https://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+
Dataset Splits No The paper describes generating synthetic datasets and using the 'Year' and 'Buzz' real datasets, but it does not specify any training/test/validation splits, percentages, or predefined splitting methodologies for these datasets. It mentions a sampling strategy for SGD within epochs but not for evaluation splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions several algorithms and methods (e.g., Ada Grad, SVRG, RLA, LSQR, interior-point method) but does not specify any software libraries, frameworks, or their version numbers used for implementing PWSGD or other comparative methods.
Experiment Setup Yes For each of these methods, given a target relative error ϵ = 0.1 on the objective, i.e., ( ˆf f )/f = 0.1, we use the optimal step-size suggested in the theory. [...] For Ada Grad, we use diagonal scaling and mirror descent update rule. For SVRG, we compute the full gradient every [n/2] iterations. As for implementation details, in all SGD-like algorithms, step-size tuning is done by grid-searching where at each trial the algorithm is run with a candidate step-size for enough iterations. Then the step-size that yields the lowest error within 10 seconds is used. The time/accuracy pair at every 2000 iterations is recorded. [...] In our experiments, we set m = 200. [...] the effect of stepsize η. When a constant stepsize is used, typically, a smaller η allows the algorithm to converge to a more accurate solution with a slower convergence rate. This is verified by Figure 6(a) in which the performance of PWSGD-full with larger η s saturates earlier at a coarser level while η = 0.001 allows the algorithm to achieve a finer solution.