Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning
Authors: Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets, and the results are consistent with our theoretical findings and demonstrate that PWSGD converges to a medium-precision solution, e.g., ϵ = 10 3, more quickly. [...] In this section, we provide empirical evaluations of our main algorithm PWSGD. We evaluate its convergence rate and overall running time on both synthetic and real datasets. |
| Researcher Affiliation | Academia | Jiyan Yang EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Yin-Lam Chow EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Christopher R e EMAIL Department of Computer Science Stanford University Stanford, CA 94305, USA; Michael W. Mahoney EMAIL International Computer Science Institute and Department of Statistics University of California, Berkeley Berkeley, CA 94720, USA |
| Pseudocode | Yes | Algorithm 1 PWSGD preconditioned weighted SGD for over-determined ℓ1 and ℓ2 regression [...] Algorithm 2 Compute ϵ-coreset [...] Algorithm 3 RLA methods with algorithmic leveraging for constrained ℓp regression |
| Open Source Code | No | The paper does not provide explicit statements or links to source code for the methodology described. |
| Open Datasets | Yes | name #rows # columns κ(A) Year7 5 105 90 2 103 Buzz8 5 105 77 108 [...] 7. https://archive.ics.uci.edu/ml/datasets/Year Prediction MSD 8. https://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+ |
| Dataset Splits | No | The paper describes generating synthetic datasets and using the 'Year' and 'Buzz' real datasets, but it does not specify any training/test/validation splits, percentages, or predefined splitting methodologies for these datasets. It mentions a sampling strategy for SGD within epochs but not for evaluation splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and methods (e.g., Ada Grad, SVRG, RLA, LSQR, interior-point method) but does not specify any software libraries, frameworks, or their version numbers used for implementing PWSGD or other comparative methods. |
| Experiment Setup | Yes | For each of these methods, given a target relative error ϵ = 0.1 on the objective, i.e., ( ˆf f )/f = 0.1, we use the optimal step-size suggested in the theory. [...] For Ada Grad, we use diagonal scaling and mirror descent update rule. For SVRG, we compute the full gradient every [n/2] iterations. As for implementation details, in all SGD-like algorithms, step-size tuning is done by grid-searching where at each trial the algorithm is run with a candidate step-size for enough iterations. Then the step-size that yields the lowest error within 10 seconds is used. The time/accuracy pair at every 2000 iterations is recorded. [...] In our experiments, we set m = 200. [...] the effect of stepsize η. When a constant stepsize is used, typically, a smaller η allows the algorithm to converge to a more accurate solution with a slower convergence rate. This is verified by Figure 6(a) in which the performance of PWSGD-full with larger η s saturates earlier at a coarser level while η = 0.001 allows the algorithm to achieve a finer solution. |