L-SVRG and L-Katyusha with Adaptive Sampling

Authors: Boxin Zhao, Boxiang Lyu, mladen kolar

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulations support our theory and the practical utility of the proposed sampling scheme on real data. Our numerical experiments support these findings. We conduct extensive simulations to provide empirical support for various aspects of our theory and real data experiments to demonstrate the practical benefits of adaptive sampling.
Researcher Affiliation Academia Boxin Zhao EMAIL Boxiang Lyu EMAIL Mladen Kolar EMAIL The University of Chicago Booth School of Business
Pseudocode Yes Algorithm 1 AS-LSVRG; Algorithm 2 AS-LKatyusha; Algorithm 3 OSMD sampler; Algorithm 4 Ada OSMD sampler; Algorithm 5 OSMD Solver
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets Yes We use the w8a dataset from Lib SVM classification tasks Zeng et al. (2008); Chang & Lin (2011).
Dataset Splits No The paper mentions using the w8a dataset for real data experiments and generating synthetic data, but it does not provide specific train/test/validation split percentages, sample counts, or explicit instructions for reproducing the data partitioning for either case.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or cloud computing specifications) used to run the experiments.
Software Dependencies No The paper references 'Lib SVM classification tasks', implying the use of the LibSVM library, but it does not specify a version number for LibSVM or any other software components used in the experiments.
Experiment Setup Yes The learning rate in Ada OSMD is set to γ = α 8 T a1 , where a1 = maxi [n] fi(x0) . For all experiments in this paper, we set α = 0.4. The stepsize the same for all algorithms, and is 0.1 when ν = 0, is 0.05 when ν = 0.5, and is 0.005 when ν = 1.0. The step size is set to 0.3. The stepsizes for both L-SVRG and AS-SVRG are initially tuned over the grid {10 2, 10 1.5, . . . , 102}. The initial search showed us that the optimal stepsize should be in the interval (0, 1). Therefore, we tune the stepsizes over a grid of 20 evenly spaced points on [0.05, 1]. The two algorithms are then used to train the model for 1000 iterations, repeated 10 times, and the best stepsize is chosen by picking the one that corresponds to the lowest loss at the 1000-th iteration.