SGD with Coordinate Sampling: Theory and Practice

Authors: Rémi Leluc, François Portier

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems.
Researcher Affiliation Academia Rémi Leluc EMAIL Department of Statistics, LTCI Télécom Paris, Institut Polytechnique de Paris 91120 Palaiseau, France François Portier EMAIL Department of Statistics, CREST Ecole Nationale de la Statistique et de l Analyse de l Information (ENSAI) 35170 Bruz, France
Pseudocode Yes MUSKETEER Require: θ0 Rp, N, T N, (γt)t 0, (λn)n 0, η > 0. 1. Initialize probability weights d0 = (1/p, . . . , 1/p) // start with uniform sampling 2. Initialize cumulative gains G0 = (0, . . . , 0) 3. for n = 0, . . . , N 1 do 4. Initialize current gain e G0 = (0, . . . , 0) 5. Run Explore(T, dn) // to compute current gain e GT 6. Run Exploit(Gn, e GT , λn, η) // to update weights dn+1 7. end for 8. Return final point θN
Open Source Code Yes For ease of reproducibility, the code is available online1. https://github.com/Remi LELUC/SCGD-Musketeer
Open Datasets Yes The datasets in the experiments are popular publicly available deep learning datasets: MNIST (Deng, 2012) and Fashion-MNIST (Xiao et al., 2017). Given an image, the goal is to predict its label among ten classes.
Dataset Splits Yes MNIST (Deng, 2012): a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. Fashion-MNIST (Xiao et al., 2017): a dataset of Zalando s article images, composed of a training set of 60,000 examples and a test set of 10,000 examples. CIFAR10 (Krizhevsky et al., 2009): The CIFAR-10 dataset consists of 60, 000 32 32 colour images in 10 classes, with 6, 000 images per class. There are 50, 000 training images and 10, 000 test images.
Hardware Specification Yes The experiments of linear models are run using a processor Intel Core i7-10510U CPU 1.80GHz 8; the neural networks are trained using GPU from Google Colab (GPU: Nvidia K80 / T4; GPU Memory: 12GB/16GB; GPU Memory Clock: 0.82GHz/1.59GHz; Performance: 4.1 TFLOPS / 8.1 TFLOPS)
Software Dependencies No The paper mentions "PyTorch optimizer" but does not provide a specific version number for PyTorch or any other software components.
Experiment Setup Yes In all cases, the initial parameter is set to θ0 = (0, . . . , 0) Rp and the optimal SGD learning rate of the form γk = γ/(k + k0) is used. (zeroth-order) For the Ridge regression, we set γ = 3, k0 = 10 and for the logistic regession γ = 10, k0 = 5. Hyperparameters. When training neural networks with linear layers, we use: batch_size = 32; input_size = 28*28; hidden_size = 32; output_size = 64, along with the parameters (zeroth-order) γ = 10 (Mnist and Fashion-Mnist) γ=15 (Kmnist); h = 0.01; ℓ1 normalization with λn = 1/ log(n); softmax normalization with λn 0.2 and η = 5 .