SGD with Coordinate Sampling: Theory and Practice
Authors: Rémi Leluc, François Portier
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems. |
| Researcher Affiliation | Academia | Rémi Leluc EMAIL Department of Statistics, LTCI Télécom Paris, Institut Polytechnique de Paris 91120 Palaiseau, France François Portier EMAIL Department of Statistics, CREST Ecole Nationale de la Statistique et de l Analyse de l Information (ENSAI) 35170 Bruz, France |
| Pseudocode | Yes | MUSKETEER Require: θ0 Rp, N, T N, (γt)t 0, (λn)n 0, η > 0. 1. Initialize probability weights d0 = (1/p, . . . , 1/p) // start with uniform sampling 2. Initialize cumulative gains G0 = (0, . . . , 0) 3. for n = 0, . . . , N 1 do 4. Initialize current gain e G0 = (0, . . . , 0) 5. Run Explore(T, dn) // to compute current gain e GT 6. Run Exploit(Gn, e GT , λn, η) // to update weights dn+1 7. end for 8. Return final point θN |
| Open Source Code | Yes | For ease of reproducibility, the code is available online1. https://github.com/Remi LELUC/SCGD-Musketeer |
| Open Datasets | Yes | The datasets in the experiments are popular publicly available deep learning datasets: MNIST (Deng, 2012) and Fashion-MNIST (Xiao et al., 2017). Given an image, the goal is to predict its label among ten classes. |
| Dataset Splits | Yes | MNIST (Deng, 2012): a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. Fashion-MNIST (Xiao et al., 2017): a dataset of Zalando s article images, composed of a training set of 60,000 examples and a test set of 10,000 examples. CIFAR10 (Krizhevsky et al., 2009): The CIFAR-10 dataset consists of 60, 000 32 32 colour images in 10 classes, with 6, 000 images per class. There are 50, 000 training images and 10, 000 test images. |
| Hardware Specification | Yes | The experiments of linear models are run using a processor Intel Core i7-10510U CPU 1.80GHz 8; the neural networks are trained using GPU from Google Colab (GPU: Nvidia K80 / T4; GPU Memory: 12GB/16GB; GPU Memory Clock: 0.82GHz/1.59GHz; Performance: 4.1 TFLOPS / 8.1 TFLOPS) |
| Software Dependencies | No | The paper mentions "PyTorch optimizer" but does not provide a specific version number for PyTorch or any other software components. |
| Experiment Setup | Yes | In all cases, the initial parameter is set to θ0 = (0, . . . , 0) Rp and the optimal SGD learning rate of the form γk = γ/(k + k0) is used. (zeroth-order) For the Ridge regression, we set γ = 3, k0 = 10 and for the logistic regession γ = 10, k0 = 5. Hyperparameters. When training neural networks with linear layers, we use: batch_size = 32; input_size = 28*28; hidden_size = 32; output_size = 64, along with the parameters (zeroth-order) γ = 10 (Mnist and Fashion-Mnist) γ=15 (Kmnist); h = 0.01; ℓ1 normalization with λn = 1/ log(n); softmax normalization with λn 0.2 and η = 5 . |