Learning with Differentiable Pertubed Optimizers
Authors: Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally the performance of our approach on various tasks. and 5 Experiments We demonstrate the usefulness of perturbed maximizers in a supervised learning setting, as described in Section 4. We focus on a classification task and on two structured prediction tasks, label ranking and learning to predict shortest paths. |
| Researcher Affiliation | Collaboration | Quentin Berthet Google Research, Brain Team Paris, France EMAIL Mathieu Blondel Google Research, Brain Team Paris, France EMAIL Olivier Teboul Google Research, Brain Team Paris, France EMAIL Marco Cuturi Google Research, Brain Team Paris, France EMAIL Jean-Philippe Vert Google Research, Brain Team Paris, France EMAIL Francis Bach INRIA DI, ENS, PSL Research University Paris EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We will open-source a Python package allowing to turn any black-box solver into a differentiable function, in just a few lines of code. |
| Open Datasets | Yes | We use the perturbed argmax with Gaussian noise in an image classification task on the CIFAR-10 dataset. and We use the same 21 datasets as in [28, 14]. |
| Dataset Splits | Yes | Results are averaged over 10-fold CV and parameters tuned by 5-fold CV. |
| Hardware Specification | No | The paper mentions "In our experiments on GPU" but does not specify any particular hardware models or specifications. |
| Software Dependencies | No | The paper mentions "a Python package" but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We train a vanilla-CNN with 10 network outputs that are the entries of θ, we minimize the Fenchel-Young loss between θi gwpxiq and yi, with different temperatures ε and number of perturbations M. and We optimize over 50 epochs with batches of size 70, temperature ε 1 and M 1 (single perturbation). |