Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start
Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. Experiments We design the experiments with the following goals. Firstly, we assess the difficulties of applying warm-start and the effect of different upper-level batch sizes in a classification problem involving equilibrium models and in a meta-learning problem. In both settings the lower-level problem can be divided into several smaller sub-problems. Secondly, we compare our method with others achieving near-optimal sample complexity in a data poisoning problem. |
| Researcher Affiliation | Academia | Riccardo Grazzi EMAIL Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy and University College of London, UKMassimiliano Pontil EMAIL Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy and University College of London, UKSaverio Salzo EMAIL Universit a la Sapienza di Roma, Italy and Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy |
| Pseudocode | Yes | Algorithm 1 Stochastic Implicit Differentiation (SID) and Algorithm 2 Bilevel Stochastic Gradient Method (BSGM) |
| Open Source Code | Yes | We provide the code at https://github.com/CSML-IIT-UCL/bioptexps |
| Open Datasets | Yes | We perform this experiments using the whole MNIST training set, hence n = 6 104... (Section 7.1 Equilibrium Models) and We perform a meta-learning experiment on Mini-Imagenet (Vinyals et al., 2016)... (Section 7.2 Meta-Learning) |
| Dataset Splits | Yes | Mini-Imagenet contains 100 classes from Imagenet which are split into 64, 16, 20 for the meta-train, meta-validation and meta-test sets respectively. (Section 7.2 Meta-Learning) and Specifically, we consider an image classification problem on the MNIST data set where (X, y) Rn p {1, . . . , c}n, and (X , y ) Rn p {1, . . . , c}n are the training and validation sets, and p = 784, c = 10, n = 45,000 and n = 15,000 are the number of features, classes, training examples and validation examples respectively. (Section 7.3 Data Poisoning) |
| Hardware Specification | Yes | All methods have been implemented in Py Torch (Paszke et al., 2019) and the experiments have been executed on a GTX 1080 Ti GPU with 11GB of dedicated memory. |
| Software Dependencies | No | All methods have been implemented in Py Torch (Paszke et al., 2019) and the experiments have been executed on a GTX 1080 Ti GPU with 11GB of dedicated memory. (No specific version of PyTorch or other libraries mentioned) |
| Experiment Setup | Yes | Let λ0 = (θ0, b0, A0, B0, c0) be the hyperparameters at initialization, we set b0 = 0, and we sample each coordinate of θ0, A0, B0, and c0 from a Gaussian distribution with zero mean and standard deviation 0.01. In Algorithm 2 we also set w0(λ) = 0, ts = ks = 2, and α = 0.5. (Section 7.1 Equilibrium Models) |