Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start

Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7. Experiments We design the experiments with the following goals. Firstly, we assess the difficulties of applying warm-start and the effect of different upper-level batch sizes in a classification problem involving equilibrium models and in a meta-learning problem. In both settings the lower-level problem can be divided into several smaller sub-problems. Secondly, we compare our method with others achieving near-optimal sample complexity in a data poisoning problem.
Researcher Affiliation Academia Riccardo Grazzi EMAIL Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy and University College of London, UKMassimiliano Pontil EMAIL Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy and University College of London, UKSaverio Salzo EMAIL Universit a la Sapienza di Roma, Italy and Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy
Pseudocode Yes Algorithm 1 Stochastic Implicit Differentiation (SID) and Algorithm 2 Bilevel Stochastic Gradient Method (BSGM)
Open Source Code Yes We provide the code at https://github.com/CSML-IIT-UCL/bioptexps
Open Datasets Yes We perform this experiments using the whole MNIST training set, hence n = 6 104... (Section 7.1 Equilibrium Models) and We perform a meta-learning experiment on Mini-Imagenet (Vinyals et al., 2016)... (Section 7.2 Meta-Learning)
Dataset Splits Yes Mini-Imagenet contains 100 classes from Imagenet which are split into 64, 16, 20 for the meta-train, meta-validation and meta-test sets respectively. (Section 7.2 Meta-Learning) and Specifically, we consider an image classification problem on the MNIST data set where (X, y) Rn p {1, . . . , c}n, and (X , y ) Rn p {1, . . . , c}n are the training and validation sets, and p = 784, c = 10, n = 45,000 and n = 15,000 are the number of features, classes, training examples and validation examples respectively. (Section 7.3 Data Poisoning)
Hardware Specification Yes All methods have been implemented in Py Torch (Paszke et al., 2019) and the experiments have been executed on a GTX 1080 Ti GPU with 11GB of dedicated memory.
Software Dependencies No All methods have been implemented in Py Torch (Paszke et al., 2019) and the experiments have been executed on a GTX 1080 Ti GPU with 11GB of dedicated memory. (No specific version of PyTorch or other libraries mentioned)
Experiment Setup Yes Let λ0 = (θ0, b0, A0, B0, c0) be the hyperparameters at initialization, we set b0 = 0, and we sample each coordinate of θ0, A0, B0, and c0 from a Gaussian distribution with zero mean and standard deviation 0.01. In Algorithm 2 we also set w0(λ) = 0, ts = ks = 2, and α = 0.5. (Section 7.1 Equilibrium Models)