Transferring Optimality Across Data Distributions via Homotopy Methods
Authors: Matilde Gargiani, Andrea Zanelli, Quoc Tran Dinh, Moritz Diehl, Frank Hutter
ICLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on a toy regression dataset and for transferring optimized parameters from MNIST to Fashion-MNIST and CIFAR-10 show substantial improvement of the numerical performance over random initialization and pre-training. |
| Researcher Affiliation | Collaboration | Matilde Gargiani1, Andrea Zanelli2, Quoc Tran-Dinh3, Moritz Diehl2,4, Frank Hutter1,5 1Department of Computer Science, University of Freiburg EMAIL 2Department of Microsystems Engineering (IMTEK), University of Freiburg EMAIL 3Department of Statistics and Operations Research, University of North Carolina EMAIL 4Department of Mathematics, University of Freiburg 5Bosch Center for Artificial Intelligence |
| Pseudocode | Yes | Conceptually, Algorithm 1 describes the basic steps of a general homotopy algorithm. Algorithm 1 A Conceptual Homotopy Algorithm 1: θ0 θ 0 arg minθ H(θ, 0) 2: γ > 0 , γ Z 3: λ0 = 0, λ = 1/γ 4: k > 0 , k Z 5: for i = 1, . . . , γ do 6: λi λi 1 + λ 7: procedure θi ITERATIVESOLVER(θi 1, k, H(θ, λi)) 8: return θγ |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | Empirical evaluations on a toy regression dataset and for transferring optimized parameters from MNIST to Fashion-MNIST and CIFAR-10 show substantial improvement of the numerical performance over random initialization and pre-training. |
| Dataset Splits | No | Section 6.1 states: 'Each considered dataset has 10000 samples split across training and testing...' While a train/test split is mentioned, no specific information about a validation split, exact percentages for splits, or cross-validation methodology is provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions using 'Adam as optimizer' and 'VGG-type network'. However, it does not specify any version numbers for these or other software components (e.g., Adam version, PyTorch version, Python version), which are necessary for reproducibility. |
| Experiment Setup | Yes | For the experiments in Figures 1a 1b, Figures 7a 7b in the appendix, and Figure 2a, we set α = 0.001, γ = 10, k = 200 and then performed an additional 500 epochs on the final target problem, while for the experiments in Figure 2b, we set γ = 10, k = 300 and performed an additional 600 epochs on the final target problem. In this last scenario we set α = 0.001 and then decrease it with a cosine annealing schedule to observe convergence to an optimum. |