MGDA Converges under Generalized Smoothness, Provably
Authors: Qi Zhang, Peiyao Xiao, Shaofeng Zou, Kaiyi Ji
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this experiment, we evaluate the performance of the Cityscapes(Cordts et al., 2016) and NYUv2 (Silberman et al., 2012) datasets. |
| Researcher Affiliation | Academia | School of Electrical, Computer and Energy Engineering, Arizona State University1 Department of Computer Science and Engineering, University at Buffalo2 |
| Pseudocode | Yes | Algorithm 1 Single loop MGDA with and without warm start Algorithm 2 warm-start(w0, x0, ρ) Algorithm 3 Stochastic MGDA with Double Sampling Algorithm 4 MGDA with Fast Approximation (MGDA-FA) |
| Open Source Code | No | The code is available in https://github.com/Jingzhao Zhang/whyclipping-accelerates. Since there is no weight update process, we only need to choose α = 0.0005 for both tasks. (Explanation: This code is for illustrating a concept from a previous paper, not the main methodology described here.) |
| Open Datasets | Yes | In this experiment, we evaluate the performance of the Cityscapes(Cordts et al., 2016) and NYUv2 (Silberman et al., 2012) datasets. |
| Dataset Splits | No | Following the experiment setup in Xiao et al. (2024), we train our method for 200 epochs, using SGD optimizers for both model parameters and weights, and the batch size for Cityscapes is 8. We compute the averaged test performance over the last 10 epochs as the final performance measure. (Explanation: This text refers to the training process and test performance, but does not specify the actual dataset splits like train/validation/test percentages or counts.) |
| Hardware Specification | Yes | All experiments are run on NVIDIA RTX A6000. |
| Software Dependencies | No | Following the experiment setup in Xiao et al. (2024), we train our method for 200 epochs, using SGD optimizers for both model parameters and weights, and the batch size for Cityscapes is 8. (Explanation: The paper mentions SGD optimizers but does not provide specific software dependencies with version numbers, such as programming languages or libraries.) |
| Experiment Setup | Yes | we train our method for 200 epochs, using SGD optimizers for both model parameters and weights, and the batch size for Cityscapes is 8. We fix the β = 0.5 and do a grid search on hyperparameters including N [10, 20, 40, 50], α [0.0001, 0.0002, 0.0005, 0.001], and ρ [0.01, 0.05, 0.1, 0.2, 0.5, 0.6, 0.7, 0.8, 0.9, 1] and choose the best result from them. It turns out our best performance is based on the choice that N = 40, α = 0.0005, β = 0.5, and ρ = 0.5. |