Injecting Imbalance Sensitivity for Multi-Task Learning
Authors: Zhipeng Zhou, Liu Liu, Peilin Zhao, Wei Gong
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance. [...] The extensive experimental results present compelling evidence that IMGrad consistently enhances its baselines and surpasses the current advanced gradient manipulation methods in a diverse range of evaluations, e.g., supervised learning tasks, and reinforcement learning benchmarks. |
| Researcher Affiliation | Collaboration | Zhipeng Zhou 1 , Liu Liu 2, , Peilin Zhao 2 and Wei Gong 1, 1University of Science and Technology of China 2Tencent AI Lab EMAIL, EMAIL, EMAIL, |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual explanations but does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs 4. 4Code is avaliable at https://github.com/zzpustc/IMGrad. |
| Open Datasets | Yes | We conducted experiments on the City Scapes dataset [Cordts et al., 2016] [...] NYUv2 is a widely used indoor scene understanding dataset for MTL benchmarking [...] The City Scapes dataset is used for MTL evaluation [...] Celeb A is a widely used face attributes dataset containing over 200,000 images annotated with 40 attributes. [...] we use CAGrad as the baseline and conduct experiments on the MT10 environment from the Meta-World benchmark [Yu et al., 2020b]. |
| Dataset Splits | No | The paper refers to using datasets like City Scapes, NYUv2, Celeb A, and MT10, and mentions training for a certain number of epochs or using specific batch sizes, but it does not explicitly state the train/test/validation split percentages or sample counts for these datasets within the provided text. It mentions using a 'validation set' for MT10, but not the specific split. |
| Hardware Specification | Yes | We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs. |
| Software Dependencies | Yes | We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs. |
| Experiment Setup | Yes | Specifically, models are trained for 200 epochs using the Adam optimizer, with an initial learning rate of 1e-4, which decays to 5e-5 after 100 epochs. [...] The model is trained using the Adam optimizer for 15 epochs, with an initial learning rate of 3.0e-4 and a batch size of 256. |