Injecting Imbalance Sensitivity for Multi-Task Learning

Authors: Zhipeng Zhou, Liu Liu, Peilin Zhao, Wei Gong

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance. [...] The extensive experimental results present compelling evidence that IMGrad consistently enhances its baselines and surpasses the current advanced gradient manipulation methods in a diverse range of evaluations, e.g., supervised learning tasks, and reinforcement learning benchmarks.
Researcher Affiliation Collaboration Zhipeng Zhou 1 , Liu Liu 2, , Peilin Zhao 2 and Wei Gong 1, 1University of Science and Technology of China 2Tencent AI Lab EMAIL, EMAIL, EMAIL,
Pseudocode No The paper describes methods using mathematical formulations and textual explanations but does not contain explicit pseudocode or algorithm blocks.
Open Source Code Yes We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs 4. 4Code is avaliable at https://github.com/zzpustc/IMGrad.
Open Datasets Yes We conducted experiments on the City Scapes dataset [Cordts et al., 2016] [...] NYUv2 is a widely used indoor scene understanding dataset for MTL benchmarking [...] The City Scapes dataset is used for MTL evaluation [...] Celeb A is a widely used face attributes dataset containing over 200,000 images annotated with 40 attributes. [...] we use CAGrad as the baseline and conduct experiments on the MT10 environment from the Meta-World benchmark [Yu et al., 2020b].
Dataset Splits No The paper refers to using datasets like City Scapes, NYUv2, Celeb A, and MT10, and mentions training for a certain number of epochs or using specific batch sizes, but it does not explicitly state the train/test/validation split percentages or sample counts for these datasets within the provided text. It mentions using a 'validation set' for MT10, but not the specific split.
Hardware Specification Yes We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs.
Software Dependencies Yes We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs.
Experiment Setup Yes Specifically, models are trained for 200 epochs using the Adam optimizer, with an initial learning rate of 1e-4, which decays to 5e-5 after 100 epochs. [...] The model is trained using the Adam optimizer for 15 epochs, with an initial learning rate of 3.0e-4 and a batch size of 256.