reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MTL-UE: Learning to Learn Nothing for Multi-Task Learning

Authors: Yi Yu, Song Xia, Siyuan Yang, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex Kot

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that MTL-UE achieves superior attacking performance consistently across 4 MTL datasets, 3 base UE methods, 5 model backbones, and 5 MTL task-weighting strategies. Code is available at https://github.com/ yuyi-sd/MTL-UE. ... 5. Experiments 5.1. Experimental Setup 5.2. Experimental Results
Researcher Affiliation	Academia	1Rapid-Rich Object Search Lab, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore 2School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 3College of Computing and Data Science, Nanyang Technological University, Singapore 4Peng Cheng Laboratory, Shenzhen, China. Correspondence to: Yi Yu <EMAIL>, Wenhan Yang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Optimization of the UE Generator in MTL-UE
Open Source Code	Yes	Code is available at https://github.com/ yuyi-sd/MTL-UE.
Open Datasets	Yes	Datasets. We choose 4 popular multi-task vision datasets: Celeb A (Liu et al., 2015), Chest X-ray14 (Wang et al., 2017), UTKFace (Zhang et al., 2017), and NYUv2 (Nathan Silberman & Fergus, 2012).
Dataset Splits	Yes	Celeb A ... We use 162,770 images for training and 19,962 for testing... Chest X-ray14 ... The official splits are used... UTKFace ... splitting 80% for training and 20% for testing... NYUv2 ... with 795 training and 654 testing images...
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions model architectures like ResNet-18, but not the computational hardware.
Software Dependencies	No	The paper mentions optimizers (Adam) and schedulers (Multi Step LR, Step LR) but does not list specific software libraries or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup	Yes	A.2. Details of the models and the training: For the Celeb A, Chest X-ray14, and UTKFace datasets, we train the models for 60 epochs using the Adam optimizer, starting with a learning rate of 1e-3. We apply the Multi Step LR scheduler, with milestones at epochs 36 and 48, and a gamma of 0.1 to adjust the learning rate. The batch size is set to 512 for Celeb A and UTKFace, and 128 for Chest X-ray14. ... For the NYUv2 dataset, we train the models for 200 epochs, starting with a learning rate of 1e-4, and use a Step LR scheduler with a step size of 100 and a gamma of 0.1 to reduce the learning rate during training. We set the batch size to 8 for this dataset... For the hyperparameters in Alg. 1, ϵ is set to match the baseline methods, with the default value ϵ = 8 255 for all datasets. The weight λ1 is set to 20, and λ2 is set to 100 across all datasets.