Identification of Negative Transfers in Multitask Learning Using Surrogate Models
Authors: Dongyue Li, Huy Nguyen, Hongyang Ryan Zhang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that our approach predicts negative transfers from multiple source tasks to target tasks much more accurately than existing task affinity measures. Additionally, we demonstrate that for five weak supervision datasets, our approach consistently improves upon existing optimization methods for multi-task learning. Experimental Results. We conduct extensive experiments to validate our approach in numerous data modalities and performance metrics. |
| Researcher Affiliation | Academia | Dongyue Li EMAIL Northeastern University, Boston; Huy L. Nguyen EMAIL Northeastern University, Boston; Hongyang R. Zhang EMAIL Northeastern University, Boston |
| Pseudocode | Yes | Algorithm 1 Subset Selection for Multi-Task Learning Using Relevance Scores |
| Open Source Code | Yes | The code repository for reproducing our experiments can be found at https: //github.com/NEU-Stats ML-Research/Task-Modeling. |
| Open Datasets | Yes | First, we apply our approach to several text classification tasks from a weak supervision dataset (Zhang et al., 2021). ... Second, we consider MTL with natural language processing tasks. We collect twenty-five datasets across a broad range of tasks, spanning sentiment classification, natural language inference, question answering, etc., from GLUE, Super GLUE, Tweet Eval, and ANLI. ... Third, we consider multi-group learning settings in which a dataset involves multiple subpopulation groups. We consider income prediction tasks based on US census data (Ding et al., 2021). |
| Dataset Splits | Yes | We include the dataset statistics in Table 1. ... We provide the statistics of the twenty-five tasks in Table 4, Appendix C.1. ... See Table 2 for dataset statistics. ... Table 1: Accuracy/F1-score from surrogate modeling followed by task selection (ours), as compared with MTL methods and weak supervision methods that use a label model to aggregate the weak labels. Dataset (Metrics) Youtube (Acc.) ... Training 1,586 Validation 120 Test 250 |
| Hardware Specification | Yes | Computational Cost. Next, we report the runtime cost collected on an NVIDIA Titan RTX card. |
| Software Dependencies | No | We use a standard approach for conducting MTL, i.e., hard parameter sharing. For text classification, we use BERT-Base as the encoder. For tabular features, we use a fully-connected layer with a hidden size of 32. |
| Experiment Setup | Yes | The surrogate modeling procedure requires three parameters: the size of a subset, the number of samples, and the loss function. We select the size from a range between 3, 5, 10, and 15. We select the number of samples from a range between 50, 200, 400, and 800, depending on k. We also collect a holdout set of size 100 for constructing the surrogate model. For classification tasks, we set the loss function as the negative classification margin, i.e., the difference between the correct-class probability and the highest incorrect-class probability. ... To set the threshold γ in our algorithm, we use grid search from 0.5 to 0.5 at an interval of 0.1. |