reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identification of Negative Transfers in Multitask Learning Using Surrogate Models

Authors: Dongyue Li, Huy Nguyen, Hongyang Ryan Zhang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we show that our approach predicts negative transfers from multiple source tasks to target tasks much more accurately than existing task affinity measures. Additionally, we demonstrate that for five weak supervision datasets, our approach consistently improves upon existing optimization methods for multi-task learning. Experimental Results. We conduct extensive experiments to validate our approach in numerous data modalities and performance metrics.
Researcher Affiliation	Academia	Dongyue Li EMAIL Northeastern University, Boston; Huy L. Nguyen EMAIL Northeastern University, Boston; Hongyang R. Zhang EMAIL Northeastern University, Boston
Pseudocode	Yes	Algorithm 1 Subset Selection for Multi-Task Learning Using Relevance Scores
Open Source Code	Yes	The code repository for reproducing our experiments can be found at https: //github.com/NEU-Stats ML-Research/Task-Modeling.
Open Datasets	Yes	First, we apply our approach to several text classification tasks from a weak supervision dataset (Zhang et al., 2021). ... Second, we consider MTL with natural language processing tasks. We collect twenty-five datasets across a broad range of tasks, spanning sentiment classification, natural language inference, question answering, etc., from GLUE, Super GLUE, Tweet Eval, and ANLI. ... Third, we consider multi-group learning settings in which a dataset involves multiple subpopulation groups. We consider income prediction tasks based on US census data (Ding et al., 2021).
Dataset Splits	Yes	We include the dataset statistics in Table 1. ... We provide the statistics of the twenty-five tasks in Table 4, Appendix C.1. ... See Table 2 for dataset statistics. ... Table 1: Accuracy/F1-score from surrogate modeling followed by task selection (ours), as compared with MTL methods and weak supervision methods that use a label model to aggregate the weak labels. Dataset (Metrics) Youtube (Acc.) ... Training 1,586 Validation 120 Test 250
Hardware Specification	Yes	Computational Cost. Next, we report the runtime cost collected on an NVIDIA Titan RTX card.
Software Dependencies	No	We use a standard approach for conducting MTL, i.e., hard parameter sharing. For text classification, we use BERT-Base as the encoder. For tabular features, we use a fully-connected layer with a hidden size of 32.
Experiment Setup	Yes	The surrogate modeling procedure requires three parameters: the size of a subset, the number of samples, and the loss function. We select the size from a range between 3, 5, 10, and 15. We select the number of samples from a range between 50, 200, 400, and 800, depending on k. We also collect a holdout set of size 100 for constructing the surrogate model. For classification tasks, we set the loss function as the negative classification margin, i.e., the difference between the correct-class probability and the highest incorrect-class probability. ... To set the threshold γ in our algorithm, we use grid search from 0.5 to 0.5 at an interval of 0.1.