reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Many-Objective Multi-Solution Transport

Authors: Ziyue Li, Tian Li, Virginia Smith, Jeff Bilmes, Tianyi Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, Mos T distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives. ... In all applications (Section 6), Mos T finds diverse high-quality solutions on the Pareto front, consistently outperforming various strong baselines in terms of average accuracy and other popular metrics on the quality of multiple solutions, without extra computation cost. ... Section 6 is titled 'MOST APPLICATIONS' and contains subsections like 'EXPERIMENTAL SETUP', 'FEDERATED LEARNING', 'MULTI-TASK LEARNING', 'MIXTURE-OF-PROMPT LEARNING', and 'ABLATION STUDIES AND COMPARISON WITH OTHER BASELINES', all detailing empirical evaluations and comparisons.
Researcher Affiliation	Academia	1University of Maryland 2University of Chicago 3Carnegie Mellon University 4University of Washington, Seattle EMAIL, EMAIL, EMAIL, EMAIL. All listed affiliations are universities, and email domains are .edu.
Pseudocode	Yes	Algorithm 1 Many-Objective Multi-Solution Transport
Open Source Code	Yes	Project: https://github.com/tianyi-lab/Mos T
Open Datasets	Yes	We conduct experiments on synthetic data and Federated Extended MNIST (FEMNIST) (Cohen et al., 2017; Caldas et al., 2018) ... Office-Caltech10 (Saenko et al., 2010; Griffin et al., 2007) and Domain Net (Peng et al., 2019) ... three datasets from the Super GLUE benchmark (Wang et al., 2019) ... toy ZDT problem set (Zitzler et al., 2000) ... German credit dataset (Asuncion & Newman, 2007).
Dataset Splits	Yes	For the generated dataset, we conduct our experiments using a train-validation-test split ratio of 6:2:2. ... We randomly sample 128 instances from the training dataset and evenly partition the original validation dataset to form both the validation and test datasets.
Hardware Specification	Yes	Table 9: Runtime (sec) comparisons for all methods on federated learning datasets, performed on a single Nvidia RTX A5000 platform. ... Table 12: Runtime (sec) comparisons for all methods on ZDT datasets, performed on a single Nvidia RTX A4000 platform.
Software Dependencies	No	The paper mentions software components like 'IPOT (Xie et al., 2020)', 'Frank-Wolfe algorithms (Fujishige, 1980)', 'T5-base model', but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	The learning rates are swept from {0.005, 0.01, 0.05, 0.1} without decaying throughout the training process. ... We run 400 epochs in total. ... We run 400 epochs for training. Learning rates are swept from {0.08, 0.1}. ... conducting 20 epochs of training and sweeping learning rates from {0.08, 0.1}.