reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Soft Merging of Experts with Adaptive Routing

Authors: Mohammed Muqeeth, Haokun Liu, Colin Raffel

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper... We empirically validate that models using SMEAR outperform models that route based on metadata or learn routing through gradient estimation. Furthermore, we provide qualitative analysis demonstrating that the experts learned via SMEAR exhibit a significant amount of specialization. All of the code used in our experiments is publicly available... We perform experiments in two real-world settings that differ in model architecture and modality.
Researcher Affiliation	Academia	Mohammed Muqeeth EMAIL University of North Carolina at Chapel Hill Hoakun Liu EMAIL University of Toronto Vector Institute Colin Raffel EMAIL University of Toronto Vector Institute
Pseudocode	No	The paper describes the methodology in prose and mathematical formulations (e.g., in Section 3, "More explicitly, we define SMEAR as computing the output of an expert routing block using a merged expert computed as f(u, P i R(v)iθi)") but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All of the code used in our experiments is publicly available.1 1https://github.com/r-three/smear
Open Datasets	Yes	Specifically, we experiment with fine-tuning T5.1.1 Base (Raffel et al., 2020) on datasets from GLUE (Wang et al., 2018) (referred to as T5-GLUE) and fine-tuning a Res Net18 (He et al., 2016) on Domain Net (Peng et al., 2019) (Res Net-Domain Net). GLUE consists of nine datasets (SST-2 (Socher et al., 2013), Co LA (Warstadt et al., 2019)), MNLI (Williams et al., 2017), RTE (Bentivogli et al., 2009), QQP (Shankar et al., 2017), MRPC (Dolan & Brockett, 2005), STS-B (Cer et al., 2017), QNLI (Rajpurkar et al., 2016), and WNLI (Levesque et al., 2012))... Domain Net is a collection of object recognition datasets... (Peng et al., 2019).
Dataset Splits	Yes	We follow the approach of Mahabadi et al. (2021) for splitting each GLUE dataset into train, eval, and test splits.
Hardware Specification	Yes	All models were trained on 48GB A6000s, except for the Ensemble method, which was trained on 80GB A100s.
Software Dependencies	No	The paper mentions specific models and optimizers (e.g., T5.1.1 Base, ResNet18, Adam W optimizer, Adam optimizer, ST-Gumbel estimator, REINFORCE estimator, DSelect-k method) but does not provide specific version numbers for underlying software libraries or programming languages (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	T5 models were trained for 600k steps using a learning rate of 3e 4, with 2k warmup steps, and batch size of 128. The Adam W optimizer was used with its default settings. We ran the ST-Gumbel estimator with a τ value of 10 and an anneal rate of 1e 6... For the REINFORCE estimator, we used the same values as in Clark et al. (2022), α = 1e 2, β = 5e 4, and γ = 1e 2... Res Net models were trained for 100k steps with batch size of 128 and a learning rate of 1e 3, with no warm up, using Adam optimizer. We used τ value of 10 and anneal rate of 1e 4 for the ST-Gumbel estimator... The hyperparameter that weighs entropy regularization in Dselect-k is chosen as 0.1.