reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

Authors: Zijia Zhao, Longteng Guo, Jie Cheng, Xuange Gao, Hua Huang, Jing Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on four popular baseline models demonstrate that our Ada-K routing method significantly outperforms conventional Top-K routing. Compared to Top-K, our method achieves over 25% reduction in FLOPs and more than 20% inference speedup while still improving performance across various benchmarks.
Researcher Affiliation	Academia	1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Artificial Intelligence, Beijing Normal University
Pseudocode	No	The paper describes the method using mathematical formulas and textual explanations in Section 3.2 and 3.3, but no explicitly labeled 'Pseudocode' or 'Algorithm' block is provided.
Open Source Code	No	The code and checkpoints will be released at https://github.com/ivattyue/Ada-K.
Open Datasets	Yes	Following previous works (Touvron et al., 2023b; Le Scao et al., 2023; Li et al., 2023; Black et al., 2022), we employ the lm-evaluation-harness (Gao et al., 2021) to evaluate our model. This tool serves as the backend for the Hugging Face Open LLM Leaderboard (Beeching et al., 2023). Our model is assessed on 6 key benchmarks aligned with Open LLM Leaderboard. [...] These tasks include AI2 Reasoning Challenge (ARC-C) (Clark et al., 2018), Hella Swag (Hella) (Zellers et al., 2019), MMLU (Hendrycks et al., 2020), Truthful QA (Truth) (Lin et al., 2021), Winogrande (Wino) (Sakaguchi et al., 2021) and GSM8K (GSM) (Cobbe et al., 2021).
Dataset Splits	Yes	Benchmark and Evaluation Details. Following previous works (Touvron et al., 2023b; Le Scao et al., 2023; Li et al., 2023; Black et al., 2022), we employ the lm-evaluation-harness (Gao et al., 2021) to evaluate our model. This tool serves as the backend for the Hugging Face Open LLM Leaderboard (Beeching et al., 2023). Our model is assessed on 6 key benchmarks aligned with Open LLM Leaderboard. [...] Table 10: Details of benchmarks. We follow the setting of Hugging Face Open LLM Leaderboard. Benchmark #shots # Samples Details ARC-C (Clark et al., 2018) 25 2.59k A set of grade-school science questions.
Hardware Specification	Yes	We employ 16 NVIDIA A800 GPUs to train Mixtral 8x22B, whereas each of the other three utilizes 8 NVIDIA A800 GPUs.
Software Dependencies	No	The paper mentions using "Adam W" as an optimizer and "bf16" precision, but it does not specify version numbers for any software libraries, programming languages, or other key software components.
Experiment Setup	Yes	Training Details. We adopt Adam W (Loshchilov & Hutter, 2017) as the optimizer. All baseline models are trained for one epoch using a consistent set of 10k samples. The batch size and learning rate is set to 64 and 1e-3, respectively. We leverage 2 PPO epochs for reinforcement learning. For all four baseline models, we uniformly set λ as 3e-3. [...] Table 11: Additional training details. Configuration Fine-tuning Warm-Start PPO Optimizer Adam W Adam W Base LR 1e-3 1e-3 Precision bf16 bf16 Weight Decay 0.1 0.1 Batch Size 64 64 LR Decay Schedule cosine constant Gradient Checkpoint True True Training Epochs 1 1 Max Length 2048 2048 Threshold p 0.3 Regularization Coef 3e-3 PPO Epoch 2