reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference

Authors: Zewen Jin, Shengnan Wang, Jiaan Zhu, Hongrui Zhan, Youhui Bai, Lin Zhang, Zhenyu Ming, Cheng Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Big Mac achieves comparable or even better model quality than fine-grained Mo Es with the same number of experts and a similar number of total parameters. Equally importantly, Big Mac reduces the end-to-end latency by up to 3.09 for training and increases the throughput by up to 3.11 for inference on state-of-the-art AI computing frameworks including Megatron, Tutel, and Deep Speed-Inference. [...] We intensively profile the time ratios of training and inference for GPT-Fine-Grained and GPT-Big Mac, based on the state-of-the-art frameworks Megatron (Shoeybi et al. 2020), Tutel (Hwang et al. 2023), and Deep Speed-Inference (Microsoft 2024).
Researcher Affiliation	Collaboration	Zewen Jin1 2, Shengnan Wang2, Jiaan Zhu1 3, Hongrui Zhan1, Youhui Bai2, Lin Zhang2, Zhenyu Ming2, Cheng Li1 3 1University of Science and Technology of China 2Huawei Technologies 3Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Pseudocode	No	The paper does not contain any sections explicitly labeled "Pseudocode" or "Algorithm", nor does it present any structured, code-like formatted procedures.
Open Source Code	No	The paper mentions using existing state-of-the-art distributed training/inference frameworks like Megatron, Tutel, and Deep Speed-Inference, but there is no explicit statement or link indicating that the authors have released their own implementation code for the Big Mac methodology.
Open Datasets	Yes	We use the Wikipedia dataset (Wikimedia 2024) containing 3.6 B tokens to train these models on Megatron (NVIDIA 2019), one of the stateof-the-art LLM training frameworks. [...] we utilized a larger dataset named Open Web Text2 dataset (Eleuther AI 2020) with 14.8 B tokens.
Dataset Splits	No	The paper mentions training on the Wikipedia and Open Web Text2 datasets and evaluating perplexity, but it does not specify the exact training, validation, or test splits (e.g., percentages or sample counts) for these datasets. It refers to 'validation perplexity' but not the split methodology.
Hardware Specification	Yes	All the experiments are conducted on a cluster of 4 machines connected with 100 Gbps Infini Band. Each machine has the same configuration and is equipped with eight GPUs. Each GPU is connected with PCIe 4.0 x 16 and has 48 GB HBM, delivering up to 149.7 TFLOPS (FP16) with 96 cores.
Software Dependencies	No	The paper mentions using "Megatron", "Tutel", and "Deep Speed-Inference" frameworks, but it does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	Table 5: Hyper-parameters of pre-training to compare the validation perplexity curves in Figure 1. This table includes specific values for #Layers, #Heads, Hidden Dimension, Sequence Length, Vocabulary Size, Global Batch Size, Dropout Rate, Load Balance Type, Balance Coefficient, Optimizer (Adam), ϵ, β, Weight Decay, Learning Rate, Minimum Learning Rate, Learning Decay Steps, Learning Rate Decay Style, Warmup Steps, Gradient Clipping, and Random Seed.