reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mixture of Experts for Image Classification: What's the Sweet Spot?

Authors: Mathurin VIDEAU, Alessandro Leite, Marc Schoenauer, Olivier Teytaud

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we explore the integration of Mo E layers into image classiﬁcation architectures using open datasets. We conduct a systematic analysis across di"erent Mo E conﬁgurations and model scales. We conduct a series of experiments considering various architecture conﬁgurations. Likewise, we investigate the impact of various components, including the number of experts and their sizes, the gate design, and the layer positions, among others.
Researcher Affiliation	Collaboration	Mathurin Videau EMAIL Meta AI, TAU, INRIA, and LISN (CNRS & Univ. Paris-Saclay) Alessandro Leite EMAIL INSA Rouen Normandy, University of Rouen Normandy, LITIS UR 4108 Marc Schoenauer EMAIL TAU, INRIA and LISN (CNRS & Univ. Paris-Saclay) Olivier Teytaud Thales, Cort AIx-Labs
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes methodologies in narrative text and uses diagrams to illustrate architectural components (e.g., Figure 1).
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is released, nor does it provide a link to a code repository. It only provides a link to its Open Review page.
Open Datasets	Yes	In this work, we focus on leveraging the potential of Mo E models for image classiﬁcation on Image Net-1k and Image Net-21k (Russakovsky et al., 2015).
Dataset Splits	Yes	Tab. 12 presents the results obtained on Image Net-1k validation set by a model that has been entirely trained on Image Net-1k, for isotropic architecture (e.g., Vi T, Conv Ne Xt iso.) and a hierarchical architecture, namely Conv Ne Xt. Tab. 2 presents the results of models that are pre-trained on Image Net-21k, and tested on the same Image Net1k validation set than above.
Hardware Specification	Yes	Throughput is measured on V100 GPUs, following (Touvron et al., 2021).
Software Dependencies	No	The paper refers to training hyperparameters similar to other works (Touvron et al., 2022 and Liu et al., 2022) but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Furthermore, when working with the Image Net-1k dataset, we use a strong data-augmentation pipeline, including Mixup (Zhang et al., 2018), Cutmix (Yun et al., 2019), Rand Augment (Cubuk et al., 2020), and Random Erasing (Zhong et al., 2020), over 300 epochs. Likewise, we utilize drop path, weight decay, and expert-speciﬁc weight decay as regularization strategies. Comprehensive details of all the hyperparameters are provided in Tab. 10 in App. A.