reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Understanding the Mixture-of-Experts Layer in Deep Learning

Authors: Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of Mo E. This motivates us to consider a challenging classiﬁcation problem with intrinsic cluster structures. ... Finally, we also conduct extensive experiments on both synthetic and real datasets to corroborate our theory.
Researcher Affiliation	Academia	Zixiang Chen Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA EMAIL Yihe Deng Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA EMAIL Yue Wu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA EMAIL Yuanzhi Li Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA EMAIL
Pseudocode	Yes	Algorithm 1 Gradient descent with random initialization
Open Source Code	Yes	The code and data for our experiments can be found on Github 1. https://github.com/uclaml/MoE
Open Datasets	Yes	We consider the CIFAR-10 dataset (Krizhevsky, 2009)
Dataset Splits	Yes	We generate 16,000 training examples and 16,000 test examples from the data distribution deﬁned in Deﬁnition 3.1
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For CNN model, we use 2 convolution layers followed by 2 fully connected layers. The input channel is 3 and output channel is 64. The kernel size is 3 and padding is 1. We use max pooling layer with kernel size 2 and stride 2. We set learning rate to 0.001 and batch size to 128. We use Adam optimizer for all experiments.