reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Theory on Mixture-of-Experts in Continual Learning

Authors: Hongbo Li, Sen Lin, Lingjie Duan, Yingbin Liang, Ness Shroff

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper provides the first theoretical results to characterize the impact of Mo E in CL via the lens of overparameterized linear regression tasks... Finally, we conduct experiments on both synthetic and real datasets to extend these insights from linear models to deep neural networks (DNNs), which also shed light on the practical algorithm design for Mo E in CL.
Researcher Affiliation	Academia	1Engineering Systems and Design Pillar, Singapore University of Technology and Design, EMAIL, EMAIL 2Department of Computer Science, University of Houston, EMAIL 3Department of ECE, The Ohio State University, EMAIL 4Department of CSE, The Ohio State University
Pseudocode	Yes	Algorithm 1 Training of the Mo E model for CL
Open Source Code	No	The paper does not explicitly state that the code is open-source or provide a link to a code repository. It mentions 'detailed experimental setups and additional results' in Appendix A but not code availability.
Open Datasets	Yes	Finally, we conduct experiments on real datasets using DNNs to show that certain insights can extend beyond linear models... We use the CIFAR-10 (Krizhevsky et al. (2009)) dataset... We use the MNIST dataset Le Cun et al. (1989)... We use the CIFAR-100 (Krizhevsky et al. (2009)) dataset... We use the Tiny Image Net (Le & Yang (2015)) dataset
Dataset Splits	Yes	We use the CIFAR-10 (Krizhevsky et al. (2009)) dataset, selecting 512 samples randomly for training and 2000 samples for testing at each training round. ... We use the MNIST dataset Le Cun et al. (1989), selecting 100 samples randomly for training and 1000 samples for testing at each training round. ... We use the CIFAR-100 (Krizhevsky et al. (2009)) dataset, selecting 192 samples randomly for training and 600 samples for testing at each training round. ... We use the Tiny Image Net (Le & Yang (2015)) dataset, selecting 192 samples randomly for training and 300 samples for testing at each training round.
Hardware Specification	Yes	Operating system: Red Hat Enterprise Linux Server 7.9 (Maipo) Type of CPU: 2.9 GHz 48-Core Intel Xeon 8268s Type of GPU: NVIDIA Volta V100 w/32 GB GPU memory
Software Dependencies	Yes	Operating system: Red Hat Enterprise Linux Server 7.9 (Maipo)
Experiment Setup	Yes	In the first experiment, we aim to check the necessity of terminating the update of Θt in Line 11 of Algorithm 1. Here we set T = 2000, N = 6, K = 3 and vary expert number M {1, 5, 10, 20}. ... Here we set σ0 = 0.4, σt = 0.1, d = 10 and s = 6. In Figure 2, we set η = 0.5, α = 0.5 and λ = 0.3. ... We employ a non-pretrained Res Net-18 as our base model. Each task is learned using Adam with a learning rate governed by a cosine annealing schedule for 5 epoches, with a minibatch size of 32, a weight decay of 0.005. The initial learning rate is set to 0.0005, and it is reduced to a minimum value of 10 6 over a total of 300 rounds.