Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach

Authors: Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on CIFAR-10 and Tiny Image Net datasets using Res Net18 and Vision Transformer (Vi T) architectures demonstrate the effectiveness of our proposed methods.
Researcher Affiliation Collaboration 1Illinois Institute of Technology 2Drexel University 3Perplexity AI. Correspondence to: Ren Wang <EMAIL>.
Pseudocode Yes Algorithm 1 The JTDMo E algorithm
Open Source Code Yes The code is publicly available at https://github.com/TIML-Group/ Robust-Mo E-Dual-Model.
Open Datasets Yes Experimental results on CIFAR-10 and Tiny Image Net datasets using Res Net18 and Vision Transformer (Vi T) architectures demonstrate the effectiveness of our proposed methods.
Dataset Splits No The paper uses CIFAR-10 and Tiny Image Net datasets but does not explicitly state the training/validation/test splits (e.g., percentages or counts) used for these datasets.
Hardware Specification No We are thankful for the computational resources made available through NSF ACCESS and Argonne Leadership Computing Facility.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We train Res Net18-based Mo E for 130 epochs on CIFAR-10 and fine-tune pre-trained Vi T-small-based Mo E for 10 epochs on Tiny Image Net. A Cyclic Learning Rate strategy (Smith, 2017), starting at 0.0001, and data augmentation (Rebuffi et al., 2021) are used to enhance performance. The hyperparameter β controls the trade-off between Mo E-wide robustness and expert-specific robustness. (Table 9 shows values for beta: 1, 3, 6, 9). We use PGD (Madry et al., 2017) and Auto Attack (Croce & Hein, 2020) to assess model performance under adversarial conditions, with ϵ = 8/255 for CIFAR-10 and ϵ = 2/255 for Tiny Image Net. ... Evaluation is done using either a 50-step PGD or Auto Attack with the same step size.