Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach
Authors: Xu Zhang, Kaidi Xu, Ziqing Hu, Ren Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on CIFAR-10 and Tiny Image Net datasets using Res Net18 and Vision Transformer (Vi T) architectures demonstrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Collaboration | 1Illinois Institute of Technology 2Drexel University 3Perplexity AI. Correspondence to: Ren Wang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 The JTDMo E algorithm |
| Open Source Code | Yes | The code is publicly available at https://github.com/TIML-Group/ Robust-Mo E-Dual-Model. |
| Open Datasets | Yes | Experimental results on CIFAR-10 and Tiny Image Net datasets using Res Net18 and Vision Transformer (Vi T) architectures demonstrate the effectiveness of our proposed methods. |
| Dataset Splits | No | The paper uses CIFAR-10 and Tiny Image Net datasets but does not explicitly state the training/validation/test splits (e.g., percentages or counts) used for these datasets. |
| Hardware Specification | No | We are thankful for the computational resources made available through NSF ACCESS and Argonne Leadership Computing Facility. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train Res Net18-based Mo E for 130 epochs on CIFAR-10 and fine-tune pre-trained Vi T-small-based Mo E for 10 epochs on Tiny Image Net. A Cyclic Learning Rate strategy (Smith, 2017), starting at 0.0001, and data augmentation (Rebuffi et al., 2021) are used to enhance performance. The hyperparameter β controls the trade-off between Mo E-wide robustness and expert-specific robustness. (Table 9 shows values for beta: 1, 3, 6, 9). We use PGD (Madry et al., 2017) and Auto Attack (Croce & Hein, 2020) to assess model performance under adversarial conditions, with ϵ = 8/255 for CIFAR-10 and ϵ = 2/255 for Tiny Image Net. ... Evaluation is done using either a 50-step PGD or Auto Attack with the same step size. |