EWMoE: An Effective Model for Global Weather Forecasting with Mixture-of-Experts
Authors: Lihao Gan, Xin Man, Chenghong Zhang, Jie Shao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct our evaluation on the ERA5 dataset using only two years of training data. Extensive experiments demonstrate that EWMo E outperforms current models such as Four Cast Net and Clima X at all forecast time, achieving competitive performance compared with the state-of-the-art models Pangu-Weather and Graph Cast in evaluation metrics such as Anomaly Correlation Coefficient (ACC) and Root Mean Square Error (RMSE). Additionally, ablation studies indicate that applying the Mo E architecture to weather forecasting offers significant advantages in improving accuracy and resource efficiency. |
| Researcher Affiliation | Academia | 1University of Electronic Science and Technology of China, Chengdu, China 2Sichuan Artificial Intelligence Research Institute, Yibin, China 3Institute of Plateau Meteorology, China Meteorological Administration, Chengdu, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation code is available at https://github.com/technomii/EWMo E. |
| Open Datasets | Yes | ERA5 (Hersbach et al. 2020) is a publicly available atmospheric reanalysis dataset produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). |
| Dataset Splits | Yes | In addition, to demonstrate the effectiveness of our model in the case of limited data and computing resources, we use two years of data for training (2015 and 2016), one year for validation (2017), and one year for testing (2018). |
| Hardware Specification | Yes | The training of EWMo E was completed under 9 days on 2 Nvidia 3090 GPUs. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer but does not specify versions for any key software libraries or frameworks (e.g., PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | For each input data sample from the ERA5 dataset, it can be represented as an image with 20 channels. We set the patch size as 8 8, and the EWMo E model consists of encoders with depth=6, dim=768 and decoders with depth=6, dim=512. Each encoder has a Mo E layer, and each Mo E layer consists of 20 independent experts. Specifically, in the gating network of each Mo E layer, we use top-2 routing to select the top-2 ranked experts for forward propagation of training data. We employ the Adam W optimizer with two momentum parameters β1=0.9 and β2=0.95, and set the weight decay to 0.05. |