reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continuous Ensemble Weather Forecasting with Diffusion models

Authors: Martin Andrae, Tomas Landelius, Joel Oskarsson, Fredrik Lindsten

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on global weather forecasting up to 10 days at 1, 6, and 24 hour timesteps. We use the downsampled ERA5 reanalysis dataset (Hersbach et al., 2020) at 5.625 resolution and 1-hour increments provided by Weather Bench (Rasp et al., 2020). Metrics. We evaluate the skill of the forecasting models by computing the Root Mean Squared Error (RMSE) of the ensemble mean. As a probabilistic metric we also consider Continuous Ranked Probability Score (CRPS) (Gneiting & Raftery, 2007). Qualitative results. Figure 3 shows an example forecast... Quantitative results. Table 1 and figure 4a show metrics for a selection of lead times and variables.
Researcher Affiliation	Academia	1Link oping University, Sweden 2Swedish Meteorological and Hydrological Institute EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 The Continuous Ensemble Forecasting algorithm Algorithm 2 The Extended Continuous Ensemble Forecasting algorithm Algorithm 3 ARCI (Autoregressive roll-outs with continuous interpolation) Algorithm 4 Deterministic sampling using Heun s 2nd order method.
Open Source Code	Yes	1The code is available at https://github.com/martinandrae/Continuous-Ensemble-Forecasting
Open Datasets	Yes	We use the downsampled ERA5 reanalysis dataset (Hersbach et al., 2020) at 5.625 resolution and 1-hour increments provided by Weather Bench (Rasp et al., 2020).
Dataset Splits	Yes	All models are trained on the period 1979 2015, validated on 2016 2017 and tested on 2018.
Hardware Specification	Yes	Training is done on a 40GB NVIDIA A100 GPU and takes roughly 2 days.
Software Dependencies	No	The paper mentions 'Pytorch' and 'Adam W' but does not specify any version numbers for these or any other software components.
Experiment Setup	Yes	Table 4: Optimizer Hyperparameters. Peak LR 5e-4, Weight decay 0.1, Warmup steps 1e3, Epochs 300, Batch size 256, Dropout probability 0.1. Table 3: Parameters used for sampling and training. Maximum noise level σmax 80, Minimum noise level σmin 0.03, Shape of noise distribution ρ 7, Number of noise levels N 20.