reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient pooling of predictions via kernel embeddings

Authors: Sam Allen, David Ginsbourger, Johanna Ziegel

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The practical utility of the proposed approaches is illustrated in an application to weather forecasting in Section 5. ... We compare the forecasts for several competing approaches: 1) the discrete predictive distributions issued by each of the three individual weather models (COSMO-1E, COSMO-2E, and ECMWF IFS); 2) a linear pool prediction that assigns equal weight to the three forecast models, often referred to as a multi-model forecast in the weather forecasting literature (LP Equal); 3) a traditional linear pool of the three discrete predictive distributions with weights estimated by minimising a proper scoring rule, as in Example 4 (LP Discrete); 4) a linear pool of the point predictions obtained from the sample members of the three forecast models, as in Example 5 (LP Point); and 5) a linear pool of the order statistics of the three forecast models, as in Example 6 (LP Ordered).
Researcher Affiliation	Academia	Sam Allen EMAIL Seminar for Statistics ETH Zürich David Ginsbourger EMAIL Institute for Mathematical Statistics and Actuarial Science University of Bern Johanna Ziegel EMAIL Seminar for Statistics ETH Zürich
Pseudocode	No	The paper includes propositions and mathematical derivations but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and data used in this study are publicly available at https://github.com/ sallen12/RKHSCombi.
Open Datasets	Yes	Forecasts are available daily for three years between the 2nd June 2020 and 31st May 2023, during which none of the three models undergo major changes. Some dates are missing for all models and stations, resulting in a total of 1030 × 82 = 84460 forecast-observation pairs for each of the three models. The code and data used in this study are publicly available at https://github.com/ sallen12/RKHSCombi.
Dataset Splits	Yes	The first two years of data (2nd June 2020 to 1st June 2022) are used as training data on which to estimate the weights, and the resulting forecasts are then assessed out-of-sample using the remaining year of data.
Hardware Specification	No	The paper describes the forecast models (COSMO-1E, COSMO-2E, ECMWF IFS) and their characteristics, including sample members and spatial resolution, and mentions they involve "integrating an initial state of the atmosphere through time according to physical laws". However, it does not specify any hardware (CPU, GPU, memory, etc.) used by the authors to run their experiments or implement their methodology.
Software Dependencies	No	in the following, we use the kernlab package in R (Karatzoglou et al., 2004).
Experiment Setup	Yes	For the latter three methods, the weights are estimated by minimising the CRPS over a training data set, performed using the energy kernel in the convex quadratic optimisation problem outlined in Proposition 2. ... The first two years of data (2nd June 2020 to 1st June 2022) are used as training data on which to estimate the weights, and the resulting forecasts are then assessed out-of-sample using the remaining year of data. ... Many of these kernels depend on an additional lengthscale hyperparameter, which is set to one in Figure 8, though similar results are obtained for other lengthscales (see Appendix C).