Efficient pooling of predictions via kernel embeddings
Authors: Sam Allen, David Ginsbourger, Johanna Ziegel
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The practical utility of the proposed approaches is illustrated in an application to weather forecasting in Section 5. ... We compare the forecasts for several competing approaches: 1) the discrete predictive distributions issued by each of the three individual weather models (COSMO-1E, COSMO-2E, and ECMWF IFS); 2) a linear pool prediction that assigns equal weight to the three forecast models, often referred to as a multi-model forecast in the weather forecasting literature (LP Equal); 3) a traditional linear pool of the three discrete predictive distributions with weights estimated by minimising a proper scoring rule, as in Example 4 (LP Discrete); 4) a linear pool of the point predictions obtained from the sample members of the three forecast models, as in Example 5 (LP Point); and 5) a linear pool of the order statistics of the three forecast models, as in Example 6 (LP Ordered). |
| Researcher Affiliation | Academia | Sam Allen EMAIL Seminar for Statistics ETH Zürich David Ginsbourger EMAIL Institute for Mathematical Statistics and Actuarial Science University of Bern Johanna Ziegel EMAIL Seminar for Statistics ETH Zürich |
| Pseudocode | No | The paper includes propositions and mathematical derivations but does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and data used in this study are publicly available at https://github.com/ sallen12/RKHSCombi. |
| Open Datasets | Yes | Forecasts are available daily for three years between the 2nd June 2020 and 31st May 2023, during which none of the three models undergo major changes. Some dates are missing for all models and stations, resulting in a total of 1030 × 82 = 84460 forecast-observation pairs for each of the three models. The code and data used in this study are publicly available at https://github.com/ sallen12/RKHSCombi. |
| Dataset Splits | Yes | The first two years of data (2nd June 2020 to 1st June 2022) are used as training data on which to estimate the weights, and the resulting forecasts are then assessed out-of-sample using the remaining year of data. |
| Hardware Specification | No | The paper describes the forecast models (COSMO-1E, COSMO-2E, ECMWF IFS) and their characteristics, including sample members and spatial resolution, and mentions they involve "integrating an initial state of the atmosphere through time according to physical laws". However, it does not specify any hardware (CPU, GPU, memory, etc.) used by the authors to run their experiments or implement their methodology. |
| Software Dependencies | No | in the following, we use the kernlab package in R (Karatzoglou et al., 2004). |
| Experiment Setup | Yes | For the latter three methods, the weights are estimated by minimising the CRPS over a training data set, performed using the energy kernel in the convex quadratic optimisation problem outlined in Proposition 2. ... The first two years of data (2nd June 2020 to 1st June 2022) are used as training data on which to estimate the weights, and the resulting forecasts are then assessed out-of-sample using the remaining year of data. ... Many of these kernels depend on an additional lengthscale hyperparameter, which is set to one in Figure 8, though similar results are obtained for other lengthscales (see Appendix C). |