Fast Automatic Smoothing for Generalized Additive Models
Authors: Yousra El-Bachir, Anthony C. Davison
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical analysis shows that the resulting algorithm is numerically stable, faster than the best existing methods and achieves state-of-the-art accuracy. For illustration, we apply it to an important and challenging problem in the analysis of extremal data. Keywords: Automatic L2 Regularization, Empirical Bayes, Expectation-maximization Algorithm, Generalized Additive Model, Laplace Approximation, Marginal Maximum Likelihood. Section 3 assesses its performance with a simulation study. Section 4 provides a real data analysis on extreme temperatures, and Section 5 closes the paper with a discussion. |
| Researcher Affiliation | Academia | Yousra El-Bachir EMAIL Anthony C. Davison EMAIL EPFL-FSB-MATH-STAT Ecole Polytechnique F ed erale de Lausanne Station 8, CH-1015 Lausanne, Switzerland |
| Pseudocode | No | The paper describes the steps of the EM algorithm (E-step, M-step) verbally and with mathematical derivations, but does not include a distinct pseudocode block or algorithm listing. |
| Open Source Code | Yes | The proposed method is implemented in a C++ library that uses Eigen (Guennebaud et al., 2018) for matrix decompositions, is integrated into the R package multgam through the interface Rcpp Eigen (Bates and Eddelbuettel, 2013), and makes addition of further probability models straightforward. |
| Open Datasets | Yes | We analyze monthly maxima of the daily Central England Temperature (CET)1 series from January 1772 to December 2016. 1. The data can be downloaded at https://www.metoffice.gov.uk/hadobs/hadcet/data/download.html |
| Dataset Splits | No | In the simulation study, the authors generated 100 replicates of training sets but did not specify any further train/test/validation splits. For the real data analysis, the paper analyzes the entire Central England Temperature (CET) dataset, describing analysis over historical periods (e.g., 'model for 1916', 'quantiles in 2016') but does not specify formal data splits for training, validation, or testing. |
| Hardware Specification | Yes | The computations were performed on a 2.80 GHz Intel i7-7700HQ laptop using Ubuntu. |
| Software Dependencies | Yes | The corresponding R (R Core Team, 2019) package gam implements the methods... implemented in the R package mgcv gam (Wood, 2011; Wood et al., 2016)... INLA (Rue et al., 2009). The routines used from mgcv are based on Version 1.8-22, and those used from INLA are based on Version 18.07.12 run with eight threads... Stan algorithms (Carpenter et al., 2017)... through the R package brms (Burkner, 2017) Version 2.4.0... C++ library that uses Eigen (Guennebaud et al., 2018)... integrated into the R package multgam through the interface Rcpp Eigen (Bates and Eddelbuettel, 2013). |
| Experiment Setup | Yes | We fit the six models using cubic regression splines with evenly spaced knots in the input range values. We used ten basis functions for each of the smooth functions fj... using loggamma priors and the random walk parametrization of order two... In our implementation, we set ξi = 0 whenever |ξi| ϖ3/10, with ϖ the machine precision. This sets the order of the threshold to 10 5... 12 basis functions from cyclic cubic regression splines for each of the location, scale and shape parameters of the GEV model; we use ten basis functions from thin plate splines (Wood, 2003) in the location for the trend visible in Figure 3. |