Split Conformal Prediction and Non-Exchangeable Data
Authors: Roberto I. Oliveira, Paulo Orenstein, Thiago Ramos, João Vitor Romano
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments with both real and synthetic data, we show that our theoretical results translate to good empirical performance under non-exchangeability, e.g., for time series and spatiotemporal data. |
| Researcher Affiliation | Academia | Roberto I. Oliveira EMAIL IMPA, Rio de Janeiro, Brazil. Paulo Orenstein EMAIL IMPA, Rio de Janeiro, Brazil. Thiago Ramos EMAIL IMPA, Rio de Janeiro, Brazil. João Vitor Romano EMAIL IMPA, Rio de Janeiro, Brazil. |
| Pseudocode | No | The paper describes the methods in prose and mathematical formulations but does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce all the figures and tables is available at https://github.com/jv-rv/split-conformal-nonexchangeable. |
| Open Datasets | Yes | measurements from the start of 1979 through the end of 2022 were retrieved via the National Oceanic and Atmospheric Administration (NOAA, 2023). NOAA, 2023. http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.temperature/.daily/, Retrieved on 2023-03-08. Minute-by-minute data is retrieved directly from Hist Data (2022) via an open-source API (Rémy, 2022). Hist Data, 2022. https://www.histdata.com/, Retrieved on 2022-01-27. |
| Dataset Splits | Yes | For example, Figure 4 shows how marginal coverage (7), calculated through 10 000 simulations, is affected by increasing levels of dependence 1 p for three different models (boosting, neural network and random forest) and ntrain = 1000, ncal = 500 and ntest = 1. Further, Figure 6 shows the empirical coverage for a gradient boosting model over a thousand simulations with ntrain = 1000, ncal = ntest = 15 000 and δcal = δtest = 0.005. Then, we apply online conformal prediction over a sliding window of 1000 training points, 500 calibration points and 1 single test point for the entire year of 2021. |
| Hardware Specification | Yes | The experiments were conducted on a server with 774 GB of RAM and two Intel Xeon Platinum 8354H processors, totalling 8 physical cores and 288 threads. |
| Software Dependencies | No | The paper mentions software tools like 'MAPIE' but does not provide specific version numbers for any libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | For the experiment comparing split CP and Enb PI (cf. Figure 3), split CP s underlying random forest model comprised 100 trees; mean squared error was used as the split criterion; no maximum tree depth was set, so nodes are expanded until all leaves contain less than 2 samples; all features were considered for splitting. Enb PI s random forest was exactly the same and the length of the blocks in the Enb PI s block bootstrap procedure was set to 8 with the number of resamplings set to 30. For the AR(1) experiment (cf. Figure 1) and Examples 3 and 4, gradient boosting was set to boost 100 trees with a learning rate of 0.1 and pinball loss; trees of any depth were allowed; the minimal number of data in one leaf was 20; the minimal sum hessian in one leaf was 0.001; no minimal gain to perform a split was required; no more than 31 leaves were allowed per tree; no regularization was set. The neural network consisted of three fully connected layers with Re LU activation; the number of output units were 128, 64 and 2, respectively, where the final output of 2 units represents the low and high quantiles being estimated; Adam W with learning rate of 10 3 and weight decay of 10 6 was used; training was over 100 epochs with batches of size 64; pinball loss was used. The random forest model (quantile regression forest) comprised 10 trees; mean squared error was used as the split criterion; no maximum tree depth was set, so nodes are expanded until all leaves contain less than 2 samples; all features were considered for splitting. |