reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Split Conformal Prediction and Non-Exchangeable Data

Authors: Roberto I. Oliveira, Paulo Orenstein, Thiago Ramos, João Vitor Romano

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments with both real and synthetic data, we show that our theoretical results translate to good empirical performance under non-exchangeability, e.g., for time series and spatiotemporal data.
Researcher Affiliation	Academia	Roberto I. Oliveira EMAIL IMPA, Rio de Janeiro, Brazil. Paulo Orenstein EMAIL IMPA, Rio de Janeiro, Brazil. Thiago Ramos EMAIL IMPA, Rio de Janeiro, Brazil. João Vitor Romano EMAIL IMPA, Rio de Janeiro, Brazil.
Pseudocode	No	The paper describes the methods in prose and mathematical formulations but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce all the ﬁgures and tables is available at https://github.com/jv-rv/split-conformal-nonexchangeable.
Open Datasets	Yes	measurements from the start of 1979 through the end of 2022 were retrieved via the National Oceanic and Atmospheric Administration (NOAA, 2023). NOAA, 2023. http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.temperature/.daily/, Retrieved on 2023-03-08. Minute-by-minute data is retrieved directly from Hist Data (2022) via an open-source API (Rémy, 2022). Hist Data, 2022. https://www.histdata.com/, Retrieved on 2022-01-27.
Dataset Splits	Yes	For example, Figure 4 shows how marginal coverage (7), calculated through 10 000 simulations, is aﬀected by increasing levels of dependence 1 p for three diﬀerent models (boosting, neural network and random forest) and ntrain = 1000, ncal = 500 and ntest = 1. Further, Figure 6 shows the empirical coverage for a gradient boosting model over a thousand simulations with ntrain = 1000, ncal = ntest = 15 000 and δcal = δtest = 0.005. Then, we apply online conformal prediction over a sliding window of 1000 training points, 500 calibration points and 1 single test point for the entire year of 2021.
Hardware Specification	Yes	The experiments were conducted on a server with 774 GB of RAM and two Intel Xeon Platinum 8354H processors, totalling 8 physical cores and 288 threads.
Software Dependencies	No	The paper mentions software tools like 'MAPIE' but does not provide specific version numbers for any libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	For the experiment comparing split CP and Enb PI (cf. Figure 3), split CP s underlying random forest model comprised 100 trees; mean squared error was used as the split criterion; no maximum tree depth was set, so nodes are expanded until all leaves contain less than 2 samples; all features were considered for splitting. Enb PI s random forest was exactly the same and the length of the blocks in the Enb PI s block bootstrap procedure was set to 8 with the number of resamplings set to 30. For the AR(1) experiment (cf. Figure 1) and Examples 3 and 4, gradient boosting was set to boost 100 trees with a learning rate of 0.1 and pinball loss; trees of any depth were allowed; the minimal number of data in one leaf was 20; the minimal sum hessian in one leaf was 0.001; no minimal gain to perform a split was required; no more than 31 leaves were allowed per tree; no regularization was set. The neural network consisted of three fully connected layers with Re LU activation; the number of output units were 128, 64 and 2, respectively, where the ﬁnal output of 2 units represents the low and high quantiles being estimated; Adam W with learning rate of 10 3 and weight decay of 10 6 was used; training was over 100 epochs with batches of size 64; pinball loss was used. The random forest model (quantile regression forest) comprised 10 trees; mean squared error was used as the split criterion; no maximum tree depth was set, so nodes are expanded until all leaves contain less than 2 samples; all features were considered for splitting.