reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Achieving Risk Control in Online Learning Settings

Authors: Shai Feldman, Liran Ringel, Stephen Bates, Yaniv Romano

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the utility of our method, we conduct experiments on real-world tabular time-series data sets showing that the proposed method rigorously controls various natural risks. Furthermore, we show how to construct valid intervals for an online image-depth estimation problem that previous sequential calibration schemes cannot handle.
Researcher Affiliation	Academia	Shai Feldman EMAIL Department of Computer Science Technion, Israel Liran Ringel EMAIL Department of Computer Science Technion, Israel Stephen Bates EMAIL Departments of Statistics and of EECS UC Berkeley Yaniv Romano EMAIL Departments of Electrical and Computer Engineering Computer Science Technion, Israel
Pseudocode	Yes	Algorithm 1 Rolling RC Input: Data {(Xt, Yt)}T t=1 X Y, given as a stream, desired risk level r R, a step size γ > 0, a set constructing function f : (X, R, M) 2Y and an online learning model M. Process: 1: Initialize θ0 = 0. 2: for t = 1, ..., T do 3: Construct a prediction set for the new point Xt: ˆCt(Xt) = f(Xt, θt, Mt). 4: Obtain Yt. 5: Compute lt = L(Yt, ˆCt(Xt)). 6: Update θt+1 = θt + γ(lt r). 7: Fit the model Mt on (Xt, Yt) and obtain the updated model Mt+1. 8: end for Output: Uncertainty sets ˆCt(Xt) for each time step t {1, ...T}.
Open Source Code	Yes	Software implementing the proposed framework and reproducing our experiments is available at https://github.com/Shai128/rrc.
Open Datasets	Yes	We test the performance of Rolling RC on five real-world benchmark data sets with a 1-dimensional Y: Power, Energy, Traffic, Wind, and Prices. We apply Rolling RC on the KITTI benchmark (Geiger et al., 2013) in which the task is to estimate a depth map given an RGB image. Power. Power consumption of tetouan city. https://archive.ics.uci.edu/ml/datasets/Power+ consumption+of+Tetouan+city. Accessed: April, 2021. Energy. Appliances energy prediction. https://archive.ics.uci.edu/ml/datasets/Appliances+energy+ prediction. Accessed: April, 2021. Traffic. Metro interstate traffic volume. https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+ Traffic+Volume. Accessed: April, 2021. Wind. Wind power in germany. https://www.kaggle.com/datasets/l3llff/wind-power. Accessed: April, 2021. Prices. French electricity spot prices. https://github.com/mzaffran/ Adaptive Conformal Predictions Time Series/blob/main/data_prices/Prices_2016_2019_extract. csv. Accessed: April, 2021.
Dataset Splits	Yes	We commence by fitting an initial quantile regression model on the first 12000 data points, to obtain a reasonable predictive system. Then, passing time step 12001, we start applying the calibration procedure while continuing to fit the model in an online fashion. Next, we choose the calibration s hyperparameters based on the validation set, indexed by 12001-16000. Lastly, we measure the performance of the deployed calibration method on data points corresponding to time steps 16001 to 20000.
Hardware Specification	Yes	The resources used for the experiments are: CPU: Intel(R) Xeon(R) E5-2650 v4. GPU: Nvidia titanx, 1080ti, 2080ti. OS: Ubuntu 18.04.
Software Dependencies	No	The model s optimizer is Adam (Kingma & Ba, 2015). We used scikit-image s implementation of optical flow. The calibrations hyperparameter configurations were tuned using SMAC3 library (Lindauer et al., 2022)
Experiment Setup	Yes	The neural network architecture is composed of four parts: an MLP, an LSTM, and another two MLPs. The networks contain dropout layers with a parameter equal to 0.1. The model s optimizer is Adam (Kingma & Ba, 2015) and the batch size is 512, i.e., the model is fitted on the most recent 512 samples in each time step. Before forwarding the input to the model, the feature vectors and response variables were normalized to have unit variance and zero mean using the first 8000 samples of the data stream. Table 1: Hyperparameters tested for the learning model Parameter Options f1 LSTM input layers [32], [32, 64], [32, 64, 128] f2 -LSTM layers [64], [128] f3 -LSTM output layers [32], [64, 32] learning rate 10 4, 5 10 4 The updating rates we tested for the calibration schemes are γ {0.005, 0.01, 0.05, 0.1, 0.2, 0.5, 1, 2, 10}.