Wasserstein-Regularized Conformal Prediction under General Distribution Shift

Authors: Rui Xu, Chao Chen, Yue Sun, Parvathinathan Venkitasubramaniam, Sihong Xie

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on six datasets prove that WR-CP can reduce coverage gaps to 3.2% across different confidence levels and outputs prediction sets 37% smaller than the worst-case approach on average. ... Experiments were conducted on six datasets: (a) the airfoil self-noise dataset (Brooks & Marcolini, 2014); (b) Seattle-loop (Cui et al., 2019), Pe MSD4, Pe MSD8 (Guo et al., 2019) for traffic speed prediction; (c) Japan-Prefectures, and U.S.-States (Deng et al., 2020) for epidemic spread forecasting.
Researcher Affiliation Academia Rui Xu, Sihong Xie The Hong Kong University of Science and Technology (Guangzhou) EMAIL, EMAIL Chao Chen Harbin Institute of Technology EMAIL Yue Sun, Parvathinathan Venkitasubramaniam Lehigh University EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Wasserstein-regularized Conformal Prediction (WR-CP)
Open Source Code Yes The code of our work is released on https://github.com/rxu0112/WR-CP.
Open Datasets Yes Experiments were conducted on six datasets: (a) the airfoil self-noise dataset (Brooks & Marcolini, 2014); (b) Seattle-loop (Cui et al., 2019), Pe MSD4, Pe MSD8 (Guo et al., 2019) for traffic speed prediction; (c) Japan-Prefectures, and U.S.-States (Deng et al., 2020) for epidemic spread forecasting. ... The airfoil self-noise dataset from the UCI Machine Learning Repository (Brooks & Marcolini, 2014). DOI: https://doi.org/10.24432/C5VW2C.
Dataset Splits No We conducted 10 sampling trials for each dataset. Within each trails, we sampled S(i) XY from each subset i, for i = 1, ..., k. After this step, we allocated the remaining elements within each subset for calibration and testing purposes. The parts intended for calibration across all subsets were then unified to form SP XY . Lastly, to create diverse testing scenarios, we generated multiple test sets by randomly mixing the parts designated for testing from each subset with replacement.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions using an MLP model.
Software Dependencies No To find the optimized bandwidth value of ˆPX and ˆD(i) X for i = 1, ..., k on each dataset, we applied the grid search method with a bandwidth pool using scikit-learn package (Pedregosa et al., 2011).
Experiment Setup Yes A multi-layer perceptron (MLP) with an architecture of (input dimension, 64, 64, 1) was utilized in all experimental setups to maintain comparison fairness. ... The β values for the WR-CP method are 9, 11, 9, 10, 13, and 13, respectively. ... The β values for the WR-CP method are 4.5, 9, 9, 6, 8, and 20, respectively. ... The selected β values for the results of Figure 5 are shown in Table 2.