Parametric Scaling Law of Tuning Bias in Conformal Prediction
Authors: Hao Zeng, Kangdao Liu, Bingyi Jing, Hongxin Wei
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we empirically find that the tuning bias the coverage gap introduced by leveraging the same dataset for tuning and calibration, is negligible for simple parameter tuning in many conformal prediction methods. In particular, we observe the scaling law of the tuning bias: this bias increases with parameter space complexity and decreases with calibration set size. ... We conduct an additional analysis from two key perspectives: (1) the number of parameters and (2) the size of the calibration set. The experiments provide deeper insights into how tuning methods influence bias formation. ... In summary, the empirical results reveal that the tuning bias is negligible for simple parameter tuning and scales up with the complexity of parameter space and down with the size of the calibration set. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Southern University of Science and Technology 2Department of Computer and Information Science, University of Macau . Correspondence to: Hongxin Wei <EMAIL>. |
| Pseudocode | No | The paper describes methodologies and theoretical results using mathematical equations and definitions but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Code: https://github.com/ml-stat-Sustech/Parametric-Scaling-Law-CP-Tuning. |
| Open Datasets | Yes | We conduct experiments on the CIFAR-100 dataset (Krizhevsky, 2009) and use pre-trained model Res Net-18 (He et al., 2016). ... For CIFAR-10, we use a simple CNN ... For CIFAR-100 and Image Net, we employ Res Net-18 (He et al., 2016). ... The experiments are performed on the Protein Structure dataset obtained from the UCI repository, which comprises N = 45730 data points with feature dimension d = 9. |
| Dataset Splits | Yes | The size of the calibration set considered here is 1000. ... The calibration set size for Figure (a) is 6000. ... The calibration set size varies from 6000 to 10000 with an increment of 1000. ... For the first experiment, the size of calibration set in the Shared strategy is fixed at n = 50. ... For the Split strategy, 50 data points are used for model selection, and a separate set of 50 data points is used for calibration, amounting to 100 reserved data points in total. ... The values investigated for both strategies are n {100, 200, 300, 400, 500}. |
| Hardware Specification | No | The paper mentions using 'Res Net-18' and 'pre-trained classifiers available in Torch Vision', but it does not specify any concrete hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Torch Vision (Paszke et al., 2019)', 'Adam optimizer (Kingma & Ba, 2017)', and 'Python s quantile forest package'. However, it does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For CIFAR-100, the network is trained for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 256. The initial learning rate is set to 0.1 and decreases by a factor of 5 at epochs 60, 120, and 160. Similarly, for CIFAR-10, the network is trained for 120 epochs using SGD with identical momentum, weight decay, and batch size settings. The initial learning rate is also set to 0.1 and decreases by a factor of 5 at epochs 30, 60, and 90. ... For tuning the C-Adapter, we use the Adam optimizer (Kingma & Ba, 2017) with a batch size of 256 and a learning rate of 0.1. The model is tuned for 10 epochs, and the only parameter, T, is set to 1 10 4 by default. |