reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Parametric Scaling Law of Tuning Bias in Conformal Prediction

Authors: Hao Zeng, Kangdao Liu, Bingyi Jing, Hongxin Wei

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we empirically find that the tuning bias the coverage gap introduced by leveraging the same dataset for tuning and calibration, is negligible for simple parameter tuning in many conformal prediction methods. In particular, we observe the scaling law of the tuning bias: this bias increases with parameter space complexity and decreases with calibration set size. ... We conduct an additional analysis from two key perspectives: (1) the number of parameters and (2) the size of the calibration set. The experiments provide deeper insights into how tuning methods influence bias formation. ... In summary, the empirical results reveal that the tuning bias is negligible for simple parameter tuning and scales up with the complexity of parameter space and down with the size of the calibration set.
Researcher Affiliation	Academia	1Department of Statistics and Data Science, Southern University of Science and Technology 2Department of Computer and Information Science, University of Macau . Correspondence to: Hongxin Wei <EMAIL>.
Pseudocode	No	The paper describes methodologies and theoretical results using mathematical equations and definitions but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	Code: https://github.com/ml-stat-Sustech/Parametric-Scaling-Law-CP-Tuning.
Open Datasets	Yes	We conduct experiments on the CIFAR-100 dataset (Krizhevsky, 2009) and use pre-trained model Res Net-18 (He et al., 2016). ... For CIFAR-10, we use a simple CNN ... For CIFAR-100 and Image Net, we employ Res Net-18 (He et al., 2016). ... The experiments are performed on the Protein Structure dataset obtained from the UCI repository, which comprises N = 45730 data points with feature dimension d = 9.
Dataset Splits	Yes	The size of the calibration set considered here is 1000. ... The calibration set size for Figure (a) is 6000. ... The calibration set size varies from 6000 to 10000 with an increment of 1000. ... For the first experiment, the size of calibration set in the Shared strategy is fixed at n = 50. ... For the Split strategy, 50 data points are used for model selection, and a separate set of 50 data points is used for calibration, amounting to 100 reserved data points in total. ... The values investigated for both strategies are n {100, 200, 300, 400, 500}.
Hardware Specification	No	The paper mentions using 'Res Net-18' and 'pre-trained classifiers available in Torch Vision', but it does not specify any concrete hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'Torch Vision (Paszke et al., 2019)', 'Adam optimizer (Kingma & Ba, 2017)', and 'Python s quantile forest package'. However, it does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	For CIFAR-100, the network is trained for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 256. The initial learning rate is set to 0.1 and decreases by a factor of 5 at epochs 60, 120, and 160. Similarly, for CIFAR-10, the network is trained for 120 epochs using SGD with identical momentum, weight decay, and batch size settings. The initial learning rate is also set to 0.1 and decreases by a factor of 5 at epochs 30, 60, and 90. ... For tuning the C-Adapter, we use the Adam optimizer (Kingma & Ba, 2017) with a batch size of 256 and a learning rate of 0.1. The model is tuned for 10 epochs, and the only parameter, T, is set to 1 10 4 by default.