Direct Prediction Set Minimization via Bilevel Conformal Classifier Training

Authors: Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Jana Doppa, Yan Yan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with 20.46% in the prediction set size and validates our theory.
Researcher Affiliation Academia 1School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA 2Department of Mathematics and Statistics, Washington State University, Pullman, Washington, USA.
Pseudocode Yes Algorithm 1 Direct Prediction Set Minimization (DPSM)
Open Source Code Yes The DPSM code is available at https://github. com/Yuanjie Sh/DPSM_code.
Open Datasets Yes We utilize the benchmark datasets CIFAR-100 (Krizhevsky et al., 2009), Caltech-101 (Fei-Fei et al., 2004), and i Naturalist (Van Horn et al., 2018), where all details are summarized in Table 2 of Appendix E.
Dataset Splits Yes Table 2. Description of the data sets are given in the table. The number of classes in the i Naturalist data set depends on the taxonomy level (e.g., species, genus, family). We employ Fungi species which has 341 different categories. Data Number of Classes Number of Training Data Number of Validation Data Number of Calibration Data Number of Test Data CIFAR-100 100 45000 5000 3000 7000 Caltech-101 101 4310 1256 1111 2000 i Naturalist 341* 15345 1705 1410 2000
Hardware Specification No The paper mentions deep models and neural network architectures but does not specify the hardware used to run experiments. There are no mentions of specific GPU models, CPU models, TPUs, or detailed cloud instance specifications.
Software Dependencies No The paper mentions using specific architectures like Res Net and Dense Net, and the SGD optimizer. However, it does not provide version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages, which are necessary for reproducibility.
Experiment Setup Yes Table 3. The below table shows the details we used to train our models. We reported the hyperparameters which gives the best predictive efficiency. We employed SGD optimizer for all training unless specified. Data Architecture Batch size Epochs η lr schedule Momentum weight decay γ λ CIFAR-100 Dense Net 64 40 0.1 25 0.9 0.1 0.01 0.05 Res Net 128 40 0.1 25 0.9 0.1 0.01 0.01 Caltech-101 Dense Net 128 60 0.05 25, 40 0.9 0.1 0.1 1.0 Res Net 128 60 0.05 25, 40 0.9 0.1 0.05 0.1 i Naturalist Dense Net 128 60 0.001 3 0.9 0.97 0.001 1.0 Res Net 128 60 0.001 3 0.9 0.97 0.001 0.5