Direct Prediction Set Minimization via Bilevel Conformal Classifier Training
Authors: Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Jana Doppa, Yan Yan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with 20.46% in the prediction set size and validates our theory. |
| Researcher Affiliation | Academia | 1School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, USA 2Department of Mathematics and Statistics, Washington State University, Pullman, Washington, USA. |
| Pseudocode | Yes | Algorithm 1 Direct Prediction Set Minimization (DPSM) |
| Open Source Code | Yes | The DPSM code is available at https://github. com/Yuanjie Sh/DPSM_code. |
| Open Datasets | Yes | We utilize the benchmark datasets CIFAR-100 (Krizhevsky et al., 2009), Caltech-101 (Fei-Fei et al., 2004), and i Naturalist (Van Horn et al., 2018), where all details are summarized in Table 2 of Appendix E. |
| Dataset Splits | Yes | Table 2. Description of the data sets are given in the table. The number of classes in the i Naturalist data set depends on the taxonomy level (e.g., species, genus, family). We employ Fungi species which has 341 different categories. Data Number of Classes Number of Training Data Number of Validation Data Number of Calibration Data Number of Test Data CIFAR-100 100 45000 5000 3000 7000 Caltech-101 101 4310 1256 1111 2000 i Naturalist 341* 15345 1705 1410 2000 |
| Hardware Specification | No | The paper mentions deep models and neural network architectures but does not specify the hardware used to run experiments. There are no mentions of specific GPU models, CPU models, TPUs, or detailed cloud instance specifications. |
| Software Dependencies | No | The paper mentions using specific architectures like Res Net and Dense Net, and the SGD optimizer. However, it does not provide version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages, which are necessary for reproducibility. |
| Experiment Setup | Yes | Table 3. The below table shows the details we used to train our models. We reported the hyperparameters which gives the best predictive efficiency. We employed SGD optimizer for all training unless specified. Data Architecture Batch size Epochs η lr schedule Momentum weight decay γ λ CIFAR-100 Dense Net 64 40 0.1 25 0.9 0.1 0.01 0.05 Res Net 128 40 0.1 25 0.9 0.1 0.01 0.01 Caltech-101 Dense Net 128 60 0.05 25, 40 0.9 0.1 0.1 1.0 Res Net 128 60 0.05 25, 40 0.9 0.1 0.05 0.1 i Naturalist Dense Net 128 60 0.001 3 0.9 0.97 0.001 1.0 Res Net 128 60 0.001 3 0.9 0.97 0.001 0.5 |