reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal transport-based conformal prediction

Authors: Gauthier Thurin, Kimia Nadjahi, Claire Boyer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we evaluate our method on practical regression and classification problems, illustrating its advantages in terms of (conditional) coverage and efficiency. [...] Numerical experiments. In what follows, we study a practical regression problem and compare several CP methods described above: OT-CP for forming prediction regions as in (8), a CP approach producing ellipses (ELL, Johnstone & Cox, 2021), and a simple method creating hyperrectangle (REC, Neeven & Smirnov, 2018), with the miscoverage level adjusted by the Bonferroni correction. We simulate univariate inputs X Unif([0, 2]) with responses Y R2, and we assume that we are given a pre-trained predictor ˆf(x) = (2x2, (x + 1)2), x R. [...] We also compare the methods in terms of empirical coverage on test data (Figure 2(c)) and efficiency (volume of prediction regions, Figure 2(d)).
Researcher Affiliation	Academia	1CNRS, Ecole Normale Sup erieure, Paris, France 2Laboratoire de Math ematiques d Orsay (LMO), Universit e Paris Saclay, France, and Institut universitaire de France. Correspondence to: Gauthier Thurin <EMAIL>.
Pseudocode	No	The paper describes the methodology in prose and bullet points, but does not include any explicitly labeled pseudocode, algorithm blocks, or similarly structured step-by-step procedures.
Open Source Code	Yes	The code used to produce the results in this paper can be accessed at this Git Hub repository.
Open Datasets	Yes	Next, we evaluate OT-CP+ on real datasets sourced from Mulan (Tsoumakas et al., 2011), with dataset statistics summarized in Table 1. [...] In Figure 8 and Figure 9, we present the results for a random forest on MNIST and Fashion-MNIST.
Dataset Splits	Yes	We split each dataset into training, calibration, and testing subsets (50% 25% 25% ratio) and train a random forest model as the regressor. [...] We used 25 000 data splitted in train/calibration/test with ratio 10%/45%/45%, since this is sufficient for the classifier to reach 90% accuracy and to ensure reasonable size for the test data. [...] Results in Figures 15 and 16 are averaged over 10 runs, each with 10 000 randomly chosen observations split in train/calibration/test with ratio 50%, 40%, 10%.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU, GPU models, or memory used for running the experiments.
Software Dependencies	No	In all of our experiments, optimal transport problems are solved using the network simplex method implemented in the Python Optimal Transport library (Flamary et al., 2021). [...] random forest classifier implemented with the Python library scikit-learn.
Experiment Setup	Yes	Quantile regions for α = 0.9 are constructed using n = 1000 calibration instances. [...] Both methods use a k NN step that selects 10% of the calibration set as neighbors for each test point Xtest. [...] We start by simulating data according to a Gaussian mixture model, represented in Figure 7(a) and we consider a pretrained classifier based on Quadratic Discriminant Analysis. [...] a random forest classifier implemented with the Python library scikit-learn.