reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple Calibration via Geodesic Kernels

Authors: Jayanta Dey, Haoyin Xu, Ashwin De Silva, Joshua T Vogelstein

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on both tabular and vision benchmarks show that the proposed approaches, namely Kernel Density Forest (KDF) and Kernel Density Network (KDN), obtain well-calibrated posteriors for both ID and OOD samples, while mostly preserving the classification accuracy and extrapolating beyond the training data to handle OOD inputs appropriately.
Researcher Affiliation	Academia	Jayanta Dey EMAIL Department of Biomedical Engineering Johns Hopkins University Haoyin Xu EMAIL Department of Biomedical Engineering Johns Hopkins University Ashwin De Silva EMAIL Department of Biomedical Engineering Johns Hopkins University Joshua T. Vogelstein EMAIL Department of Biomedical Engineering Johns Hopkins University
Pseudocode	Yes	Algorithm 1 Computing Geodesic Kernel
Open Source Code	Yes	Our code, including the package and the approach proposed in this manuscript, is available from https://github.com/neurodata/kdg.
Open Datasets	Yes	Our experiments on both tabular and vision benchmarks show that the proposed approaches, namely Kernel Density Forest (KDF) and Kernel Density Network (KDN), obtain well-calibrated posteriors for both ID and OOD samples, while mostly preserving the classification accuracy and extrapolating beyond the training data to handle OOD inputs appropriately. [...] We conduct experiments on a two-dimensional Gaussian XOR simulation (described in Appendix D) and the 784-dimensional Fashion-MNIST dataset (from Open ML-CC18 (Bischl et al., 2017)) using fully-connected networks of varying depth and width. [...] We experiment with popular benchmark datasets CIFAR10, CIFAR-100 and SVHN.
Dataset Splits	Yes	For each approach, 70% of the training data was used to fit the model and the rest of the data was used to calibrate the model. [...] For simulation study on tabular data, we use 6 simulation datasets. Three of the simulations are visualized in Figure 3A and see Appendix D for additional simulations and details. We sample 10,000 training samples with half of the samples from each class.
Hardware Specification	Yes	All the computations were performed for producing the results in Table 1 using a Mac Book Pro with an Apple M1 Max chip and 64 GB of RAM.
Software Dependencies	Yes	Software: Python 3.8, scikit-learn 0.22.0, tensorflow-macos 2.9, tensorflow-metal 0.5.0.
Experiment Setup	Yes	Table 3: Hyperparameters for RF and KDF. Hyperparameters Value n_estimators 500 max_depth min_samples_leaf 1 λ 1 10 6 b exp ( 10 7) Table 4: Hyperparameters for Re LU-net and KDN on Tabular data. Hyperparameters Value number of hidden layers 4 nodes per hidden layer 1000 optimizer Adam learning rate 3 10 4 b exp ( 10 7)