reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Laplace Sample Information: Data Informativeness Through a Bayesian Lens

Authors: Johannes Kaiser, Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and unsupervised settings.
Researcher Affiliation	Academia	Johannes Kaiser & Kristian Schwethelm & Daniel Rückert & Georgios Kaissis AI in Healthcare and Medicine; Munich Center for Machine Learning (MCML) Technical University of Munich, Germany EMAIL
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We provide the code as well as the pre-computed values of LSI for common datasets under github.com/TUM-AIMED/LSI.
Open Datasets	Yes	We demonstrate the utility of LSI with experiments on supervised image tasks using CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), a medical imaging dataset (pediatric pneumonia, i.e. lung infection in children), (Kermany et al., 2018) and two ten-class subsets of the Image Net dataset (Deng et al., 2009), Imagewoof and Imagenette (Howard, 2019)... Beyond image classification, we show the applicability of LSI in a text sentiment analysis task on the IMDb dataset (Maas et al., 2011)... Moreover, we compute LSI in unsupervised contrastive learning using CLIP (Radford et al., 2021) between image and caption pairs of the COCO dataset (Lin et al., 2015).
Dataset Splits	Yes	To this end, we partition each dataset into label-stratified subsets of 1/3 of the full dataset size... Figure 8: Training accuracy (left) and test accuracy (right) of models trained on subsets of CIFAR-10 containing 1/3 of the complete dataset with the highest, intermediate, and lowest LSI values compared to the dummy baseline of a model predicting the majority class.
Hardware Specification	Yes	All experiments described in this paper and its appendix are performed on an NVidia H100 GPU (80GB VRAM) with 2 AMD EPYC 9354 32-Core CPUs.
Software Dependencies	No	The paper mentions training parameters and model architectures but does not explicitly list software libraries with specific version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	All training (hyper-)parameters are provided in Appendix N... Table 2: Training parameters for each of the experiments Parameter LSI-Distribution Sample Difficulty/ Generalization LSI under Differential Privacy Learning rate 0.04 0.04 0.04 Weight decay (L2) 0.01 0.01 0.01 Nesterov Momentum 0.9 0.9 0.9 Dataset Full 1/3 Subsets 1/5 Subsets Epochs/ Steps 1000 1000 700 Averaged across n Seeds 3 3 5