reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-centric Machine Learning Research (DMLR) - 2024

Website:

Venue	Year	Papers	Reproducibility Score Reproducibility Score based on Gundersen et al. (2025)	Documentation Score Global mean is the average score over the seven reproducibility variables for empirical research papers.	% Empirical Percentage of papers that are empirical research vs theoretical research	% Industry Percentage of empirical research papers with at least one author from Industry	Website
DMLR	2024	27	0.71	4.4	92.59%	56.0%

Search Papers

	Pseudocode	Open Source Code	Open Datasets	Dataset Splits	Hardware Specification	Software Dependencies	Experiment Setup
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications	❌	✅	✅	✅	✅	❌	✅	5
Benchmarking Edge Regression on Temporal Networks	❌	❌	✅	✅	✅	❌	✅	4
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift	❌	✅	✅	✅	❌	❌	✅	4
Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators	❌	❌	❌	❌	❌	❌	❌	0
ComPile: A Large IR Dataset from Production Sources	❌	✅	✅	❌	❌	✅	✅	4
DMLR: Data-centric Machine Learning Research - Past, Present and Future	❌	❌	✅	❌	❌	❌	❌	1
Datasets and Benchmarks for Offline Safe Reinforcement Learning	❌	✅	✅	❌	✅	❌	✅	4
Deep Neural Network Benchmarks for Selective Classification	❌	✅	✅	✅	✅	❌	✅	5
Detecting Errors in a Numerical Response via any Regression Model	✅	✅	✅	✅	❌	❌	❌	4
Evaluating Durability: Benchmark Insights into Image and Text Watermarking	❌	✅	✅	✅	✅	❌	✅	5
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things	❌	✅	✅	✅	✅	❌	✅	5
Forecasting Electric Vehicle Charging Station Occupancy: Smarter Mobility Data Challenge	❌	✅	✅	✅	❌	❌	✅	4
GlycoNMR: Dataset and Benchmark of Carbohydrate-Specific NMR Chemical Shift for Machine Learning Research	❌	✅	✅	✅	✅	❌	✅	5
Highlighting Challenges of State-of-the-Art Semantic Segmentation with HAIR - A Dataset of Historical Aerial Images	❌	✅	✅	✅	✅	❌	✅	5
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning	❌	✅	✅	✅	✅	❌	✅	5
NAFlora-1M: Continental-Scale High-Resolution Fine-Grained Plant Classification Dataset	❌	✅	✅	✅	✅	❌	✅	5
On Catastrophic Inheritance of Large Foundation Models	❌	❌	❌	❌	❌	❌	❌	0
On Minimizing the Training Set Fill Distance in Machine Learning Regression	✅	✅	✅	✅	✅	❌	✅	6
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection	❌	✅	✅	✅	✅	❌	✅	5
Potion: Towards Poison Unlearning	✅	❌	✅	✅	✅	❌	✅	5
Properties of Alternative Data for Fairer Credit Risk Predictions	❌	❌	❌	✅	❌	✅	✅	3
Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery	❌	✅	✅	✅	✅	❌	✅	5
The Matrix Reloaded: Towards Counterfactual Group Fairness in Machine Learning	✅	❌	✅	✅	✅	❌	✅	5
The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism	❌	❌	✅	❌	❌	❌	❌	1
VALUED - Vision and Logical Understanding Evaluation Dataset	❌	✅	✅	✅	✅	❌	✅	5
When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective	✅	✅	✅	✅	✅	❌	✅	6
You can't handle the (dirty) truth: Data-centric Insights Improve Pseudo-Labeling	✅	✅	✅	✅	✅	❌	❌	5