Data-centric Machine Learning Research (DMLR) - 2024

Website:

Venue Year Papers
Repro. Score Reproducibility Score based on Gundersen et al. (2025)
Doc. Mean Doc. Median Dataset Doc. Code Doc. Other Doc. % Empirical % Industry Website
DMLR 2024 27 0.71 4.4 5.0 1.76 0.76 1.88 92.59% 56.0%
Pseudocode
Open Source Code
Open Datasets
Dataset Splits
Hardware Specification
Software Dependencies
Experiment Setup
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications 5
Benchmarking Edge Regression on Temporal Networks 4
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift 4
Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators 0
ComPile: A Large IR Dataset from Production Sources 4
DMLR: Data-centric Machine Learning Research - Past, Present and Future 1
Datasets and Benchmarks for Offline Safe Reinforcement Learning 4
Deep Neural Network Benchmarks for Selective Classification 5
Detecting Errors in a Numerical Response via any Regression Model 4
Evaluating Durability: Benchmark Insights into Image and Text Watermarking 5
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things 5
Forecasting Electric Vehicle Charging Station Occupancy: Smarter Mobility Data Challenge 4
GlycoNMR: Dataset and Benchmark of Carbohydrate-Specific NMR Chemical Shift for Machine Learning Research 5
Highlighting Challenges of State-of-the-Art Semantic Segmentation with HAIR - A Dataset of Historical Aerial Images 5
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning 5
NAFlora-1M: Continental-Scale High-Resolution Fine-Grained Plant Classification Dataset 5
On Catastrophic Inheritance of Large Foundation Models 0
On Minimizing the Training Set Fill Distance in Machine Learning Regression 6
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection 5
Potion: Towards Poison Unlearning 5
Properties of Alternative Data for Fairer Credit Risk Predictions 3
Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery 5
The Matrix Reloaded: Towards Counterfactual Group Fairness in Machine Learning 5
The Nine Lives of ImageNet: A Sociotechnical Retrospective of a Foundation Dataset and the Limits of Automated Essentialism 1
VALUED - Vision and Logical Understanding Evaluation Dataset 5
When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective 6
You can't handle the (dirty) truth: Data-centric Insights Improve Pseudo-Labeling 5