Data-centric Machine Learning Research (DMLR) - 2025

Website:

Venue Year Papers
Repro. Score Reproducibility Score based on Gundersen et al. (2025)
Doc. Mean Doc. Median Dataset Doc. Code Doc. Other Doc. % Empirical % Industry Website
DMLR 2025 13 0.76 4.55 5.0 1.82 0.82 1.91 84.62% 18.18%
Pseudocode
Open Source Code
Open Datasets
Dataset Splits
Hardware Specification
Software Dependencies
Experiment Setup
Challenge design roadmap 4
Chronicling Germany: An Annotated Historical Newspaper Dataset 5
Constructing Confidence Intervals for “the” Generalization Error – a Comprehensive Benchmark Study 7
Data Acquisition: A New Frontier in Data-centric AI 2
Deep Learning for Accurate Diagnosis of Viral Infections through scRNA-seq Analysis: A Comprehensive Benchmark Study 1
FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries 5
MONSTER: Monash Scalable Time Series Evaluation Repository 5
SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning 5
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs 6
Text Quality-Based Pruning for Efficient Training of Language Models 3
The FIX Benchmark: Extracting Features Interpretable to eXperts 4
Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions 0
V-LoL: A Diagnostic Dataset for Visual Logical Learning 7