Algorithms with Calibrated Machine Learning Predictions

Authors: Judy Hanwen Shen, Ellen Vitercik, Anders Wikum

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on real-world data validate our theoretical findings, highlighting the practical impact of calibration for algorithms with predictions. ... We validate our theoretical findings with strong empirical results on real-world data, highlighting the practical benefits of our approach. ... We now evaluate our algorithms on two real-world datasets, demonstrating the utility of using calibrated predictions.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University, Stanford, CA, USA 2Department of Management Science & Engineering, Stanford University, Stanford, CA, USA. Correspondence to: Anders Wikum <EMAIL>.
Pseudocode Yes Algorithm 1 Ak ... Algorithm 2 (Sun et al., 2024) Optimal ski rental with conformal predictions ... Algorithm 3 β-threshold rule
Open Source Code Yes 1Code and data available here: https://github.com/heyyjudes/algs-cali-pred
Open Datasets Yes To model the rent-or-buy scenario in the ski rental problem, we use publicly available Citi Bike usage data.2. ... 2Monthly usage data is publicly available at https://citibikenyc.com/system-data. ... We use a real-world dataset for sepsis prediction to validate our theory results for scheduling with calibrated predictions. Sepsis Survival Minimal Clinical Records . 4 This dataset contains three characteristics: age, sex, and number of sepsis episodes. ... 4https://archive.ics.uci.edu/dataset/827/sepsis+survival+minimal+clinical+records
Dataset Splits No The paper mentions using a 'validation set' for calibration in Appendix C: 'A key intervention we make for calibration is to calibrate according to balanced classes in the validation set when the label distribution is highly skewed.' However, specific details on the size or methodology of how training, validation, and test splits were performed (e.g., exact percentages or sample counts) are not provided in the main text or appendix.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions various software components like 'XGBoost', 'logistic regression', 'multi-layer perceptrons', 'Linear Regression', 'Bayesian Ridge Regression', 'SGD Regressor', 'Elastic Net', and calibration methods like 'histogram calibration', 'binned calibration', 'Platt scaling'. However, it does not specify any version numbers for these software dependencies, which are necessary for replication.
Experiment Setup No The paper provides some model architecture details, such as 'a small MLP with two hidden layers of size 8 and 2', and lists features used for training. However, it does not provide concrete hyperparameter values for training, such as learning rates, batch sizes, number of epochs, or optimizer settings for any of the machine learning models used in the experiments.