Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Human-Aligned Calibration for AI-Assisted Decision Making

Authors: Nina Corvelo Benz, Manuel Rodriguez

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we validate our theoretical results using a dataset with real expert predictions in an AI-assisted decision making scenario comprising four different binary classification tasks.
Researcher Affiliation Academia Nina L. Corvelo Benz Max Planck Institute for Software Systems ETH Zürich EMAIL Manuel Gomez Rodriguez Max Planck Institute for Software Systems EMAIL
Pseudocode Yes Refer to Algorithm 1 in Appendix B for a pseudocode of the algorithm.
Open Source Code Yes We release the code to reproduce our analysis at https://github.com/Networks-Learning/human-aligned-calibration.
Open Datasets Yes We experiment with the publicly available Human-AI Interactions dataset [35].
Dataset Splits No The paper does not provide specific percentages or counts for train/validation/test splits, nor does it reference predefined splits with citations for reproducibility. It mentions using a 'calibration set' but without detailed split information.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific programming language versions or library versions).
Experiment Setup No The paper describes how confidence values were discretized and binned for analysis, but it does not provide specific experimental setup details such as hyperparameters for model training, optimizer settings, or other system-level training configurations.