reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Human-Aligned Calibration for AI-Assisted Decision Making

Authors: Nina Corvelo Benz, Manuel Rodriguez

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate our theoretical results using a dataset with real expert predictions in an AI-assisted decision making scenario comprising four different binary classiﬁcation tasks.
Researcher Affiliation	Academia	Nina L. Corvelo Benz Max Planck Institute for Software Systems ETH Zürich EMAIL Manuel Gomez Rodriguez Max Planck Institute for Software Systems EMAIL
Pseudocode	Yes	Refer to Algorithm 1 in Appendix B for a pseudocode of the algorithm.
Open Source Code	Yes	We release the code to reproduce our analysis at https://github.com/Networks-Learning/human-aligned-calibration.
Open Datasets	Yes	We experiment with the publicly available Human-AI Interactions dataset [35].
Dataset Splits	No	The paper does not provide specific percentages or counts for train/validation/test splits, nor does it reference predefined splits with citations for reproducibility. It mentions using a 'calibration set' but without detailed split information.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific programming language versions or library versions).
Experiment Setup	No	The paper describes how confidence values were discretized and binned for analysis, but it does not provide specific experimental setup details such as hyperparameters for model training, optimizer settings, or other system-level training configurations.