Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Human-Aligned Calibration for AI-Assisted Decision Making
Authors: Nina Corvelo Benz, Manuel Rodriguez
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate our theoretical results using a dataset with real expert predictions in an AI-assisted decision making scenario comprising four different binary classification tasks. |
| Researcher Affiliation | Academia | Nina L. Corvelo Benz Max Planck Institute for Software Systems ETH Zürich EMAIL Manuel Gomez Rodriguez Max Planck Institute for Software Systems EMAIL |
| Pseudocode | Yes | Refer to Algorithm 1 in Appendix B for a pseudocode of the algorithm. |
| Open Source Code | Yes | We release the code to reproduce our analysis at https://github.com/Networks-Learning/human-aligned-calibration. |
| Open Datasets | Yes | We experiment with the publicly available Human-AI Interactions dataset [35]. |
| Dataset Splits | No | The paper does not provide specific percentages or counts for train/validation/test splits, nor does it reference predefined splits with citations for reproducibility. It mentions using a 'calibration set' but without detailed split information. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific programming language versions or library versions). |
| Experiment Setup | No | The paper describes how confidence values were discretized and binned for analysis, but it does not provide specific experimental setup details such as hyperparameters for model training, optimizer settings, or other system-level training configurations. |