Learning Personalized Decision Support Policies

Authors: Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our computational experiments explore the utility of personalization across multiple expertise profiles. ... To validate Modiste on real users (N = 80), we conduct human subject experiments, where we explore forms of support that include expert consensus, outputs from an LLM, or predictions from a classification model. ... we demonstrate how Modiste can be used to learn personalized decision support policies online on both vision and language tasks.
Researcher Affiliation Academia 1New York University 2The Alan Turing Institute 3Carnegie Mellon University 4University of Cambridge EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Learning a decision support policy 1: Input: human decision-maker h 2: Initialization: data buffer D0 = {}; human error values {br Ai,0(x; h) = 0.5 : x X, Ai A}; initial policy π1 3: for t = 1, 2, . . . , T do 4: data point (xt, yt) X Y is drawn iid from P 5: support at A is selected using policy πt 6: human makes the prediction eyt based on xt and at 7: human incurs the loss ℓ(yt, eyt) 8: update the buffer Dt Dt 1 {(xt, at, ℓ(yt, eyt))} 9: update the decision support policy: br Ai,t(x; h) Ur(br Ai,t 1(x; h), Dt), Ai A (Step 1) πt+1(x) Uπ({br Ai,t}i) (Step 2) 10: end for 11: Output: policy πalg λ πT +1
Open Source Code Yes We open-source Modiste as a tool to encourage the adoption of personalized decision support policies.
Open Datasets Yes 1. CIFAR-10 (Krizhevsky 2009), a 10-class image classification dataset; 2. MMLU (Hendrycks et al. 2020), a multi-task text-based benchmark that tests for knowledge and problem-solving ability across 57 topics in both the humanities and STEM.
Dataset Splits No The paper describes how they constructed tasks for CIFAR-3A and MMLU-2A, mentioning aspects like the number of images/questions for human interaction (100 for CIFAR-3A, 60 for MMLU-2A) or how classes were corrupted, but it does not specify any training, validation, or test dataset splits for machine learning models.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Instruct GPT3.5, text-davinci-003' for the LLM support, but it does not list any specific software libraries, frameworks, or operating systems with their version numbers that would be required to replicate the experiments.
Experiment Setup Yes Via pilot studies, we found that 100 CIFAR images or 60 MMLU questions were a reasonable number of decisions to make within 20-40 minutes (a typical time limit for an online study), which we use throughout our experiments. ... Algorithm 1: ... 2: Initialization: data buffer D0 = {}; human error values {br Ai,0(x; h) = 0.5 : x X, Ai A}; initial policy π1