Learning Personalized Decision Support Policies
Authors: Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our computational experiments explore the utility of personalization across multiple expertise profiles. ... To validate Modiste on real users (N = 80), we conduct human subject experiments, where we explore forms of support that include expert consensus, outputs from an LLM, or predictions from a classification model. ... we demonstrate how Modiste can be used to learn personalized decision support policies online on both vision and language tasks. |
| Researcher Affiliation | Academia | 1New York University 2The Alan Turing Institute 3Carnegie Mellon University 4University of Cambridge EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Learning a decision support policy 1: Input: human decision-maker h 2: Initialization: data buffer D0 = {}; human error values {br Ai,0(x; h) = 0.5 : x X, Ai A}; initial policy π1 3: for t = 1, 2, . . . , T do 4: data point (xt, yt) X Y is drawn iid from P 5: support at A is selected using policy πt 6: human makes the prediction eyt based on xt and at 7: human incurs the loss ℓ(yt, eyt) 8: update the buffer Dt Dt 1 {(xt, at, ℓ(yt, eyt))} 9: update the decision support policy: br Ai,t(x; h) Ur(br Ai,t 1(x; h), Dt), Ai A (Step 1) πt+1(x) Uπ({br Ai,t}i) (Step 2) 10: end for 11: Output: policy πalg λ πT +1 |
| Open Source Code | Yes | We open-source Modiste as a tool to encourage the adoption of personalized decision support policies. |
| Open Datasets | Yes | 1. CIFAR-10 (Krizhevsky 2009), a 10-class image classification dataset; 2. MMLU (Hendrycks et al. 2020), a multi-task text-based benchmark that tests for knowledge and problem-solving ability across 57 topics in both the humanities and STEM. |
| Dataset Splits | No | The paper describes how they constructed tasks for CIFAR-3A and MMLU-2A, mentioning aspects like the number of images/questions for human interaction (100 for CIFAR-3A, 60 for MMLU-2A) or how classes were corrupted, but it does not specify any training, validation, or test dataset splits for machine learning models. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Instruct GPT3.5, text-davinci-003' for the LLM support, but it does not list any specific software libraries, frameworks, or operating systems with their version numbers that would be required to replicate the experiments. |
| Experiment Setup | Yes | Via pilot studies, we found that 100 CIFAR images or 60 MMLU questions were a reasonable number of decisions to make within 20-40 minutes (a typical time limit for an online study), which we use throughout our experiments. ... Algorithm 1: ... 2: Initialization: data buffer D0 = {}; human error values {br Ai,0(x; h) = 0.5 : x X, Ai A}; initial policy π1 |