Confounding-Robust Deferral Policy Learning

Authors: Ruijiang Gao, Mingzhang Yin

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical and theoretical analyses demonstrate the efficacy of our approach in mitigating unobserved confounding and improving the overall performance of human-AI collaborations. ... We report empirical findings to examine the advantages of Human-AI complementary and being robust to unobserved confounding. Our first experiment demonstrates the benefit of human-AI collaboration within a controlled environment. Our subsequent experiments consider two real-world examples in financial lending and healthcare industry. ... 5.1 Synthetic Experiment ... 5.2 Real-World Examples ... 5.3 Real Human Responses ... 5.4 Ablation Studies
Researcher Affiliation Academia 1Naveen Jindal School of Management, University of Texas at Dallas, Richardson, TX 75082 2Warrington College of Business, University of Florida, Gainesville, FL 32611 EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Confounding-Robust Deferral Collaboration (Conf HAI/Conf HAIPerson)
Open Source Code Yes Code and appendix are available at https://github.com/ruijiang81/Confound_L2D.
Open Datasets Yes We use the Home Equity Line of Credit(HELOC) dataset which contains anonymized information about credit applications by real homeowners. ... We use the data from the International Stroke Trial (Group 1997) ... We use the scientific annotation dataset FOCUS (Rzhetsky, Shatkay, and Wilbur 2009)
Dataset Splits No The paper mentions 'We train a logistic regression on 10% of the data to simulate nominal policies' for the HELOC dataset, but does not provide specific train/test/validation splits (e.g., percentages or counts) for the experiments conducted. No explicit splitting methodology or predefined splits are mentioned for any of the datasets used.
Hardware Specification No The paper does not explicitly describe any specific hardware (e.g., GPU models, CPU types, memory details, or cloud instance specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation or experimentation.
Experiment Setup Yes We use the logistic policies for the policy and router model classes. The baseline policy is set as the never-treat policy πc(0|x) = 1 (Kallus and Zhou 2018). ... We set log(Γ) = 2.5, C(x) = 0 and vary the log-confounding parameter in {0.01, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4}. ... We simulate three human workers with log(Γ) = 1, 2.5, 4, respectively ... We assume there are three human decision makers with log(Γ) = [0.1, 0.1, 1] ... We train a logistic regression on 10% of the data to simulate nominal policies ... For each experiment, we try three log(Γ) specifications: [0.1, 0.1, 0.1], [0.1, 0.1, 1] and [1, 1, 1] ... We use the synthetic data setup and vary the human cost from 0 to 0.3.