Confounding-Robust Deferral Policy Learning
Authors: Ruijiang Gao, Mingzhang Yin
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical and theoretical analyses demonstrate the efficacy of our approach in mitigating unobserved confounding and improving the overall performance of human-AI collaborations. ... We report empirical findings to examine the advantages of Human-AI complementary and being robust to unobserved confounding. Our first experiment demonstrates the benefit of human-AI collaboration within a controlled environment. Our subsequent experiments consider two real-world examples in financial lending and healthcare industry. ... 5.1 Synthetic Experiment ... 5.2 Real-World Examples ... 5.3 Real Human Responses ... 5.4 Ablation Studies |
| Researcher Affiliation | Academia | 1Naveen Jindal School of Management, University of Texas at Dallas, Richardson, TX 75082 2Warrington College of Business, University of Florida, Gainesville, FL 32611 EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Confounding-Robust Deferral Collaboration (Conf HAI/Conf HAIPerson) |
| Open Source Code | Yes | Code and appendix are available at https://github.com/ruijiang81/Confound_L2D. |
| Open Datasets | Yes | We use the Home Equity Line of Credit(HELOC) dataset which contains anonymized information about credit applications by real homeowners. ... We use the data from the International Stroke Trial (Group 1997) ... We use the scientific annotation dataset FOCUS (Rzhetsky, Shatkay, and Wilbur 2009) |
| Dataset Splits | No | The paper mentions 'We train a logistic regression on 10% of the data to simulate nominal policies' for the HELOC dataset, but does not provide specific train/test/validation splits (e.g., percentages or counts) for the experiments conducted. No explicit splitting methodology or predefined splits are mentioned for any of the datasets used. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware (e.g., GPU models, CPU types, memory details, or cloud instance specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation or experimentation. |
| Experiment Setup | Yes | We use the logistic policies for the policy and router model classes. The baseline policy is set as the never-treat policy πc(0|x) = 1 (Kallus and Zhou 2018). ... We set log(Γ) = 2.5, C(x) = 0 and vary the log-confounding parameter in {0.01, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4}. ... We simulate three human workers with log(Γ) = 1, 2.5, 4, respectively ... We assume there are three human decision makers with log(Γ) = [0.1, 0.1, 1] ... We train a logistic regression on 10% of the data to simulate nominal policies ... For each experiment, we try three log(Γ) specifications: [0.1, 0.1, 0.1], [0.1, 0.1, 1] and [1, 1, 1] ... We use the synthetic data setup and vary the human cost from 0 to 0.3. |