Estimating Optimal Policy Value in Linear Contextual Bandits Beyond Gaussianity

Authors: Jonathan Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present promising experimental benefits on a semi-synthetic simulation using historical data on warfarin treatment dosage outcomes. (Section 1) Figure 1: Estimation error of Algorithm 1 (Moment) is shown in red on a simulated high-dimensional CB domain with d = 300. ... Error bars represent standard error over 10 trials. Figure 2: A comparison between our proposed test for treatment effect in Section 4.1.2 and a test based on a linear regression baseline (LR) with d = 600 and p = 2000. ... The right figure demonstrates the estimator on warfarin dosage data. ... Error bands represent standard error on 1000 replicates. H Experiment details
Researcher Affiliation Collaboration Jonathan N. Lee EMAIL Stanford University Weihao Kong EMAIL Google Research Aldo Pacchiano EMAIL Boston University Vidya Muthukumar EMAIL Georgia Institute of Technology Emma Brunskill EMAIL Stanford University
Pseudocode Yes Algorithm 1 Moment-Based Estimator Algorithm 2 Estimator of Upper Bound on V Algorithm 3 Treatment Effect Test Algorithm 4 Model Selection with Gaussian Process Upper Bound
Open Source Code No The paper does not provide explicit statements about releasing code, nor does it include a link to a code repository for the described methodology.
Open Datasets Yes The Pharmacogenetics and Pharmacogenomics Knowledge Base (Pharm GKB) provides a publicly available dataset of patient covariates as well as their final dosages, which might be noisy or slightly suboptimal. (Section 4.1.2)
Dataset Splits Yes The dataset Dn is split into two independent datasets Dm and D m of size m = n/2. (Proof of Lemma C.2) Split dataset D evenly into {xi, ai, yi}i [m] and {x i, a i, y i}i [m]. (Algorithm 2, Line 9) We split the training set into two equal parts, randomly. (Section H.2.1) Using an 80/20 split of the n samples into datasets D and D of sizes |D| = nin and |D | = nout, we compute... (Section H.2.1)
Hardware Specification Yes The experiments of Section 3.2 were run on a standard Amazon Web Services EC2 c5.xlarge instance. The experiments of Section 4.1.2 were conducted on a standard personal laptop with 16GB of memory and an Intel Core i7 processor. (Section H.3)
Software Dependencies No The paper does not explicitly mention any software dependencies with specific version numbers.
Experiment Setup Yes We consider a simulated high-dimensional CB learning setting with K = 2 actions and d = 300 dimensions. ... We set the degree t = 2 and did not split the data q = 1. (Section H.1) We approximated the max function with a polynomial of degree t = 2 by minimizing an ℓ1 loss under randomly 2000 uniformly randomly generated points in [−2, 2]. (Section H.1) For the linear regression baseline, we perform standard unregularized linear regression on the training dataset to learn both ˆµa and ˆθa for the candidate actions. (Section H.2.1) For our method, (0 ˆU 0.2 q 1 otherwise and for the linear regression (LR) plug-in method, it was (0 ˆW 0.33 q d n 1 otherwise (Section H.2.1)