reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating Optimal Policy Value in Linear Contextual Bandits Beyond Gaussianity

Authors: Jonathan Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present promising experimental benefits on a semi-synthetic simulation using historical data on warfarin treatment dosage outcomes. (Section 1) Figure 1: Estimation error of Algorithm 1 (Moment) is shown in red on a simulated high-dimensional CB domain with d = 300. ... Error bars represent standard error over 10 trials. Figure 2: A comparison between our proposed test for treatment effect in Section 4.1.2 and a test based on a linear regression baseline (LR) with d = 600 and p = 2000. ... The right figure demonstrates the estimator on warfarin dosage data. ... Error bands represent standard error on 1000 replicates. H Experiment details
Researcher Affiliation	Collaboration	Jonathan N. Lee EMAIL Stanford University Weihao Kong EMAIL Google Research Aldo Pacchiano EMAIL Boston University Vidya Muthukumar EMAIL Georgia Institute of Technology Emma Brunskill EMAIL Stanford University
Pseudocode	Yes	Algorithm 1 Moment-Based Estimator Algorithm 2 Estimator of Upper Bound on V Algorithm 3 Treatment Effect Test Algorithm 4 Model Selection with Gaussian Process Upper Bound
Open Source Code	No	The paper does not provide explicit statements about releasing code, nor does it include a link to a code repository for the described methodology.
Open Datasets	Yes	The Pharmacogenetics and Pharmacogenomics Knowledge Base (Pharm GKB) provides a publicly available dataset of patient covariates as well as their final dosages, which might be noisy or slightly suboptimal. (Section 4.1.2)
Dataset Splits	Yes	The dataset Dn is split into two independent datasets Dm and D m of size m = n/2. (Proof of Lemma C.2) Split dataset D evenly into {xi, ai, yi}i [m] and {x i, a i, y i}i [m]. (Algorithm 2, Line 9) We split the training set into two equal parts, randomly. (Section H.2.1) Using an 80/20 split of the n samples into datasets D and D of sizes \|D\| = nin and \|D \| = nout, we compute... (Section H.2.1)
Hardware Specification	Yes	The experiments of Section 3.2 were run on a standard Amazon Web Services EC2 c5.xlarge instance. The experiments of Section 4.1.2 were conducted on a standard personal laptop with 16GB of memory and an Intel Core i7 processor. (Section H.3)
Software Dependencies	No	The paper does not explicitly mention any software dependencies with specific version numbers.
Experiment Setup	Yes	We consider a simulated high-dimensional CB learning setting with K = 2 actions and d = 300 dimensions. ... We set the degree t = 2 and did not split the data q = 1. (Section H.1) We approximated the max function with a polynomial of degree t = 2 by minimizing an ℓ1 loss under randomly 2000 uniformly randomly generated points in [−2, 2]. (Section H.1) For the linear regression baseline, we perform standard unregularized linear regression on the training dataset to learn both ˆµa and ˆθa for the candidate actions. (Section H.2.1) For our method, (0 ˆU 0.2 q 1 otherwise and for the linear regression (LR) plug-in method, it was (0 ˆW 0.33 q d n 1 otherwise (Section H.2.1)