A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control
Authors: Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, Martin J. Wainwright
NeurIPS 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run extensive simulations to verify our claims, and also report results on real data collected from the New Yorker Cartoon Caption contest. |
| Researcher Affiliation | Academia | Fanny Yang Dept. of EECS, U.C. Berkeley EMAIL Aaditya Ramdas Dept. of EECS and Statistics, U.C. Berkeley EMAIL Kevin Jamieson Allen School of CSE, U. of Washington EMAIL Martin Wainwright Dept. of EECS and Statistics, U.C. Berkeley EMAIL |
| Pseudocode | Yes | Procedure 1 MAB-FDR Meta algorithm skeleton. Algorithm 1 Best-arm identification with a control arm for confidence δ and precision ϵ. Procedure 2 MAB-LORD: best-arm identification with online FDR control. |
| Open Source Code | Yes | The code for reproducing all experiments and plots in this paper is publicly available at https://github.com/fanny-yang/MABFDR |
| Open Datasets | Yes | Our experiments are run on artificial data with Gaussian/Bernoulli draws and real-world Bernoulli draws from the New Yorker Cartoon Caption contest. We have access to 1000 such contests over a period of 4 years. |
| Dataset Splits | Yes | In all simulations, 60% of all the hypotheses are true nulls, and their indices are chosen uniformly. The results in Section 4 are based on two different experimental settings: (i) an independent setting where we simulate K = 50 arms for each hypothesis, where we chose 60% of hypotheses to be true nulls and for the remaining 40% (non-nulls) we chose µi for the best alternative randomly in [0.05, 0.2] and other alternatives randomly in [0.0, 0.1]. (ii) a dependent setting (New Yorker data) where the alternatives are not chosen independently. For all results, we average over 100 repetitions. |
| Hardware Specification | No | The paper does not specify any hardware details like CPU models, GPU models, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Unless otherwise noted, we set ϵ = 0 in all of our simulations to focus on the main ideas and keep the discussion concise. γj = 0.07 log(j 2) / je log j as in [4]. (i) an independent setting where we simulate K = 50 arms for each hypothesis, where we chose 60% of hypotheses to be true nulls and for the remaining 40% (non-nulls) we chose µi for the best alternative randomly in [0.05, 0.2] and other alternatives randomly in [0.0, 0.1]. For all results, we average over 100 repetitions. |