Kernel Estimation and Model Combination in A Bandit Problem with Covariates

Authors: Wei Qian, Yuhong Yang

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations and a real data evaluation are conducted to illustrate the performance of the proposed allocation strategy. We show in Section 6 and Section 7 the numerical performance of the proposed allocation strategy using simulations and a web-based news article recommendation data set, respectively.
Researcher Affiliation Academia Wei Qian EMAIL School of Mathematical Sciences Rochester Institute of Technology Rochester, NY 14623, USA Yuhong Yang EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA
Pseudocode Yes STEP 1. Initialize with forced arm selections. Give each arm a small number of applications. STEP 2. Initialize the weights and the error variance estimates. STEP 3. Estimate the individual functions fi for 1 i l. STEP 4. Combine the regression estimates and obtain the weighted average estimates. STEP 5. Estimate the best arm, select and pull. STEP 6. Update the weights and the error variance estimates. STEP 7. Repeat steps 3 6 for n = n0l + 2, n0l + 3, , and so on.
Open Source Code No The major source code illustrating the proposed algorithms is available upon request.
Open Datasets Yes In this section, we use the Yahoo! Front Page Today Module User Click Log data set (Yahoo! Academic Relations, 2011) to evaluate the proposed allocation strategy. Available from http://webscope.sandbox.yahoo.com.
Dataset Splits Yes Given the time horizon N = 1200, the first 90 rounds of the game are the forced sampling period. The ϵ-greedy, SIR-kernel and model combining algorithms described above all take the first 1000 time points to be the forced sampling stage and use πn = n−1/4/6. For each of the 100 runs, the algorithm starts at a position randomly chosen from the first 10,000 events of the reduced data set.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments. It only mentions the implementation languages: "The simulation example is implemented in MATLAB and the real data example is implemented in C++.".
Software Dependencies No The paper mentions "LDR package (Cook et al., 2011) for SIR, and CISE package (Chen et al., 2010) for CIS-SIR" and that simulations were "implemented in MATLAB and the real data example is implemented in C++." However, it does not provide specific version numbers for these software components or packages.
Experiment Setup Yes Given the time horizon N = 1200, the first 90 rounds of the game are the forced sampling period. Let the inferior arm sampling probability be πn = 1/(log2 n)2, and the kernel bandwidth for arm i be h = n−1/(2+ri), i = 1, 2, 3. The ϵ-greedy, SIR-kernel and model combining algorithms described above all take the first 1000 time points to be the forced sampling stage and use πn = n−1/4/6. Also, for any given arm, the SIR-kernel method limits the history time window for reward estimation to have maximum sample size of 10,000. We set c0 = 0.5, 1 or 3 and hn = n−1/10.