reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kernel Estimation and Model Combination in A Bandit Problem with Covariates

Authors: Wei Qian, Yuhong Yang

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations and a real data evaluation are conducted to illustrate the performance of the proposed allocation strategy. We show in Section 6 and Section 7 the numerical performance of the proposed allocation strategy using simulations and a web-based news article recommendation data set, respectively.
Researcher Affiliation	Academia	Wei Qian EMAIL School of Mathematical Sciences Rochester Institute of Technology Rochester, NY 14623, USA Yuhong Yang EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA
Pseudocode	Yes	STEP 1. Initialize with forced arm selections. Give each arm a small number of applications. STEP 2. Initialize the weights and the error variance estimates. STEP 3. Estimate the individual functions fi for 1 i l. STEP 4. Combine the regression estimates and obtain the weighted average estimates. STEP 5. Estimate the best arm, select and pull. STEP 6. Update the weights and the error variance estimates. STEP 7. Repeat steps 3 6 for n = n0l + 2, n0l + 3, , and so on.
Open Source Code	No	The major source code illustrating the proposed algorithms is available upon request.
Open Datasets	Yes	In this section, we use the Yahoo! Front Page Today Module User Click Log data set (Yahoo! Academic Relations, 2011) to evaluate the proposed allocation strategy. Available from http://webscope.sandbox.yahoo.com.
Dataset Splits	Yes	Given the time horizon N = 1200, the ﬁrst 90 rounds of the game are the forced sampling period. The ϵ-greedy, SIR-kernel and model combining algorithms described above all take the ﬁrst 1000 time points to be the forced sampling stage and use πn = n−1/4/6. For each of the 100 runs, the algorithm starts at a position randomly chosen from the ﬁrst 10,000 events of the reduced data set.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It only mentions the implementation languages: "The simulation example is implemented in MATLAB and the real data example is implemented in C++.".
Software Dependencies	No	The paper mentions "LDR package (Cook et al., 2011) for SIR, and CISE package (Chen et al., 2010) for CIS-SIR" and that simulations were "implemented in MATLAB and the real data example is implemented in C++." However, it does not provide specific version numbers for these software components or packages.
Experiment Setup	Yes	Given the time horizon N = 1200, the ﬁrst 90 rounds of the game are the forced sampling period. Let the inferior arm sampling probability be πn = 1/(log2 n)2, and the kernel bandwidth for arm i be h = n−1/(2+ri), i = 1, 2, 3. The ϵ-greedy, SIR-kernel and model combining algorithms described above all take the ﬁrst 1000 time points to be the forced sampling stage and use πn = n−1/4/6. Also, for any given arm, the SIR-kernel method limits the history time window for reward estimation to have maximum sample size of 10,000. We set c0 = 0.5, 1 or 3 and hn = n−1/10.