reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition

Authors: Yuta Saito, Jihan Yao, Thorsten Joachims

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EMPIRICAL EVALUATION We first evaluate POTEC on synthetic data with ground-truth cluster information to compare the effectiveness of POTEC w/ and w/o true cluster information and w/ and w/o pairwise regression. We then assess the real-world applicability of POTEC on a public recommendation dataset.
Researcher Affiliation	Academia	Yuta Saito Cornell University EMAIL Jihan Yao University of Washington EMAIL Thorsten Joachims Cornell University EMAIL
Pseudocode	Yes	Algorithm 1 The POTEC Algorithm Input: logged bandit data D, conventionally trained regression model ˆq(x, a). Output: 1st-stage (policy-based) policy π1st θ and 2nd-stage (regression-based) policy π2nd ψ
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets	Yes	We now evaluate it on the Kuai Rec dataset (Gao et al., 2022), a publicly available recommendation dataset collected on a short video platform... In addition to synthetic and real-world recommendation data, we performed OPL experiments on two extreme classification datasets provided by Bhatia et al. (2016).
Dataset Splits	Yes	The logged data we can use for performing OPL takes the form D := {(xi, ai, ri)}n i=1, which contains n independent observations drawn from the logging policy π0... Table 4: Dataset Statistics Dataset ntrain ntest \|A\| EUR-Lex 4K 15,449 3,865 3,956 Wiki10-31K 14,146 6,616 30,938
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam' as an optimizer and 'scikit-learn' for clustering, but does not provide specific version numbers for these software dependencies or any other libraries.
Experiment Setup	Yes	We tuned the weight decay hyperparameter, learning rate, batch size, and the number of irrelevant actions for variance reduction for the baseline methods (i.e., IPS-PG and DR-PG) using the test policy value, while we use a fixed set of hyperparameters for POTEC as shown in Table 3... For all methods, we used Adam (Kingma & Ba, 2014) as the optimizer and used neural networks with 3 hidden layers to parameterize the policy.