Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization

Authors: Adith Swaminathan, Thorsten Joachims

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness and efficiency of POEM is evaluated on several simulated multi-label classification problems, as well as on a real-world information retrieval problem. The empirical results show that the CRM objective implemented in POEM provides improved robustness and generalization performance compared to the state-of-the-art. 6. Empirical Evaluation
Researcher Affiliation Academia Adith Swaminathan EMAIL Thorsten Joachims EMAIL Department of Computer Science Cornell University Ithaca, NY 14853, USA
Pseudocode Yes Algorithm 1 POEM pseudocode.
Open Source Code Yes Software implementing POEM is available at http://www.cs.cornell.edu/ adith/poem/ for download, as is all the code and data needed to run each of the experiments reported in Section 6.
Open Datasets Yes We conducted experiments on different multi-label data sets collected from the Lib SVM repository, with different ranges for p (features), q (labels) and n (samples) represented as summarized in Table 2.
Dataset Splits Yes Table 2: Corpus statistics for different multi-label data sets from the Lib SVM repository. LYRL was post-processed so that only top level categories were treated as labels. We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No CRF is implemented by scikit-learn (Pedregosa et al., 2011). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 and step size η = 1 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence.
Experiment Setup Yes We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. λ = cλ , where λ is the calibration factor from Section 4.4 and c {10 6, . . . , 1} in multiples of 10. The clipping constant M is similarly set to the ratio of the 90%ile to the 10%ile propensity score observed in the training set of D. When optimizing any objective over w, we always begin the optimization from w = 0, which is equivalent to hw = uniform(Y). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 and step size η = 1 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence.