Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
Authors: Adith Swaminathan, Thorsten Joachims
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness and efficiency of POEM is evaluated on several simulated multi-label classification problems, as well as on a real-world information retrieval problem. The empirical results show that the CRM objective implemented in POEM provides improved robustness and generalization performance compared to the state-of-the-art. 6. Empirical Evaluation |
| Researcher Affiliation | Academia | Adith Swaminathan EMAIL Thorsten Joachims EMAIL Department of Computer Science Cornell University Ithaca, NY 14853, USA |
| Pseudocode | Yes | Algorithm 1 POEM pseudocode. |
| Open Source Code | Yes | Software implementing POEM is available at http://www.cs.cornell.edu/ adith/poem/ for download, as is all the code and data needed to run each of the experiments reported in Section 6. |
| Open Datasets | Yes | We conducted experiments on different multi-label data sets collected from the Lib SVM repository, with different ranges for p (features), q (labels) and n (samples) represented as summarized in Table 2. |
| Dataset Splits | Yes | Table 2: Corpus statistics for different multi-label data sets from the Lib SVM repository. LYRL was post-processed so that only top level categories were treated as labels. We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | CRF is implemented by scikit-learn (Pedregosa et al., 2011). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 and step size η = 1 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence. |
| Experiment Setup | Yes | We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. λ = cλ , where λ is the calibration factor from Section 4.4 and c {10 6, . . . , 1} in multiples of 10. The clipping constant M is similarly set to the ratio of the 90%ile to the 10%ile propensity score observed in the training set of D. When optimizing any objective over w, we always begin the optimization from w = 0, which is equivalent to hw = uniform(Y). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 and step size η = 1 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence. |