reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization

Authors: Adith Swaminathan, Thorsten Joachims

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The eﬀectiveness and eﬃciency of POEM is evaluated on several simulated multi-label classiﬁcation problems, as well as on a real-world information retrieval problem. The empirical results show that the CRM objective implemented in POEM provides improved robustness and generalization performance compared to the state-of-the-art. 6. Empirical Evaluation
Researcher Affiliation	Academia	Adith Swaminathan EMAIL Thorsten Joachims EMAIL Department of Computer Science Cornell University Ithaca, NY 14853, USA
Pseudocode	Yes	Algorithm 1 POEM pseudocode.
Open Source Code	Yes	Software implementing POEM is available at http://www.cs.cornell.edu/ adith/poem/ for download, as is all the code and data needed to run each of the experiments reported in Section 6.
Open Datasets	Yes	We conducted experiments on diﬀerent multi-label data sets collected from the Lib SVM repository, with diﬀerent ranges for p (features), q (labels) and n (samples) represented as summarized in Table 2.
Dataset Splits	Yes	Table 2: Corpus statistics for diﬀerent multi-label data sets from the Lib SVM repository. LYRL was post-processed so that only top level categories were treated as labels. We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	CRF is implemented by scikit-learn (Pedregosa et al., 2011). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 and step size η = 1 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence.
Experiment Setup	Yes	We keep aside 25% of D as a validation set we use the unbiased counterfactual estimator from Equation (1) for selecting hyper-parameters. λ = cλ , where λ is the calibration factor from Section 4.4 and c {10 6, . . . , 1} in multiples of 10. The clipping constant M is similarly set to the ratio of the 90%ile to the 10%ile propensity score observed in the training set of D. When optimizing any objective over w, we always begin the optimization from w = 0, which is equivalent to hw = uniform(Y). We use mini-batch Ada Grad (Duchi et al., 2011) with batch size = 100 and step size η = 1 to adapt our learning rates for the stochastic approaches and use progressive validation (Blum et al., 1999) and gradient norms to detect convergence.