reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Self-Normalized Estimator for Counterfactual Learning

Authors: Adith Swaminathan, Thorsten Joachims

NeurIPS 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the empirical effectiveness of Norm POEM on several multi-label classiﬁcation problems, ﬁnding that it consistently outperforms the conventional estimator.
Researcher Affiliation	Academia	Adith Swaminathan Department of Computer Science Cornell University EMAIL Thorsten Joachims Department of Computer Science Cornell University EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Software implementing Norm-POEM is available at http://www.cs.cornell.edu/~adith/POEM.
Open Datasets	Yes	The experiment setup uses supervised datasets for multi-label classiﬁcation from the Lib SVM repository. In these datasets, the inputs x Rp. The predictions y {0, 1}q are bitvectors indicating the labels assigned to x. The datasets have a range of features p, labels q and instances n: Name p(# features) q(# labels) ntrain ntest Scene 294 6 1211 1196 Yeast 103 14 1500 917 TMC 30438 22 21519 7077 LYRL 47236 4 23149 781265
Dataset Splits	Yes	Hyper-parameters λ, M were calibrated as recommended and validated on a 25% hold-out of D in summary, our experimental setup is identical to POEM [1].
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or specific computational resources.
Software Dependencies	No	The paper mentions that 'CRF is implemented by scikit-learn [27]', but it does not specify the version number of scikit-learn or any other software dependencies.
Experiment Setup	Yes	Hyper-parameters λ, M were calibrated as recommended and validated on a 25% hold-out of D in summary, our experimental setup is identical to POEM [1]. and To simulate a bandit feedback dataset D, we use a CRF with default hyper-parameters trained on 5% of the supervised dataset as h0, and replay the training data 4 times and collect sampled labels from h0. and Since the choice of optimization method could be a confounder, we use L-BFGS for all methods and experiments.