reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Off-policy evaluation for slate recommendation

Authors: Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, Imed Zitouni

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learningto-rank task, where it achieves competitive performance.
Researcher Affiliation	Collaboration	Adith Swaminathan Microsoft Research, Redmond EMAIL Akshay Krishnamurthy University of Massachusetts, Amherst EMAIL Alekh Agarwal Microsoft Research, New York EMAIL Miroslav Dudík Microsoft Research, New York EMAIL John Langford Microsoft Research, New York EMAIL Damien Jose Microsoft, Redmond EMAIL Imed Zitouni Microsoft, Redmond EMAIL
Pseudocode	No	The paper describes procedures and methods but does not include a dedicated 'Pseudocode' or 'Algorithm' section or block.
Open Source Code	Yes	All of our code is available online.3 (Footnote 3 points to: https://github.com/adith387/slates_semisynth_expts)
Open Datasets	Yes	Our semi-synthetic evaluation uses labeled data from the Microsoft Learning to Rank Challenge dataset [30] (MSLR-WEB30K) to create a contextual bandit instance. [30] Tao Qin and Tie-Yan Liu. Introducing LETOR 4.0 datasets. ar Xiv:1306.2597, 2013.
Dataset Splits	Yes	We use the provided 5-fold split and always train on bandit data collected by uniform logging from four folds, while evaluating with supervised data on the ﬁfth.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, memory amounts, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions software like 'lasso regression models', 'regression tree models', 'gradient boosted regression trees', and 'Lambda MART', but does not specify their version numbers.
Experiment Setup	Yes	Both PI-OPT and SUP train gradient boosted regression trees (with 1000 trees, each with up to 70 leaves).