reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits

Authors: Yuta Natsubori, Masataka Ushiku, Yuta Saito

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EMPIRICAL ANALYSIS. This section empirically demonstrates the advantages of COPE and COPE-PG against existing ideas on a real-world public dataset called Kuai Rec (Gao et al., 2022) collected from a recommendation system on a video-sharing app. ... The following reports and discusses the MSE, squared bias, and variance of the OPE estimators computed over 200 sets of logged data, each replicated with different seeds.
Researcher Affiliation	Collaboration	Yuta Natsubori Hakuhodo DY Holdings, Inc. EMAIL Masataka Ushiku Hakuhodo DY Holdings, Inc. EMAIL Yuta Saito Cornell University EMAIL
Pseudocode	No	The paper describes methods and formulas mathematically and in natural language, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Our implementation in the experiments relies on one of the most standard methods, unconstrained Least-Squares Importance Fitting (u LSIF), to perform density ratio estimation proposed in (Kanamori et al., 2012; Sugiyama et al., 2012). ... https://github.com/hoxo-m/densratio_py, which we relied on in our experiments, is one of the well-known public implementations of the method.
Open Datasets	Yes	This section empirically demonstrates the advantages of COPE and COPE-PG against existing ideas on a real-world public dataset called Kuai Rec (Gao et al., 2022) collected from a recommendation system on a video-sharing app.
Dataset Splits	No	The small matrix of the dataset consists of 1,411 users (denoted as u U), 3,327 items, and 4,676,570 interactions, with a density of 99.6%, which enables OPE/L experiments without synthetic reward functions. ... Iterating this procedure nk times in each domain generate Dk with nk independent copies of (uk, xk u, ak, rk). ... The following reports and discusses the MSE, squared bias, and variance of the OPE estimators computed over 200 sets of logged data, each replicated with different seeds.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	We use Random Forest (Breiman, 2001) implemented in scikit-learn (Pedregosa et al., 2011) along with 3-fold cross-fitting (Newey & Robins, 2018) to obtain ˆq T (x, a) for DR and DM, and ˆq(x, a) for DR-ALL, DM-ALL, and COPE. In addition, for COPE, we use \|ϕ(T)\| = 4, where we define the target cluster ϕ(T) by the set of domains for which the difference in the empirical average of the rewards, \| rk r T \|, is small. ... https://github.com/hoxo-m/densratio_py, which we relied on in our experiments, is one of the well-known public implementations of the method.
Experiment Setup	Yes	We randomly select 30 actions that have at least one interaction with all users for our experiments. We use the user features and watch ratio recorded in the original data as the context xu and expected reward q(xu, a), respectively. ... where ϵ [0, 1] controls the quality of π and we set ϵ = 0.2 as default. We sample the reward rk from a normal distribution with mean q(xu, a) and standard deviation σ = 1. ... Note that we set K = 10 for the number of domains and nk = 100 for the logged data size of each domain as the default experimental parameters. ... we use Random Forest (Breiman, 2001) implemented in scikit-learn (Pedregosa et al., 2011) along with 3-fold cross-fitting (Newey & Robins, 2018) to obtain ˆq T (x, a) for DR and DM, and ˆq(x, a) for DR-ALL, DM-ALL, and COPE. In addition, for COPE, we use \|ϕ(T)\| = 4