Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Counterfactual Mean Embeddings

Authors: Krikamol Muandet, Motonobu Kanagawa, Sorawit Saengkyongam, Sanparith Marukatat

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on synthetic data and off-policy evaluation tasks demonstrate the advantages of the proposed estimator.
Researcher Affiliation Academia Krikamol Muandet EMAIL Max Planck Institute for Intelligent Systems Tübingen, Germany Motonobu Kanagawa EMAIL Data Science Department, EURECOM Sophia Antipolis, France Sorawit Saengkyongam EMAIL University of Copenhagen Copenhagen, Denmark Sanparith Marukatat EMAIL National Electronics and Computer Technology Center National Science and Technology Development Agency Pathumthani, Thailand
Pseudocode Yes Algorithm 1 Sampling from a counterfactual mean embedding estimate; Algorithm 2 Off-Policy Evaluation using the CME estimator (18)
Open Source Code Yes The codes to reproduce the experiments are available at https://github.com/sorawitj/counterfactual-mean-embedding.
Open Datasets Yes For our real data experiment, we use the data from the Microsoft Learning to Rank Challenge dataset (MSLR-WEB30K) (Qin and Liu, 2013) and treat them as an off-policy evaluation problem.
Dataset Splits Yes We set the kernel ℓon the outcome space as the Gaussian kernel ℓ(y, y ) = exp( y y 2 2/2σ2 Y ) whose bandwidth parameter σY is chosen by the median heuristic using (yi)n i=1. We also set the kernel k on the covariate space as the Gaussian kernel k(x, x ) = exp( x x 2 2/2σ2 X), whose parameter σX as well as the regularization constant ε in the CME estimator are chosen by 5-fold cross validation from σX {0.01, 0.1, 1, 10} and ε {0.01, 0.1, 1, 10}.
Hardware Specification No The paper does not explicitly mention specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for experiments.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn 0.x).
Experiment Setup Yes Throughout the experiment, we set β = [0.1, 0.2, 0.3, 0.4, 0.5] , α = [0.05, 0.04, 0.03, 0.02, 0.01] , α0 = 0.05, and σ2 ε = σ2 x = 0.1. We set b = 0 for the Scenario I and b = 2 for the Scenario II. For Scenario III, we set b = 2z 1, where z {0, 1} is an independent Bernoulli random variable z Bernoulli(0.5) generated for every observation. We perform 5-fold CV over parameter grids, i.e., the number of hidden units nh {50, 100, 150, 200} for the Direct and DR estimators, and the regularization parameter ε {10 8, . . . , 100} for our CME.