Optimal Policy Adaptation Under Covariate Shift

Authors: Xueqing Liu, Qinwei Yang, Zhaoqing Tian, Ruocheng Guo, Peng Wu

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the approach not only estimates the reward more accurately but also yields a policy that closely approximates the theoretically optimal policy.
Researcher Affiliation Collaboration 1Beijing Technology and Business University 2Byte Dance Research
Pseudocode Yes Algorithm 1 Proposed Policy Learning Approach
Open Source Code No The paper does not provide any explicit statements about code availability, nor does it include links to repositories for the methodology described.
Open Datasets Yes The Communities and Crime dataset [Redmond, 2009] comprises 1994 records from communities in the United States... UCI Machine Learning Repository, doi.org/10. 24432/C53W3X, 2009.
Dataset Splits No The paper describes generating a source dataset of 512 individuals and a target dataset of 2,048 individuals for simulated experiments, and splitting the real-world Communities and Crime dataset into New Jersey (source) and other states (target). However, it does not specify explicit training, validation, or test splits for model development and evaluation within these datasets (e.g., 70/15/15 percentages, absolute sample counts, or specific predefined splits with citations).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used to run the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup No The paper describes the overall experimental methodology, including the number of trials and evaluation metrics, but it does not provide specific details about hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or other training configurations for the models used in the experiments.