Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Authors: Jiafei Lyu, Mengbei Yan, Zhongjian Qiao, Runze Liu, Xiaoteng Ma, Deheng Ye, Jing-Wen Yang, Zongqing Lu, Xiu Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate OTDF upon various D4RL (Fu et al., 2020) datasets with different types of dynamics shifts (e.g., gravity shift), given limited target domain data. Empirically, we demonstrate that OTDF achieves superior performance across numerous tasks and with varied source or target domain dataset qualities, often outperforming recent strong baseline methods by a large margin. To ensure that our work is reproducible, our code is available at https://github.com/dmksjfl/OTDF. |
| Researcher Affiliation | Collaboration | 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Department of Automation, Tsinghua University, 3Tencent 4School of Computer Science, Peking University, 5Beijing Academy of Artificial Intelligence EMAIL, EMAIL |
| Pseudocode | Yes | The abstracted pseudocode of OTDF is presented in Algorithm 1. ... We summarize the pseudocode of OTDF+IQL in Algorithm 2. |
| Open Source Code | Yes | To ensure that our work is reproducible, our code is available at https://github.com/dmksjfl/OTDF. |
| Open Datasets | Yes | We evaluate OTDF upon various D4RL (Fu et al., 2020) datasets with different types of dynamics shifts (e.g., gravity shift), given limited target domain data. |
| Dataset Splits | Yes | To ensure that only a limited budget of target domain data can be accessed, we only collect 5 trajectories for each dataset, which amounts to about 5000 transitions. ... We strictly follow the data budget and pick 2 trajectories from the medium dataset and 3 trajectories from the expert dataset to construct the medium-expert datasets. |
| Hardware Specification | Yes | In Table 8, we list the compute infrastructure that we use to run all of the algorithms. Table 8: Compute infrastructure. CPU AMD EPYC 7452 GPU RTX3090 8 Memory 288GB |
| Software Dependencies | No | The paper mentions several software components like OTT-JAX, IQL, SAC, D4RL, Mu Jo Co, Open AI Gym, CVAE, and Adam. However, it does not provide specific version numbers for these components in the main text, except for 'D4RL -v2'. |
| Experiment Setup | Yes | We run all algorithms for 1M gradient steps across 5 random seeds. ... For most of our experiments, we set β = 0.5. ... We summarize the detailed hyperparameter setup for all baseline methods and OTDF in Table 5. Table 5 includes specific values for learning rate, batch size, discount factor, target update rate, and various algorithm-specific coefficients. |