Planning with Consistency Models for Model-Based Offline Reinforcement Learning
Authors: Guanquan Wang, Takuya Hiraoka, Yoshimasa Tsuruoka
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on Gym tasks in the D4RL framework, demonstrating that, when compared to its diffusion model counterparts, our method achieves more than a 12-fold increase in speed without any loss in performance. Section 5 Experiment |
| Researcher Affiliation | Collaboration | Guanquan Wang EMAIL Department of Information and Communication Engineering The University of Tokyo; Takuya Hiraoka EMAIL NEC Corporation, Tokyo, Japan; Yoshimasa Tsuruoka EMAIL Department of Information and Communication Engineering The University of Tokyo |
| Pseudocode | Yes | Algorithm 1 Consistency Distillation with guidance; Algorithm 2 Planning with Consistency Model |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We validate our method on Gym tasks in the D4RL framework... We evaluate Consistency Planning on D4RL benchmark tasks (Fu et al., 2020) for offline RL... The diffusion model, inverse dynamics model, and consistency model are trained using publicly available D4RL datasets... |
| Dataset Splits | No | The paper uses D4RL datasets but does not explicitly provide information on how these datasets were split into training, validation, and test sets for the experiments. |
| Hardware Specification | No | The paper states 'on our server' when discussing inference time measurements, but does not provide specific details about the hardware components (e.g., GPU model, CPU, memory) of this server. |
| Software Dependencies | No | The paper mentions using '2nd order Heun as ODE solver' and 'Adam optimizer', but it does not provide specific software dependencies like library names with version numbers (e.g., PyTorch version, TensorFlow version) used for implementation. |
| Experiment Setup | Yes | We train diffusion model using learning rate of 1e 4 and batch size of 512 for 2e5 train steps with Adam optimizer. We choose the probability p of removing the conditioning information to be 0.25. We use N = 2 for consistency inference. We use a planning horizon H of 32, context length C of 8 in all tasks. We use a guidance scale ωmax = 1, ωmin = 0 in guided consistency distillation. |