reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Planning with Consistency Models for Model-Based Offline Reinforcement Learning

Authors: Guanquan Wang, Takuya Hiraoka, Yoshimasa Tsuruoka

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method on Gym tasks in the D4RL framework, demonstrating that, when compared to its diffusion model counterparts, our method achieves more than a 12-fold increase in speed without any loss in performance. Section 5 Experiment
Researcher Affiliation	Collaboration	Guanquan Wang EMAIL Department of Information and Communication Engineering The University of Tokyo; Takuya Hiraoka EMAIL NEC Corporation, Tokyo, Japan; Yoshimasa Tsuruoka EMAIL Department of Information and Communication Engineering The University of Tokyo
Pseudocode	Yes	Algorithm 1 Consistency Distillation with guidance; Algorithm 2 Planning with Consistency Model
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository.
Open Datasets	Yes	We validate our method on Gym tasks in the D4RL framework... We evaluate Consistency Planning on D4RL benchmark tasks (Fu et al., 2020) for offline RL... The diffusion model, inverse dynamics model, and consistency model are trained using publicly available D4RL datasets...
Dataset Splits	No	The paper uses D4RL datasets but does not explicitly provide information on how these datasets were split into training, validation, and test sets for the experiments.
Hardware Specification	No	The paper states 'on our server' when discussing inference time measurements, but does not provide specific details about the hardware components (e.g., GPU model, CPU, memory) of this server.
Software Dependencies	No	The paper mentions using '2nd order Heun as ODE solver' and 'Adam optimizer', but it does not provide specific software dependencies like library names with version numbers (e.g., PyTorch version, TensorFlow version) used for implementation.
Experiment Setup	Yes	We train diffusion model using learning rate of 1e 4 and batch size of 512 for 2e5 train steps with Adam optimizer. We choose the probability p of removing the conditioning information to be 0.25. We use N = 2 for consistency inference. We use a planning horizon H of 32, context length C of 8 in all tasks. We use a guidance scale ωmax = 1, ωmin = 0 in guided consistency distillation.