DRDT3: Diffusion-Refined Decision Test-Time Training Model

Authors: Xingshuai Huang, Di Wu, Benoit Boulet

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple tasks in the D4RL benchmark, our DT3 model without diffusion refinement demonstrates improved performance over standard DT, while DRDT3 further achieves superior results compared to state-of-the-art DT-based and offline RL methods. Experiments on extensive tasks from the D4RL benchmark (Fu et al., 2020) demonstrate the superior performance of our proposed DT3 and DRDT3 over conventional offline RL and DT-based methods.
Researcher Affiliation Academia Xingshuai Huang EMAIL Department of Electrical and Computer Engineering Mc Gill University Di Wu EMAIL Department of Electrical and Computer Engineering Mc Gill University Benoit Boulet EMAIL Department of Electrical and Computer Engineering Mc Gill University
Pseudocode Yes Algorithm 1 Training of DRDT3, Algorithm 2 Inference of DRDT3
Open Source Code No The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository. It refers to other papers for baselines, but not its own implementation.
Open Datasets Yes We conduct experiments to evaluate our proposed DRDT3 on the commonly used D4RL benchmark (Fu et al., 2020) using an AMD Ryzen 7 7700X 8-Core Processor with a single NVIDIA Ge Force RTX 4080 GPU.
Dataset Splits No The paper describes the characteristics and sources of the D4RL datasets used (e.g., "Medium datasets contain one million samples collected using a behavior policy...", "Medium-Expert datasets consist of two million samples..."), but it does not specify how these datasets were further split into training, validation, or testing sets for the authors' experiments. It refers to using the D4RL benchmark datasets, but not the specific splits used within their experiments.
Hardware Specification Yes We conduct experiments to evaluate our proposed DRDT3 on the commonly used D4RL benchmark (Fu et al., 2020) using an AMD Ryzen 7 7700X 8-Core Processor with a single NVIDIA Ge Force RTX 4080 GPU.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks. It mentions 'GPT-2 model' but without a version, and 'Optuna hyperparameter optimization framework' also without a version.
Experiment Setup Yes When implementing our proposed DRDT3, we train it for 50 epochs with 2000 gradient updates per epoch. The learning rate and batch size are designated as 0.0003 and 2048, respectively. To proceed with historical subtrajectories with DT3 module, we set the context length as 6. The Attention TTT block used in the DT3 module consists of 1-layer self-attention and 1-layer TTT with embedding dimensions of 128. We empirically set ΞΆ = 0.2 based on the results.