DRDT3: Diffusion-Refined Decision Test-Time Training Model
Authors: Xingshuai Huang, Di Wu, Benoit Boulet
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple tasks in the D4RL benchmark, our DT3 model without diffusion refinement demonstrates improved performance over standard DT, while DRDT3 further achieves superior results compared to state-of-the-art DT-based and offline RL methods. Experiments on extensive tasks from the D4RL benchmark (Fu et al., 2020) demonstrate the superior performance of our proposed DT3 and DRDT3 over conventional offline RL and DT-based methods. |
| Researcher Affiliation | Academia | Xingshuai Huang EMAIL Department of Electrical and Computer Engineering Mc Gill University Di Wu EMAIL Department of Electrical and Computer Engineering Mc Gill University Benoit Boulet EMAIL Department of Electrical and Computer Engineering Mc Gill University |
| Pseudocode | Yes | Algorithm 1 Training of DRDT3, Algorithm 2 Inference of DRDT3 |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository. It refers to other papers for baselines, but not its own implementation. |
| Open Datasets | Yes | We conduct experiments to evaluate our proposed DRDT3 on the commonly used D4RL benchmark (Fu et al., 2020) using an AMD Ryzen 7 7700X 8-Core Processor with a single NVIDIA Ge Force RTX 4080 GPU. |
| Dataset Splits | No | The paper describes the characteristics and sources of the D4RL datasets used (e.g., "Medium datasets contain one million samples collected using a behavior policy...", "Medium-Expert datasets consist of two million samples..."), but it does not specify how these datasets were further split into training, validation, or testing sets for the authors' experiments. It refers to using the D4RL benchmark datasets, but not the specific splits used within their experiments. |
| Hardware Specification | Yes | We conduct experiments to evaluate our proposed DRDT3 on the commonly used D4RL benchmark (Fu et al., 2020) using an AMD Ryzen 7 7700X 8-Core Processor with a single NVIDIA Ge Force RTX 4080 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks. It mentions 'GPT-2 model' but without a version, and 'Optuna hyperparameter optimization framework' also without a version. |
| Experiment Setup | Yes | When implementing our proposed DRDT3, we train it for 50 epochs with 2000 gradient updates per epoch. The learning rate and batch size are designated as 0.0003 and 2048, respectively. To proceed with historical subtrajectories with DT3 module, we set the context length as 6. The Attention TTT block used in the DT3 module consists of 1-layer self-attention and 1-layer TTT with embedding dimensions of 128. We empirically set ΞΆ = 0.2 based on the results. |