Return-Aligned Decision Transformer
Authors: Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that RADT significantly reduces the discrepancies between the actual return and the target return compared to DT-based methods. |
| Researcher Affiliation | Collaboration | Tsunehiko Tanaka EMAIL Waseda University; Kenshi Abe EMAIL Cyber Agent; Kaito Ariu EMAIL Cyber Agent; Tetsuro Morimura EMAIL Cyber Agent; Edgar Simo-Serra EMAIL Waseda University |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and textual explanations (e.g., equations 1-15 and sections 4.1-4.2), but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'We use the model code for DT, St ARformer, and DC from the following sources. DT: https://github. com/kzl/decision-transformer. St ARformer: https://github.com/elicassion/St ARformer. DC: https://openreview.net/forum?id=af2c8Ea Kl8.' However, this refers to the baselines' code, not the source code for the proposed RADT methodology. There is no explicit statement or link provided for the open-source code of RADT. |
| Open Datasets | Yes | We evaluate RADT on continuous (Mu Jo Co (Todorov et al., 2012)) and discrete (Atari (Bellemare et al., 2013)) control tasks in the same way as DT. ... We use four gym locomotion tasks from the widely-used D4RL (Fu et al., 2020) dataset: ant, hopper, halfcheetah, and walker2d. ... Similar to DT, we use 1% of all samples in the DQN-replay datasets as per Agarwal et al. (2020) for training. |
| Dataset Splits | No | The paper states: 'Similar to DT, we use 1% of all samples in the DQN-replay datasets as per Agarwal et al. (2020) for training.' This indicates the amount of data used for training but does not specify how the dataset is split into distinct training, validation, or testing sets needed for reproduction. |
| Hardware Specification | Yes | We used an Nvidia A100 GPU for training in the Atari and Mu Jo Co domains. |
| Software Dependencies | No | The paper states: 'Our implementation of RADT is based on the public codebase of DT.' and mentions 'official Py Torch implementations' for baselines. However, it does not provide specific version numbers for any key software components like Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | The paper provides detailed hyperparameters settings in Table 13 and Table 14 for RADT in the Mu Jo Co, Ant Maze, and Atari domains. These include: Number of blocks, Number of heads, Embedding dimension, Batch size, Nonlinearity function, Context length K, Dropout, Learning rate, Grad norm clip, Weight decay, Learning rate decay, Encoder channels, filter size, strides, Max epochs, and Adam betas. |