Return-Aligned Decision Transformer

Authors: Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that RADT significantly reduces the discrepancies between the actual return and the target return compared to DT-based methods.
Researcher Affiliation Collaboration Tsunehiko Tanaka EMAIL Waseda University; Kenshi Abe EMAIL Cyber Agent; Kaito Ariu EMAIL Cyber Agent; Tetsuro Morimura EMAIL Cyber Agent; Edgar Simo-Serra EMAIL Waseda University
Pseudocode No The paper describes the methodology using mathematical formulations and textual explanations (e.g., equations 1-15 and sections 4.1-4.2), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states: 'We use the model code for DT, St ARformer, and DC from the following sources. DT: https://github. com/kzl/decision-transformer. St ARformer: https://github.com/elicassion/St ARformer. DC: https://openreview.net/forum?id=af2c8Ea Kl8.' However, this refers to the baselines' code, not the source code for the proposed RADT methodology. There is no explicit statement or link provided for the open-source code of RADT.
Open Datasets Yes We evaluate RADT on continuous (Mu Jo Co (Todorov et al., 2012)) and discrete (Atari (Bellemare et al., 2013)) control tasks in the same way as DT. ... We use four gym locomotion tasks from the widely-used D4RL (Fu et al., 2020) dataset: ant, hopper, halfcheetah, and walker2d. ... Similar to DT, we use 1% of all samples in the DQN-replay datasets as per Agarwal et al. (2020) for training.
Dataset Splits No The paper states: 'Similar to DT, we use 1% of all samples in the DQN-replay datasets as per Agarwal et al. (2020) for training.' This indicates the amount of data used for training but does not specify how the dataset is split into distinct training, validation, or testing sets needed for reproduction.
Hardware Specification Yes We used an Nvidia A100 GPU for training in the Atari and Mu Jo Co domains.
Software Dependencies No The paper states: 'Our implementation of RADT is based on the public codebase of DT.' and mentions 'official Py Torch implementations' for baselines. However, it does not provide specific version numbers for any key software components like Python, PyTorch, or other libraries used in the implementation.
Experiment Setup Yes The paper provides detailed hyperparameters settings in Table 13 and Table 14 for RADT in the Mu Jo Co, Ant Maze, and Atari domains. These include: Number of blocks, Number of heads, Embedding dimension, Batch size, Nonlinearity function, Context length K, Dropout, Learning rate, Grad norm clip, Weight decay, Learning rate decay, Encoder channels, filter size, strides, Max epochs, and Adam betas.