reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Return-Aligned Decision Transformer

Authors: Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that RADT significantly reduces the discrepancies between the actual return and the target return compared to DT-based methods.
Researcher Affiliation	Collaboration	Tsunehiko Tanaka EMAIL Waseda University; Kenshi Abe EMAIL Cyber Agent; Kaito Ariu EMAIL Cyber Agent; Tetsuro Morimura EMAIL Cyber Agent; Edgar Simo-Serra EMAIL Waseda University
Pseudocode	No	The paper describes the methodology using mathematical formulations and textual explanations (e.g., equations 1-15 and sections 4.1-4.2), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'We use the model code for DT, St ARformer, and DC from the following sources. DT: https://github. com/kzl/decision-transformer. St ARformer: https://github.com/elicassion/St ARformer. DC: https://openreview.net/forum?id=af2c8Ea Kl8.' However, this refers to the baselines' code, not the source code for the proposed RADT methodology. There is no explicit statement or link provided for the open-source code of RADT.
Open Datasets	Yes	We evaluate RADT on continuous (Mu Jo Co (Todorov et al., 2012)) and discrete (Atari (Bellemare et al., 2013)) control tasks in the same way as DT. ... We use four gym locomotion tasks from the widely-used D4RL (Fu et al., 2020) dataset: ant, hopper, halfcheetah, and walker2d. ... Similar to DT, we use 1% of all samples in the DQN-replay datasets as per Agarwal et al. (2020) for training.
Dataset Splits	No	The paper states: 'Similar to DT, we use 1% of all samples in the DQN-replay datasets as per Agarwal et al. (2020) for training.' This indicates the amount of data used for training but does not specify how the dataset is split into distinct training, validation, or testing sets needed for reproduction.
Hardware Specification	Yes	We used an Nvidia A100 GPU for training in the Atari and Mu Jo Co domains.
Software Dependencies	No	The paper states: 'Our implementation of RADT is based on the public codebase of DT.' and mentions 'official Py Torch implementations' for baselines. However, it does not provide specific version numbers for any key software components like Python, PyTorch, or other libraries used in the implementation.
Experiment Setup	Yes	The paper provides detailed hyperparameters settings in Table 13 and Table 14 for RADT in the Mu Jo Co, Ant Maze, and Atari domains. These include: Number of blocks, Number of heads, Embedding dimension, Batch size, Nonlinearity function, Context length K, Dropout, Learning rate, Grad norm clip, Weight decay, Learning rate decay, Encoder channels, filter size, strides, Max epochs, and Adam betas.