GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning
Authors: Yingbo Luo, Meibao Yao, Xueming Xiao
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method can generate resilient locomotion behaviors for robots with different configurations, including zero-shot generalization to robot morphologies not seen during training. In particular, GCNT achieved the best performance on 8 tasks in the 2 standard benchmarks. We conducted experiments across 8 different scenarios on two standard benchmarks and compared with 7 baselines, ensuring thorough testing. |
| Researcher Affiliation | Academia | Yingbo Luo1 , Meibao Yao1 and Xueming Xiao2 1Jilin University 2Changchun University of Science and Technology EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology of GCNT, including its modules (Limb Observation, GCN, Weisfeiler-Lehman, Learnable Distance Embedding, Transformer) and optimization method (actor-critic, TD3/PPO algorithms). It provides figures illustrating the network architecture and GCN/Transformer blocks, but no explicitly labeled 'Pseudocode' or 'Algorithm' block with structured steps is present in the text. |
| Open Source Code | No | The paper does not include an unambiguous statement from the authors that they are releasing their code for the work described, nor does it provide a direct link to a source-code repository or mention code in supplementary materials. |
| Open Datasets | Yes | We conducted experiments across 8 different scenarios on two standard benchmarks and compared with 7 baselines, ensuring thorough testing. The benchmarks are SMPENV [Huang et al., 2020] and UNIMAL [Gupta et al., 2021]. All tests are based on the Mu Jo Co [Todorov et al., 2012]. |
| Dataset Splits | No | For the zero-shot generalization experiments, the paper states: 'Each method was trained on the training sets of Walker++, Humanoid++, and Cheetah++ and evaluated on their corresponding test sets.' This indicates the use of training and test sets but does not provide specific details on how these splits were defined (e.g., percentages, sample counts for each split, or how the training/test sets were generated/sourced for reproduction). No other explicit dataset split information is provided for the main experiments. |
| Hardware Specification | No | The paper states 'All tests are based on the Mu Jo Co [Todorov et al., 2012],' which is a physics engine. However, it does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or train the models. |
| Software Dependencies | No | The paper mentions using specific algorithms like TD3 [Fujimoto et al., 2018] and PPO [Schulman et al., 2017], and the Mu Jo Co [Todorov et al., 2012] physics engine. However, it does not provide specific version numbers for these or any other ancillary software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, Gym) that would be needed for replication. |
| Experiment Setup | No | The paper mentions using TD3 and PPO algorithms for optimization and running experiments with 3 or 4 random seeds. It also describes the MLP architecture for baselines (3 hidden layers, 256 hidden units). However, it lacks specific hyperparameters for training GCNT, such as learning rates, batch sizes, number of epochs, or detailed optimizer settings (e.g., specific Adam parameters), which are crucial for reproducing the experimental setup. |