TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Authors: Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TOP-ERL significantly outperforms state-of-the-art RL methods. Thorough ablation studies additionally show the impact of key design choices on the model performance. Our code is available here. Our experiments focus on the following questions: I) Can TOP-ERL improve sample efficiency in classical ERL tasks featured by challenging exploration problems? II) How does TOP-ERL perform in large-scale, general manipulation benchmarks? III) How do key design choices affect the performance of TOP-ERL? We compare TOP-ERL against a set of strong baselines. The results of these experiments, shown in Fig. 4, demonstrate that TOP-ERL achieved the highest final performance across all three tasks. ... Finally, we conduct a comprehensive ablation study to analysis which ingredient accounts for the strong performance of TOP-ERL. |
| Researcher Affiliation | Academia | Ge Li Dong Tian Hongyi Zhou Xinkai Jiang Rudolf Lioutikov Gerhard Neumann Karlsruhe Institute of Technology, Germany Email to <EMAIL, EMAIL> |
| Pseudocode | Yes | We summarize the key learning steps in Algorithm1. Algorithm 1 TOP-ERL |
| Open Source Code | No | All relevant code, including the implementation of the proposed algorithms, simulation environments, and trained models, will be made available in an Git Hub repository provided in the main paper. |
| Open Datasets | Yes | We conducted experiments on the Meta-World benchmark(Yu et al., 2020), a large-scale suite of general manipulation tasks. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (like percentages or absolute counts for train/test/validation sets) for reproducing the data partitioning. It focuses on environment interactions and evaluation protocols rather than static dataset splits for training/testing. |
| Hardware Specification | No | The paper mentions using "bw HPC, and the Hore Ka supercomputer" in the acknowledgments but does not provide specific details on the CPU models, GPU models, memory, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using implementations of PPO, SAC, and GTr XL based on other works and augmentation techniques, but it does not specify the version numbers for any key software components, libraries, or programming languages used in the authors' own implementation. |
| Experiment Setup | Yes | The detailed hyper parameters used are listed in the following tables. Table 4: Hyperparameters for the Meta-World experiments. Table 5: Hyperparameters for the Box Pushing Dense, Episode Length T = 100 Table 6: Hyperparameters for the Box Pushing Sparse, Episode Length T = 100 Table 7: Hyperparameters for the Hopper Jump, Episode Length T = 250 |