reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Authors: Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	TOP-ERL significantly outperforms state-of-the-art RL methods. Thorough ablation studies additionally show the impact of key design choices on the model performance. Our code is available here. Our experiments focus on the following questions: I) Can TOP-ERL improve sample efficiency in classical ERL tasks featured by challenging exploration problems? II) How does TOP-ERL perform in large-scale, general manipulation benchmarks? III) How do key design choices affect the performance of TOP-ERL? We compare TOP-ERL against a set of strong baselines. The results of these experiments, shown in Fig. 4, demonstrate that TOP-ERL achieved the highest final performance across all three tasks. ... Finally, we conduct a comprehensive ablation study to analysis which ingredient accounts for the strong performance of TOP-ERL.
Researcher Affiliation	Academia	Ge Li Dong Tian Hongyi Zhou Xinkai Jiang Rudolf Lioutikov Gerhard Neumann Karlsruhe Institute of Technology, Germany Email to <EMAIL, EMAIL>
Pseudocode	Yes	We summarize the key learning steps in Algorithm1. Algorithm 1 TOP-ERL
Open Source Code	No	All relevant code, including the implementation of the proposed algorithms, simulation environments, and trained models, will be made available in an Git Hub repository provided in the main paper.
Open Datasets	Yes	We conducted experiments on the Meta-World benchmark(Yu et al., 2020), a large-scale suite of general manipulation tasks.
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (like percentages or absolute counts for train/test/validation sets) for reproducing the data partitioning. It focuses on environment interactions and evaluation protocols rather than static dataset splits for training/testing.
Hardware Specification	No	The paper mentions using "bw HPC, and the Hore Ka supercomputer" in the acknowledgments but does not provide specific details on the CPU models, GPU models, memory, or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using implementations of PPO, SAC, and GTr XL based on other works and augmentation techniques, but it does not specify the version numbers for any key software components, libraries, or programming languages used in the authors' own implementation.
Experiment Setup	Yes	The detailed hyper parameters used are listed in the following tables. Table 4: Hyperparameters for the Meta-World experiments. Table 5: Hyperparameters for the Box Pushing Dense, Episode Length T = 100 Table 6: Hyperparameters for the Box Pushing Sparse, Episode Length T = 100 Table 7: Hyperparameters for the Hopper Jump, Episode Length T = 250