RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Authors: Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Wenjun Zeng, Glen Berseth

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments aim to achieve two main objectives: (i) highlight how intrinsic reward methods are sensitive to implementation details, and (ii) identify the best algorithmic and design choices to ensure high performance across various sparse-reward environments to demonstrate the generality and robustness of our framework.
Researcher Affiliation Academia Mingqi Yuan EMAIL Department of Computing The Hong Kong Polytechnic University Roger Creus Castanyer EMAIL Mila Québec AI Institute & Université de Montréal Bo Li EMAIL Department of Computing The Hong Kong Polytechnic University Xin Jin EMAIL Ningbo Institute of Digital Twin Eastern Institute of Technology, Ningbo Wenjun Zeng EMAIL Ningbo Institute of Digital Twin Eastern Institute of Technology, Ningbo Fellow, IEEE and CAE Glen Berseth EMAIL Mila Québec AI Institute & Université de Montréal
Pseudocode No The paper includes code examples in Appendix C.2, but these are actual code snippets demonstrating API usage rather than generalized, structured pseudocode or algorithm blocks for a method or procedure.
Open Source Code Yes Our documentation, examples, and source code are available at https://github.com/RLE-Foundation/RLe Xplore.
Open Datasets Yes We evaluate the RLe Xplore framework on multiple recognized benchmarks, which are specifically designed to evaluate the exploration capability of RL agents. We select Super Mario Bros (Kauten, 2018), Mini Grid (Chevalier-Boisvert et al., 2023), Procgen (Cobbe et al., 2020), Arcade learning environment (ALE) (Bellemare et al., 2013), and Gymnasium-Robotics (de Lazcano et al., 2024) for our experiments
Dataset Splits No The paper describes using various environments for reinforcement learning (Super Mario Bros, Mini Grid, Procgen, ALE-5, Ant-UMaze) and mentions training steps (e.g., 10M Steps, 1M Steps, 25M Steps), but it does not specify traditional train/test/validation splits for static datasets. Reinforcement learning typically involves continuous interaction with an environment rather than fixed dataset partitioning.
Hardware Specification No We thank the high-performance computing center at Eastern Institute of Technology and Ningbo Institute of Digital Twin for providing the computing resources. We also want to acknowledge funding support from NSERC and CIFAR, and compute support from Digital Research Alliance of Canada, Mila IDT and Nvidia.
Software Dependencies No The paper mentions several software components like PPO (Schulman et al., 2017), Stable-Baselines3 (Raffin et al., 2021), Clean RL (Huang et al., 2022b), and RLLTE (Yuan et al., 2023), but it does not provide specific version numbers for these or other key software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes Table 8: PPO hyperparameters for Super Mario Bros, Mini Grid, and Procgen games. These remain fixed for all experiments. The table then lists specific values for observation downsampling, stacked frames, environment steps, episode steps, number of workers, environments per worker, optimizer, learning rate, GAE coefficient, action entropy coefficient, value loss coefficient, value clip range, max gradient norm, epochs per rollout, batch size, and discount factor. Additionally, Tables 2, 4, 5, 6, and 7 provide details on baseline settings and specific configurations for intrinsic rewards, including observation normalization, reward normalization, update proportion, weight initialization, and memory requirements.