Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning
Authors: Shangding Gu, Laixi Shi, Muning Wen, Ming Jin, Eric Mazumdar, Yuejie Chi, Adam Wierman, Costas Spanos
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive evaluation of several state-of-the-art (SOTA) baselines from standard RL, robust RL, safe RL, and multi-agent RL using representative tasks in Robust-Gymnasium. Our findings reveal that current algorithms often fall short of expectations in challenging tasks, even under single-stage disruptions, highlighting the need for new robust RL approaches. Furthermore, our experiments demonstrate the flexibility of Robust-Gymnasium by encompassing tasks with disruptions across all stages and four disturbance modes, including an adversarial model using a large language model (LLM). |
| Researcher Affiliation | Academia | 1 University of California, Berkeley 2 California Institute of Technology 3 Shanghai Jiao Tong University 4 Virginia Tech 5 Carnegie Mellon University |
| Pseudocode | Yes | the pseudo code is shown in Listing 3. Furthermore, Equation (3) is for initial noise, and Equation (4) is for noise during training we use these Equarions to consider the incorporation of stochastic disturbances into the Ant robot model, again including factors like gravity fluctuations and wind speed variations, the pseudo code is shown in Listing 4. Apart from wind and gravity disturbances, we also investigate the robot shape disturbances during policy learning, as shown in Equations (5)-(8), and an example of pseudo code is shown in Listing 5. |
| Open Source Code | Yes | The code is available at this website1. ... 2Website with the introduction, code, and examples: https://robust-gym.github.io/ |
| Open Datasets | Yes | We introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components agents observed state and reward, agents actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. ... Gymnasium-Box2D (three relative simple control tasks in games). These tasks are from Gymnasium (Towers et al., 2024)... Gymnisium-Mu Jo Co (eleven control tasks). It includes various robot models... Robosuite (twelve tasks for various modular robot platforms). |
| Dataset Splits | No | We mainly focus on two evaluation settings: In-training: the disruptor simultaneously affects the agent and environment during both training and testing at each time step. This process is typically used in robotics to address sim-to-real gaps by introducing potential noise during training; 2) Post-training: the disruptor only impacts the agent and environment during testing, mimicking scenarios where learning algorithms are unaware of testing variability. The paper describes evaluation settings related to when disruptions occur (in-training vs. post-training) but does not provide specific dataset split percentages, sample counts, or citations to predefined splits for reproducibility in terms of data partitioning. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are provided in the paper. |
| Software Dependencies | No | No specific software versions (e.g., Python, PyTorch, TensorFlow, CUDA, scikit-learn, etc.) are explicitly mentioned in the paper, beyond general framework names like 'Gymnasium'. |
| Experiment Setup | Yes | We deploy several SOTA baselines in our benchmark to evaluate their robustness across various challenging scenarios. The implementation parameters associated with these methods are provided in Tables 9-13. |