Robust Transfer of Safety-Constrained Reinforcement Learning Agents
Authors: Markel Zubia, Thiago Simão, Nils Jansen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical evaluation shows that this method yields policies that are robust against changes in dynamics, demonstrating safety after transfer to a new environment. |
| Researcher Affiliation | Academia | 1Ruhr Univesity Bochum, Germany 2Eindhoven University of Technology, The Netherlands 3Radboud University Nijmegen, The Netherlands |
| Pseudocode | No | The paper describes the methodology in Section 5 ('ROBUST GUIDED SAFE EXPLORATION') using natural language without presenting any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The source code is available on https://github.com/ai-fm/safe-and-robust-transfer |
| Open Datasets | Yes | We evaluate our method 1 on benchmark environments created using a framework for safe reinforcement learning called Safety-Gymnasium (Ji et al., 2023). |
| Dataset Splits | Yes | We restrict the uncertainty set to a finite subset ( U ) by discretizing the values of the parameters to m = m1, . . . , m N, and η = η1, . . . , ηN. In our experiments, we use N = 8 values for each parameter by letting mi = (0.5 + i 1 7 )m and ηi = (0.5 + i 1 7 )η for i = 1, . . . , 8, where m and η correspond to the dynamics in the source task. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Safety-Gymnasium' as a framework for benchmark environments but does not provide specific version numbers for it or any other software libraries or dependencies used. |
| Experiment Setup | Yes | A HYPERPARAMETERS The hyperparameters in our method are summarized in Table 1. All actor and critic networks are modeled by a multilayer perceptron (MLP). Parameter M1 M2 M3 Actor network size [256, 256] [256, 256] [256, 256] Critic network size [256, 256] [256, 256] [256, 256] Size of replay buffer 106 106 106 Batch size 256 256 256 Steps per epoch 2000 2000 2000 Number of epochs 106 106 106 Actor learning rate 5 10 6 5 10 6 5 10 6 Critic learning rate 10 3 10 3 10 3 Lambda learning rate 5 10 7 5 10 7 5 10 7 Safety constraint 5 8 25 Table 1: The hyperparameters used in the experiments. |