Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning
Authors: Yassine Chemingui, Aryan Deshwal, Honghao Wei, Alan Fern, Jana Doppa
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. |
| Researcher Affiliation | Academia | Yassine Chemingui1, Aryan Deshwal2, Honghao Wei1, Alan Fern3, Jana Doppa1 1Washington State University 2University of Minnesota 3Oregon State University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the algorithms and methods in prose and mathematical equations in Sections 4.1 and 4.2, but it does not include a distinct pseudocode block or algorithm figure. |
| Open Source Code | Yes | The code/appendices are available at https://github.com/ yassine Ch/CAPS. |
| Open Datasets | Yes | We employ 38 sequential decision-making benchmarks of varying difficulty from Safety-Gymnasium (Ray, Achiam, and Amodei 2019; Ji et al. 2024), Bullet Safety-Gym (Gronauer 2022), and Meta Drive (Li et al. 2022) within the DSRL framework (Liu et al. 2024). Further details are provided in Appendix C.1. |
| Dataset Splits | No | The paper mentions using a "fixed pre-collected dataset D" and evaluating algorithms with "three random seeds, and twenty episodes," but it does not specify explicit training, validation, or test splits for this dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using existing offline RL methods like IQL and SAC+BC but does not specify software libraries, frameworks, or their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We provide the details of the neural network structure used for value and Q-functions, policy heads, and hyper-parameters in the Appendix C.3. |