Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees
Authors: Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas Henzinger
NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment. |
| Researcher Affiliation | Academia | Ðor de Žikeli c Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria EMAIL Mathias Lechner Massachusetts Institute of Technology Cambridge, MA, USA EMAIL Abhinav Verma The Pennsylvania State University University Park, PA, USA EMAIL Krishnendu Chatterjee Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria EMAIL Thomas A. Henzinger Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria EMAIL |
| Pseudocode | Yes | The algorithm pseudocode is presented in Algorithm 1. |
| Open Source Code | Yes | Our code is available at https://github.com/mlech26l/neural_martingales |
| Open Datasets | No | The paper mentions the 'Stochastic Nine Rooms environment', which is obtained by injecting stochastic disturbances to the environment of [33]. However, it does not provide a link or specific access information for this customized environment/dataset to be considered publicly available. |
| Dataset Splits | No | The paper describes an RL environment but does not specify explicit train/validation/test splits for data or evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'proximal policy optimization (PPO) [50]' but does not provide specific version numbers for any software dependencies like PPO itself, Python, or machine learning frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | No | The paper states that PPO was used to initialize policy parameters but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations. |