UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning
Authors: Shicheng Liu, Minghui Zhu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use Mu Jo Co experiments to show that our method outperforms state-of-the-art baselines. This section provides experiment results for the proposed framework. |
| Researcher Affiliation | Academia | Shicheng Liu & Minghui Zhu Department of Electrical Engineering Pennsylvania State University University Park, PA 16802, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Utilizing explainable reinforcement learning to improve reinforcement learning |
| Open Source Code | No | The paper does not contain any explicit statements about code availability, such as a link to a repository or a declaration that the code will be released. |
| Open Datasets | Yes | We test the algorithms on delayed Mu Jo Co environments (Zheng et al., 2018; Memarian et al., 2021; Oh et al., 2018)... We also conduct experiments on the original Mu Jo Co environments, which are widely used in RL literature (Xu & Zhu, 2023b; 2024) |
| Dataset Splits | No | The paper mentions "each episode has the length of 100 in our experiments" for the environments, but it does not specify any training/test/validation splits for a collected dataset. Reinforcement learning typically involves continuous interaction with an environment rather than predefined static dataset splits. |
| Hardware Specification | Yes | The code was running on a laptop whose CPU is Intel Core i9 12900k and GPU is NVIDIA RTX 3080. |
| Software Dependencies | No | The paper mentions the operating system as "Windows 10" but does not specify any software libraries, frameworks (like PyTorch, TensorFlow), or other dependencies with version numbers that would be necessary for reproduction. |
| Experiment Setup | Yes | The neural network has two hidden layers where each hidden layer has 64 neurons. The activation functions are respectively Re LU and Tanh. Following (Finn et al., 2017), each episode has the length of 100 in our experiments. We use soft actor-critic (SAC) (Haarnoja et al., 2018) as the baseline RL algorithm. The mean and standard deviation are computed using five random seeds. |