Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Authors: Saket Tiwari, Omer Gottesman, George D Konidaris
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically corroborate this upper bound for four Mu Jo Co environments and also demonstrate the results in a toy environment with varying dimensionality. We also show the applicability of this theoretical result by introducing a local manifold learning layer to the policy and value function networks to improve the performance in control environments with very high degrees of freedom by changing one layer of the neural network to learn sparse representations. |
| Researcher Affiliation | Collaboration | Saket Tiwari Department of Computer Science Brown University Omer Gottesman Amazon Web Services George Konidaris Department of Computer Science Brown University |
| Pseudocode | No | The paper describes methods and procedures in narrative text, but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using a 'Py Torch-based implementation for DDPG with modifications for the use of Ge LU units. The base implementation of the DDPG algorithm can be found here:https://github.com/rail-berkeley/rlkit/blob/master/examples/ddpg.py.' This link points to a base implementation used, not explicitly the authors' specific modified code for their contributions (e.g., the local manifold learning layer or sparse SAC modifications). There is no explicit statement of releasing their own implementation code. |
| Open Datasets | Yes | We empirically corroborate our main result (Theorem 1) in the Mu Jo Co domains provided in the Open AI Gym (Brockman et al., 2016)... Our modified neural network works out of the box with SAC (Haarnoja et al., 2018) and we show significant improvements in high dimensional DM control environments (Tunyasuvunakool et al., 2020). |
| Dataset Splits | No | To sample data from the manifold, we record the trajectories of multiple DDPG evaluation runs across different seeds (Lillicrap et al., 2016), with two changes... We then randomly sample states from the evaluation trajectories to obtain a subsample of states, D = {si}n i=1. We estimate the dimensionality with 10 different subsamples of the same size to provide confidence intervals. |
| Hardware Specification | No | This research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University. |
| Software Dependencies | No | We use a Py Torch-based implementation for DDPG with modifications for the use of Ge LU units. The base implementation of the DDPG algorithm can be found here:https://github.com/rail-berkeley/rlkit/blob/master/examples/ddpg.py. We use the same hyperparameter for learning rates and entropy regularization for both the sparse SAC and vanilla SAC as those provided in the Clean RL library (Huang et al., 2022). |
| Experiment Setup | Yes | We use the same hyperparameter for learning rates and entropy regularization for both the sparse SAC and vanilla SAC as those provided in the Clean RL library (Huang et al., 2022). We also use wider networks of width 1024, for both the baseline and modified architecture... for a fixed embedding dimension ds we obtain neural networks sampled uniformly randomly from the family of linearised neural networks as in definition 3, with r = 1.0, t (0, 5), n = 1024. Consequently, we obtain 1000 policies with δt = 0.01, and therefore a sample of 500000 states to estimate the intrinsic dimension of the attained set of states using the algorithm of Facco et al. (2017). We report ablation for Ant and Humanoid domains over the step size parameter in Appendix N. |