Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Authors: Saket Tiwari, Omer Gottesman, George D Konidaris

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically corroborate this upper bound for four Mu Jo Co environments and also demonstrate the results in a toy environment with varying dimensionality. We also show the applicability of this theoretical result by introducing a local manifold learning layer to the policy and value function networks to improve the performance in control environments with very high degrees of freedom by changing one layer of the neural network to learn sparse representations.
Researcher Affiliation Collaboration Saket Tiwari Department of Computer Science Brown University Omer Gottesman Amazon Web Services George Konidaris Department of Computer Science Brown University
Pseudocode No The paper describes methods and procedures in narrative text, but does not present them in structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using a 'Py Torch-based implementation for DDPG with modifications for the use of Ge LU units. The base implementation of the DDPG algorithm can be found here:https://github.com/rail-berkeley/rlkit/blob/master/examples/ddpg.py.' This link points to a base implementation used, not explicitly the authors' specific modified code for their contributions (e.g., the local manifold learning layer or sparse SAC modifications). There is no explicit statement of releasing their own implementation code.
Open Datasets Yes We empirically corroborate our main result (Theorem 1) in the Mu Jo Co domains provided in the Open AI Gym (Brockman et al., 2016)... Our modified neural network works out of the box with SAC (Haarnoja et al., 2018) and we show significant improvements in high dimensional DM control environments (Tunyasuvunakool et al., 2020).
Dataset Splits No To sample data from the manifold, we record the trajectories of multiple DDPG evaluation runs across different seeds (Lillicrap et al., 2016), with two changes... We then randomly sample states from the evaluation trajectories to obtain a subsample of states, D = {si}n i=1. We estimate the dimensionality with 10 different subsamples of the same size to provide confidence intervals.
Hardware Specification No This research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.
Software Dependencies No We use a Py Torch-based implementation for DDPG with modifications for the use of Ge LU units. The base implementation of the DDPG algorithm can be found here:https://github.com/rail-berkeley/rlkit/blob/master/examples/ddpg.py. We use the same hyperparameter for learning rates and entropy regularization for both the sparse SAC and vanilla SAC as those provided in the Clean RL library (Huang et al., 2022).
Experiment Setup Yes We use the same hyperparameter for learning rates and entropy regularization for both the sparse SAC and vanilla SAC as those provided in the Clean RL library (Huang et al., 2022). We also use wider networks of width 1024, for both the baseline and modified architecture... for a fixed embedding dimension ds we obtain neural networks sampled uniformly randomly from the family of linearised neural networks as in definition 3, with r = 1.0, t (0, 5), n = 1024. Consequently, we obtain 1000 policies with δt = 0.01, and therefore a sample of 500000 states to estimate the intrinsic dimension of the attained set of states using the algorithm of Facco et al. (2017). We report ablation for Ant and Humanoid domains over the step size parameter in Appendix N.