Online Laplacian-Based Representation Learning in Reinforcement Learning

Authors: Maheed H. Ahmed, Jayanth Bhargav, Mahsa Ghasemi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive simulation studies empirically validate the convergence guarantees to the true Laplacian representation. Furthermore, we provide insights into the compatibility of different reinforcement learning algorithms with online representation learning.
Researcher Affiliation Academia 1Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. Correspondence to: Maheed H. Ahmed <EMAIL>.
Pseudocode Yes Algorithm 1 Online PGD of AGDO
Open Source Code Yes We provide an open-source implementation at https://github.com/Maheed Hatem/online_laplacian_representation.
Open Datasets No The paper describes experiments conducted in custom 'grid world environments' where targets and agent locations are sampled, rather than utilizing a pre-existing, publicly available dataset.
Dataset Splits No The paper describes an online reinforcement learning setup where an agent interacts with an environment and data is collected into a replay buffer. It does not define traditional training, validation, and test splits for a static dataset.
Hardware Specification No The paper does not specify any particular GPU or CPU models, memory, or cloud computing instances used for the experiments. It only details software components and experimental parameters.
Software Dependencies No The paper mentions software components and algorithms like 'Adam optimizer', 'proximal policy optimization (PPO)', and 'deep Q-network (DQN)' but does not provide specific version numbers for these or any associated libraries.
Experiment Setup Yes We set d = 11 and use the (x, y) coordinates as input to the encoder network, a fully connected neural network with 3 layers of size 256 each. ... We use a fixed value of 5 for the barrier coefficient. The encoder network is trained using an Adam optimizer with a learning rate of 10^-3. ... For training the agent, we use proximal policy optimization (PPO) (Schulman et al., 2017) as the training algorithm with an initial clipping parameter 0.2 unless otherwise specified. We add an entropy regularization term to discourage deterministic policies. ... Appendix B provides Table 1, which lists comprehensive hyper-parameters for AGDO, PPO, and DQN.