reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Laplacian-Based Representation Learning in Reinforcement Learning

Authors: Maheed H. Ahmed, Jayanth Bhargav, Mahsa Ghasemi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive simulation studies empirically validate the convergence guarantees to the true Laplacian representation. Furthermore, we provide insights into the compatibility of different reinforcement learning algorithms with online representation learning.
Researcher Affiliation	Academia	1Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. Correspondence to: Maheed H. Ahmed <EMAIL>.
Pseudocode	Yes	Algorithm 1 Online PGD of AGDO
Open Source Code	Yes	We provide an open-source implementation at https://github.com/Maheed Hatem/online_laplacian_representation.
Open Datasets	No	The paper describes experiments conducted in custom 'grid world environments' where targets and agent locations are sampled, rather than utilizing a pre-existing, publicly available dataset.
Dataset Splits	No	The paper describes an online reinforcement learning setup where an agent interacts with an environment and data is collected into a replay buffer. It does not define traditional training, validation, and test splits for a static dataset.
Hardware Specification	No	The paper does not specify any particular GPU or CPU models, memory, or cloud computing instances used for the experiments. It only details software components and experimental parameters.
Software Dependencies	No	The paper mentions software components and algorithms like 'Adam optimizer', 'proximal policy optimization (PPO)', and 'deep Q-network (DQN)' but does not provide specific version numbers for these or any associated libraries.
Experiment Setup	Yes	We set d = 11 and use the (x, y) coordinates as input to the encoder network, a fully connected neural network with 3 layers of size 256 each. ... We use a fixed value of 5 for the barrier coefficient. The encoder network is trained using an Adam optimizer with a learning rate of 10^-3. ... For training the agent, we use proximal policy optimization (PPO) (Schulman et al., 2017) as the training algorithm with an initial clipping parameter 0.2 unless otherwise specified. We add an entropy regularization term to discourage deterministic policies. ... Appendix B provides Table 1, which lists comprehensive hyper-parameters for AGDO, PPO, and DQN.