reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Real-Time Recurrent Reinforcement Learning

Authors: Julian Lemmel, Radu Grosu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia. ... We evaluate the feasibility of our RTRRL approach by testing on RL benchmarks provided by the gymnax (Lange 2022), popgym (Morad et al. 2022) and brax (Freeman et al. 2021) packages. ... Figure 3: Bar-charts of combined normalized validation rewards achieved for 5 runs each on a range of different tasks. ... Ablation Experiments.
Researcher Affiliation	Collaboration	Julian Lemmel1,2, Radu Grosu1 1 Vienna University of Technology 2 Daten Vorsprung Gmb H EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: RTRRL Require: Linear policy: A(a\|h) Require: Linear value-function: ˆv C(h) Require: Recurrent layer: RNN R([o, a, r], h, ˆJ) 1: A, C, R initialize parameters 2: BA, BC initialize feedback matrices 3: h, e A, e C, e R 0 4: o reset Environment 5: h, ˆJ RNN R([o, 0, 0], h, 0) 6: v ˆv C(h) 7: while not done do 8: A(h) 9: a sample( ) 10: o, r take action a 11: h0, ˆJ0 RNN R([o, a, r], h, ˆJ) 12: e C γλCe C + r C ˆv 13: e A γλAe A + r A log [a] 14: g C BC1 15: g A BAr log [a] 16: e R γλRe R + ˆJ(g C + Ag A) 17: v0 ˆv C(h0) 18: δ r + γv0 v 19: C C + Cδe C 20: A A + Aδe A 21: R R + Rδe R 22: v v0, h h0, ˆJ ˆJ0 23: end while
Open Source Code	Yes	Code https://github.com/Franz Knut/RTRRL-AAAI25
Open Datasets	Yes	We evaluate the feasibility of our RTRRL approach by testing on RL benchmarks provided by the gymnax (Lange 2022), popgym (Morad et al. 2022) and brax (Freeman et al. 2021) packages.
Dataset Splits	No	The paper evaluates on RL benchmarks/environments (gymnax, popgym, brax) which do not typically involve predefined train/test/validation dataset splits in the traditional sense. It mentions conducting experiments with '5 runs each' and '10 runs', but does not specify how a static dataset would be partitioned for training, validation, or testing for reproducibility. The context is continuous interaction with environments, not static dataset splitting.
Hardware Specification	No	Computational results have been achieved in part using the Vienna Scientiﬁc Cluster (VSC). No specific hardware details (e.g., GPU/CPU models, memory) of the cluster are provided.
Software Dependencies	No	Our implementation of PPO is based on purejaxrl (Lu et al. 2022). ... gymnax (Lange 2022), popgym (Morad et al. 2022) and brax (Freeman et al. 2021) packages. ... the adam (Kingma and Ba 2015) optimizer. The paper mentions software packages and an optimizer, along with citations, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	For each environment, we trained a network with 32 neurons for either a maximum of 50 million steps or until 20 subsequent epochs showed no improvement. The same set of hyperparameters, given in the Appendix, was used for all the RTRRL experiments if not stated otherwise. Importantly, a batch size of 1 was used to ensure biological plausibility. All λ s and γ were kept at 0.99, H was set to 10 5, and the adam (Kingma and Ba 2015) optimizer with a learning rate of 10 3 was used.