reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Reinforcement Learning with Time-Scale Invariant Memory

Authors: Md Rysul Kabir, James Mochizuki-Freeman, Zoran Tiganj

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We ﬁrst provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning.
Researcher Affiliation	Academia	Department of Computer Science Luddy School of Informatics, Computing, and Engineering Indiana University Bloomington EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and mathematical formulations but does not contain explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Supplemental Information and code is available in https://github.com/cogneuroai/RL-with-scale-invariant-memory.
Open Datasets	Yes	We used an existing delayed-match-to-sample environment from Neuro Gym (Molano-Mazon et al. 2022).
Dataset Splits	No	The paper describes training and evaluation across different temporal scales and durations (e.g., "trained on scale 1 and evaluated on scales 1, 2 and 4"), but it does not specify traditional dataset splits with exact percentages or sample counts for training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions algorithms and frameworks like A3C, REINFORCE, Adam optimizer, and PyBullet, but it does not provide specific version numbers for any of these software components or other libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Our setup for the 3D interval timing environment used the three following convolutional layers: 32 kernels of size 8 8 (stride 2), 16 kernels of size 4 4 (stride 1), and 32 kernels of size 8 8 (stride 2). The fully connected layer after the convolution layer has 64 nodes. Outputs of all layers had Re LU activation. Following the encoder, the RL agent had either an LSTM network with 256 hidden units, or the Cog RNN architecture with 8 log-spaced units having min = 1, max = 1000, and k = 8. We also introduced two layered multihead-attention (Vaswani et al. 2017) over the with 8 heads and dmodel 128. Parameters related to the RL algorithm were a discount factor (γ) of 0.98 and a decay (λ) factor of 0.95. We used the Adam optimizer with β1 = 0.9, β2 = 0.999, and = 1e 8. We also trained the RL agents with varying learning rates, including 0.001, 0.0001 and 0.00001, and selected the best-performing learning rates for each agent. ... For simple environments, we used LSTM networks with 128 hidden size and the same Cog RNN network without the attention network.