Deep Reinforcement Learning with Time-Scale Invariant Memory
Authors: Md Rysul Kabir, James Mochizuki-Freeman, Zoran Tiganj
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we integrate a computational neuroscience model of scale invariant memory into deep reinforcement learning (RL) agents. We first provide a theoretical analysis and then demonstrate through experiments that such agents can learn robustly across a wide range of temporal scales, unlike agents built with commonly used recurrent memory architectures such as LSTM. This result illustrates that incorporating computational principles from neuroscience and cognitive science into deep neural networks can enhance adaptability to complex temporal dynamics, mirroring some of the core properties of human learning. |
| Researcher Affiliation | Academia | Department of Computer Science Luddy School of Informatics, Computing, and Engineering Indiana University Bloomington EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations but does not contain explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Supplemental Information and code is available in https://github.com/cogneuroai/RL-with-scale-invariant-memory. |
| Open Datasets | Yes | We used an existing delayed-match-to-sample environment from Neuro Gym (Molano-Mazon et al. 2022). |
| Dataset Splits | No | The paper describes training and evaluation across different temporal scales and durations (e.g., "trained on scale 1 and evaluated on scales 1, 2 and 4"), but it does not specify traditional dataset splits with exact percentages or sample counts for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions algorithms and frameworks like A3C, REINFORCE, Adam optimizer, and PyBullet, but it does not provide specific version numbers for any of these software components or other libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Our setup for the 3D interval timing environment used the three following convolutional layers: 32 kernels of size 8 8 (stride 2), 16 kernels of size 4 4 (stride 1), and 32 kernels of size 8 8 (stride 2). The fully connected layer after the convolution layer has 64 nodes. Outputs of all layers had Re LU activation. Following the encoder, the RL agent had either an LSTM network with 256 hidden units, or the Cog RNN architecture with 8 log-spaced units having min = 1, max = 1000, and k = 8. We also introduced two layered multihead-attention (Vaswani et al. 2017) over the with 8 heads and dmodel 128. Parameters related to the RL algorithm were a discount factor (γ) of 0.98 and a decay (λ) factor of 0.95. We used the Adam optimizer with β1 = 0.9, β2 = 0.999, and = 1e 8. We also trained the RL agents with varying learning rates, including 0.001, 0.0001 and 0.00001, and selected the best-performing learning rates for each agent. ... For simple environments, we used LSTM networks with 128 hidden size and the same Cog RNN network without the attention network. |