Learning Successor Features with Distributed Hebbian Temporal Memory
Authors: Evgenii Dzhivelikian, Petr Kuderov, Aleksandr Panov
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that DHTM outperforms LSTM, RWKV and a biologically inspired HMM-like algorithm, CSCG, on non-stationary data sets. |
| Researcher Affiliation | Academia | Evgenii Dzhivelikian MIPT, AIRI Moscow, Russia EMAIL Petr Kuderov AIRI, MIPT Moscow, Russia EMAIL Aleksandr Panov AIRI, MIPT Moscow, Russia EMAIL |
| Pseudocode | Yes | Algorithm 1 General agent training procedure Algorithm 2 Episodic memory learning Algorithm 3 SF formation for EC |
| Open Source Code | Yes | 1The source code of the experiments and algorithms is at https://github.com/ Cognitive-AI-Systems/him-agent/tree/iclr25. |
| Open Datasets | Yes | The proposed TM is tested as an episodic memory for an RL agent architecture navigating in a Gridworld environment and a more challenging Animal AI testbed (Crosby et al., 2020). |
| Dataset Splits | Yes | For both Gridworld and Animal AI tasks, LSTM and RWKV with a hidden state size of 100 are trained on a buffer of 1000 trajectories with learning rate 0.002 on 50 randomly sampled batches of size 50 every 10 episodes. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | Yes | Current public RWKV implementation is a fast evolving framework (Bo, 2021), and for the increased performance it is tightly bound to the offline batch training common for the transformer architectures. In our case we needed a so-called sequential mode for online learning similar to LSTM. Thus, we adapted another public implementation mentioned in the official documentation (RWKV in 150 lines of code). |
| Experiment Setup | Yes | The agent uses softmax exploration with temperature equal to 0.04, except for EC agent we set softmax temperature to 0.01, since for higher temperatures its policy is unstable. For SF formation the agent uses γ = 0.8 and γ = 0.9 for Gridworld and Animal AI experiments respectively. Maximum SF horizon is set to 50, but it is usually shorter since we employ early stop condition based on feature variables distribution. That is, if the prediction is close to uniform or the probability of a state with associated positive reward meets a threshold, the planning is interrupted. We use probability thresholds equal to 0.05 and 0.01 for Gridworld and Animal AI respectively. |