reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Successor Features with Distributed Hebbian Temporal Memory

Authors: Evgenii Dzhivelikian, Petr Kuderov, Aleksandr Panov

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that DHTM outperforms LSTM, RWKV and a biologically inspired HMM-like algorithm, CSCG, on non-stationary data sets.
Researcher Affiliation	Academia	Evgenii Dzhivelikian MIPT, AIRI Moscow, Russia EMAIL Petr Kuderov AIRI, MIPT Moscow, Russia EMAIL Aleksandr Panov AIRI, MIPT Moscow, Russia EMAIL
Pseudocode	Yes	Algorithm 1 General agent training procedure Algorithm 2 Episodic memory learning Algorithm 3 SF formation for EC
Open Source Code	Yes	1The source code of the experiments and algorithms is at https://github.com/ Cognitive-AI-Systems/him-agent/tree/iclr25.
Open Datasets	Yes	The proposed TM is tested as an episodic memory for an RL agent architecture navigating in a Gridworld environment and a more challenging Animal AI testbed (Crosby et al., 2020).
Dataset Splits	Yes	For both Gridworld and Animal AI tasks, LSTM and RWKV with a hidden state size of 100 are trained on a buffer of 1000 trajectories with learning rate 0.002 on 50 randomly sampled batches of size 50 every 10 episodes.
Hardware Specification	No	The paper does not explicitly mention specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	Yes	Current public RWKV implementation is a fast evolving framework (Bo, 2021), and for the increased performance it is tightly bound to the offline batch training common for the transformer architectures. In our case we needed a so-called sequential mode for online learning similar to LSTM. Thus, we adapted another public implementation mentioned in the official documentation (RWKV in 150 lines of code).
Experiment Setup	Yes	The agent uses softmax exploration with temperature equal to 0.04, except for EC agent we set softmax temperature to 0.01, since for higher temperatures its policy is unstable. For SF formation the agent uses γ = 0.8 and γ = 0.9 for Gridworld and Animal AI experiments respectively. Maximum SF horizon is set to 50, but it is usually shorter since we employ early stop condition based on feature variables distribution. That is, if the prediction is close to uniform or the probability of a state with associated positive reward meets a threshold, the planning is interrupted. We use probability thresholds equal to 0.05 and 0.01 for Gridworld and Animal AI respectively.