reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GLAM: Global-Local Variation Awareness in Mamba-based World Model

Authors: Qian He, Wenqi Liang, Chunhui Hao, Gan Sun, Jiandong Tian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark. We evaluate GLAM on a subset of 26 games from Atari 100k (Bellemare et al. 2013), a benchmark that is widely used for testing reinforcement learning algorithms. ... Ablation Studies Inference module in GLAM We conduct ablation studies on the design of the inference modules on Pong, Boxing, Kung Fu Master and Battle Zone.
Researcher Affiliation	Academia	1State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences. 2Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences. 3University of Chinese Academy of Sciences. 4Shenyang University of Chemical Technology. 5School of Automation Science and Engineering, South China University of Technology.
Pseudocode	Yes	Algorithm 1: Parallel inference in GLAM Initialize:: The input sequence length l = t, the length of the concatenate short sequences s = 4. Input: Observation sequence o0:t, Action sequence a0:t. Encode feature sequence e0:l: z0:t Z0:t = qϕ(o0:t); e0:t = fϕ(z0:t, a0:t); Global variation inference: d0:t 1 = e1:l e0:t 1; ug 1:t = LMamba(d0:t 1); ug 1:t = Layer Norm(ug 1:t); ug 1:t = Si LU(g 1:t); Local variation inference: Concatenate e0:t into a short sequence block Es; {e0 et} Es = {{ei+1 s, , ei}}t i=3; ug 3:t = LMamba(Es); Predict unknown information and next state: ˆZ4:t+1 = g D ϕ (ug 3:t, ul 3:t); ˆr3:t = g R ϕ (ug 3:t, ul 3:t); ˆc3:t = g C ϕ (ug 3:t, ul 3:t); Return: The unknown information sequence ˆr3:t, ˆc3:t and the distribution ˆZ4:t+1 of the state sequence.
Open Source Code	Yes	Code https://github.com/GLAM2025/glam
Open Datasets	Yes	We evaluate GLAM on a subset of 26 games from Atari 100k (Bellemare et al. 2013), a benchmark that is widely used for testing reinforcement learning algorithms.
Dataset Splits	No	GLAM using 100k samples in each game. Considering a skip step of 4 frames, the samples correspond to 400k actual game frames, which is about 1.85 hours (Zhang et al. 2024) of real-time game time. The final results of the agent are quantified using the human-normalized score: SNorm = (SAgent SRandom)/(SHuman SRandom). To evaluate the agent, we perform 20 evaluations for final checkpoints and compute the average scores as the results. This text describes the total samples and evaluation methodology, but not how the dataset was split into training, validation, or test sets for the world model itself.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, etc.) used for running the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided in the paper.
Experiment Setup	Yes	In this work, we use fixed length l = 16 and s = 4. The specific formula for calculating the variable number of interaction is as follows: nt = min[n0 + Int(t/ nf) ni, nmax], (11) where nt represents the variable number of interaction, t denotes the current training step, nf represents the change frequency of nt, n corresponds to the increment of number at each update and nmax represents the maximum number of interaction during imagined training. For other parts of the agent s training, we refer to the method in STORM. ... We design three different parameters of interaction, nmax = 16, 24, 32, labeled as N 16, N 24, N 32 respectively. The variable parameter nt in training is calculated by Eq. (11), and the parameters n0 = 16 and n = 8 are fixed.