GLAM: Global-Local Variation Awareness in Mamba-based World Model
Authors: Qian He, Wenqi Liang, Chunhui Hao, Gan Sun, Jiandong Tian
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark. We evaluate GLAM on a subset of 26 games from Atari 100k (Bellemare et al. 2013), a benchmark that is widely used for testing reinforcement learning algorithms. ... Ablation Studies Inference module in GLAM We conduct ablation studies on the design of the inference modules on Pong, Boxing, Kung Fu Master and Battle Zone. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences. 2Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences. 3University of Chinese Academy of Sciences. 4Shenyang University of Chemical Technology. 5School of Automation Science and Engineering, South China University of Technology. |
| Pseudocode | Yes | Algorithm 1: Parallel inference in GLAM Initialize:: The input sequence length l = t, the length of the concatenate short sequences s = 4. Input: Observation sequence o0:t, Action sequence a0:t. Encode feature sequence e0:l: z0:t Z0:t = qϕ(o0:t); e0:t = fϕ(z0:t, a0:t); Global variation inference: d0:t 1 = e1:l e0:t 1; ug 1:t = LMamba(d0:t 1); ug 1:t = Layer Norm(ug 1:t); ug 1:t = Si LU(g 1:t); Local variation inference: Concatenate e0:t into a short sequence block Es; {e0 et} Es = {{ei+1 s, , ei}}t i=3; ug 3:t = LMamba(Es); Predict unknown information and next state: ˆZ4:t+1 = g D ϕ (ug 3:t, ul 3:t); ˆr3:t = g R ϕ (ug 3:t, ul 3:t); ˆc3:t = g C ϕ (ug 3:t, ul 3:t); Return: The unknown information sequence ˆr3:t, ˆc3:t and the distribution ˆZ4:t+1 of the state sequence. |
| Open Source Code | Yes | Code https://github.com/GLAM2025/glam |
| Open Datasets | Yes | We evaluate GLAM on a subset of 26 games from Atari 100k (Bellemare et al. 2013), a benchmark that is widely used for testing reinforcement learning algorithms. |
| Dataset Splits | No | GLAM using 100k samples in each game. Considering a skip step of 4 frames, the samples correspond to 400k actual game frames, which is about 1.85 hours (Zhang et al. 2024) of real-time game time. The final results of the agent are quantified using the human-normalized score: SNorm = (SAgent SRandom)/(SHuman SRandom). To evaluate the agent, we perform 20 evaluations for final checkpoints and compute the average scores as the results. This text describes the total samples and evaluation methodology, but not how the dataset was split into training, validation, or test sets for the world model itself. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, etc.) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided in the paper. |
| Experiment Setup | Yes | In this work, we use fixed length l = 16 and s = 4. The specific formula for calculating the variable number of interaction is as follows: nt = min[n0 + Int(t/ nf) ni, nmax], (11) where nt represents the variable number of interaction, t denotes the current training step, nf represents the change frequency of nt, n corresponds to the increment of number at each update and nmax represents the maximum number of interaction during imagined training. For other parts of the agent s training, we refer to the method in STORM. ... We design three different parameters of interaction, nmax = 16, 24, 32, labeled as N 16, N 24, N 32 respectively. The variable parameter nt in training is calculated by Eq. (11), and the parameters n0 = 16 and n = 8 are fixed. |