reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continual Reinforcement Learning by Planning with Online World Models

Authors: Zichen Liu, Guoji Fu, Chao Du, Wee Sun Lee, Min Lin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To assess OA, we further design Continual Bench, a dedicated environment for CRL, and compare with several strong baselines under the same model-planning algorithmic framework. The empirical results show that OA learns continuously to solve new tasks while not forgetting old skills, outperforming agents built on deep world models with various continual learning techniques.
Researcher Affiliation	Collaboration	1Sea AI Lab 2National University of Singapore. Correspondence to: Min Lin <EMAIL>.
Pseudocode	Yes	Figure 7: OA learning and acting loop. Require: zero-initialized agent memories A(0), B(0) and world model weights W (0); sparse encoder ϕ : RS+A 7 RD; initial index s = [D]; planner CEM; sequence of tasks (Rτ) τ=1; initial state s1 ρ0; time step t = 1. 1: loop OA runs forever, updates per step 2: if task changes then 3: µt init values RA H 4: else 5: µt shifted µt 1 (fill the last column with init values) 6: end if 7: W (t) s (A(t 1) ss + 1λI) 1(B(t 1) s A(t 1) ss W (t 1)s ) 8: at, µt+1 CEM(st, W (t), µt, Rτ) Appendix A.4 for details 9: st+1 environment(st, at) 10: xt [st, at]; yt st+1 st 11: s nonzero_index(ϕ(xt)) 12: A(t) ss A(t 1) ss + ϕs(xt)ϕs(xt) 13: B(t) s B(t 1) s + ϕs(xt)y t 14: t t + 1 15: end loop
Open Source Code	Yes	We open source the code of Continual Bench1 and hope this realistic but lightweight environment can accelerate the progress of CRL research. Please see Appendix B for detailed environment specifications. 1https://github.com/sail-sg/Continual Bench
Open Datasets	Yes	In this work, we propose a lightweight but realistic benchmark environment, Continual Bench, that has a consistent state space and takes both forgetting and transfer into consideration. We open source the code of Continual Bench1 and hope this realistic but lightweight environment can accelerate the progress of CRL research. 1https://github.com/sail-sg/Continual Bench
Dataset Splits	No	The paper describes a continual reinforcement learning setting where an agent continuously solves a sequence of tasks. It mentions a fixed order of 6 tasks: (pick-place, button-press, door-open, peg-unplug, window-close, faucet-close). However, it does not specify traditional dataset splits (e.g., train/test/validation percentages or counts) for a static dataset, which is common in supervised learning. Instead, the tasks are presented sequentially for continuous learning and evaluation.
Hardware Specification	Yes	All experiments in the paper are run on the internal cluster, with each job consuming one A100 GPU and 16 CPUs.
Software Dependencies	Yes	We develop our agent and the baselines using the MBRL Library2 (Pineda et al., 2021) by Meta (MIT license, v0.2.0). 2https://github.com/facebookresearch/mbrl-lib
Experiment Setup	Yes	To build the sparse world model, we use 300 2-dimensional Losse features with Λ = 9. We find 1/λ = 0.005 as a good regularization strength without any tuning. For the planner, we use a candidate sampling size of N = 150 with planning horizon H = 15 and K = 3 iterations. The ratio of elite candidates is 0.1. In all above baselines, we use a 4-layer MLP with hidden size 200 and ReLU activation for world modeling. We use the Adam optimizer (Kingma & Ba, 2015) with learning rate of 4e-4 and minibatch size of 256. ... the deep models are trained until convergence over all available data (stopped when the validation loss on 5% holdout data does not improve over 5 consecutive epochs).