Continual Reinforcement Learning by Planning with Online World Models
Authors: Zichen Liu, Guoji Fu, Chao Du, Wee Sun Lee, Min Lin
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess OA, we further design Continual Bench, a dedicated environment for CRL, and compare with several strong baselines under the same model-planning algorithmic framework. The empirical results show that OA learns continuously to solve new tasks while not forgetting old skills, outperforming agents built on deep world models with various continual learning techniques. |
| Researcher Affiliation | Collaboration | 1Sea AI Lab 2National University of Singapore. Correspondence to: Min Lin <EMAIL>. |
| Pseudocode | Yes | Figure 7: OA learning and acting loop. Require: zero-initialized agent memories A(0), B(0) and world model weights W (0); sparse encoder ϕ : RS+A 7 RD; initial index s = [D]; planner CEM; sequence of tasks (Rτ) τ=1; initial state s1 ρ0; time step t = 1. 1: loop OA runs forever, updates per step 2: if task changes then 3: µt init values RA H 4: else 5: µt shifted µt 1 (fill the last column with init values) 6: end if 7: W (t) s (A(t 1) ss + 1λI) 1(B(t 1) s A(t 1) ss W (t 1)s ) 8: at, µt+1 CEM(st, W (t), µt, Rτ) Appendix A.4 for details 9: st+1 environment(st, at) 10: xt [st, at]; yt st+1 st 11: s nonzero_index(ϕ(xt)) 12: A(t) ss A(t 1) ss + ϕs(xt)ϕs(xt) 13: B(t) s B(t 1) s + ϕs(xt)y t 14: t t + 1 15: end loop |
| Open Source Code | Yes | We open source the code of Continual Bench1 and hope this realistic but lightweight environment can accelerate the progress of CRL research. Please see Appendix B for detailed environment specifications. 1https://github.com/sail-sg/Continual Bench |
| Open Datasets | Yes | In this work, we propose a lightweight but realistic benchmark environment, Continual Bench, that has a consistent state space and takes both forgetting and transfer into consideration. We open source the code of Continual Bench1 and hope this realistic but lightweight environment can accelerate the progress of CRL research. 1https://github.com/sail-sg/Continual Bench |
| Dataset Splits | No | The paper describes a continual reinforcement learning setting where an agent continuously solves a sequence of tasks. It mentions a fixed order of 6 tasks: (pick-place, button-press, door-open, peg-unplug, window-close, faucet-close). However, it does not specify traditional dataset splits (e.g., train/test/validation percentages or counts) for a static dataset, which is common in supervised learning. Instead, the tasks are presented sequentially for continuous learning and evaluation. |
| Hardware Specification | Yes | All experiments in the paper are run on the internal cluster, with each job consuming one A100 GPU and 16 CPUs. |
| Software Dependencies | Yes | We develop our agent and the baselines using the MBRL Library2 (Pineda et al., 2021) by Meta (MIT license, v0.2.0). 2https://github.com/facebookresearch/mbrl-lib |
| Experiment Setup | Yes | To build the sparse world model, we use 300 2-dimensional Losse features with Λ = 9. We find 1/λ = 0.005 as a good regularization strength without any tuning. For the planner, we use a candidate sampling size of N = 150 with planning horizon H = 15 and K = 3 iterations. The ratio of elite candidates is 0.1. In all above baselines, we use a 4-layer MLP with hidden size 200 and ReLU activation for world modeling. We use the Adam optimizer (Kingma & Ba, 2015) with learning rate of 4e-4 and minibatch size of 256. ... the deep models are trained until convergence over all available data (stopped when the validation loss on 5% holdout data does not improve over 5 consecutive epochs). |