reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Knowledge Retention in Continual Model-Based Reinforcement Learning

Authors: Haotian Fu, Yixiang Sun, Michael Littman, George Konidaris

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios. We evaluated DRAGO on three continual learning domains. In Figure 4, we also visualize the prediction accuracy of the learned world models across the whole gridworld... As shown in Figure 5, we find that the proposed method DRAGO achieves the best overall performance compared to all the other approaches across three domains. This section evaluates the essentiality of DRAGO s components. Specifically, we evaluate DRAGO s performance without Synthetic Experience Rehearsal and Regaining Memories Through Exploration (reviewer) separately in four transfer tasks of Cheetah and Mini Grid. As we show in Figure 7...
Researcher Affiliation	Academia	1Brown University. Correspondence to: Haotian Fu <EMAIL>, Yixiang Sun <EMAIL>.
Pseudocode	Yes	Algorithm 1 DRAGO (Training process for each task) Algorithm 2 update learner and reviewer Algorithm 3 update vae Algorithm 4 update transition from synthetic data
Open Source Code	Yes	Code is available at https://github.com/Yixiang Sun/drago.
Open Datasets	Yes	We evaluated the performance of DRAGO in the Mini Grid (Chevalier-Boisvert et al., 2023) domain using a sequence of four tasks... We also evaluated the performance of DRAGO in the Cheetah and Walker domains from the Deepmind Control Suite (Tassa et al., 2018).
Dataset Splits	Yes	Task names in Blue denote the continual training tasks; Task names in Red denote the test tasks. We train and test all the tasks in the order of left to right as in the figure. E.g., we train the cheetah agent in the order of run, jump and backward. And after training on jump, we test on jump2run and jump&run. The evaluation was conducted on four new tasks that require the agent to move between different rooms (e.g., start in room 1 and move to the goal position in room 2). The evaluation was conducted on several new tasks that require the agent to quickly change to different locomotion modes from another mode (jump, run, etc.).
Hardware Specification	No	This work was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.
Software Dependencies	No	We implement DRAGO on top of TDMPC (Hansen et al., 2022) and the overall algorithm is described in Algorithm 1.
Experiment Setup	Yes	Table 2. Here we list the hyperparameters used for Mini Grid World, DM-Control cheetah, and DM-Control walker. Unlisted hyperparameters are all identical to the default parameters in TD-MPC. action repeat 1, 4, 2 discount factor 0.99 batch size 512 maximum steps 100, 1000, 1000 planning horizon 10, (25, 15), 15 policy fraction 0.05 temperature 0.5 momentum 0.1 reward loss coef 0.5 value coef 0.1 consistency loss coef 2 vae recon loss coef 1 vae kl loss coef 0.02 temporal loss discount (ρ) 0.5 learning rate 1e-3 sampling technique PER(0.6, 0.4) target networks update freq 40, 2, 2 temperature (τ) 0.01 cost coef for reviewer reward (α) 0.5 vae latent dim 64, 256, 256 vae encoding dim 128 mlp latent dim 512 gumble softmax temp 1.0 steps per synthetic data rehearsal 10, 20