Uncertainty-Based Experience Replay for Task-Agnostic Continual Reinforcement Learning

Authors: Adrian Remonda, Cole Corbitt Terrell, Eduardo E. Veas, Marc Masana

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the combination of the proposed strategies leads to reduced training times, smaller replay buffer size, and less catastrophic forgetting, all while maintaining performance. [...] Evaluation of generalization and catastrophic forgetting in a continual learning setting.
Researcher Affiliation Collaboration Adrian Remonda EMAIL Graz University of Technology and Know-Center GmbH; Cole Terrell EMAIL Graz University of Technology and Know-Center GmbH; Eduardo Veas EMAIL Graz University of Technology and Know-Center GmbH; Marc Masana EMAIL Graz University of Technology and SAL Dependable Embedded Systems
Pseudocode Yes Algorithm 1 MBRL; Algorithm 2 Get Optimal Trajectory Planning; Algorithm 3 UBER; Algorithm 4 Get Uncertainty Score
Open Source Code No The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets Yes We evaluate the methods in the Cart Pole and Reacher environment provided by the Mu Jo Co (Todorov et al., 2012) physics engine. Additionally, we introduce our own proposed environments related to racing, including Masspoint and a Non-linear Bicycle model. [...] We also included an extended version of the Masspoint environment proposed by Thananjeyan et al. (2020).
Dataset Splits Yes Each task is trained for 30 episodes in each task and then tested in the test tasks for a single episode. [...] The models are trained on tasks T1 to T14. After completing each training task, the model is tested across all tasks encountered up to that point.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions MuJoCo as a physics engine but does not specify a version number. It also includes pseudocode with parameters but no versioned software dependencies.
Experiment Setup Yes Table 2: Hyperparameters used for UBER implementation. Look-Ahead 1 1 1, β 0.005 0.004 1.5, Training episodes 100 100 30/task, CEM population 400 400 400, CEM # elites 40 40 40, CEM # iterations 5 5 5, CEM α 0.1 0.1 0.1, MPD 1 10 1