Uncertainty-Based Experience Replay for Task-Agnostic Continual Reinforcement Learning
Authors: Adrian Remonda, Cole Corbitt Terrell, Eduardo E. Veas, Marc Masana
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the combination of the proposed strategies leads to reduced training times, smaller replay buffer size, and less catastrophic forgetting, all while maintaining performance. [...] Evaluation of generalization and catastrophic forgetting in a continual learning setting. |
| Researcher Affiliation | Collaboration | Adrian Remonda EMAIL Graz University of Technology and Know-Center GmbH; Cole Terrell EMAIL Graz University of Technology and Know-Center GmbH; Eduardo Veas EMAIL Graz University of Technology and Know-Center GmbH; Marc Masana EMAIL Graz University of Technology and SAL Dependable Embedded Systems |
| Pseudocode | Yes | Algorithm 1 MBRL; Algorithm 2 Get Optimal Trajectory Planning; Algorithm 3 UBER; Algorithm 4 Get Uncertainty Score |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate the methods in the Cart Pole and Reacher environment provided by the Mu Jo Co (Todorov et al., 2012) physics engine. Additionally, we introduce our own proposed environments related to racing, including Masspoint and a Non-linear Bicycle model. [...] We also included an extended version of the Masspoint environment proposed by Thananjeyan et al. (2020). |
| Dataset Splits | Yes | Each task is trained for 30 episodes in each task and then tested in the test tasks for a single episode. [...] The models are trained on tasks T1 to T14. After completing each training task, the model is tested across all tasks encountered up to that point. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions MuJoCo as a physics engine but does not specify a version number. It also includes pseudocode with parameters but no versioned software dependencies. |
| Experiment Setup | Yes | Table 2: Hyperparameters used for UBER implementation. Look-Ahead 1 1 1, β 0.005 0.004 1.5, Training episodes 100 100 30/task, CEM population 400 400 400, CEM # elites 40 40 40, CEM # iterations 5 5 5, CEM α 0.1 0.1 0.1, MPD 1 10 1 |