reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Uncertainty-Based Experience Replay for Task-Agnostic Continual Reinforcement Learning

Authors: Adrian Remonda, Cole Corbitt Terrell, Eduardo E. Veas, Marc Masana

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the combination of the proposed strategies leads to reduced training times, smaller replay buffer size, and less catastrophic forgetting, all while maintaining performance. [...] Evaluation of generalization and catastrophic forgetting in a continual learning setting.
Researcher Affiliation	Collaboration	Adrian Remonda EMAIL Graz University of Technology and Know-Center GmbH; Cole Terrell EMAIL Graz University of Technology and Know-Center GmbH; Eduardo Veas EMAIL Graz University of Technology and Know-Center GmbH; Marc Masana EMAIL Graz University of Technology and SAL Dependable Embedded Systems
Pseudocode	Yes	Algorithm 1 MBRL; Algorithm 2 Get Optimal Trajectory Planning; Algorithm 3 UBER; Algorithm 4 Get Uncertainty Score
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets	Yes	We evaluate the methods in the Cart Pole and Reacher environment provided by the Mu Jo Co (Todorov et al., 2012) physics engine. Additionally, we introduce our own proposed environments related to racing, including Masspoint and a Non-linear Bicycle model. [...] We also included an extended version of the Masspoint environment proposed by Thananjeyan et al. (2020).
Dataset Splits	Yes	Each task is trained for 30 episodes in each task and then tested in the test tasks for a single episode. [...] The models are trained on tasks T1 to T14. After completing each training task, the model is tested across all tasks encountered up to that point.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions MuJoCo as a physics engine but does not specify a version number. It also includes pseudocode with parameters but no versioned software dependencies.
Experiment Setup	Yes	Table 2: Hyperparameters used for UBER implementation. Look-Ahead 1 1 1, β 0.005 0.004 1.5, Training episodes 100 100 30/task, CEM population 400 400 400, CEM # elites 40 40 40, CEM # iterations 5 5 5, CEM α 0.1 0.1 0.1, MPD 1 10 1