reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Replay-enhanced Continual Reinforcement Learning

Authors: Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods. We conduct comprehensive experiments on a suite of realistic robotic manipulation tasks from the Continual World benchmark (De Lange et al., 2021).
Researcher Affiliation	Collaboration	Tiantian Zhang EMAIL Tsinghua University; Kevin Z. Shen EMAIL The University of British Columbia; Zichuan Lin EMAIL Tencent; Bo Yuan EMAIL Tsinghua University; Xueqian Wang EMAIL Tsinghua University; Xiu Li EMAIL Tsinghua University; Deheng Ye EMAIL Tencent
Pseudocode	Yes	Algorithm 1 Replay-Enhanced Continu AL r L (RECALL)
Open Source Code	No	The paper does not contain any explicit statement about providing source code nor a link to a code repository for the methodology described.
Open Datasets	Yes	We conduct comprehensive experiments on a suite of realistic robotic manipulation tasks from the Continual World benchmark (De Lange et al., 2021) designed as a testbed for evaluating RL agents with respect to challenges incurred by the continual learning paradigm.
Dataset Splits	No	The paper describes the arrangement of tasks into sequences (e.g., CW3, CW10, CW20) and the training duration for each task (1M steps), followed by evaluation. However, it does not provide explicit training/validation/test splits of the data within each task's dataset, which is common in reinforcement learning where data is collected through interaction rather than pre-split.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions using 'the soft actor-critic (SAC)' algorithm and refers to implementations 'based on (Wołczyk et al., 2021)' and 'from (Wołczyk et al., 2022)'. It also states that 'the maximum entropy coefficient α is tuned automatically according to the adjustment rule provided in (Haarnoja et al., 2018b)'. However, it does not explicitly list specific software dependencies (e.g., Python, PyTorch, CUDA) with their version numbers for the current work.
Experiment Setup	Yes	We use an implementation of the underlying RL algorithm SAC (Haarnoja et al., 2018a;b; Zhou et al., 2022) based on (Wołczyk et al., 2021), in which the maximum entropy coefficient α is tuned automatically according to the adjustment rule provided in (Haarnoja et al., 2018b). We follow exactly the same experimental setup (including network structure and hyperparameters) from (Wołczyk et al., 2022) for all baselines and the common settings for RECALL, ensuring fair comparison. The actor and critic are implemented as two separate MLP networks, each with 4 hidden layers of 256 units... For each task sequence, we search method-specific regularization coefficient λ for policy distillation of RECALL in {0.01, 0.1, 1, 10, 100}, and the final selected value is 10. Replay buffer size is set to be consistent with that in Perfect Memory and batch size is 128. Table 4: Core hyperparameters used for the underlying SAC algorithm. Parameter: optimizer, Value: Adam; Parameter: learning rate, Value: 1e-3; Parameter: batch size, Value: 128; Parameter: discount factor (γ), Value: 0.99; Parameter: target smoothing coefficient (τ), Value: 0.005; Parameter: target update interval, Value: 1; Parameter: target output std (σt), Value: 0.089; Parameter: replay buffer size, Value: 10^6.