reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation

Authors: Jingbo Sun, Songjun Tu, Qichao Zhang, Haoran Li, Xin Liu, Yaran Chen, Ke Chen, Dongbin Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, DVFB demonstrates both superior zero-shot generalization (outperforming on all 12 tasks) and fine-tuning adaptation (leading on 10 out of 12 tasks) abilities, surpassing state-of-the-art (SOTA) URL methods. Our code is available at https://github.com/bofusun/DVFB.
Researcher Affiliation	Academia	1Institute of Automation, Chinese Academy of Sciences, 2Pengcheng Laboratory 3University of Chinese Academy of Sciences, 4Xi an Jiaotong-Liverpool University EMAIL EMAIL EMAIL, EMAIL
Pseudocode	Yes	We provide the complete pseudocode for DVFB, with the unsupervised pre-training phase described in Algorithm 1 and the downstream task fine-tuning phase described in Algorithm 2. Algorithm 1 DVFB Algorithm: Unsupervised Pre-training Phase Algorithm 2 DVFB Algorithm: Downstream Fine-tuning Phase Algorithm 3 Reward Mapping Mechanism
Open Source Code	Yes	Our code is available at https://github.com/bofusun/DVFB.
Open Datasets	Yes	Following the latest advancements (Yang et al., 2023; Bai et al., 2024), we evaluate task generalization performance using 12 downstream tasks across 3 domains in URLB (Laskin et al., 2021) and Deep Mind Control Suite (DMC) (Tassa et al., 2018).
Dataset Splits	No	The paper describes interaction steps for pre-training (2 million steps) and skill inference (10,000 steps) in online reinforcement learning environments. This defines how data is generated and used in different phases, but it does not provide specific train/test/validation splits for a fixed, pre-existing dataset, as would typically be described in supervised learning contexts.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions 'RL backbone algorithm DDPG' in Table 4, but it does not provide specific version numbers for DDPG or any other software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Table 4: Hyper-parameter settings. This table lists various hyperparameters including Pre-training frames, Finetuning frames, Zero-shot selection frames, RL replay buffer size, Batch size, Optimizer, Learning rate, network architectures, and various coefficients like α, β, and η.