reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When Should We Prefer State-to-Visual DAgger over Visual Reinforcement Learning?

Authors: Tongzhou Mu, Zhaoyang Li, Stanisław Wiktor Strzelecki, Xiu Yuan, Yunchao Yao, Litian Liang, Hao Su

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study conducts an empirical comparison of Stateto-Visual DAgger a two-stage framework that initially trains a state policy before adopting online imitation to learn a visual policy and Visual RL across a diverse set of tasks. We evaluate both methods across 16 tasks from three benchmarks, focusing on their asymptotic performance, sample efficiency, and computational costs.
Researcher Affiliation	Academia	University of California San Diego EMAIL
Pseudocode	No	For a detailed description of our State-to-Visual DAgger implementation algorithm and related details, please refer to the Appendix C in the extended version of our paper.
Open Source Code	Yes	Code https://github.com/tongzhoumu/s2v-dagger
Open Datasets	Yes	We selected 16 tasks from three benchmarks: Mani Skill (Gu et al. 2023), DMControl (Tassa et al. 2018), and Adroit (Rajeswaran et al. 2017).
Dataset Splits	No	The paper mentions using Mani Skill, DMControl, and Adroit benchmarks and for DMControl states 'following standard protocols (Kostrikov, Yarats, and Fergus 2020; Laskin et al. 2020; Hafner et al. 2019c)', but does not provide specific details on the dataset splits (e.g., percentages, sample counts) used in this paper for training, validation, or testing.
Hardware Specification	Yes	Our hardware setting: 32 CPU cores (Intel Xeon 2.1GHz) and 1 GPU (NVIDIA-Ge Force-RTX-2080-Ti with 11GB).
Software Dependencies	No	The paper mentions specific algorithms like Soft Actor-Critic (SAC) and Asymmetric Actor Critic, but does not provide version numbers for any software libraries, frameworks, or environments used (e.g., PyTorch, TensorFlow, Gym, MuJoCo).
Experiment Setup	No	The paper describes high-level training procedures, such as stopping Stage 1 upon convergence and using early-stopping with a predefined imitation loss threshold for Stage 2, and categorizes tasks by difficulty based on environment steps (4 million). However, it lacks specific numerical hyperparameters such as learning rates, batch sizes, specific optimizers, or detailed network architectures, which are crucial for a reproducible experimental setup.