Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

When Should We Prefer State-to-Visual DAgger over Visual Reinforcement Learning?

Authors: Tongzhou Mu, Zhaoyang Li, Stanisล‚aw Wiktor Strzelecki, Xiu Yuan, Yunchao Yao, Litian Liang, Hao Su

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This study conducts an empirical comparison of Stateto-Visual DAgger a two-stage framework that initially trains a state policy before adopting online imitation to learn a visual policy and Visual RL across a diverse set of tasks. We evaluate both methods across 16 tasks from three benchmarks, focusing on their asymptotic performance, sample efficiency, and computational costs.
Researcher Affiliation Academia University of California San Diego EMAIL
Pseudocode No For a detailed description of our State-to-Visual DAgger implementation algorithm and related details, please refer to the Appendix C in the extended version of our paper.
Open Source Code Yes Code https://github.com/tongzhoumu/s2v-dagger
Open Datasets Yes We selected 16 tasks from three benchmarks: Mani Skill (Gu et al. 2023), DMControl (Tassa et al. 2018), and Adroit (Rajeswaran et al. 2017).
Dataset Splits No The paper mentions using Mani Skill, DMControl, and Adroit benchmarks and for DMControl states 'following standard protocols (Kostrikov, Yarats, and Fergus 2020; Laskin et al. 2020; Hafner et al. 2019c)', but does not provide specific details on the dataset splits (e.g., percentages, sample counts) used in this paper for training, validation, or testing.
Hardware Specification Yes Our hardware setting: 32 CPU cores (Intel Xeon 2.1GHz) and 1 GPU (NVIDIA-Ge Force-RTX-2080-Ti with 11GB).
Software Dependencies No The paper mentions specific algorithms like Soft Actor-Critic (SAC) and Asymmetric Actor Critic, but does not provide version numbers for any software libraries, frameworks, or environments used (e.g., PyTorch, TensorFlow, Gym, MuJoCo).
Experiment Setup No The paper describes high-level training procedures, such as stopping Stage 1 upon convergence and using early-stopping with a predefined imitation loss threshold for Stage 2, and categorizes tasks by difficulty based on environment steps (4 million). However, it lacks specific numerical hyperparameters such as learning rates, batch sizes, specific optimizers, or detailed network architectures, which are crucial for a reproducible experimental setup.