reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributional Successor Features Enable Zero-Shot Policy Optimization

Authors: Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a practical instantiation of Di SPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems. Videos and code are available at https://weirdlabuw.github.io/dispo/.
Researcher Affiliation	Academia	Chuning Zhu University of Washington EMAIL Xinqi Wang University of Washington EMAIL Tyler Han University of Washington EMAIL Simon Shaolei Du University of Washington EMAIL Abhishek Gupta University of Washington EMAIL
Pseudocode	Yes	Appendix F Algorithm Pseudocode
Open Source Code	Yes	Videos and code are available at https://weirdlabuw.github.io/dispo/.
Open Datasets	Yes	We use the D4RL dataset for pretraining and dense rewards described in Appendix D for adaptation. ... We use the offline dataset from [9] for pretraining and shaped rewards for adaptation. ... D4RL: Datasets for deep data-driven reinforcement learning. https://arxiv.org/abs/2004.07219, 2020.
Dataset Splits	No	The paper explicitly mentions using 'offline dataset for pretraining' and adapting to 'test-time rewards', but it does not specify a distinct validation set or its split ratios/counts for hyperparameter tuning or model selection during its experimental setup.
Hardware Specification	Yes	Each experiment (pretraining + adaptation) takes 3 hours on a single Nvidia L40 GPU.
Software Dependencies	No	The paper mentions using 'Adam W optimizer [35]' and 'conditional DDIMs [47]' but does not provide specific version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup	Yes	We set d = 128 for all of our experiments. ... The noise prediction network is implemented as a 1-D Unet with down dimensions [256, 512, 1024]. ... We train our models on the offline dataset for 100,000 gradient steps using the Adam W optimizer [35] with batch size 2048. The learning rate for the outcome model and the policy are set to 3e 4 and adjusted according to a cosine learning rate schedule with 500 warmup steps.