reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Generalization with Approximate Factored Value Functions

Authors: Shagun Sodhani, Sergey Levine, Amy Zhang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify the eﬀectiveness of our approach in terms of faster training (better sample complexity) and robust zero-shot transfer (better generalization) on the Proc Gen benchmark and the Mini Grid environments.
Researcher Affiliation	Collaboration	Shagun Sodhani EMAIL Meta AI Sergey Levine EMAIL University of California, Berkeley Amy Zhang EMAIL Meta AI UT Austin
Pseudocode	Yes	Algorithm 1 AFa R algorithm.
Open Source Code	No	The paper does not explicitly state that the source code for AFaR is being released or provide a direct link to a code repository for AFaR. It mentions a project website and lists implementations for baseline algorithms used (RIDE, Dr AC, IDAAC), but not for the proposed AFaR method itself.
Open Datasets	Yes	We use the Procgen benchmark (Cobbe et al., 2020) and the Mini Grid environments (Chevalier-Boisvert et al., 2018) to evaluate the eﬀectiveness of the proposed AFa R algorithm.
Dataset Splits	Yes	Following the setup in Raileanu et al. (2020), we train the agent on a ﬁxed set of 200 levels while testing on the full distribution of levels. In practice, this is simulated by sampling levels at random during evaluation. ... We also perform an ablation where we train the systems using just 10 levels (instead of 200 levels)... In the ﬁrst case, we train and evaluate the agents on a given environment. ... In the second case, we train the agent on one environment and evaluate it on a diﬀerent environment in a zero-shot manner.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. It mentions that more details about the experimental setup can be found in Appendix E, but Appendix E does not contain hardware specifications.
Software Dependencies	No	The paper lists several open-source libraries in Appendix C.1 (Py Torch, Hydra, Numpy, Pandas, RIDE Implementation, Dr AC Implementation, IDAAC Implementation) but does not specify version numbers for these dependencies, which is required for a reproducible description.
Experiment Setup	No	The main text of the paper states: "More details about our experimental setup and hyperparameters can be found in Appendix E." However, the specific hyperparameter values and training configurations, such as learning rates or batch sizes, are not provided within the main body of the paper, but are instead relegated to tables in the appendices.