reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

Authors: Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of both the in-context and in-weights models by showing sustained generational performance gains on several tasks requiring exploration under partial observability. On each task, we find that accumulating agents outperform those that learn for a single lifetime of the same total experience budget.
Researcher Affiliation	Collaboration	Jonathan Cook FLAIR, University of Oxford EMAIL Chris Lu FLAIR, University of Oxford EMAIL Edward Hughes Google Deep Mind EMAIL Joel Z. Leibo Google Deep Mind EMAIL Jakob Foerster FLAIR, University of Oxford EMAIL
Pseudocode	Yes	Algorithm 1 Training Loop for In-Context Accumulation (changes to RL2 in red); Algorithm 2 In-Context Accumulation During Evaluation
Open Source Code	Yes	Code can be found at https://github.com/FLAIROx/cultural-accumulation.
Open Datasets	No	The paper introduces custom environments (Memory Sequence, Goal Sequence, Travelling Salesperson) which are released as part of their open-source code, but does not provide concrete access information for a pre-existing, static public dataset.
Dataset Splits	No	The paper describes training and testing on environment instances but does not explicitly provide details about a separate validation dataset split.
Hardware Specification	Yes	Memory Sequence and TSP experiments were run on a single NVIDIA RTX A40 GPU (40GB memory)... Training of in-context learners in Goal Sequence was run in under 8 minutes on 4 A40s... In-weights accumulation in Goal Sequence was run in 30 minutes on 4 A40s.
Software Dependencies	No	The paper mentions software components like 'Pure Jax RL codebase', 'PPO', and 'S5' but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Appendix F Hyperparameters: population size, learning rate, batch size, rollout length, update epochs, minibatches, γ, λGAE, ϵ clip, entropy coefficient, value coefficient, max gradient norm, anneal learning rate are specified for Memory Sequence, TSP, and Goal Sequence experiments.