reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Authors: Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. ... Our trained agent exhibits strong physical reasoning capabilities in 2D space, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent tabula rasa. This includes solving some environments that standard RL training completely fails at.
Researcher Affiliation	Academia	Michael Matthews Michael Beukman Chris Lu Jakob Foerster FLAIR, University of Oxford
Pseudocode	Yes	Algorithm 1 Jax2D main engine loop. 1: while true do 2: Apply gravity 3: Calculate collision manifolds (Appendices A.3.1, A.3.2, A.3.3 and A.3.4) 4: Apply motors (Appendix A.5) 5: Apply thrusters (Appendix A.6) 6: if warm starting then 7: Apply warm starting collision impulses (Appendix A.7) 8: Apply warm starting joint impulses (Appendix A.7) 9: end if 10: for i = 1 to num solver steps do 11: Apply joint constraints (Appendices A.2 and A.4) 12: Apply collision constraints (Appendices A.2 and A.3.5) 13: end for 14: Euler step position and rotation 15: end while
Open Source Code	Yes	1We provide full code and models at https://kinetix-env.github.io. 2https://github.com/Michael TMatthews/Jax2D
Open Datasets	Yes	We provide the capability to sample random levels from the vast space of possible physics tasks, as well as providing a large set of 74 interpretable handmade levels.
Dataset Splits	Yes	We train on programatically generated Kinetix levels drawn from the statically defined distribution. We refer to training on sampled levels from this distribution as DR. Our main metric of assessment is the solve rate on the set of handmade holdout levels. The agent does not train on these levels but they do exist inside the support of the training distribution.
Hardware Specification	Yes	For all comparisons we use a single NVIDIA L40S GPU, on a server with two AMD EPYC 9554 64-Core CPUs.
Software Dependencies	No	The paper mentions software like JAX (Bradbury et al., 2018) and Pure Jax RL-style training (Lu et al., 2022), and algorithms like PPO (Schulman et al., 2017). However, it does not provide specific version numbers for these or any other software dependencies crucial for replication.
Experiment Setup	Yes	Hyperparameters are detailed in Appendix H. Table 7: Learning Hyperparameters. Parameter Value Env Frame Skip 2 PPO γ 0.995 λGAE 0.9 PPO number of steps 256 PPO epochs 8 PPO minibatches per epoch 32 PPO clip range 0.02 PPO # parallel environments 2048 Adam learning rate 5e-5 Anneal LR no PPO max gradient norm 0.5 PPO value clipping yes return normalisation no value loss coefficient 0.5 entropy coefficient 0.01 Model Fully-connected dimension size 128 Fully-connected layers 5 Transformer layers 2 Transformer Encoder Size 128 Transformer Size 16 Number of heads 8 SFL Batch Size N 12288 Rollout Length L 512 Update Period T 128 Buffer Size K 1024 Sample Ratio ρ 0.5