reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Learning to Imitate from a Single Video Demonstration

Authors: Glen Berseth, Florian Golemo, Christopher Pal

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and quadruped and humanoid agents in 3D. We show that our method outperforms current state-of-the-art techniques and can learn to imitate behaviours from a single video demonstration. ... We perform experiments for multiple simulated robots in both 2D and 3D, including simulations for training quadruped robots and a humanoid with 38 degrees of freedom (DoF).
Researcher Affiliation	Collaboration	Glen Berseth Universit e de Montr eal, Mila Quebec AI Institute, Institut Courtois, and Canada CIFAR AI Chair EMAIL Florian Golemo Mila Quebec AI Institute EMAIL Christopher Pal Polytechnique Montr eal, Mila Quebec AI Institute, Service Now Research, and Canada CIFAR AI Chair EMAIL
Pseudocode	Yes	Algorithm 1 VIRL
Open Source Code	No	Additionally, the dog, raptor, zombie walk, run and jumping policy can be found on the project website: https://sites.google.com/view/virl1. ... The resulting behaviours learned in simulation are available at: https://sites.google.com/view/virl1. (The paper only mentions that videos of learned behaviors and policies are available on the project website, not the source code itself.)
Open Datasets	Yes	We are using the mocap data from the CMU Graphics Lab Motion Capture Database from 2002 (http://mocap.cs.cmu.edu/).
Dataset Splits	No	We collect 2048 samples between training rounds. The batch size for TRPO is 2048. ... The data used to train the Siamese network is a combination of observation trajectories O = o0, . . . , o T generated from simulating the agent in the environment and the demonstration. (The paper describes data collection for RL training and for training the Siamese network, but it does not specify explicit training, validation, or test splits for any dataset.)
Hardware Specification	Yes	It takes 5–7 days to train each policy in these results on a 16 core machine with an Nvidia GTX1080 GPU.
Software Dependencies	No	We train the agent s policy using the trust-region policy optimization (TRPO) algorithm (Schulman et al., 2015). ... The image encoder convnet is φ, the image decoder ψ, the recurrent encoder ω, and the recurrent decoder ρ. The weights for λ are found by empirically evaluating VIRL over all environments from section 5. Additional details on the hyperparameter search can be found in subsection 7.8. (The paper mentions the TRPO algorithm and uses neural network components, but does not specify software dependencies with version numbers like Python, PyTorch, TensorFlow, or specific library versions.)
Experiment Setup	Yes	Where the relative weights of the diﬀerent terms are λ1:4 = {0.7, 0.1, 0.1, 0.1}... The batch size for TRPO is 2048. The kl term is 0.5. ... The margin ρ is set to 1... We normalize the distance metric outputs using r = exp(r2 wd) where wd = 5.0 scales the ﬁltering width.