Imitation Learning from a Single Temporally Misaligned Video

Authors: William Huey, Huaxiaoyue Wang, Anne Wu, Yoav Artzi, Sanjiban Choudhury

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments showing that the ORCA reward can effectively and efficiently train RL agents to achieve 4.5x improvement (0.11 0.50 average normalized return) for Meta-world tasks and 6.6x improvement (6.55 43.3 average return) for Humanoid-v4 tasks compared to the best frame-level matching approach.
Researcher Affiliation Academia 1Cornell University. Correspondence to: William Huey <EMAIL>, Huaxiaoyue (Yuki) Wang <EMAIL>.
Pseudocode Yes Algorithm 1 ORCA Rewards.
Open Source Code No The project website is at https:// portal-cornell.github.io/orca/
Open Datasets Yes Meta-World (Yu et al., 2020). Following Fu et al. (2024c), we use ten tasks from the Meta-world environment to evaluate the effectiveness of ORCA reward in the robotic manipulation domain. Humanoid. We define four tasks in the Mu Jo Co Humanoid-v4 environment (Todorov et al., 2012) to examine how well ORCA works with precise motion.
Dataset Splits Yes For Meta-world, we follow the RL setup in Fu et al. (2024c). We train Dr Q-v2 (Yarats et al., 2021) with state-based input for 1M steps and evaluate the policy every 10k steps on 10 randomly seeded environments. For the Humanoid environment, we train SAC (Haarnoja et al., 2018) for 2M steps and evaluate the policy every 20k steps on 8 environments.
Hardware Specification No The paper does not explicitly describe any specific hardware components (e.g., GPU, CPU models, memory details) used for running the experiments. It focuses on the software environments and models.
Software Dependencies No The paper mentions several software components and environments such as "Dr Q-v2 (Yarats et al., 2021)", "SAC (Haarnoja et al., 2018)", "Meta-world (Yu et al., 2020)", "Mu Jo Co Humanoid-v4 environment (Todorov et al., 2012)", "Res Net50 (He et al., 2016)", "Image Net-1K (Deng et al., 2009)", "LIV (Ma et al., 2023)", and "DINOv2 (Oquab et al., 2023)". However, it does not provide specific version numbers for any of these software dependencies, programming languages, or libraries.
Experiment Setup Yes Table 3. Training hyperparameters used for experiments on both environments. Parameter Meta-world (Dr Q-v2) Humanoid (SAC) Total environment steps 1,000,000 2,000,000 Learning rate 1e-4 1e-3 Batch size 512 256 Gamma (γ) 0.9 0.99 Learning starts 500 6000 Soft update coefficient 5e-3 5e-3 Actor/Critic architecture (256, 256) (256, 256) Episode length 125 or 175 120