6D Object Pose Tracking in Internet Videos for Robotic Manipulation

Authors: Georgy Ponimatkin, Martin Cífka, Tomas Soucek, Médéric Fourmy, Yann Labbé, Vladimir Petrik, Josef Sivic

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Third, we thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories. We demonstrate significant improvements over existing state-of-the-art RGB 6D pose estimation methods.
Researcher Affiliation Collaboration 1Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague 2 Faculty of Electrical Engineering, Czech Technical University in Prague 3 H Company
Pseudocode No The paper describes methods using natural language and mathematical equations (e.g., Equation 1, 3), but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using publicly available third-party software (e.g., example-robot-data package, Pinocchio library), but does not provide specific access information or an explicit statement about releasing its own implementation code.
Open Datasets Yes We thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories.
Dataset Splits No The paper mentions evaluating on YCB-V and HOPE-Video datasets following a "BOP-inspired protocol," which implies standard splits for these benchmarks. However, it does not explicitly state the specific dataset splits used for reproducibility, nor does it provide split details for the "new dataset of instructional videos" it introduced.
Hardware Specification Yes All experiments were carried out on a cluster featuring nodes with 8x NVIDIA A100 (40 GB of VRAM per GPU), 2x 64-core AMD EPYC 7763 CPU, 1024 GB RAM.
Software Dependencies No The paper mentions software like the Pinocchio library (Carpentier et al., 2019) and the Aligator trajectory optimization package (Jallet et al., 2023), but it does not specify version numbers for these or any other key software components used in the experiments.
Experiment Setup No The paper describes the overall methodology and some parameter choices like "M = 600 views" for rendering and "30 frames" for retrieval and scale estimation. It also presents an optimization problem with weights (wd, wq, wτ) in Equation (3). However, it does not provide the specific numerical values for these optimization weights or other typical hyperparameters (e.g., learning rates, batch sizes, number of iterations) for the experimental setup in the main text.