6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Authors: Georgy Ponimatkin, Martin Cífka, Tomas Soucek, Médéric Fourmy, Yann Labbé, Vladimir Petrik, Josef Sivic
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Third, we thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories. We demonstrate significant improvements over existing state-of-the-art RGB 6D pose estimation methods. |
| Researcher Affiliation | Collaboration | 1Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague 2 Faculty of Electrical Engineering, Czech Technical University in Prague 3 H Company |
| Pseudocode | No | The paper describes methods using natural language and mathematical equations (e.g., Equation 1, 3), but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using publicly available third-party software (e.g., example-robot-data package, Pinocchio library), but does not provide specific access information or an explicit statement about releasing its own implementation code. |
| Open Datasets | Yes | We thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories. |
| Dataset Splits | No | The paper mentions evaluating on YCB-V and HOPE-Video datasets following a "BOP-inspired protocol," which implies standard splits for these benchmarks. However, it does not explicitly state the specific dataset splits used for reproducibility, nor does it provide split details for the "new dataset of instructional videos" it introduced. |
| Hardware Specification | Yes | All experiments were carried out on a cluster featuring nodes with 8x NVIDIA A100 (40 GB of VRAM per GPU), 2x 64-core AMD EPYC 7763 CPU, 1024 GB RAM. |
| Software Dependencies | No | The paper mentions software like the Pinocchio library (Carpentier et al., 2019) and the Aligator trajectory optimization package (Jallet et al., 2023), but it does not specify version numbers for these or any other key software components used in the experiments. |
| Experiment Setup | No | The paper describes the overall methodology and some parameter choices like "M = 600 views" for rendering and "30 frames" for retrieval and scale estimation. It also presents an optimization problem with weights (wd, wq, wτ) in Equation (3). However, it does not provide the specific numerical values for these optimization weights or other typical hyperparameters (e.g., learning rates, batch sizes, number of iterations) for the experimental setup in the main text. |