Universal Features Guided Zero-Shot Category-Level Object Pose Estimation

Authors: Wentian Qu, Chenyu Meng, Heng Li, Jian Cheng, Cuixia Ma, Hongan Wang, Xiao Zhou, Xiaoming Deng, Ping Tan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method outperforms previous methods on the REAL275 and Wild6D benchmarks for unseen categories. Experiments on the REAL275 (Wang et al. 2019b) and Wild6D (Ze and Wang 2022) benchmarks demonstrate that our method establishes robust correspondences based on pretrained 2D/3D universal features, resulting in accurate pose estimation based on coarse-to-fine optimization. We show comparison results on REAL275 and Wild6D in Tab. 1 and Fig. 5. Ablation Results.
Researcher Affiliation Academia 1Institute of Software, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Hong Kong University of Science and Technology 4Aerospace Information Research Institute, Chinese Academy of Sciences
Pseudocode No The paper describes steps for its method in paragraph text and uses mathematical equations, but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code No Project Page https://iscas3dv.github.io/universal6dpose/. This link directs to a project page, not explicitly a code repository for the methodology described in the paper, nor does the text contain an unambiguous statement of code release.
Open Datasets Yes We select Wild6D (Ze and Wang 2022) and REAL275 (Wang et al. 2019a) for category-level object pose estimation. Wild6D provides 5,166 videos across 1,722 different objects with five categories (including bottles, bowls, cameras, laptops and mugs). REAL275 contains six testing scenes with six categories(including bottles, bowls, cameras, cans, laptops and mugs).
Dataset Splits Yes We follow REAL275 (Wang et al. 2019a) and Wild6D (Ze and Wang 2022) to divide the training set and the test set. We adapt the leave-1 strategy for supervised methods, which selects one category as the test set and use the remaining categories to train the model. We conduct leave-1 experiments for each category and finally take the average of them. We directly use the official pre-trained models to conduct the leave-p experiments for self-supervised methods, as they have the property of persubject-per-train.
Hardware Specification Yes We test our method on a single Ge Force RTX 4090, costing 11.7 GB memory in coarse pose estimation stage and costing 5.5 GB memory in pose refinement stage for each instance.
Software Dependencies No The paper mentions using 'Adam (Kingma and Ba 2014) as the optimizer' and 'Pytorch3D (Ravi et al. 2020)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We set αD1, αD2, αSD to 0, 0.7, 0.3 in Eq.1, set the αm, αg, αc to 1, 1 and 0.1 respectively. We iteratively update the object pose 2 times to obtain the coarse object pose and run RANSAC up to 1,000 times for each iteration to handle outliers. In the pose refinement stage, we use Adam (Kingma and Ba 2014) as the optimizer to minimize the loss function.