LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models

Authors: Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic, Marco Pavone, Yue Wang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our method on more than 160 scenes from the Replica, TUM and Waymo Open datasets, achieving up to 88% performance improvement on 3D reconstruction, multi-view pose estimation and novel-view rendering.
Researcher Affiliation Collaboration Ziqi Lu1,2, Heng Yang1,3, Danfei Xu1,4, Boyi Li1,5, Boris Ivanovic1, Marco Pavone1,6, Yue Wang1,7 1NVIDIA Research, 2Massachusetts Institute of Technology, 3Harvard University, 4Georgia Institute of Technology, 5University of California, Berkeley, 6Stanford University, 7University of Southern California EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical formulations (e.g., equations 1-15) and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No For more details, please visit our project page. This statement directs readers to a project page for more details but does not explicitly confirm the release of source code or provide a direct link to a code repository within the paper itself. The reproducibility statement also does not mention code being released.
Open Datasets Yes We tested our method on all available test scenes from the Replica (Straub et al., 2019) and Waymo Open Dataset (Sun et al., 2020), as well as on three test scenes from the TUM RGBD dataset (Schubert et al., 2018) that are most frequently tested in literature.
Dataset Splits Yes For each scene, the first 1000 RGB images serve as the calibration split and the remaining as the test split. ... We randomly sample2 10 images from the calibration split as the calibration images. ... In each segment, only forward-looking camera images are adopted, where the first 100 form the calibration split and the remaining 100 images belong to the test split. We sample 10 images from the calibration split for self-calibration. ... In each scene, the first 500 RGB images are reserved as the calibration split, and the remaining 92/2897/2015 images form the test split.
Hardware Specification Yes Our pipeline is implemented with Py Torch and all our experiments are conducted on a NVIDIA 3090 GPU.
Software Dependencies No Our pipeline is implemented with Py Torch. The paper mentions the use of PyTorch but does not provide a specific version number.
Experiment Setup Yes For robust global point map alignment, we set the regularization coefficient ยต to 0.01. We minimize the optimization loss by running 300 steps of gradient descent using the Adam optimizer with a learning rate of 0.01, applying the closed-form weight update Eq. 8 every 10th gradient descent step. Additionally, we exclude points with prediction confidence below 0.5 by setting their weights to zero, preventing them from participating in the optimization process. For confidence-based pseudo labeling, we use a cofidence threshold of 1.5 for all test scenes. For Lo RA fine-tuning, we resize all calibration images to the pre-training resolution of (512, 384). During fine-tuning, we optimize the Lo RA weights over 10 epochs (without warmup) using the Adam W optimizer with a batch size of 2. A cosine decay learning rate scheduler is employed, with a base learning rate of 0.001 and a minimum learning rate of 0.00001 for most test cases.