Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Authors: Alexey Bochkovskiy, Amaƫl Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan Richter, Vladlen Koltun

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. 4 EXPERIMENTS This section summarizes the key results. Additional details and experiments are reported in the appendices, including details on datasets, hyperparameters, experimental protocols, and the comparison of runtimes, which is summarized in Fig. 2.
Researcher Affiliation Industry We release code & weights at https://github.com/apple/ml-depth-pro
Pseudocode No The paper describes methods and objectives in prose and mathematical notation within Section 3 'METHOD' but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release code & weights at https://github.com/apple/ml-depth-pro
Open Datasets Yes Table 15: Datasets used in this work. Dataset URL License Usage 3D Ken Burns (Niklaus et al., 2019) https://github.com/sniklaus/3d-ken-burns CC-BY-NC-SA 4.0 Train AM-2K (Li et al., 2022a) https://github.com/Jizhizi Li/GFM Custom Testing Apolloscape (Huang et al., 2020) https://apolloscape.auto/ Custom Val ARKit Scenes (Dehghan et al., 2021) https://github.com/apple/ARKit Scenes Custom Train
Dataset Splits Yes To do this, we test generalization accuracy by training on a train split of some datasets and testing on a val or test split of other datasets, following the Stage 1 protocol for all models in accordance with Tab. 16 and Tab. 17. Table 18: Dataset evaluation setup.For each metric depth dataset in our evaluation, we report the range of valid depth values, number of samples, and resolution of ground truth depth maps. Due to the large size of the validation set (approximately 35K samples), we used a randomly sampled subset of Nu Scenes.
Hardware Specification Yes producing a 2.25-megapixel depth map in 0.3 seconds on a V100 GPU. Table 5: Model performance, measured on a V100-32G GPU.
Software Dependencies No The paper mentions the use of 'TIMM library (Wightman, 2019)' and 'fvcore library (Facebook Research, 2022)' but does not provide specific version numbers for these or other software components like Python, PyTorch, or CUDA.
Experiment Setup Yes C . 4 T R A I N I N G H Y P E R PA R A M E T E R S We specify the training hyperparameters in Tab. 16 and Tab. 17. Table 16: Training hyperparameters. Epochs 250 Epoch length 72000 Schedule 1 % warmup, 80 % constant LR, 19 % 0.1 LR LR for Encoder 1.28e-5 LR for Decoder 1.28e-4 Batch size 128 Optimizer Adam Weight decay 0 Clip gradient norm 0.2 Pretrained Layer Norm Frozen Random color change probability 75 % Random blur probability 30 % Center crop probability for FOV-augmentation 50 % Metric depth normalization CSTM-label (Yin et2023) Number of channels for Decoder 256 Resolution 1536 1536 Depth Pro model structure: Image-Encoder resolution 384 384 Patch-Encoder resolution 384 384 Number of 384 384 patches in Depth Pro 35 Intersection of 384 384 patches in Depth Pro 25 % Table 17: Training loss functions for different datasets and stages.