Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Authors: Alexey Bochkovskiy, Amaƫl Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan Richter, Vladlen Koltun
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions. 4 EXPERIMENTS This section summarizes the key results. Additional details and experiments are reported in the appendices, including details on datasets, hyperparameters, experimental protocols, and the comparison of runtimes, which is summarized in Fig. 2. |
| Researcher Affiliation | Industry | We release code & weights at https://github.com/apple/ml-depth-pro |
| Pseudocode | No | The paper describes methods and objectives in prose and mathematical notation within Section 3 'METHOD' but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release code & weights at https://github.com/apple/ml-depth-pro |
| Open Datasets | Yes | Table 15: Datasets used in this work. Dataset URL License Usage 3D Ken Burns (Niklaus et al., 2019) https://github.com/sniklaus/3d-ken-burns CC-BY-NC-SA 4.0 Train AM-2K (Li et al., 2022a) https://github.com/Jizhizi Li/GFM Custom Testing Apolloscape (Huang et al., 2020) https://apolloscape.auto/ Custom Val ARKit Scenes (Dehghan et al., 2021) https://github.com/apple/ARKit Scenes Custom Train |
| Dataset Splits | Yes | To do this, we test generalization accuracy by training on a train split of some datasets and testing on a val or test split of other datasets, following the Stage 1 protocol for all models in accordance with Tab. 16 and Tab. 17. Table 18: Dataset evaluation setup.For each metric depth dataset in our evaluation, we report the range of valid depth values, number of samples, and resolution of ground truth depth maps. Due to the large size of the validation set (approximately 35K samples), we used a randomly sampled subset of Nu Scenes. |
| Hardware Specification | Yes | producing a 2.25-megapixel depth map in 0.3 seconds on a V100 GPU. Table 5: Model performance, measured on a V100-32G GPU. |
| Software Dependencies | No | The paper mentions the use of 'TIMM library (Wightman, 2019)' and 'fvcore library (Facebook Research, 2022)' but does not provide specific version numbers for these or other software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | C . 4 T R A I N I N G H Y P E R PA R A M E T E R S We specify the training hyperparameters in Tab. 16 and Tab. 17. Table 16: Training hyperparameters. Epochs 250 Epoch length 72000 Schedule 1 % warmup, 80 % constant LR, 19 % 0.1 LR LR for Encoder 1.28e-5 LR for Decoder 1.28e-4 Batch size 128 Optimizer Adam Weight decay 0 Clip gradient norm 0.2 Pretrained Layer Norm Frozen Random color change probability 75 % Random blur probability 30 % Center crop probability for FOV-augmentation 50 % Metric depth normalization CSTM-label (Yin et2023) Number of channels for Decoder 256 Resolution 1536 1536 Depth Pro model structure: Image-Encoder resolution 384 384 Patch-Encoder resolution 384 384 Number of 384 384 patches in Depth Pro 35 Intersection of 384 384 patches in Depth Pro 25 % Table 17: Training loss functions for different datasets and stages. |