reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Authors: Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train our depth estimation model on two synthetic datasets, Hypersim (Roberts et al. 2021) and Virtual KITTI (Cabon, Murray, and Humenberger 2020) to cover both indoor and outdoor scenes. We perform zero-shot evaluations on established realworld depth estimation benchmarks NYUv2 (Nathan Silberman and Fergus 2012), KITTI (Behley et al. 2019), ETH3D (Schops et al. 2017), Scan Net (Dai et al. 2017), and DIODE (Vasiljevic et al. 2019). Table 2 compares our model quantitatively with state-of-the-art depth estimation methods. Ablation Studies
Researcher Affiliation	Academia	1 Comp Vis @ LMU Munich, Munich Center for Machine Learning
Pseudocode	No	The paper describes methods using mathematical equations and textual explanations, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Comp Vis/depth-fm
Open Datasets	Yes	We train our depth estimation model on two synthetic datasets, Hypersim (Roberts et al. 2021) and Virtual KITTI (Cabon, Murray, and Humenberger 2020) to cover both indoor and outdoor scenes. We leverage Metric3D v2 (Hu et al. 2024a), as our teacher model. We perform zero-shot evaluations on established realworld depth estimation benchmarks NYUv2 (Nathan Silberman and Fergus 2012), KITTI (Behley et al. 2019), ETH3D (Schops et al. 2017), Scan Net (Dai et al. 2017), and DIODE (Vasiljevic et al. 2019). On the high-resolution Middlebury-2014 dataset (Scharstein et al. 2014)
Dataset Splits	Yes	Following (Ke et al. 2024) we take 54K training samples from Hypersim and 20K training samples from Virtual KITTI. By training only on 74k synthetic samples and an additional 7.4k samples from a discriminative depth estimation method... we fine-tune our Depth FM to complete depth maps where only 2% of the ground truth pixels are available
Hardware Specification	No	The authors gratefully acknowledge the Gauss Center for Supercomputing for providing compute through the NIC on JUWELS at JSC and the HPC resources supplied by the Erlangen National High Performance Computing Center (NHR@FAU funded by DFG). While these are specific supercomputing centers, the paper does not specify exact GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers for libraries or frameworks used in the implementation.
Experiment Setup	Yes	Unless otherwise specified, we evaluate our model using an ensemble size of 10 and 4 Euler steps, and scale and shift our predictions to match the ground truth depth in log space. For Lo RA, we use rank 8 and keep the rest of the training details the same. Through empirical analysis in Table 9, we determine that a noise augmentation level of ts = 0.4 is optimal.