Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining

Authors: Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation results on multiple datasets demonstrate that our method effectively enhances depth quality on reflective surfaces and outperforms state-of-the-art SSMDE baselines.
Researcher Affiliation Academia Wonhyeok Choi1, , Kyumin Hwang1, , Wei Peng2, Minwoo Choi1, Sunghoon Im1, Electrical Engineering and Computer Science1, Psychiatry and Behavioral Sciences2 Daegu Gyeongbuk Institute of Science and Technology1, Stanford University2 South Korea1, USA2 EMAIL1, wepeng@stanford.edu2
Pseudocode No The paper describes methods and strategies but does not include any explicit pseudocode blocks or algorithms.
Open Source Code No The paper does not provide an explicit statement about releasing its source code, a direct link to a repository, or mention of code in supplementary materials for the methodology described.
Open Datasets Yes Datasets. Scan Net (v2) (Dai et al., 2017) is a comprehensive indoor RGB-D video dataset... KITTI (Geiger et al., 2013) captures autonomous driving information... NYU-v2 (Silberman et al., 2012) serves as one of the most established and widely used benchmarks... 7-Scenes (Shotton et al., 2013) is a challenging RGB-D dataset... Booster (Ramirez et al., 2023) includes a variety of non-Lambertian objects...
Dataset Splits Yes This subdivision results in a Scan Net-Reflection dataset consisting of 45,539 training, 439 validation, and 121 testing samples. Additionally, a Scan Net No Reflection validation set comprising 1,012 samples evaluates the model s generalization when trained in reflective environments. Aligning with these methodologies, the training process leverages the Scan Net-Reflection train set to simulate real-world scenarios involving reflective surfaces. For the KITTI and NYU-v2 experimental setups, we follow the training protocol of Godard et al. (2019), incorporating our reflection-aware triplet loss and distillation training procedure.
Hardware Specification Yes All training times were measured using a single RTX A6000 GPU, as detailed in the Table 9.
Software Dependencies No The paper mentions "implemented in Py Torch" but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We train all models with the reflection triplet split proposed by 3D Distillation for 41 epochs through the Adam optimizer (Kingma & Ba, 2014) with an image resolution of 384 × 288, implemented in Py Torch. The training batch sizes of the Monodepth2 (Godard et al., 2019), HRDepth (Lyu et al., 2021), and Mono Vi T (Zhao et al., 2022) are {12, 12, 8}, respectively. The initial learning rate is 10^-4, and we adopt the multi-step learning rate scheduler that decays the learning rate by γ = 0.1 once the number of epochs reaches one of the milestones [26, 36]... the minimum and maximum depths used for training and evaluation are 0.1m and 10m.