Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
Authors: Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation results on multiple datasets demonstrate that our method effectively enhances depth quality on reflective surfaces and outperforms state-of-the-art SSMDE baselines. |
| Researcher Affiliation | Academia | Wonhyeok Choi1, , Kyumin Hwang1, , Wei Peng2, Minwoo Choi1, Sunghoon Im1, Electrical Engineering and Computer Science1, Psychiatry and Behavioral Sciences2 Daegu Gyeongbuk Institute of Science and Technology1, Stanford University2 South Korea1, USA2 EMAIL1, wepeng@stanford.edu2 |
| Pseudocode | No | The paper describes methods and strategies but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code, a direct link to a repository, or mention of code in supplementary materials for the methodology described. |
| Open Datasets | Yes | Datasets. Scan Net (v2) (Dai et al., 2017) is a comprehensive indoor RGB-D video dataset... KITTI (Geiger et al., 2013) captures autonomous driving information... NYU-v2 (Silberman et al., 2012) serves as one of the most established and widely used benchmarks... 7-Scenes (Shotton et al., 2013) is a challenging RGB-D dataset... Booster (Ramirez et al., 2023) includes a variety of non-Lambertian objects... |
| Dataset Splits | Yes | This subdivision results in a Scan Net-Reflection dataset consisting of 45,539 training, 439 validation, and 121 testing samples. Additionally, a Scan Net No Reflection validation set comprising 1,012 samples evaluates the model s generalization when trained in reflective environments. Aligning with these methodologies, the training process leverages the Scan Net-Reflection train set to simulate real-world scenarios involving reflective surfaces. For the KITTI and NYU-v2 experimental setups, we follow the training protocol of Godard et al. (2019), incorporating our reflection-aware triplet loss and distillation training procedure. |
| Hardware Specification | Yes | All training times were measured using a single RTX A6000 GPU, as detailed in the Table 9. |
| Software Dependencies | No | The paper mentions "implemented in Py Torch" but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We train all models with the reflection triplet split proposed by 3D Distillation for 41 epochs through the Adam optimizer (Kingma & Ba, 2014) with an image resolution of 384 × 288, implemented in Py Torch. The training batch sizes of the Monodepth2 (Godard et al., 2019), HRDepth (Lyu et al., 2021), and Mono Vi T (Zhao et al., 2022) are {12, 12, 8}, respectively. The initial learning rate is 10^-4, and we adopt the multi-step learning rate scheduler that decays the learning rate by γ = 0.1 once the number of epochs reaches one of the milestones [26, 36]... the minimum and maximum depths used for training and evaluation are 0.1m and 10m. |