Predicting High-precision Depth on Low-Precision Devices Using 2D Hilbert Curves

Authors: Mykhail Uss, Ruslan Yermolenko, Oleksii Shashko, Olena Kolodiazhna, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method increases the bit precision of predicted depth by up to three bits with little computational overhead. We also observed a positive side effect of quantization error reduction by up to 4.6 times. Our method enables effective and accurate depth prediction with DNN weights and activations quantized to eight-bit precision. 3. Experiments Two models are selected for the stereo matching experiment: Disp Net with the original architecture proposed in (Mayer et al., 2016) and Dense Prediction Transformer (DPT) (Ranftl et al., 2021) with Mobile Vi Tv3-S (Wadekar & Chaurasia, 2022) as an encoder.
Researcher Affiliation Collaboration 1Samsung R&D Institute Ukraine, Kyiv 01032, Ukraine 2Department of Information-Communication Technologies, National Aerospace University, Kharkiv 61070, Ukraine 3Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv 01032, Ukraine 4Department of Artificial Intelligence, Kharkiv National University of Radio Electronics, Kharkiv 61070, Ukraine 5Institute of Physics and Technology, NTUU Igor Sikorsky Kyiv Polytechnic Institute , Kyiv 01032, Ukraine 6Samsung Research, Seoul 06765, Republic of Korea 7Department of Computer Science and Engineering, Konkuk University, Seoul, Republic of Korea.
Pseudocode No The paper describes methods in prose and diagrams (like Figure 3 and 6) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes For stereo matching models training we adapted Scan Net v2 (Dai et al., 2017) dataset in the following way. [...] We evaluate our approach on KITTI 2012 (Geiger et al., 2012) dataset. [...] Training dataset is composed of Scan Net and Virtual KITTI 2 (Cabon et al., 2020) datasets with 25/75% balancing. [...] For training, the MS COCO (Lin et al., 2014) dataset with human pose keypoints labeling was used.
Dataset Splits Yes The split of the dataset into training and test parts corresponds to the official Scan Net v2 split. Training dataset is composed of Scan Net and Virtual KITTI 2 (Cabon et al., 2020) datasets with 25/75% balancing. The Disp Net and h2Disp Net models were trained in 256 by 1152 px input resolution and 128 by 576 px output disparity resolution. Our evaluation is based on 194 images in the training part of KITTI 2012.
Hardware Specification Yes All models were quantized using SNPE SDK v.2.24 and tested on Samsung S24+ device with Qualcomm Snapdragon 8 Gen 3 processor and Hexagon DSP. Power consumption is measured with Monsoon Solutions FTA22D Power Monitor in power save mode.
Software Dependencies Yes All models were quantized using SNPE SDK v.2.24 and tested on Samsung S24+ device with Qualcomm Snapdragon 8 Gen 3 processor and Hexagon DSP. [...] Training, validation, and test data are rendered from meshes provided for each Scan Net v2 scene using Py Render v.0.1.45 library. [...] For a mesh fusion we utilize truncated signed distance function (TSDF) (Curless & Levoy, 1996) approach as implemented in Python library Open3D (Zhou et al., 2018).
Experiment Setup Yes In all experiments, the models input size is 384 512 pixels and the output size is 192 256 pixels. [...] Experimental results for modified models are presented with a Gaussian noise layer with SD equals 0.02. [...] We set α = 1 and β = 25 in our experiments. [...] For the stereo matching, we experimentally found that p = 2, 3 are suitable choices, providing curve length 4 7.2 and bit-width increase by 2 2.85. [...] The Disp Net and h2Disp Net models were trained in 256 by 1152 px input resolution and 128 by 576 px output disparity resolution. Training settings are the same as for the Scan Net experiment.