reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DepthART: Monocular Depth Estimation as Autoregressive Refinement Task

Authors: Bulat Gabdullin, Nina Konovalova, Nikolay Patakin, Dmitry Senushkin, Anton Konushin

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that the proposed training approach significantly enhances the performance of VAR in depth estimation tasks. When trained on Hypersim dataset using our approach, the model achieves superior results across multiple unseen benchmarks compared to existing generative and discriminative baselines.
Researcher Affiliation	Academia	Bulat Gabdullin1,2 , Nina Konovalova1 , Nikolay Patakin1 , Dmitry Senushkin1 and Anton Konushin1 1AIRI, Moscow, Russia 2HSE University EMAIL
Pseudocode	No	The paper describes methods using prose and mathematical equations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper provides a footnote '1https://bulatko.github.io/depthart-pp/' next to the introduction of the work. This URL is a project demonstration page on GitHub Pages, not a direct link to a source-code repository.
Open Datasets	Yes	Due to the requirement of dense ground-truth depth maps for variational autoencoders, we utilize the highly realistic synthetic Hyper Sim dataset [Roberts et al., 2021], which includes 461 diverse indoor scenes. Evaluation is performed on four datasets unseen during training: NYUv2 [Silberman et al., 2012] and IBIMS [Koch et al., 2019] capturing indoor environments, TUM [Li et al., 2019] capturing dynamic humans in indoor environment, ETH3D [Schops et al., 2017] providing high-quality depth maps for outdoor environments.
Dataset Splits	No	The paper mentions using the Hypersim dataset for training and several other datasets for evaluation, but it does not specify any particular train/validation/test splits (e.g., percentages, sample counts, or methodology) for these datasets needed for reproduction.
Hardware Specification	Yes	Training of our model takes 17 hours using 4 NVIDIA H100 GPUs.
Software Dependencies	No	The paper mentions using Adam W optimizer and a Step LR scheduler, but does not provide specific version numbers for any software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Visual Autoregressive Transformer is trained with Depth ART using Adam W [Loshchilov and Hutter, 2019] optimizer with a learning rate of 10 4 and weight decay of 10 2 and batch size equals to 4. Additionally we decrease learning rate during training with Step LR scheduler with a step size of 10, 000 and a gamma of 0.8. Training of our model takes 17 hours using 4 NVIDIA H100 GPUs. ... we train all models at this resolution [256 256].