All-in-One: Transferring Vision Foundation Models into Stereo Matching
Authors: Jingyi Zhou, Haoyu Zhang, Jiakang Yuan, Peng Ye, Tao Chen, Hao Jiang, Meiya Chen, Yangyang Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that AIO-Stereo achieves start-of-the-art performance on multiple datasets and ranks 1st on the Middlebury dataset and outperforms all the published work on the ETH3D benchmark. |
| Researcher Affiliation | Collaboration | Jingyi Zhou1*, Haoyu Zhang1*, Jiakang Yuan1*, Peng Ye1,3,4 , Tao Chen1 , Hao Jiang2, Meiya Chen2, Yangyang Zhang2 1School of Information Science and Technology, Fudan University, China 2Xiaomi Inc., Beijing, China 3Shanghai AI Laboratory, Shanghai, China 4The Chinese University of Hong Kong |
| Pseudocode | No | The paper describes the methodology in the 'Method' section, but it does not include any explicitly labeled pseudocode or algorithm blocks. The steps are explained in narrative form. |
| Open Source Code | No | The paper does not provide an explicit statement about the availability of source code, nor does it include any links to code repositories or mention code in supplementary materials. |
| Open Datasets | Yes | Following Selective-Stereo (Wang et al. 2024), we verify the effectiveness of AIO-Stereo on four widely used datasets including Scene Flow (Mayer et al. 2016), Middlebury 2014 (Scharstein et al. 2014), KITTI-2015 (Menze and Geiger 2015) and ETH3D (Schops et al. 2017). ... For the Middlebury dataset, following (Wang et al. 2024), we first finetune our pre-trained model on the mixed Tartan Air (Wang et al. 2020), CREStereo Dataset (Li et al. 2022), Scene Flow, Falling things (Tremblay, To, and Birchfield 2018), In Stereo2k (Bao et al. 2020), CARLA HR-VS (Yang et al. 2019), and Middlebury datasets 200k steps... |
| Dataset Splits | Yes | Scene Flow (Mayer et al. 2016) contains more than 39000 synthetic stereo frames which are divided into training and testing set. Middlebury 2014 (Scharstein et al. 2014) provides a training set with images of 23 indoor scenes and a testing set with images of 10 indoor scenes, and both sets have three resolutions to use. KITTI-2015 (Menze and Geiger 2015) contains 200 training pairs and 200 testing pairs with sparse disparity maps which were collected in real-world driving scenes. |
| Hardware Specification | Yes | We implement our AIO-Stereo with Pytorch framework and perform our experiments using NVIDIA A100 GPUs while using the Adam W optimizer. |
| Software Dependencies | No | The paper mentions 'Pytorch framework' for implementation but does not specify its version or any other software dependencies with their respective version numbers. |
| Experiment Setup | Yes | We implement our AIO-Stereo with Pytorch framework and perform our experiments using NVIDIA A100 GPUs while using the Adam W optimizer. For pre-training, we trained our model on the augmented Scene Flow training set (i.e., both cleanpass and finalpass) for 200k steps with a batch size of 8, and we use a random crop size of 320 720. We use a one-cycle learning rate schedule with warm up strategy and the learning rate gradually increases to 0.0002 in the first 1% of steps and gradually decreases thereafter. And for finetune, the learning rate linearly decays from 0.0003 to 0. |