SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing
Authors: Chen Chen, Liangjin Zhao, Yuanchun He, Yingxuan Long, Kaiqiang Chen, Zhirui Wang, Yanfeng Hu, Xian Sun
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the US3D and WHU datasets demonstrate that our method achieves state-of-the-art performance for both semantic segmentation and stereo matching. Further ablation studies highlight the significance of modeling the connections between semantic categories and disparities. |
| Researcher Affiliation | Academia | 1Key Laboratory of Target Cognition and Application Technology, Aerospace Information Research Institute, Chinese Academy of Sciences 2School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences EMAIL, EMAIL |
| Pseudocode | No | The paper includes mathematical equations (1) through (9) but does not present any structured pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | Datasets US3D (Bosch et al. 2019) contains 2,139 pairs of satellite stereo images from Jacksonville and 2,153 from Omaha, each with corresponding semantic labels. WHU (Liu and Ji 2020) is an aerial dataset from 8,316 real aerial images in the training set and 2,618 in the test set, covering an area of 6.7 2.2 km2 over Meitan County, China, with a ground resolution of approximately 0.1 meters. |
| Dataset Splits | Yes | US3D (Bosch et al. 2019) contains 2,139 pairs of satellite stereo images from Jacksonville and 2,153 from Omaha, each with corresponding semantic labels. We randomly select 1,500 pairs from Jacksonville for training, 139 for validation, and 500 for testing, and use Omaha for generalization verification. WHU (Liu and Ji 2020) is an aerial dataset from 8,316 real aerial images in the training set and 2,618 in the test set, covering an area of 6.7 2.2 km2 over Meitan County, China, with a ground resolution of approximately 0.1 meters. We randomly select 50 and 500 pairs from Omaha for fine-tuning over 12 and 48 epochs, respectively, using 1500 pairs for validation. |
| Hardware Specification | Yes | We implement Sem Stereo using Py Torch and conduct our experiments on two NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | The hyperparameters for the loss function are set as follows: λ0 = 1, λ1 = 0.6, λ2 = 0.5, λ3 = 0.3, and α, β = 1. We compare our approach with a range of state-of-the-art methods from both the computer vision and remote sensing communities. For fairness, we standardize the configuration parameters: the optimizer is set to Adam with β1 = 0.9 and β2 = 0.999, the batch size is 4, and we use the original resolution without any augmentation techniques. We train each stage for 48 epochs, starting with an initial learning rate (lr0) of 0.001, which decays by half after epochs 12, 22, 30, 38, and 44. The disparity range varies by dataset: US3D is set to [ 64, 64) and WHU is set to [0, 128), following the settings used in previous work (He et al. 2021). |