Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer
Authors: Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Yiqiang Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple UHR image segmentation benchmarks demonstrate that our BPT outperforms previous state-of-the-art methods without introducing extra computational overhead. |
| Researcher Affiliation | Collaboration | 1 Beijing Key Lab. of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Peng Cheng Laboratory 4The Chinese University of Hong Kong 5The University of Hong Kong 6Sense Time Research and Tetras.AI |
| Pseudocode | No | The paper describes the methods in detailed paragraphs and uses figures to illustrate the architecture (Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | To validate our method s effectiveness, we conducted experiments on five UHR image datasets: Deep Globe (Demir et al. 2018), Inria Aerial (Maggiori et al. 2017), City Scapes (Cordts et al. 2016), ISIC (Tschandl, Rosendahl, and Kittler 2018), and CRAG (Graham et al. 2019). |
| Dataset Splits | Yes | The Deep Globe dataset comprises 803 UHR images, split into 455/207/142 for training, validation, and testing, respectively. Each image is 2448 2448 pixels, with annotations for seven landscape classes. The Inria Aerial dataset includes 180 UHR images (5000 5000 pixels) with binary masks for building/non-building areas, divided into 126/27/27 for training, validation, and testing. The Cityscapes dataset contains 5000 images with 19 semantic classes, split into 2979/500/1525 for training, validation, and testing. The ISIC dataset consists of 2596 UHR images, divided into 2077/260/259 for training, validation, and testing. The CRAG dataset comprises 213 images with glandular morphology annotations, split into 173 for training and 40 for testing, with an average size of 1512 1516. |
| Hardware Specification | Yes | For segmentation training, we train models on MMSegmentation codebase with GTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "Adam W" as an optimizer and "MMSegmentation codebase" for training, but it does not specify version numbers for these or any other software libraries or frameworks. |
| Experiment Setup | Yes | We pre-train the model on the Image Net-1K dataset using Adam W with a momentum of 0.9 and a weight decay of 5 10 2. The initial learning rate is 1 10 3, and the learning rate follows the cosine schedule. Models are pre-trained for 300 epochs. For segmentation training, we optimize models using Adam W with an initial learning rate of 1 10 4, decayed using a polynomial schedule with a power of 0.9. Hyperparameters are set as follows: α1 = 0.6, β1 = 0.4, α2 = 0.3, β2 = 0.7, α3 = 0.5, β3 = 0.5, λ1 = 0.3, λ2 = 0.3, λ3 = 0.4. Following common practices (Ji, Zhao, and Lu 2023), maximum training iterations are set to 40k, 80k, 160k, 80k and 80k for Inria Aerial, Deep Globe, Cityscapes, ISIC and CRAG, respectively. |