reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unleashing the Power of Visual Foundation Models for Generalizable Semantic Segmentation

Authors: PeiYuan Tang, Xiaodong Zhang, Chunze Yang, Haoran Yuan, Jun Sun, Danfeng Shan, Zijiang James Yang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our method, outperforming stateof-the-art methods by 3.3% on the average m Io U in syntheticto-real domain generalization.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, Xi an Jiaotong University 2School of Computer Science and Technology, Xidian University 3Shaanxi Key Laboratory of Network and System Security, Xidian University 4Synkrotron, Inc. 5Singapore Management University 6University of Science and Technology of China
Pseudocode	No	The paper describes the method using textual explanations and network architecture diagrams (Figure 2, Figure 3), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/tpy001/VFMSeg
Open Datasets	Yes	Datasets. Following previous studies (Wei et al. 2024), we evaluate our method on both synthetic and real-world datasets. The synthetic dataset is GTAV (Richter et al. 2016), which contains 24,966 street-view images rendered by a computer game engine with the resolution of 1914x1052. For real-world datasets, we use Cityscapes (Cordts et al. 2016)... BDD100K (Yu et al. 2020) is another realworld dataset... The last real-world dataset we use is Mapillary (Neuhold et al. 2017)...
Dataset Splits	Yes	Cityscapes (Cordts et al. 2016), a large-scale semantic segmentation dataset for autonomous driving, with 2,975 training images and 500 validation images, all with a resolution of 2048 1024. BDD100K (Yu et al. 2020) is another realworld dataset that contains diverse urban driving scene images with the resolution of 1280 720. The last real-world dataset we use is Mapillary (Neuhold et al. 2017), which consists of highresolution images with a minimum resolution of 1920 1080 collected from around the world. BDD100K and Mapillary provide 1000 and 2000 validation images, respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	Our implementation is based on the MMSegmentation framework.
Experiment Setup	Yes	Implementation Details. Our implementation is based on the MMSegmentation framework. We use the Adam W optimizer with learning rates of 1e-5 for the backbone and 1e-4 for all decode heads. Training is conducted for 40,000 iterations with a batch size of 2 and crop size of 512x512. We employ basic data augmentation techniques including random cropping, random horizontal flipping, photo-metric transformation and rare class sampling (Hoyer, Dai, and Van Gool 2022). During training, we set λ = 1.0, r = α = 32, and p = 0.2. During inference, we use a sliding window approach with a window size of 512x512 and a stride of 320. The θ and Cτ are set to 0.968 and 0.8 respectively.