reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decoupled Finetuning for Domain Generalizable Semantic Segmentation

Authors: Jaehyun Pahk, Donghyeon Kwon, Seong Joon Oh, Suha Kwak

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method was evaluated on five different datasets, Cityscapes (Cordts et al., 2016), BDD100K (Yu et al., 2020), Mapillary Neuhold et al. (2017), GTAV (Richter et al., 2016) and SYNTHIA (Ros et al., 2016), and it demonstrated superior performance to previous work in every experiment.
Researcher Affiliation	Academia	Jaehyun Pahk1 Donghyeon Kwon1 Seong Joon Oh3 Suha Kwak1,2 Dept. of CSE, POSTECH1 GSAI, POSTECH2 T ubingen AI Center, Universit at T ubingen3
Pseudocode	Yes	In this sections, we present Py Torch-like pseudocodes for each stage of De FT. Algorithm 1 describes the training procedure for decoder warm-up, and Algorithm 2 describes the training procedure for decoupled finetuning and the configuration of the final model for inference.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Datasets. We used three real-world datasets, Cityscapes (Cordts et al., 2016), BDD-100K (Yu et al., 2020), and Mapillary (Neuhold et al., 2017), and two synthetic datasets, GTAV (Richter et al., 2016) and SYNTHIA (Ros et al., 2016) for the experiment.
Dataset Splits	Yes	Cityscapes is a real-world urban driving scene dataset, comprising 2,985 images for training and 500 for validation. BDD-100K is another realworld urban driving scene dataset, and we used the 1,000 validation images for evaluation. Mapillary consists of 25,000 images collected from various worldwide locations, and we used 2,000 validation images for evaluation. GTAV contains 24,966 images generated from the Grand Theft Auto V (GTAV) game engine, split into 12,403 images for training and 6,382 for validation. SYNTHIA is a photo-realistic synthetic urban scene dataset, consisting of 9,400 images. We used 6,382 validation images for evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions 'Py Torch-like pseudocodes' and 'trained with a batch size of 4 through SGD', but does not specify version numbers for PyTorch or any other software libraries, compilers, or operating systems.
Experiment Setup	Yes	The model was trained with a batch size of 4 through SGD with a momentum of 0.9. For the warm-up stage, the model was trained for 2K iterations for Cityscapes and 8K iterations for GTAV, with a learning rate of 1e-2 and a weight decay of 5e-3. During the decoupled finetuning, the model was trained for 40K iterations with a learning rate of 1e-2 and a weight decay of 5e-4. We employed a polynomial learning rate decay schedule with a power of 0.9. For data augmentation, we adopted color jittering, Gaussian blurring, random horizontal flipping with a probability of 0.5, random scaling in the range [0.5, 2.0], and random cropping with a size of 768 768. The weight update ratio β was set to 0.9999.