Towards Unbiased Learning in Semi-Supervised Semantic Segmentation
Authors: Rui Sun, Huayu Mai, Wangkai Li, Tianzhu Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results across multiple benchmarks, especially in the most limited label scenarios with the most serious class imbalance issues, demonstrate that Diff Match performs favorably against state-of-the-art methods. |
| Researcher Affiliation | Academia | Rui Sun1 Huayu Mai1,2 Wangkai Li1,2 Tianzhu Zhang1,2 1University of Science and Technology of China 2National Key Laboratory of Deep Space Exploration, Deep Space Exploration Laboratory EMAIL, EMAIL |
| Pseudocode | Yes | In Algorithm 1, we present the pseudo algorithm of Diff Match to clearly summarize our method. At this point, we have explored the integration of the diffusion process and the teacher-student paradigm to alleviate the class imbalance issue from a generative perspective. |
| Open Source Code | Yes | Code is available at https://github.com/yuisuen/Diff Match. |
| Open Datasets | Yes | We conduct experiments on three datasets with severe class-imbalanced issues. (1) PASCAL VOC 2012 (Everingham et al., 2010) contains 21 classes with 1,464 and 1,449 finely annotated images for training and validation, respectively. We augment the original training set (i.e., classic) with additional 9,118 coarsely annotated images in SBD (Hariharan et al., 2011) to get a blender training set following other researches (Chen et al., 2021b; Hu et al., 2021). ... (2) Cityscapes (Cordts et al., 2016) consists of 2,975 images for training and 500 images for validation with 19 classes. ... (3) COCO (Lin et al., 2014), composed of 118k/5k training/validation images, is a more severe class-imbalanced dataset, containing 81 classes to predict, with over 10, 000 head-to-tail ratio. |
| Dataset Splits | Yes | We conduct experiments on three datasets with severe class-imbalanced issues. (1) PASCAL VOC 2012 (Everingham et al., 2010) contains 1,464 and 1,449 finely annotated images for training and validation, respectively. ... (2) Cityscapes (Cordts et al., 2016) consists of 2,975 images for training and 500 images for validation with 19 classes. ... (3) COCO (Lin et al., 2014), composed of 118k/5k training/validation images... The partition protocol (e.g., 1/16) indicates the ratio of labeled data used in training to the entire training set. |
| Hardware Specification | Yes | All experiments are conducted on 8 RTX 3090 GPUs (memory is 24G/GPU). |
| Software Dependencies | No | The paper mentions using 'Deep Labv3+' as the decoder and 'Res Net-50/101' as backbones. While these are specific models/architectures, the paper does not list specific software libraries or frameworks with version numbers (e.g., PyTorch 1.9, Python 3.8) that would constitute reproducible software dependencies. |
| Experiment Setup | Yes | We set the sampling step as 3 at inference, the number of layers in mask denoiser L as 4 and the scaling factor b as 0.1 for all experiments. During training, we randomly crop 513 × 513 for PASCAL and COCO datasets, and train 80 and 30 epochs, respectively. For Cityscapes, the cropsize is set as 801 × 801 and the training epoch is 240. The batch size of the three datasets is set to 8. Polynomial Decay learning rate policy is applied throughout the whole training. The strong augmentation contains feature dropout, random color jitter, grayscale and Gaussian blur. The weak augmentation consists of random crop, resize and horizontal flip. |