reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Unbiased Learning in Semi-Supervised Semantic Segmentation

Authors: Rui Sun, Huayu Mai, Wangkai Li, Tianzhu Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results across multiple benchmarks, especially in the most limited label scenarios with the most serious class imbalance issues, demonstrate that Diff Match performs favorably against state-of-the-art methods.
Researcher Affiliation	Academia	Rui Sun1 Huayu Mai1,2 Wangkai Li1,2 Tianzhu Zhang1,2 1University of Science and Technology of China 2National Key Laboratory of Deep Space Exploration, Deep Space Exploration Laboratory EMAIL, EMAIL
Pseudocode	Yes	In Algorithm 1, we present the pseudo algorithm of Diff Match to clearly summarize our method. At this point, we have explored the integration of the diffusion process and the teacher-student paradigm to alleviate the class imbalance issue from a generative perspective.
Open Source Code	Yes	Code is available at https://github.com/yuisuen/Diff Match.
Open Datasets	Yes	We conduct experiments on three datasets with severe class-imbalanced issues. (1) PASCAL VOC 2012 (Everingham et al., 2010) contains 21 classes with 1,464 and 1,449 finely annotated images for training and validation, respectively. We augment the original training set (i.e., classic) with additional 9,118 coarsely annotated images in SBD (Hariharan et al., 2011) to get a blender training set following other researches (Chen et al., 2021b; Hu et al., 2021). ... (2) Cityscapes (Cordts et al., 2016) consists of 2,975 images for training and 500 images for validation with 19 classes. ... (3) COCO (Lin et al., 2014), composed of 118k/5k training/validation images, is a more severe class-imbalanced dataset, containing 81 classes to predict, with over 10, 000 head-to-tail ratio.
Dataset Splits	Yes	We conduct experiments on three datasets with severe class-imbalanced issues. (1) PASCAL VOC 2012 (Everingham et al., 2010) contains 1,464 and 1,449 finely annotated images for training and validation, respectively. ... (2) Cityscapes (Cordts et al., 2016) consists of 2,975 images for training and 500 images for validation with 19 classes. ... (3) COCO (Lin et al., 2014), composed of 118k/5k training/validation images... The partition protocol (e.g., 1/16) indicates the ratio of labeled data used in training to the entire training set.
Hardware Specification	Yes	All experiments are conducted on 8 RTX 3090 GPUs (memory is 24G/GPU).
Software Dependencies	No	The paper mentions using 'Deep Labv3+' as the decoder and 'Res Net-50/101' as backbones. While these are specific models/architectures, the paper does not list specific software libraries or frameworks with version numbers (e.g., PyTorch 1.9, Python 3.8) that would constitute reproducible software dependencies.
Experiment Setup	Yes	We set the sampling step as 3 at inference, the number of layers in mask denoiser L as 4 and the scaling factor b as 0.1 for all experiments. During training, we randomly crop 513 × 513 for PASCAL and COCO datasets, and train 80 and 30 epochs, respectively. For Cityscapes, the cropsize is set as 801 × 801 and the training epoch is 240. The batch size of the three datasets is set to 8. Polynomial Decay learning rate policy is applied throughout the whole training. The strong augmentation contains feature dropout, random color jitter, grayscale and Gaussian blur. The weak augmentation consists of random crop, resize and horizontal flip.