reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semi-Supervised Semantic Segmentation via Marginal Contextual Information

Authors: Moshe Kimhi, Shai Kimhi, Evgenii Zheltonozhskii, Or Litany, Chaim Baskin

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on standard benchmarks, we demonstrate that S4MC outperforms existing state-of-the-art semi-supervised learning approaches, offering a promising solution for reducing the cost of acquiring dense annotations. For example, S4MC achieves a 1.39 m Io U improvement over the prior art on PASCAL VOC 12 with 366 annotated images. [...] 4 Experiments This section presents our experimental results. The setup for the different datasets and partition protocols is detailed in Section 4.1. Section 4.2 compares our method against existing approaches and Section 4.3 provides the ablation study.
Researcher Affiliation	Collaboration	Moshe Kimhi EMAIL Computer Science Department, Technion Shai Kimhi EMAIL Computer Science Department, Technion Evgenii Zheltonozhskii EMAIL Physics Department, Technion Or Litany EMAIL Computer Science Department, Technion NVIDIA Chaim Baskin EMAIL Computer Science Department, Technion
Pseudocode	Yes	Algorithm 1: Pseudocode: Pseudo label refinement of S4MC, Py Torch-like style.
Open Source Code	Yes	The code to reproduce our experiments is available at https://s4mcontext.github.io/.
Open Datasets	Yes	Datasets In our experiments, we use PASCAL VOC 12 (Everingham et al., 2010), Cityscapes (Cordts et al., 2016), and MS COCO (Lin et al., 2014) datasets.
Dataset Splits	Yes	Evaluation We compare S4MC with state-of-the-art methods and baselines under the standard partition protocols using 1/2, 1/4, 1/8, and 1/16 of the training set as labeled data. For the classic setting of the PASCAL experiment, we additionally use all the finely annotated images. We follow standard protocols and use mean Intersection over Union (m Io U) as our evaluation metric. We use the data split published by Wang et al. (2022) when available to ensure a fair comparison. For the ablation studies, we use PASCAL VOC 12 val with 1/4 partition.
Hardware Specification	Yes	To verify that, we conducted a training time analysis comparing Fix Match and Fix Match + S4MC over PASCAL with 366 labeled examples, using distributed training with 8 Nvidia RTX 3090 GPUs. [...] All experiments are conducted on a machine with 8 Nvidia RTX A5000 GPUs.
Software Dependencies	No	No specific version numbers for key software components like PyTorch or other libraries are provided. The text only mentions 'Py Torch-like pseudo-code' and the use of specific architectures (Deep Labv3+, Res Net-101, Xception-65) and optimizers (SGD) without versioning.
Experiment Setup	Yes	All experiments were conducted for 80 training epochs with the stochastic gradient descent (SGD) optimizer with a momentum of 0.9 and learning rate policy of lr = lrbase 1 iter total iter power. [...] For PASCAL VOC 12 lrbase = 0.001 and the decoder only lrbase = 0.01, the weight decay is set to 0.0001 and all images are cropped to 513 513 and Bl = Bu = 3. For Cityscapes, all parameters use lrbase = 0.01, and the weight decay is set to 0.0005. The learning rate decay parameter is set to power = 0.9. Due to memory constraints, all images are cropped to 769 769 and Bℓ= Bu = 2.