reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decoder Denoising Pretraining for Semantic Segmentation

Authors: Emmanuel Asiedu Brempong, Simon Kornblith, Ting Chen, Niki Parmar, Matthias Minderer, Mohammad Norouzi

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a decoder pretraining approach based on denoising, which can be combined with supervised pretraining of the encoder. We ﬁnd that decoder denoising pretraining on the Image Net dataset strongly outperforms encoder-only supervised pretraining. Despite its simplicity, decoder denoising pretraining achieves state-of-the-art results on label-eﬃcient semantic segmentation and oﬀers considerable gains on the Cityscapes, Pascal Context, and ADE20K datasets.
Researcher Affiliation	Industry	Emmanuel Asiedu Brempong EMAIL Google Research Simon Kornblith EMAIL Google Research Ting Chen EMAIL Google Research Niki Parmar EMAIL Google Research Matthias Minderer EMAIL Google Research Mohammad Norouzi EMAIL Google Research
Pseudocode	No	The paper describes the methodology using mathematical equations and textual explanations, but it does not include any distinct pseudocode blocks or algorithms.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is being released, nor does it provide a link to a code repository. The 'Reviewed on Open Review' link is for the review process, not code.
Open Datasets	Yes	The encoder is pre-trained on Image Net-21k (Deng et al., 2009) classiﬁcation... After pretraining, the model is ﬁne-tuned on the Cityscapes, Pascal Context, or ADE20K semantic segmentation datasets (Cordts et al., 2016; Mottaghi et al., 2014; Zhou et al., 2018).
Dataset Splits	Yes	Right: Mean Io U on the Pascal Context dataset as a function of fraction of labeled training images available. Decoder denoising pretraining is particularly eﬀective when a small number of labeled images is available, but continues to outperform supervised pretraining even on the full dataset. For the 100% setting, we report the means of 10 runs on all of the datasets. On Pascal Context and ADE20K, we also report the mean of 10 runs (with diﬀerent subsets) for the 1%, 5% and 10% label fractions and 5 runs for the 20% setting. On Cityscapes, we report the mean of 10 runs for the 1/30 setting, 6 runs for the 1/8 setting and 4 runs for the 1/4 setting.
Hardware Specification	Yes	Indeed, training DDe P costs 117.6 PFLOPs compared to 48.3 PFLOPs for the supervised baseline on 32 TPU-v4 chips.
Software Dependencies	No	The paper mentions using the Adam optimizer, but does not provide specific version numbers for any software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used for implementation.
Experiment Setup	Yes	For downstream ﬁne-tuning of the pretrained models for the semantic segmentation task, we use the standard pixel-wise cross-entropy loss. We use the Adam (Kingma & Ba, 2015) optimizer with a cosine learning rate decay schedule. For Decoder Denoising Pretraining (DDe P), we use a batch size of 512 and train for 100 epochs. The learning rate is 6e 5 for the 1 and 3 width decoders, and 1e 4 for the 2 width decoder. When ﬁne-tuning the pretrained models on the target semantic segmentation task, we sweep over weight decay and learning rate values between [1e 5, 3e 4] and choose the best combination for each task. During training, random cropping and random left-right ﬂipping is applied to the images and their corresponding segmentation masks. We randomly crop the images to a ﬁxed size of 1024 1024 for Cityscapes and 512 512 for ADE20K and Pascal Context. All of the decoder denoising pretraining runs are conducted at a 224 224 resolution.