reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Single Image Test-Time Adaptation for Segmentation

Authors: Klara Janouskova, Tamir Shor, Chaim Baskin, Jiri Matas

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose two new segmentation TTA methods and compare them to established baselines and recent stateof-the-art. The methods are ﬁrst validated on synthetic domain shifts and then tested on real-world datasets.
Researcher Affiliation	Academia	Klara Janouskova EMAIL Visual Recognition Group, Faculty of Electrical Engineering Czech Technical University in Prague Tamir Shor EMAIL Technion Israel Institute of Technology, Haifa, Israel Chaim Baskin EMAIL Technion Israel Institute of Technology, Haifa, Israel Jiri Matas EMAIL Visual Recognition Group, Faculty of Electrical Engineering Czech Technical University in Prague
Pseudocode	No	The paper describes the methodology using mathematical formulas and descriptive text. No explicit pseudocode or algorithm blocks are present.
Open Source Code	Yes	Code and data: https://klarajanouskova.github.io/sitta-seg/
Open Datasets	Yes	The TTA methods are evaluated on two semantic segmentation models pretrained on the GTA5 Richter et al. (2016) and COCO Lin et al. (2014) datasets. After selecting the best hyper-parameters for each method on the SITTA training set, the methods are evaluated on 5 test datasets: ACDC-Rain, ACDC-Fog, ACDC-Night, ACDC-Snow, and Cityscapes. In this experiment, the performance of TTA methods is studied on a model trained on the COCO dataset and evaluated on the VOC dataset.
Dataset Splits	Yes	The SITTA training set for each model is derived from a set of 40 images from the segmentation model s training dataset extended with a set of 9 synthetic corruptions at three severity levels from Hendrycks & Dietterich (2019)... Since the original images without corruption are also included, each SITTA training dataset consists of 1200 images (40 images, 9 + 1 corruption, three corruption levels). Each of the test sets consists of 500 images.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions the use of 'Timm library Wightman (2019)' but does not specify a version number for it or other key software components.
Experiment Setup	Yes	SITTA hyper-parameters. For each TTA method, optimizing all the network parameters or normalization parameters is only considered... The learning rate and number of TTA iterations are considered from learning hyper-parameters. The maximum possible number of iterations is 10 to limit the computational requirements. Reasonable learning rate values are found via a grid search and then extended with other promising values based on the initial results. Shared implementation details... It is trained with the Adam W Loshchilov & Hutter (2017) optimizer with a learning rate of 1e 3 and the Cross-Entropy (CE) loss. The SGD optimizer is used for the TTA since early experiments with Adam W showed a high divergence rate.