reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantifying Context Bias in Domain Adaptation for Object Detection

Authors: Hojun Son, Asma A. Almutairi, Arpan Kusari

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through systematic experiments involving background masking, feature-level perturbations, and CAM, we reveal that convolution-based object detection models encode FG BG association. The association substantially impacts detection performance, particularly under domain shifts where background information significantly diverges. Our results demonstrate that context bias not only exists but also causally undermines the generalization capabilities of object detection models across domains. Furthermore, we validate these findings across multiple models and datasets, including state-of-the-art architectures such as ALDI++.
Researcher Affiliation	Academia	Hojun Son EMAIL University of Michigan Transportation Research Institute University of Michigan Asma Almutairi EMAIL University of Michigan Transportation Research Institute University of Michigan Arpan Kusari EMAIL University of Michigan Transportation Research Institute University of Michigan
Pseudocode	Yes	Algorithm 1: Class-wise Background Removal Experiments in Image Space Algorithm 2: Feature-wise Background Removal Experiments in Feature Space Algorithm 3: Causality Analysis via Smooth-Grad CAM++ Mask Thresholding Algorithm 4: Feature Extraction from CAM and Ground Truth Instance Mask
Open Source Code	No	The paper mentions using 'pre-trained ALDI++ model from the official repository' and 'Ultralytics framework' for YOLOv11, which refers to third-party tools. However, there is no explicit statement or link provided for the authors' own code or implementation for the methodology described in this paper.
Open Datasets	Yes	We use multiple datasets for training and evaluation, including Cityscapes, KITTI Semantic, and various subsets of Virtual KITTI. Additionally, BG-20K, a collection of 20,000 images containing non-salient objects, is utilized to generate randomized background images. The Cityscapes and KST sets share 8 foreground and 11 background object categories, while the Virtual KITTI subsets contain 3 foreground and 10 background object classes.
Dataset Splits	Yes	Cityscapes: 2,950 training images, 500 validation images, 1,500 foggy validation images, and 1,188 rainy validation images. KITTI Semantic Train: 200 images. Virtual KITTI Semantic: 2,126 images across 6 simulated weather conditions.
Hardware Specification	Yes	Training is conducted using an NVIDIA RTX A4500 GPU.
Software Dependencies	No	We employ Res Net-50 ( Res ) and E!cient Net-B0 ( E! ) as backbones for FPN models implemented in Detectron2, as well as YOLOv11 ( Yo ) (Khanam & Hussain, 2024), an anchor-free detection model. ... All models except Yo are trained using the standard loss functions provided by Detectron2, while Ultralytics is utilized for Yo . The paper mentions 'Detectron2' and 'Ultralytics' but does not specify their version numbers.
Experiment Setup	Yes	For training, we use a learning rate of 0.02 for Res , with input resolutions of 1024 2048 for Cityscapes and 375 1242 for KITTI-related datasets. For E! , we use a resolution of 1024 1024 for Cityscapes and the same KITTI resolution, with a learning rate of 0.01. All models use identical data augmentation: resizing and cropping, color jitter, and horizontal flipping. Each model is trained with a batch size of 8. We run training for approximately 100 epochs for ALDI++, Res and Yo , and 200 epochs for E! .