Quantifying Context Bias in Domain Adaptation for Object Detection

Authors: Hojun Son, Asma A. Almutairi, Arpan Kusari

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through systematic experiments involving background masking, feature-level perturbations, and CAM, we reveal that convolution-based object detection models encode FG BG association. The association substantially impacts detection performance, particularly under domain shifts where background information significantly diverges. Our results demonstrate that context bias not only exists but also causally undermines the generalization capabilities of object detection models across domains. Furthermore, we validate these findings across multiple models and datasets, including state-of-the-art architectures such as ALDI++.
Researcher Affiliation Academia Hojun Son EMAIL University of Michigan Transportation Research Institute University of Michigan Asma Almutairi EMAIL University of Michigan Transportation Research Institute University of Michigan Arpan Kusari EMAIL University of Michigan Transportation Research Institute University of Michigan
Pseudocode Yes Algorithm 1: Class-wise Background Removal Experiments in Image Space Algorithm 2: Feature-wise Background Removal Experiments in Feature Space Algorithm 3: Causality Analysis via Smooth-Grad CAM++ Mask Thresholding Algorithm 4: Feature Extraction from CAM and Ground Truth Instance Mask
Open Source Code No The paper mentions using 'pre-trained ALDI++ model from the official repository' and 'Ultralytics framework' for YOLOv11, which refers to third-party tools. However, there is no explicit statement or link provided for the authors' own code or implementation for the methodology described in this paper.
Open Datasets Yes We use multiple datasets for training and evaluation, including Cityscapes, KITTI Semantic, and various subsets of Virtual KITTI. Additionally, BG-20K, a collection of 20,000 images containing non-salient objects, is utilized to generate randomized background images. The Cityscapes and KST sets share 8 foreground and 11 background object categories, while the Virtual KITTI subsets contain 3 foreground and 10 background object classes.
Dataset Splits Yes Cityscapes: 2,950 training images, 500 validation images, 1,500 foggy validation images, and 1,188 rainy validation images. KITTI Semantic Train: 200 images. Virtual KITTI Semantic: 2,126 images across 6 simulated weather conditions.
Hardware Specification Yes Training is conducted using an NVIDIA RTX A4500 GPU.
Software Dependencies No We employ Res Net-50 ( Res ) and E!cient Net-B0 ( E! ) as backbones for FPN models implemented in Detectron2, as well as YOLOv11 ( Yo ) (Khanam & Hussain, 2024), an anchor-free detection model. ... All models except Yo are trained using the standard loss functions provided by Detectron2, while Ultralytics is utilized for Yo . The paper mentions 'Detectron2' and 'Ultralytics' but does not specify their version numbers.
Experiment Setup Yes For training, we use a learning rate of 0.02 for Res , with input resolutions of 1024 2048 for Cityscapes and 375 1242 for KITTI-related datasets. For E! , we use a resolution of 1024 1024 for Cityscapes and the same KITTI resolution, with a learning rate of 0.01. All models use identical data augmentation: resizing and cropping, color jitter, and horizontal flipping. Each model is trained with a batch size of 8. We run training for approximately 100 epochs for ALDI++, Res and Yo , and 200 epochs for E! .