Union-over-Intersections: Object Detection beyond Winner-Takes-All

Authors: Aritra Bhowmik, Pascal Mettes, Martin R. Oswald, Cees G Snoek

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct evaluations on COCO (Lin et al., 2014), covering two tasks: object detection and instance segmentation. The MS-COCO 2017 dataset is a key benchmark for object detection and instance segmentation, comprising 80 categories with 118k training and 5k evaluation images. It features a maximum of 93 object instances per image, with an average of 7 objects. Further results on PASCAL VOC (Everingham et al., 2010) are provided in the supplemental. For Pascal VOC, we leverage the 2007 and 2012 editions. It spans 20 object categories, with 5,011 training and 4,952 testing images in VOC2007, and an additional 11,540 training images in VOC2012.
Researcher Affiliation Academia Aritra Bhowmik Pascal Mettes Martin R. Oswald Cess G. M. Snoek Atlas Lab, University of Amsterdam EMAIL
Pseudocode Yes Figure 2: Pseudo code demonstrating our minimal changes in the object detection pipeline. During regression, we adjust the target of the proposals from the entire ground truth to only the intersection with ground truth. In post-processing, we group boxes by proposal rather than regressed outcomes and merge regressed intersections, avoiding the discard of non-maximum boxes.
Open Source Code Yes Code is provided at https: //github.com/aritrabhowmik/Uo I.
Open Datasets Yes We conduct evaluations on COCO (Lin et al., 2014), covering two tasks: object detection and instance segmentation. The MS-COCO 2017 dataset is a key benchmark for object detection and instance segmentation, comprising 80 categories with 118k training and 5k evaluation images. Further results on PASCAL VOC (Everingham et al., 2010) are provided in the supplemental.
Dataset Splits Yes The MS-COCO 2017 dataset is a key benchmark for object detection and instance segmentation, comprising 80 categories with 118k training and 5k evaluation images. For Pascal VOC, we leverage the 2007 and 2012 editions. It spans 20 object categories, with 5,011 training and 4,952 testing images in VOC2007, and an additional 11,540 training images in VOC2012.
Hardware Specification No No specific hardware details (like GPU models, CPU types, or memory) are provided in the paper. The text only mentions performance metrics like '14.1 fps' and training times such as '23h for 50 epochs' without specifying the underlying hardware.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes We trained Faster R-CNN, Mask R-CNN, and Cascade R-CNN on COCO using standard configurations such as random horizontal flip, 512 proposals, and SGD optimization over 12 epochs, with a learning rate decay at epochs 8 and 11. YOLOv3 is trained with the Darknet-53 backbone for 273 epochs using common augmentation strategies and SGD. Deformable DETR is trained for 50 epochs with the Adam W optimizer.