Union-over-Intersections: Object Detection beyond Winner-Takes-All
Authors: Aritra Bhowmik, Pascal Mettes, Martin R. Oswald, Cees G Snoek
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct evaluations on COCO (Lin et al., 2014), covering two tasks: object detection and instance segmentation. The MS-COCO 2017 dataset is a key benchmark for object detection and instance segmentation, comprising 80 categories with 118k training and 5k evaluation images. It features a maximum of 93 object instances per image, with an average of 7 objects. Further results on PASCAL VOC (Everingham et al., 2010) are provided in the supplemental. For Pascal VOC, we leverage the 2007 and 2012 editions. It spans 20 object categories, with 5,011 training and 4,952 testing images in VOC2007, and an additional 11,540 training images in VOC2012. |
| Researcher Affiliation | Academia | Aritra Bhowmik Pascal Mettes Martin R. Oswald Cess G. M. Snoek Atlas Lab, University of Amsterdam EMAIL |
| Pseudocode | Yes | Figure 2: Pseudo code demonstrating our minimal changes in the object detection pipeline. During regression, we adjust the target of the proposals from the entire ground truth to only the intersection with ground truth. In post-processing, we group boxes by proposal rather than regressed outcomes and merge regressed intersections, avoiding the discard of non-maximum boxes. |
| Open Source Code | Yes | Code is provided at https: //github.com/aritrabhowmik/Uo I. |
| Open Datasets | Yes | We conduct evaluations on COCO (Lin et al., 2014), covering two tasks: object detection and instance segmentation. The MS-COCO 2017 dataset is a key benchmark for object detection and instance segmentation, comprising 80 categories with 118k training and 5k evaluation images. Further results on PASCAL VOC (Everingham et al., 2010) are provided in the supplemental. |
| Dataset Splits | Yes | The MS-COCO 2017 dataset is a key benchmark for object detection and instance segmentation, comprising 80 categories with 118k training and 5k evaluation images. For Pascal VOC, we leverage the 2007 and 2012 editions. It spans 20 object categories, with 5,011 training and 4,952 testing images in VOC2007, and an additional 11,540 training images in VOC2012. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) are provided in the paper. The text only mentions performance metrics like '14.1 fps' and training times such as '23h for 50 epochs' without specifying the underlying hardware. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | We trained Faster R-CNN, Mask R-CNN, and Cascade R-CNN on COCO using standard configurations such as random horizontal flip, 512 proposals, and SGD optimization over 12 epochs, with a learning rate decay at epochs 8 and 11. YOLOv3 is trained with the Darknet-53 backbone for 273 epochs using common augmentation strategies and SGD. Deformable DETR is trained for 50 epochs with the Adam W optimizer. |