Towards Region-Adaptive Feature Disentanglement and Enhancement for Small Object Detection

Authors: Yanchao Bi, Yang Ning, Xiushan Nie, Xiankai Lu, Yongshun Gong, Leida Li

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several public datasets demonstrate that the RAFDE strategy is highly effective and outperforms stateof-the-art methods. The code is available at https://github.com/b-yanchao/RAFDE.git. 4 Experiments We have integrated our RAFDE module with the latest YOLO model and conducted experiments on two widely used drone image benchmarks: the Vis Drone dataset [Du et al., 2019] and the Drone-vs-Bird dataset [Coluccia et al., 2021].
Researcher Affiliation Academia 1School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China 2School of Software, Shandong University, Jinan, China 3School of Artificial Intelligence, Xidian University, Xi an, China
Pseudocode No The paper describes the methods through mathematical definitions and textual explanations (e.g., Sections 3.1, 3.2, 3.3) and does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/b-yanchao/RAFDE.git.
Open Datasets Yes We have integrated our RAFDE module with the latest YOLO model and conducted experiments on two widely used drone image benchmarks: the Vis Drone dataset [Du et al., 2019] and the Drone-vs-Bird dataset [Coluccia et al., 2021].
Dataset Splits Yes The Vis Drone dataset consists of 7,019 high-resolution images (2000 1500) containing 10 classes of small, densely packed objects. Of these, 6,471 images are used for training, 548 for validation, and 1,610 for testing. The Drone-vs-Bird dataset includes 1,387 training images and 434 test images, featuring both UAV and environmental data.
Hardware Specification Yes Training and testing were conducted on a single RTX A6000 GPU, with batch sizes of 8 and 2 for input resolutions of 640 640 and 1280 1280, respectively.
Software Dependencies No We implemented our RAFDE strategy using Py Torch [Paszke et al., 2019]. The paper mentions using PyTorch but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes All models were trained for 150 epochs, with YOLOv11m serving as the baseline. Our approach employs the same loss function as YOLOv11 [Khanam R, 2024], which includes both object classification loss and bounding box regression loss. For the classification loss, we combine BCELoss [Zheng et al., 2020] and Focal Loss [Li et al., 2020], while for the regression loss, we use CIo ULoss [Wang et al., 2023]. The input resolutions were set to 640 640 and 1280 1280 for the Vis Drone dataset, and 640 640 for the Drone-vs-Bird dataset. All models were trained using the Adam optimizer with an initial learning rate of 0.01 and a decay rate of 1e-5. Training and testing were conducted on a single RTX A6000 GPU, with batch sizes of 8 and 2 for input resolutions of 640 640 and 1280 1280, respectively.