RemDet: Rethinking Efficient Model Design for UAV Object Detection
Authors: Chen Li, Rui Zhao, Zeyu Wang, Huiying Xu, Xinzhong Zhu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on large UAV datasets, Visdrone and UAVDT, validate the real-time efficiency and superior performance of our methods. On the challenging UAV dataset Vis Drone, our methods not only provided state-of-the-art results, improving detection by more than 3.4%, but also achieve 110 FPS on a single 4090. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Zhejiang Normal University, Zhejiang, 321004, China 2Research Institute of Hangzhou Artificial Intelligence, Zhejiang Normal University, Zhejiang, 311231, China |
| Pseudocode | No | The paper describes the methods and modules verbally and with diagrams (Figure 2, Figure 5) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using 'PyTorch and MMDetection' but does not provide any explicit statement about releasing the authors' own source code, nor does it include any links to a code repository or supplementary materials containing code. |
| Open Datasets | Yes | To evaluate our method, we conduct UAV detection experiment on the Vis Drone (Zhu et al. 2021) and UAVDT (Du et al. 2018), and also included the MSCOCO (Lin et al. 2014) dataset as an additional benchmark. |
| Dataset Splits | Yes | Vis Drone comprises 8,599 aerial images across 10 categories, with 6,471 images for training and 548 images for validation, all at a resolution of 2,000 x 1,500 pixels... UAVDT includes 23,258 training images and 15,069 testing images, with a resolution of 1,024 x 540 pixels across 3 classes. |
| Hardware Specification | Yes | All experiments were conducted on 8 NVIDIA RTX 4090 GPUs, with inference performed on a single 4090 GPU. |
| Software Dependencies | No | Using PyTorch and MMDetection, we trained one-stage models from scratch... |
| Experiment Setup | Yes | Using PyTorch and MMDetection, we trained one-stage models from scratch on the Vis Drone and UAVDT datasets for 300 epochs, with a learning rate of 1e-2, and applied data augmentation techniques such as mixup and Mosaic. For two-stage models, we utilized pretrained backbone networks. On MSCOCO, we kept the same parameters, except for a momentum of 0.937, a weight decay of 5e-4, and a learning rate decay of 1e-2 every 10 epochs. The input size for the YOLO series models was 640 x 640, while for other models it was 1,333 x 800. |