Multi-Perspective Data Augmentation for Few-shot Object Detection
Authors: Anh-Khoa Nguyen Vu, Quoc Truong Truong, Vinh-Tiep Nguyen, Thanh Ngo, Thanh-Toan Do, Tam Nguyen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of 17.5% in n AP50 over the baseline on PASCAL VOC. |
| Researcher Affiliation | Academia | 1University of Information Technology, Ho Chi Minh City, Vietnam 2Vietnam National University, Ho Chi Minh City, Vietnam 3Department of Data Science and AI, Monash University, Australia 4University of Dayton, Dayton, OH 45469, United States |
| Pseudocode | No | The paper describes methods and processes verbally and mathematically (e.g., Equation 1, 3, 4, 5, 6), and illustrates components in Figure 2, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at github.com/nvakhoa/MPAD. |
| Open Datasets | Yes | we assess our MPAD method in the FSOD setting of PASCAL VOC (Everingham et al., 2010; 2015) and MS COCO (Lin et al., 2014). |
| Dataset Splits | Yes | For PASCAL VOC, 20 classes are separated into three sets. In each set, five classes are designated as novel classes Cnovel, and the remaining fifteen classes are used as the base classes Cbase. There are K samples for each novel class (K {1, 2, 3, 5, 10}). Regarding MS COCO, the dataset serves as a challenging benchmark for FSOD. 80 classes are split into 60 base classes and 20 novel classes (identical to the 20 PASCAL VOC classes). We select a value of K from the set ({1, 2, 3, 5}) for each novel and base class to fine-tune detectors. |
| Hardware Specification | Yes | During the fine-tuning stage, we utilize both real novel data and synthetic data to train models on a single NVIDIA Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions using specific models like Powerpaint (Zhuang et al., 2023), CLIP text encoder (Radford et al., 2021), and a pre-trained Vi T model (Dosovitskiy et al., 2021). However, it does not provide specific version numbers for these software components or underlying frameworks (e.g., PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In both the base training and the fine-tuning stage, we use the same hyper-parameters as De FRCN (Qiao et al., 2021). [...] We set w = 0.7, m = 0.8 and ˆNaug = 300. The number of inference steps is fixed at T = 80. |