EFDTR: Learnable Elliptical Fourier Descriptor Transformer for Instance Segmentation
Authors: Jiawei Cao, Chaochen Gu, Hao Cheng, Xiaofeng Zhang, Kaijie Wu, Changsheng Lu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the COCO dataset show that EFDTR outperforms existing polygon-based methods, offering a promising alternative to pixelbased approaches. Code is available at https: //github.com/chrisclear3/EFDTR. [...] Table 1. Quantitative Results on MS COCO. We compare our EFDTR with state-of-the-art models on val2017. [...] 4.4. Ablation Study In this section, we conduct ablation studies to evaluate the key components of our proposed EFDTR method and their impact on performance, validated on the COCO val2017 dataset. |
| Researcher Affiliation | Academia | 1Department of Automation, Shanghai Jiao Tong University, Shanghai, China 2Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China 3School of Computing, The Australian National University, Canberra, Australia 4Australian Institute for Machine Learning, University of Adelaide. Correspondence to: Kaijie Wu & Changsheng Lu <EMAIL, EMAIL>. |
| Pseudocode | No | The paper describes the methodology using mathematical equations and block diagrams (Figure 4) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https: //github.com/chrisclear3/EFDTR. |
| Open Datasets | Yes | The COCO dataset (Lin et al., 2014) is a widely used benchmark in computer vision, supporting tasks like object detection, segmentation, and captioning. It contains over 330,000 images across 80 categories with detailed annotations reflecting complex real-world object interactions. |
| Dataset Splits | Yes | The COCO dataset (Lin et al., 2014) is a widely used benchmark in computer vision... [...] Table 1. Quantitative Results on MS COCO. We compare our EFDTR with state-of-the-art models on val2017. [...] 4.4. Ablation Study In this section, we conduct ablation studies to evaluate the key components of our proposed EFDTR method and their impact on performance, validated on the COCO val2017 dataset. For fairness and efficiency, all experiments are trained for 12 epochs. [...] During inference, the input image scale is fixed at 800 x 800. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions training with 'Adam W optimizer' and uses 'Pyramid Attention Network (PAN)', but does not specify software versions (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The query number in the EFD decoder is set to 300, with adjacent 4 points grouped together. The EFDTR model is trained using the Adam W optimizer, with different learning rates for each model component and a multi-step learning rate scheduler. Additionally, Exponential Moving Average (EMA) is employed during training to stabilize the process. Data augmentation includes Random Flip, Random IoUCrop, and multi-scale training. During inference, the input image scale is fixed at 800 x 800. [...] For fairness and efficiency, all experiments are trained for 12 epochs. [...] The overall loss is as follows: Loverall = Lcls + αLefd + βLpolygon, (20) where α=6 and β=10. |