EFDTR: Learnable Elliptical Fourier Descriptor Transformer for Instance Segmentation

Authors: Jiawei Cao, Chaochen Gu, Hao Cheng, Xiaofeng Zhang, Kaijie Wu, Changsheng Lu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the COCO dataset show that EFDTR outperforms existing polygon-based methods, offering a promising alternative to pixelbased approaches. Code is available at https: //github.com/chrisclear3/EFDTR. [...] Table 1. Quantitative Results on MS COCO. We compare our EFDTR with state-of-the-art models on val2017. [...] 4.4. Ablation Study In this section, we conduct ablation studies to evaluate the key components of our proposed EFDTR method and their impact on performance, validated on the COCO val2017 dataset.
Researcher Affiliation Academia 1Department of Automation, Shanghai Jiao Tong University, Shanghai, China 2Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China 3School of Computing, The Australian National University, Canberra, Australia 4Australian Institute for Machine Learning, University of Adelaide. Correspondence to: Kaijie Wu & Changsheng Lu <EMAIL, EMAIL>.
Pseudocode No The paper describes the methodology using mathematical equations and block diagrams (Figure 4) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https: //github.com/chrisclear3/EFDTR.
Open Datasets Yes The COCO dataset (Lin et al., 2014) is a widely used benchmark in computer vision, supporting tasks like object detection, segmentation, and captioning. It contains over 330,000 images across 80 categories with detailed annotations reflecting complex real-world object interactions.
Dataset Splits Yes The COCO dataset (Lin et al., 2014) is a widely used benchmark in computer vision... [...] Table 1. Quantitative Results on MS COCO. We compare our EFDTR with state-of-the-art models on val2017. [...] 4.4. Ablation Study In this section, we conduct ablation studies to evaluate the key components of our proposed EFDTR method and their impact on performance, validated on the COCO val2017 dataset. For fairness and efficiency, all experiments are trained for 12 epochs. [...] During inference, the input image scale is fixed at 800 x 800.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions training with 'Adam W optimizer' and uses 'Pyramid Attention Network (PAN)', but does not specify software versions (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The query number in the EFD decoder is set to 300, with adjacent 4 points grouped together. The EFDTR model is trained using the Adam W optimizer, with different learning rates for each model component and a multi-step learning rate scheduler. Additionally, Exponential Moving Average (EMA) is employed during training to stabilize the process. Data augmentation includes Random Flip, Random IoUCrop, and multi-scale training. During inference, the input image scale is fixed at 800 x 800. [...] For fairness and efficiency, all experiments are trained for 12 epochs. [...] The overall loss is as follows: Loverall = Lcls + αLefd + βLpolygon, (20) where α=6 and β=10.