ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation

Authors: Ze Yang, Shichao Dong, Ruibo Li, Nan Song, Guosheng Lin

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves state-of-the-art performance on ADE20K and COCO benchmarks. ... Extensive experiments on ADE20k and COCO benchmarks showcase that our method outperforms state-of-the-art approaches in CPS.
Researcher Affiliation Academia Ze Yang, Shichao Dong, Ruibo Li, Nan Song & Guosheng Lin College of Computing and Data Science Nanyang Technological University 50 Nanyang Ave, Singapore 639798 EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Panoptic Inference with Mask2Former
Open Source Code Yes Code is available at https://github.com/Ze-Yang/ADAPT.
Open Datasets Yes We validate the effectiveness of our approach on ADE20K Zhou et al. (2017) and COCO Lin et al. (2014) benchmarks.
Dataset Splits Yes We adhere to the continual panoptic segmentation protocol from ECLIPSE Kim et al. (2024), evaluating both 100-n (n = 5, 10, 50) and 50-n (n = 10, 20, 50) scenarios. For instance, 100-10 refers to base training with 100 classes, followed by 5 incremental steps, each introducing 10 new classes. ... We train the network for 160k iterations during base training, and for 400 iterations per class in all subsequent steps. The batch size is consistently set to 16 across all settings. For all settings, we report Panoptic Quality (PQ) results on the standard validation set.
Hardware Specification Yes Experiments are conducted using two NVIDIA RTX 6000 Ada GPUs on ADE20K and four on COCO.
Software Dependencies No The paper mentions using Mask2Former, Res Net-50 as backbone, and Adam W optimizer, but does not provide specific version numbers for any software libraries or dependencies like Python, PyTorch, CUDA, etc.
Experiment Setup Yes The initial learning rate is set to 10^-4 for all steps in the 100-n settings. In the 50-n settings, we use 2 * 10^-4 for the incremental steps (t > 0), while maintaining 10^-4 during base training (t = 0). We train the network for 160k iterations during base training, and for 400 iterations per class in all subsequent steps. The batch size is consistently set to 16 across all settings. We use the Adam W optimizer Loshchilov & Hutter (2018) with the same weight decay values as in Cheng et al. (2022).