SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
Authors: Weiqi Yan, Lvhai Chen, Shengchuan Zhang, Yan Zhang, Liujuan Cao
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed method surpasses previous semi-supervised methods in the COD field and achieves state-of-the-art performance. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing semi-supervised COD models. Especially, our method improves the Mean Absolute Error (MAE) by 52.0% and the S-Measure by 19.1% compared to previous SOTA semisupervised COD methods. Additionally, our approach outperforms some supervised COD methods, highlighting its greater practical applicability. Our main contributions are summarized as follows: We proposed an innovative semi-supervised COD model SCOUT, and extensive experiments have demonstrated its high performance and effectiveness. |
| Researcher Affiliation | Academia | Weiqi Yan1 , Lvhai Chen1 , Shengchuan Zhang1 , Yan Zhang1 and Liujuan Cao1 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. weiqi EMAIL, EMAIL, zsc EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and structured prose, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Our code will be released at https://github. com/Heartfirey/UCOD-DPL. |
| Open Datasets | Yes | To adapt to this work, we build a new dataset, namely Ref Text COD. The motivation behind this proposed dataset is that existing datasets do not have image-level referring text. To achieve a fair and effective comparison with existing methods, we expect to be able to construct text-based referring COD experiments in settings that are as similar as possible. Referring to [Chen et al., 2022; Fan et al., 2022], we used the mainstream COD datasets: CHAMELEON ([Wu et al., 2019]), CAMO ([Yan et al., 2021; Le et al., 2019]), COD10K ([Fan et al., 2020]), NC4K ([Le et al., 2019]) as the base image data. |
| Dataset Splits | Yes | Training Set. To compare with the existing works, following [Luo et al., 2023; Fan et al., 2020], we use 1000 images from the CAMO trainset and 3040 images from the COD10K trainset as the training set for our experiments. During the training process, we follow the data partition ratios from previous semi-supervised COD results [Lai et al., 2024], training the model with 1%, 5%, and 10% of labeled data. Testing Sets. We test the model s performance on four mainstream COD benchmark testing sets, CHAMELEON with 76 test images, CAMO with 250 test images, COD10K with 2026 test images, and NC4K with 4121 test images. |
| Hardware Specification | Yes | All experiments are implemented with Py Torch 2.1 and a machine with Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz, 256Gi B RAM, and 8 NVIDIA Titan A80080G GPUs. |
| Software Dependencies | Yes | All experiments are implemented with Py Torch 2.1 and a machine with Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz, 256Gi B RAM, and 8 NVIDIA Titan A80080G GPUs. |
| Experiment Setup | Yes | All images are resized to 640 640 for training and testing. We employ the Image Net pretrained Swin-base [Liu et al., 2021] as our image encoder, use recently developed Bi Ref Block [Zheng et al., 2024] from High-Resolution Dichotomous Image Segmentation (HRDIS) fields to build the decoder, and utilize CLIP-Vi T-Large as our text encoder. The parameters of the CLIP text encoder are frozen during the training process, while all others are trainable. The batch size is set to 6 for each GPU during training, Adam is used as the optimizer, and the learning rate is initialized to 1e-4 and use multi-step decay strategy with 30 training epochs. |