Visual Perturbation for Text-Based Person Search
Authors: Pengcheng Zhang, Xiaohan Yu, Xiao Bai, Jin Zheng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed method clearly surpasses previous TBPS methods on the PRW-TBPS and CUHK-SYSU-TBPS datasets. Code https://github.com/Patrick Zad/Vi Per |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Jiangxi Research Institute, Beihang University, Beijing, China 2School of Computing, Macquarie University, Sydney, Australia EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Attentive Vi Per |
| Open Source Code | Yes | Code https://github.com/Patrick Zad/Vi Per |
| Open Datasets | Yes | PRW-TBPS is collected based on the IBPS dataset PRW (Zheng et al. 2017). CUHK-SYSU-TBPS is based on the IBPS dataset CUHK-SYSU (Xiao et al. 2017) and the text-based perosn Re ID dataset CUHK-PEDES (Li et al. 2017). |
| Dataset Splits | Yes | The training set contains 5,704 scene images...The query set presents independently annotated sentences matched with 2,057 query person boxes...For evaluation, a total of 6,112 scene images...CUHK-SYSU-TBPS is based on the IBPS dataset CUHK-SYSU (Xiao et al. 2017) and the text-based perosn Re ID dataset CUHK-PEDES (Li et al. 2017). 11,206 training scene images with 15,080 person boxes of 5,532 persons and 6,978 gallery images of 2,900 persons are presented. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using CLIP, ResNet50, Faster R-CNN, BERT, and Adam optimizer but does not specify version numbers for general software libraries or programming languages. |
| Experiment Setup | Yes | For training, we set the batch size to be 8 and employ a multi-scale training strategy...The model is optimized with the Adam optimizer and an initial learning rate of 1e-5 which is linearly warmed up during the first two epochs. We train the model for 20 epochs and decrease the learning rate by 10 at the 12-th epoch for CUHK-SYSU-TBPS. For PRW-TBPS, the model is trained for 25 epochs and the learning rate is decayed by 10 at the 12-th epoch. In the OIM loss, the circular queue sizes are 5000 for CUHK-SYSU-TBPS and 500 for PRW-TBPS. The temperature σ for the OIM losses is set to 1/30 and the momentum coefficient is 0.5...λ1, λ2 and n for Spatila Viper are set to 0.4, 0.2 and 3, respectively. The min/max removable and exchangeable tokens m0 r/m! r and m0 e/m1 e are 1/4 and 4/8, respectively. The masking ratio r in Fine-grained Vi Per is 0.5. |