Visual Perturbation for Text-Based Person Search

Authors: Pengcheng Zhang, Xiaohan Yu, Xiao Bai, Jin Zheng

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed method clearly surpasses previous TBPS methods on the PRW-TBPS and CUHK-SYSU-TBPS datasets. Code https://github.com/Patrick Zad/Vi Per
Researcher Affiliation Academia 1School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Jiangxi Research Institute, Beihang University, Beijing, China 2School of Computing, Macquarie University, Sydney, Australia EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Attentive Vi Per
Open Source Code Yes Code https://github.com/Patrick Zad/Vi Per
Open Datasets Yes PRW-TBPS is collected based on the IBPS dataset PRW (Zheng et al. 2017). CUHK-SYSU-TBPS is based on the IBPS dataset CUHK-SYSU (Xiao et al. 2017) and the text-based perosn Re ID dataset CUHK-PEDES (Li et al. 2017).
Dataset Splits Yes The training set contains 5,704 scene images...The query set presents independently annotated sentences matched with 2,057 query person boxes...For evaluation, a total of 6,112 scene images...CUHK-SYSU-TBPS is based on the IBPS dataset CUHK-SYSU (Xiao et al. 2017) and the text-based perosn Re ID dataset CUHK-PEDES (Li et al. 2017). 11,206 training scene images with 15,080 person boxes of 5,532 persons and 6,978 gallery images of 2,900 persons are presented.
Hardware Specification Yes All experiments are conducted on a single RTX 3090 GPU.
Software Dependencies No The paper mentions using CLIP, ResNet50, Faster R-CNN, BERT, and Adam optimizer but does not specify version numbers for general software libraries or programming languages.
Experiment Setup Yes For training, we set the batch size to be 8 and employ a multi-scale training strategy...The model is optimized with the Adam optimizer and an initial learning rate of 1e-5 which is linearly warmed up during the first two epochs. We train the model for 20 epochs and decrease the learning rate by 10 at the 12-th epoch for CUHK-SYSU-TBPS. For PRW-TBPS, the model is trained for 25 epochs and the learning rate is decayed by 10 at the 12-th epoch. In the OIM loss, the circular queue sizes are 5000 for CUHK-SYSU-TBPS and 500 for PRW-TBPS. The temperature σ for the OIM losses is set to 1/30 and the momentum coefficient is 0.5...λ1, λ2 and n for Spatila Viper are set to 0.4, 0.2 and 3, respectively. The min/max removable and exchangeable tokens m0 r/m! r and m0 e/m1 e are 1/4 and 4/8, respectively. The masking ratio r in Fine-grained Vi Per is 0.5.