CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification

Authors: Jiyang Xu, Qi Wang, Xin Xiong, Di Gai, Ruihua Zhou, Dong Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify that the proposed method is capable of effectively obtaining cross-modal description ability from multiple views. Extensive experimental results on the Ve Ri-776 and Vehicle ID datasets demonstrate that our method outperforms other related ones in terms of view variations and cross-modal performance.
Researcher Affiliation Academia 1School of Mathematics and Computer Sciences, Nanchang University 2Department of Information, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University 3School of Software, Nanchang University EMAIL;EMAIL;EMAIL;EMAIL; EMAIL;EMAIL
Pseudocode No The paper describes the methods in detailed textual explanations and uses figures to illustrate the framework, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository.
Open Datasets Yes Datasets and Evaluation Protocols Ve Ri-776 (Liu et al. 2016b) is the benchmark dataset for vehicle Re-ID task... Vehicle ID (Liu et al. 2016a) is a large-scale dataset for vehicle Re-ID
Dataset Splits Yes Ve Ri-776 (Liu et al. 2016b) is the benchmark dataset for vehicle Re-ID task, which includes 776 unique vehicles captured by 20 cameras. The entire dataset is divided into a training set consisting of 37,778 images from 576 vehicles and a testing set consisting of 11,579 images from 200 vehicles. Vehicle ID (Liu et al. 2016a) is a large-scale dataset for vehicle Re-ID, which includes 221,763 images of 26,267 vehicles. To accommodate the varying testing requirements across scales, the test set is subdivided into three subsets (Test800, Test1600, and Test2400) with separate sizes.
Hardware Specification Yes All experiments are conducted on two NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions using a pre-trained CLIP-B/16 as a backbone and the Adam optimizer, but does not specify version numbers for any software libraries or dependencies.
Experiment Setup Yes We employ a pre-trained CLIP-B/16 as our backbone. During the training phase, we train with batches of 64 and 50 epochs, each epoch consisting of 600 iterations. We employ the Adam optimizer to update model weights, setting the initial learning rate to decay by 10 times every 20 epochs. We utilize random erasure, random cropping, and random horizontal flipping, each with a 0.5 probability, as data augmentation techniques to enhance the dataset.