CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification
Authors: Jiyang Xu, Qi Wang, Xin Xiong, Di Gai, Ruihua Zhou, Dong Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify that the proposed method is capable of effectively obtaining cross-modal description ability from multiple views. Extensive experimental results on the Ve Ri-776 and Vehicle ID datasets demonstrate that our method outperforms other related ones in terms of view variations and cross-modal performance. |
| Researcher Affiliation | Academia | 1School of Mathematics and Computer Sciences, Nanchang University 2Department of Information, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University 3School of Software, Nanchang University EMAIL;EMAIL;EMAIL;EMAIL; EMAIL;EMAIL |
| Pseudocode | No | The paper describes the methods in detailed textual explanations and uses figures to illustrate the framework, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Datasets and Evaluation Protocols Ve Ri-776 (Liu et al. 2016b) is the benchmark dataset for vehicle Re-ID task... Vehicle ID (Liu et al. 2016a) is a large-scale dataset for vehicle Re-ID |
| Dataset Splits | Yes | Ve Ri-776 (Liu et al. 2016b) is the benchmark dataset for vehicle Re-ID task, which includes 776 unique vehicles captured by 20 cameras. The entire dataset is divided into a training set consisting of 37,778 images from 576 vehicles and a testing set consisting of 11,579 images from 200 vehicles. Vehicle ID (Liu et al. 2016a) is a large-scale dataset for vehicle Re-ID, which includes 221,763 images of 26,267 vehicles. To accommodate the varying testing requirements across scales, the test set is subdivided into three subsets (Test800, Test1600, and Test2400) with separate sizes. |
| Hardware Specification | Yes | All experiments are conducted on two NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using a pre-trained CLIP-B/16 as a backbone and the Adam optimizer, but does not specify version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | We employ a pre-trained CLIP-B/16 as our backbone. During the training phase, we train with batches of 64 and 50 epochs, each epoch consisting of 600 iterations. We employ the Adam optimizer to update model weights, setting the initial learning rate to decay by 10 times every 20 epochs. We utilize random erasure, random cropping, and random horizontal flipping, each with a 0.5 probability, as data augmentation techniques to enhance the dataset. |