Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification
Authors: Wenbo Dai, Lijing Lu, Zhihang Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments have demonstrated that VI-Re ID models trained on synthetic data produced by Di VE consistently exhibit notable enhancements. In particular, the state-of-the-art method, CAJ, trained with synthetic images, achieves an improvement of about 9% in m AP over the baseline on the LLCM dataset. Experiments demonstrate that VI-Re ID models trained on syntheic dataset by Di VE have consistently shown a significant improvement. Extensive ablation studies have validated the effectiveness of each proposed module. |
| Researcher Affiliation | Academia | 1Nanjing Tech University, Nanjing, China 2Peking University, Beijing, China 3Chinese Academy of Sciences, Beijing, China EMAIL |
| Pseudocode | No | The paper describes the methodology conceptually with mathematical formulations and high-level steps but does not contain a dedicated pseudocode block or algorithm section. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on two VI-Re ID datasets, namely SYSU-MM01(Wu et al. 2020), and LLCM(Zhang and Wang 2023), as well as two RGB person Re-IDentification datasets, including Market-1501(Zheng et al. 2015) and CUHK03-NP(Zhong et al. 2017; Li et al. 2014). |
| Dataset Splits | No | The paper mentions using established datasets like SYSU-MM01, LLCM, Market-1501, and CUHK03-NP and states, "Following common practices, we adopt the cumulative matching characteristics (CMC) and mean average precision (m AP) as evaluation metrics. Additionally, all the reported results are the average of 10 trails." However, it does not explicitly provide the specific training/validation/test splits, percentages, or methodology used for these datasets beyond referring to common practices. |
| Hardware Specification | No | The paper mentions using "Stable Diffusion 1.5 as the base model" but does not specify the hardware (e.g., GPU model, CPU type, memory) used for training or inference. |
| Software Dependencies | No | The paper mentions using "Stable Diffusion 1.5 as the base model" and "DPMsolver++ (Lu et al. 2022) as the sampling scheduler." However, it does not provide specific version numbers for other key software dependencies or libraries (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproducibility. |
| Experiment Setup | Yes | Our method uses Stable Diffusion 1.5 as the base model, fine-tuning only the Lo RA weights and textual embeddings. The rank of Lo RA is set to 128, and each modality identifier is assigned a unique 8-character identifier (e.g. b8z BXKo H ). During the training phase, all input images are resized to 512 256 pixels and augmented with horizontal flips to enhance model robustness. The learning rate is set to 5 10 5. The batch size is configured to 16, and the total number of training steps is set to 400,000. For image generation, we utilize a timestep of 25 and adopt DPMsolver++ (Lu et al. 2022) as the sampling scheduler. |