Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification

Authors: Wenbo Dai, Lijing Lu, Zhihang Li

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments have demonstrated that VI-Re ID models trained on synthetic data produced by Di VE consistently exhibit notable enhancements. In particular, the state-of-the-art method, CAJ, trained with synthetic images, achieves an improvement of about 9% in m AP over the baseline on the LLCM dataset. Experiments demonstrate that VI-Re ID models trained on syntheic dataset by Di VE have consistently shown a significant improvement. Extensive ablation studies have validated the effectiveness of each proposed module.
Researcher Affiliation Academia 1Nanjing Tech University, Nanjing, China 2Peking University, Beijing, China 3Chinese Academy of Sciences, Beijing, China EMAIL
Pseudocode No The paper describes the methodology conceptually with mathematical formulations and high-level steps but does not contain a dedicated pseudocode block or algorithm section.
Open Source Code No The paper does not contain an explicit statement about releasing the source code or a link to a code repository.
Open Datasets Yes We evaluate our method on two VI-Re ID datasets, namely SYSU-MM01(Wu et al. 2020), and LLCM(Zhang and Wang 2023), as well as two RGB person Re-IDentification datasets, including Market-1501(Zheng et al. 2015) and CUHK03-NP(Zhong et al. 2017; Li et al. 2014).
Dataset Splits No The paper mentions using established datasets like SYSU-MM01, LLCM, Market-1501, and CUHK03-NP and states, "Following common practices, we adopt the cumulative matching characteristics (CMC) and mean average precision (m AP) as evaluation metrics. Additionally, all the reported results are the average of 10 trails." However, it does not explicitly provide the specific training/validation/test splits, percentages, or methodology used for these datasets beyond referring to common practices.
Hardware Specification No The paper mentions using "Stable Diffusion 1.5 as the base model" but does not specify the hardware (e.g., GPU model, CPU type, memory) used for training or inference.
Software Dependencies No The paper mentions using "Stable Diffusion 1.5 as the base model" and "DPMsolver++ (Lu et al. 2022) as the sampling scheduler." However, it does not provide specific version numbers for other key software dependencies or libraries (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproducibility.
Experiment Setup Yes Our method uses Stable Diffusion 1.5 as the base model, fine-tuning only the Lo RA weights and textual embeddings. The rank of Lo RA is set to 128, and each modality identifier is assigned a unique 8-character identifier (e.g. b8z BXKo H ). During the training phase, all input images are resized to 512 256 pixels and augmented with horizontal flips to enhance model robustness. The learning rate is set to 5 10 5. The batch size is configured to 16, and the total number of training steps is set to 400,000. For image generation, we utilize a timestep of 25 and adopt DPMsolver++ (Lu et al. 2022) as the sampling scheduler.