DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval
Authors: Yating Liu, Zimo Liu, Xiangyuan Lan, Wenming Yang, Yaowei Li, Qingmin Liao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our DM-Adapter achieves state-of-the-art performance, outperforming previous methods by a significant margin. The paper includes sections like "Experimental Results", "Ablation Study", and "Hyper-parameter Analysis". |
| Researcher Affiliation | Academia | The authors are affiliated with "Shenzhen International Graduate School, Tsinghua University, China", "Pengcheng Laboratory, China", "School of ECE, Peking University, China", and "Pazhou Laboratory (Huangpu), China". Their email addresses use domains such as "mails.tsinghua.edu.cn", "pcl.ac.cn", "sz.tsinghua.edu.cn", "stu.pku.edu.cn", and "tsinghua.edu.cn", all indicating academic or public research institutions. |
| Pseudocode | No | The paper describes the methodology using mathematical formulations (e.g., Equation (1) to (9)) and architectural diagrams (Figure 3 and 4), but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | Code: https://github.com/Liu-Yating/DM-Adapter |
| Open Datasets | Yes | The paper utilizes well-known and cited public datasets: "CUHK-PEDES (Li et al. 2017)", "ICFG-PEDES (Ding et al. 2021)", and "RSTPReid (Zhu et al. 2021)". |
| Dataset Splits | Yes | For CUHK-PEDES: "The training set consists of 11,003 identities with 34, 054 images and 68, 126 texts. Both the validation set and test set have 1,000 identities." For ICFG-PEDES: "The training and test sets contain 3,102 identities and 1,000 identities respectively." For RSTPReid: "The training, validation and test sets contain 3701 identities with 18505 images, 200 identities with 1000 images, and 200 identities with 1000 images respectively." |
| Hardware Specification | Yes | We perform experiments on a single NVIDIA 4090 24GB GPU. |
| Software Dependencies | No | The paper mentions using a pre-trained CLIP-Vi T-B/16 as the image encoder, CLIP text Transformer as the text encoder, and the Adam optimizer, but does not provide specific version numbers for any software libraries or dependencies like PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | The image is resized to 384 128, and the length of textual token sequence is 77. The model is trained using Adam optimizer for 60 epochs, with a batch size of 128 and an initial learning rate 3 10 4. We utilize the reduction parameter 8 representing the bottleneck dimension in adapter as CSKT (Liu et al. 2024a). Top-K is set to 2, and the number of experts is 6. The hyperparameter α that indicates the auxiliary loss is set to 0.5. |