MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
Authors: Yuhao Wang, Xuehu Liu, Tianyu Yan, Yang Liu, Aihua Zheng, Pingping Zhang, Huchuan Lu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three multi-modal object Re ID benchmarks (i.e., RGBNT201, RGBNT100 and MSVR310) validate the effectiveness of our proposed methods. |
| Researcher Affiliation | Academia | 1School of Future Technology, School of Artificial Intelligence, Dalian University of Technology 2School of Computer Science and Artificial Intelligence, Wuhan University of Technology 3School of Artificial Intelligence, Anhui University 4Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual explanations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/924973292/Mamba Pro. |
| Open Datasets | Yes | To fully evaluate the performance of our method, we conduct experiments on three multi-modal object Re ID benchmarks. Specifically, RGBNT201 (Zheng et al. 2021) is a multi-modal person Re ID dataset comprising RGB, NIR and TIR images. RGBNT100 (Li et al. 2020) is a large-scale multi-modal vehicle Re ID dataset with diverse visual challenges, such as abnormal lighting, glaring and occlusion. MSVR310 (Zheng et al. 2022) is a small-scale multi-modal vehicle Re ID dataset with more challenges. |
| Dataset Splits | Yes | To fully evaluate the performance of our method, we conduct experiments on three multi-modal object Re ID benchmarks. Specifically, RGBNT201 (Zheng et al. 2021)... RGBNT100 (Li et al. 2020)... MSVR310 (Zheng et al. 2022)... For small-scale datasets (i.e., RGBNT201 and MSVR310), the mini-batch size is set to 64, with 4 images sampled for each identity and 16 identities sampled in a batch. For the large-scale dataset, i.e., RGBNT100, the mini-batch size is set to 128, with 16 images sampled for each identity. |
| Hardware Specification | Yes | Our model is implemented by using the Py Torch toolbox with one NVIDIA A100 GPU. |
| Software Dependencies | No | Our model is implemented by using the Py Torch toolbox with one NVIDIA A100 GPU. The paper mentions PyTorch but does not provide a specific version number, nor does it list other software dependencies with version numbers. |
| Experiment Setup | Yes | Implementation Details. Our model is implemented by using the Py Torch toolbox with one NVIDIA A100 GPU. We employ the pre-trained image encoder of CLIP (Radford et al. 2021) as the backbone. For the input resolution, images are resized to 256 128 for RGBNT201 and 128 256 for RGBNT100/MSVR310. For data augmentation, we employ random horizontal flipping, cropping and erasing (Zhong et al. 2020). For small-scale datasets (i.e., RGBNT201 and MSVR310), the mini-batch size is set to 64, with 4 images sampled for each identity and 16 identities sampled in a batch. For the large-scale dataset, i.e., RGBNT100, the mini-batch size is set to 128, with 16 images sampled for each identity. We set λ1 to 0.25 and λ2 to 1.0. We use the Adam optimizer to fine-tune the model with a learning rate of 3.5e 4. The warmup strategy with a cosine decay is used for learning rate scheduling. We set the total number of training epochs to 60 for RGBNT201/MVSR310 and 30 for RGBNT100, respectively. |