DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Authors: Yuhao Wang, Yang Liu, Aihua Zheng, Pingping Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three object Re ID benchmarks verify the effectiveness of our methods.
Researcher Affiliation Academia 1School of Future Technology, School of Artificial Intelligence, Dalian University of Technology 2School of Artificial Intelligence, Anhui University 3Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology in prose and uses diagrams (e.g., Figure 2) to illustrate the framework, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The detailed configurations and results are available at https://github.com/924973292/De Mo.
Open Datasets Yes We evaluate the proposed method on three multi-modal object Re ID benchmarks. To be specific, RGBNT201 (Zheng et al. 2021) is a multi-modal person Re ID dataset, consisting of 4,787 aligned RGB, NIR and TIR images from 201 identities. RGBNT100 (Li et al. 2020) is a large-scale multi-modal vehicle Re ID dataset with 17,250 image triples, covering a wide range of challenging visual conditions. MSVR310 (Zheng et al. 2022) is a small-scale multi-modal vehicle Re ID dataset with 2,087 image triples, featuring high-quality images captured across diverse environments and time spans.
Dataset Splits Yes We evaluate the proposed method on three multi-modal object Re ID benchmarks. To be specific, RGBNT201 (Zheng et al. 2021) is a multi-modal person Re ID dataset, consisting of 4,787 aligned RGB, NIR and TIR images from 201 identities. RGBNT100 (Li et al. 2020) is a large-scale multi-modal vehicle Re ID dataset with 17,250 image triples, covering a wide range of challenging visual conditions. MSVR310 (Zheng et al. 2022) is a small-scale multi-modal vehicle Re ID dataset with 2,087 image triples, featuring high-quality images captured across diverse environments and time spans.
Hardware Specification Yes Our model is implemented using Py Torch with an NVIDIA A100 GPU.
Software Dependencies No Our model is implemented using Py Torch with an NVIDIA A100 GPU.
Experiment Setup Yes Images in triples are resized to 256 128 for RGBNT201 and 128 256 for RGBNT100/MSVR310. For data augmentation, we apply random horizontal flipping, cropping and erasing (Zhong et al. 2020). For RGBNT201 and MSVR310, the minibatch size is set to 64, sampling 8 images per identity. For RGBNT100, the mini-batch size is 128 with 16 images per identity. We fine-tune the proposed modules using the Adam optimizer with a learning rate of 3.5e 4 and a smaller learning rate of 5e 6 for the visual encoder. The total number of training epochs is 50. The number of experts nd is set to 7.