Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification

Authors: Zhen Sun, Lei Tan, Yunhang Shen, Chengmao Cai, Xing Sun, Pingyang Dai, Liujuan Cao, Rongrong Ji

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Flexi Re ID achieves state-of-the-art performance and offers strong generalization in complex scenarios. To facilitate comprehensive evaluation, we construct CIRSPEDES, a unified dataset extending four popular Re-ID datasets to include all four modalities. Experimental results on the CIRS-PEDES show that the proposed Flexi Re ID outperforms several other state-of-the-art methods, demonstrating its flexibility and effectiveness in complex scenes.
Researcher Affiliation Collaboration 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. 2National University of Singapore. 3Tencent You Tu Lab. 4Institute of Artificial Intelligence,Xiamen University,Xiamen,China. Correspondence to: Pingyang Dai <EMAIL>.
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets Yes To facilitate comprehensive evaluation, we construct CIRSPEDES, a unified dataset extending four popular Re-ID datasets to include all four modalities. We introduced four datasets: CUHK-PEDES, ICFG-PEDS, RSTPReid, and SYSU-MM01. An overview of training and test set partitioning for each dataset can be found in the existing work(Ding et al., 2021; Zhu et al., 2021; Wu et al., 2020).
Dataset Splits Yes An overview of training and test set partitioning for each dataset can be found in the existing work(Ding et al., 2021; Zhu et al., 2021; Wu et al., 2020). Following existing cross-modality Re ID settings(Chen et al., 2022a; Ye et al., 2021b;c), we use the Rank-k matching accuracy, mean Average Precision (m AP), and mean Inverse Negative Penalty (m INP)(Ye et al., 2021c) metrics for performance evaluation in our Flexi Re ID.
Hardware Specification Yes We perform experiments on a single NVIDIA 3090 24GB GPU.
Software Dependencies No We employ the Vision Transformer(Dosovitskiy, 2020) as the visual modalities feature learning backbone, and the Transformer model(Vaswani, 2017) as the textual modality feature learning backbone. Both backbones have pre-trained parameters derived from CLIP(Radford et al., 2021). During the training process, all parameters of the backbone networks are frozen. ... We train our Flexi Re ID model with the Adam optimizer for 60 epochs.
Experiment Setup Yes In a batch, we randomly select 64 identities, each containing a sketch, an infrared, a text, and an RGB sample. The image is resized to 384 x 128, and the length of textual token sequence is 77. We train our Flexi Re ID model with the Adam optimizer for 60 epochs. And the initial learning rate is computed as 1e-5 and decayed by a cosine schedule. The threshold confidence level is set to 0.6, and the number of experts is 6. The hyperparameter λ that indicates the adaptive loss is set to 0.5.