reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation

Authors: Zhaoyang Sun, Fei Du, Weihua Chen, Fan Wang, Yaxiong Chen, Yi Rong, Shengwu Xiong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and ablation studies indicate the effectiveness of Realis ID and verify its ability in fulfilling all the requirements mentioned above.
Researcher Affiliation	Collaboration	1Wuhan University of Technology 2DAMO Academy, Alibaba Group 3Hupan Laboratory 4Shanghai AI Laboratory 5Interdisciplinary Artificial Intelligence Research Institute, Wuhan College
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code, nor does it include a link to a code repository.
Open Datasets	Yes	The trainable parameters in our Realis ID model are learned from the publicly available Cosmic Man dataset (Li et al. 2024a), which comprises 2 million image-text pairs of single individuals. Our evaluation data consists of 40 unseen identities obtained from another Celeb A-HQ (Karras et al. 2017) dataset.
Dataset Splits	No	The paper states that the Cosmic Man dataset (Li et al. 2024a) is used for training and 40 unseen identities from the Celeb A-HQ dataset (Karras et al. 2017) are used for evaluation. However, it does not explicitly provide specific details on the dataset splits, such as percentages, sample counts for training, validation, and testing within these datasets, or the methodology for selecting the 'unseen identities' for reproducibility.
Hardware Specification	Yes	The framework is optimized on 8 NVIDIA H20 GPUs, using the Adam optimizer with batch size of 16, learning rate of 1e-5 and weight decay of 1e-2.
Software Dependencies	No	The paper mentions using MTCNN, Bi Se Net, Media Pipe, SDXL-1.0, and IP-Adapter. However, it does not provide specific version numbers for these software dependencies or any other libraries needed to replicate the experiment.
Experiment Setup	Yes	During the training phase, we follow the learning strategy of IP-Adapter (Ye et al. 2023) that randomly drops either the image prompt (i.e., ID embedding) or the text prompt or both of them with a probability of 0.05. The hyperparameter λ in Eq. (7) is set to 1.0. The framework is optimized on 8 NVIDIA H20 GPUs, using the Adam optimizer with batch size of 16, learning rate of 1e-5 and weight decay of 1e-2. For inference, we adopt the same delayed subject conditioning technique as in (Xiao et al. 2023). We set λt = 7.5 and λi = 5.0 in Eq. (8), and use a 30-step DDIM (Song, Meng, and Ermon 2020) sampler to generate the target images.