reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation

Authors: Qin Guo, Ailing Zeng, Dongxu Yue, Ceyuan Yang, Yang Cao, Hanzhong Guo, Fei Shen, Wei Liu, Xihui Liu, Dan Xu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the high quality of HAIG-2.9M and the effectiveness of UNIMC, particularly in heavy occlusions and multi-class scenarios. 5. Experiments Implementation Details. ... For evaluation, we utilize the testing set of HAIG-2.9M. ... Evaluation Metrics. We adopt commonly-used metrics for comprehensive comparisons from five perspectives: 1) Image Quality. FID (Heusel et al., 2017) and KID (Bi nkowski et al., 2018) reflect quality and diversity; 2) Text-Image Alignment. CLIP (Radford et al., 2021) text-image similarity is reported; 3) Class Accuracy. ... 4) Pose Accuracy. ... 5) Human Subjective Evaluation.
Researcher Affiliation	Collaboration	1The Hong Kong University of Science and Technology 2Tencent 3Peking University 4The Chinese University of Hong Kong 5The University of Hong Kong 6National University of Singapore. Correspondence to: Dan Xu <EMAIL>.
Pseudocode	No	The paper describes the UNIMC framework and its components (unified keypoint encoder, timestep-aware keypoint modulator) in Section 3 and Figure 3, but does not present a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We propose HAIG-2.9M, a large-scale, high-quality, and diverse dataset designed for keypoint-guided human and animal image generation. Our License: Creative Common CC-BY 4.0 license. In A.1. Licenses section, it lists multiple image websites and datasets with their URLs and licenses, such as Pexels (Pexels, 2024): Creative Commons CC0 license. https://www.pexels.com/ and SA-1B8 (Kirillov et al., 2023): SA-1B Dataset Research License. https://ai.meta.com/datasets/segment-anything/.
Dataset Splits	Yes	Dataset Split. Detailed statistics for each subset of the dataset are provided in Tab. 2. First, for the testing set, we select 40 images for each class, ensuring that each class of images contains multiple classes. Then, we split the remaining images into training and validation sets at an approximately 20 : 1 ratio. We adopt a class-level partition to ensure the class proportions are balanced between the training and validation sets. The training set comprises 745K images and 2.7M instances, while the validation set consists of 39K images and 145K instances. Table 2. Split of HAIG-2.9M. Training Set 745,828 2,725,484... Validation Set 39,342 145,504... Testing Set 1,224 3,785...
Hardware Specification	Yes	We train at 1024 1024 resolution for 8K steps with a batch size of 256 using 8 A800 GPUs.
Software Dependencies	No	The paper mentions PIXART-α-1024px as the backbone model and Adam W optimizer, but does not provide specific version numbers for any software libraries or programming languages used in the implementation.
Experiment Setup	Yes	Implementation Details. We use PIXART-α-1024px (Chen et al., 2024c) as backbone. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 0.03 and a fixed learning rate of 2e 5, we only train the unified keypoint encoder and the timestep-aware keypoint modulator. We train at 1024 1024 resolution for 8K steps with a batch size of 256 using 8 A800 GPUs. During training, we drop the bounding box condition with 50% probability, the keypoint condition with 15% probability, and the prompt with 10% probability.