reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Authors: Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive qualitative and quantitative experiments demonstrate the superiority of Dis Pose compared to current methods. Project page: https://github.com/lihxxx/Dis Pose.
Researcher Affiliation	Academia	1Peking University 2University of Science and Technology of China 3Tsinghua University 4 Hong Kong University of Science and Technology
Pseudocode	No	The paper describes methods and procedures in prose, but does not contain any clearly labeled 'Pseudocode', 'Algorithm', or code-like formatted blocks.
Open Source Code	Yes	Project page: https://github.com/lihxxx/Dis Pose.
Open Datasets	Yes	Following previous works (Zhang et al., 2024; Wang et al., 2024b), we use sequences 335 to 340 from the Tik Tok (Jafarian & Park, 2021) dataset for testing.
Dataset Splits	Yes	We collected 3k human videos from the internet to train our model. For Muse Pose (Tong et al., 2024), we used stable-diffusion-v1-5* to initialize our hybrid Control Net. We sampled 16 frames from each video and center cropped to a resolution of 512 512. Training was conducted for 20,000 steps with a batch size of 32. The learning rate was set to 1e-5. For Mimic Motion (Zhang et al., 2024), we initialized our hybrid Control Net using stable-video-diffusion-img2vid-xt . We sampled 16 frames from each video and center crop to a resolution of 768 1024. Training was conducted for 10,000 steps with a batch size of 8. The learning rate was set to 2e-5. Following previous works (Zhang et al., 2024; Wang et al., 2024b), we use sequences 335 to 340 from the Tik Tok (Jafarian & Park, 2021) dataset for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions general training parameters.
Software Dependencies	No	The paper mentions several models and tools like DWPose, Open Pose, stable-diffusion-v1-5, and stable-video-diffusion-img2vid-xt, but does not specify software environment details or version numbers for libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For Muse Pose (Tong et al., 2024)... Training was conducted for 20,000 steps with a batch size of 32. The learning rate was set to 1e-5. For Mimic Motion (Zhang et al., 2024)... Training was conducted for 10,000 steps with a batch size of 8. The learning rate was set to 2e-5.