DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Authors: Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive qualitative and quantitative experiments demonstrate the superiority of Dis Pose compared to current methods. Project page: https://github.com/lihxxx/Dis Pose.
Researcher Affiliation Academia 1Peking University 2University of Science and Technology of China 3Tsinghua University 4 Hong Kong University of Science and Technology
Pseudocode No The paper describes methods and procedures in prose, but does not contain any clearly labeled 'Pseudocode', 'Algorithm', or code-like formatted blocks.
Open Source Code Yes Project page: https://github.com/lihxxx/Dis Pose.
Open Datasets Yes Following previous works (Zhang et al., 2024; Wang et al., 2024b), we use sequences 335 to 340 from the Tik Tok (Jafarian & Park, 2021) dataset for testing.
Dataset Splits Yes We collected 3k human videos from the internet to train our model. For Muse Pose (Tong et al., 2024), we used stable-diffusion-v1-5* to initialize our hybrid Control Net. We sampled 16 frames from each video and center cropped to a resolution of 512 512. Training was conducted for 20,000 steps with a batch size of 32. The learning rate was set to 1e-5. For Mimic Motion (Zhang et al., 2024), we initialized our hybrid Control Net using stable-video-diffusion-img2vid-xt . We sampled 16 frames from each video and center crop to a resolution of 768 1024. Training was conducted for 10,000 steps with a batch size of 8. The learning rate was set to 2e-5. Following previous works (Zhang et al., 2024; Wang et al., 2024b), we use sequences 335 to 340 from the Tik Tok (Jafarian & Park, 2021) dataset for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions general training parameters.
Software Dependencies No The paper mentions several models and tools like DWPose, Open Pose, stable-diffusion-v1-5, and stable-video-diffusion-img2vid-xt, but does not specify software environment details or version numbers for libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes For Muse Pose (Tong et al., 2024)... Training was conducted for 20,000 steps with a batch size of 32. The learning rate was set to 1e-5. For Mimic Motion (Zhang et al., 2024)... Training was conducted for 10,000 steps with a batch size of 8. The learning rate was set to 2e-5.