DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Authors: Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive qualitative and quantitative experiments demonstrate the superiority of Dis Pose compared to current methods. Project page: https://github.com/lihxxx/Dis Pose. |
| Researcher Affiliation | Academia | 1Peking University 2University of Science and Technology of China 3Tsinghua University 4 Hong Kong University of Science and Technology |
| Pseudocode | No | The paper describes methods and procedures in prose, but does not contain any clearly labeled 'Pseudocode', 'Algorithm', or code-like formatted blocks. |
| Open Source Code | Yes | Project page: https://github.com/lihxxx/Dis Pose. |
| Open Datasets | Yes | Following previous works (Zhang et al., 2024; Wang et al., 2024b), we use sequences 335 to 340 from the Tik Tok (Jafarian & Park, 2021) dataset for testing. |
| Dataset Splits | Yes | We collected 3k human videos from the internet to train our model. For Muse Pose (Tong et al., 2024), we used stable-diffusion-v1-5* to initialize our hybrid Control Net. We sampled 16 frames from each video and center cropped to a resolution of 512 512. Training was conducted for 20,000 steps with a batch size of 32. The learning rate was set to 1e-5. For Mimic Motion (Zhang et al., 2024), we initialized our hybrid Control Net using stable-video-diffusion-img2vid-xt . We sampled 16 frames from each video and center crop to a resolution of 768 1024. Training was conducted for 10,000 steps with a batch size of 8. The learning rate was set to 2e-5. Following previous works (Zhang et al., 2024; Wang et al., 2024b), we use sequences 335 to 340 from the Tik Tok (Jafarian & Park, 2021) dataset for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions general training parameters. |
| Software Dependencies | No | The paper mentions several models and tools like DWPose, Open Pose, stable-diffusion-v1-5, and stable-video-diffusion-img2vid-xt, but does not specify software environment details or version numbers for libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For Muse Pose (Tong et al., 2024)... Training was conducted for 20,000 steps with a batch size of 32. The learning rate was set to 1e-5. For Mimic Motion (Zhang et al., 2024)... Training was conducted for 10,000 steps with a batch size of 8. The learning rate was set to 2e-5. |