Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net
Authors: Puyuan Guo, Tuo Hao, Wenxin Fu, Yingming Gao, Ya Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments and compared our method with existing works on the widely used AIST++ dataset, demonstrating that our approach has certain advantages and controllability. |
| Researcher Affiliation | Academia | Puyuan Guo, Tuo Hao, Wenxin Fu, Yingming Gao, Ya Li* Beijing University of Posts and Telecommunications, Beijing, China EMAIL |
| Pseudocode | No | The paper describes the proposed method in the 'Methodology' section using textual explanations and figures, but no structured pseudocode or algorithm blocks are explicitly provided. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | Yes | We use Phantom Dance dataset (Li et al. 2022a) to train the U-Net. ... For the controllable generation framework, we use AIST++ dataset (Li et al. 2021) to train the models. The primary reason for using this dataset is that it provides 2D keypoint data corresponding to 3D motions. ... Additionally, the 2D keypoints are stored in COCO format (Lin et al. 2014). |
| Dataset Splits | No | The paper mentions training on the 'train set of AIST++ dataset' but does not specify the exact percentages or methodology for the training/validation/test splits within the paper itself. |
| Hardware Specification | Yes | The training is conducted on two NVIDIA Ge Force RTX 4090 GPUs, and it takes approximately one day to complete. |
| Software Dependencies | No | The paper mentions 'Adan optimizer (Xie et al. 2024)' but does not specify version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | Both models are trained using the Adan optimizer (Xie et al. 2024), with a learning rate of 4 10 4 and a weight decay of 0.02. ... we set w = 2 during inference. ... we segment the motion and corresponding music data into 5-second clips, with each segment overlapping every 0.5 seconds. ... We normalize each dimension of the 3D motion data separately, which requires determining the minimum and maximum values for each dimension. |