Aligning Human Motion Generation with Human Perceptions

Authors: Haoru Wang, Wentao Zhu, Luyi Miao, Yishu Xu, Feng Gao, Qi Tian, Yizhou Wang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of our approach in both evaluating and improving the quality of generated human motions by aligning with human perceptions.
Researcher Affiliation Collaboration 1Center on Frontiers of Computing Studies, School of Compter Science, Peking University 2School of Arts, Peking University 3Inst. for Artificial Intelligence, Peking University 4Huawei Technologies, Ltd. 5Nat l Eng. Research Center of Visual Technology, Peking University 6State Key Laboratory of General Artificial Intelligence, Peking University
Pseudocode Yes Algorithm 1 Fine-tuning Motion Generation with Motion Critic
Open Source Code Yes Code and data are publicly available at https: //motioncritic.github.io/.
Open Datasets Yes We contribute Motion Percept, a large-scale motion perceptual evaluation dataset with manual annotations. Code and data are publicly available at https: //motioncritic.github.io/.
Dataset Splits Yes We train our critic model using the MDM subset in Motion Percept. We convert each multiple-choice question into three ordered preference pairs, which results in 46740 pairs for training and 5823 pairs for testing.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models, CPU models, or memory details.
Software Dependencies No The paper mentions using DSTformer, MDM, and adapting NumPy to PyTorch, but does not provide specific version numbers for these or other software dependencies like Python or CUDA.
Experiment Setup Yes We train the critic model for 150 epochs with a batch size of 64 and a learning rate starting at 2e-3, decreasing with a 0.995 exponential learning rate decay. We fine-tune for 800 iterations, with a batch size of 64 and learning rate 1e-5. We fine-tune with critic clipping threshold τ = 12.0, critic re-weight scale λ =1e-3, and KL loss re-weight scale µ = 1.0. We set the step sampling range [T1, T2] = [700, 900].