Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Authors: Zhi Cen, Huaijin Pi, Sida Peng, Qing Shuai, Yujun Shen, Hujun Bao, Xiaowei Zhou, Ruizhen Hu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments using the challenging boxing task. Experimental results demonstrate that our method outperforms existing baselines and can generate extended motion sequences. We conducted experiments to validate the effectiveness of our approach on our self-collected boxing dataset Duo Box. |
| Researcher Affiliation | Collaboration | Zhi Cen1, Huaijin Pi2, Sida Peng1, Qing Shuai1, Yujun Shen3, Hujun Bao1, Xiaowei Zhou1, Ruizhen Hu4 1State Key Lab of CAD&CG, Zhejiang University, 2The University of Hong Kong, 3Ant Group, 4Shenzhen University |
| Pseudocode | Yes | Algorithm 1: Online Two-Character Motion Generation with Reaction Policy |
| Open Source Code | No | Code and data will be made publicly available at https://zju3dv.github.io/ready_to_react/. |
| Open Datasets | No | We conducted experiments to validate the effectiveness of our approach on our self-collected boxing dataset Duo Box. Code and data will be made publicly available at https://zju3dv.github.io/ready_to_react/. |
| Dataset Splits | Yes | For our experiments, we split the dataset into training (80%) and testing (20%) subsets, and downsample the original data to 30 FPS for training purposes. |
| Hardware Specification | Yes | All models are trained using the Adam W optimizer (Kingma & Ba, 2014) with a learning rate of 0.0001 on a single Nvidia RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions 'Adam W optimizer (Kingma & Ba, 2014)' and 'DDIM (Song et al., 2020)', which are algorithms or methods, but it does not specify any software libraries or frameworks with their version numbers that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | The training process is divided into two stages: (1) pre-training the VQ-VAE model and (2) jointly training the next latent predictor model and the online motion decoder. ... Stage 1. We pre-train the VQ-VAE model ... for 40k iterations, using motion sequences cropped to 64 frames. The batch size is set to 128, with a codebook size = 512, codebook feature dimension = 512, and a downsampling rate of d = 4. ... Stage 2. Next, we train the next latent predictor and online motion decoder jointly for 40k iterations ... motion sequences are cropped to W = 60 frames (2 seconds) for training. The batch size is set to 32, with time step T = 1000, and we employ DDIM (Song et al., 2020) to sample only 50 steps during inference. The loss is defined as: L = Ldiffusion + β A A 2 2 + γ R R 2 2, where β = 1.0, γ = 1.0. ... All models are trained using the Adam W optimizer (Kingma & Ba, 2014) with a learning rate of 0.0001... |