MoReact: Generating Reactive Motion from Textual Descriptions

Authors: Xiyan Xu, Sirui Xu, Yu-Xiong Wang, Liangyan Gui

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach for this novel task, which is capable of producing realistic, diverse, and controllable reactions that not only closely match the movements of the counterpart but also adhere to the textual guidance. Please find our webpage at https://xiyan-xu.github.io/Mo React Web Page/. In this section, we begin by introducing the experimental setup of our work, which includes evaluation metrics, baseline settings, and implementation details in Sec. 4.1. Subsequently, we present the quantitative results of our method in Sec. 4.2, followed by the qualitative results in Sec. 4.3. Finally, in Sec. 4.4, we discuss the ablation study conducted on our model.
Researcher Affiliation Academia Xiyan Xu EMAIL University of Illinois Urbana-Champaign Sirui Xu EMAIL University of Illinois Urbana-Champaign Yu-Xiong Wang EMAIL University of Illinois Urbana-Champaign Liang-Yan Gui EMAIL University of Illinois Urbana-Champaign
Pseudocode No The paper describes the methodology and loss formulations using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper provides a webpage link: "Please find our webpage at https://xiyan-xu.github.io/Mo React Web Page/". This is a project demonstration page, not a specific code repository, and the paper does not contain an explicit statement about releasing the source code.
Open Datasets Yes Dataset. We conduct our evaluation on the Inter Human (Liang et al., 2024) and CHI3D (Fieraru et al., 2020) datasets.
Dataset Splits Yes We use the official training and testing split as specified in Inter Human. To demonstrate the generalizability of Mo React, we also evaluate its performance on the action-driven reaction generation task using the CHI3D dataset, which contains 376 interaction sequences with action labels. We split this dataset into training and testing sets at a 2:1 ratio.
Hardware Specification Yes Our models are implemented in Py Torch and trained on two NVIDIA A40 GPUs.
Software Dependencies No Our models are implemented in Py Torch and trained on two NVIDIA A40 GPUs. For text processing, we utilize a frozen CLIP-Vi T-L-14 model to encode the text prompts into text features... We follow the official implementation of ST-GCN (Yan et al., 2018) to build our evaluator... The paper mentions software tools like PyTorch, CLIP, and ST-GCN, but it does not specify version numbers for any of these components.
Experiment Setup Yes The trajectory generation model is trained for 1,200 epochs, and the full-body motion generation model is trained for 2,000 epochs. We train both models using a learning rate of lr = 1e 4 and the Adam W optimizer, with a batch size of 32. Our models are implemented in Py Torch and trained on two NVIDIA A40 GPUs. During training, we use a 1,000-step diffusion process and adopt a classifier-free technique (Ho & Salimans, 2022) that randomly masks 10% of the text conditions, 10% of the actor s motion conditions, and 10% of the global trajectory condition independently. During inference, we use the DDIM (Song et al., 2020) sampling strategy with 50 time steps and η = 0, and set the classifier-free guidance coefficient s = 3.5. For the hyperparameters used in the training of the revised model, we set (λR, λK, λI, λfoot K , λvel K , λrot K , λtraj K , Lp I , Lv I ) to (7.0, 1.0, 1.0, 300.0, 110.0, 1.5, 10, 5.0, 25.0), respectively. In addition, we set the threshold t for applying the kinematic loss LK and the interaction loss LI as 700.