reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation

Authors: Wenhui Tan, Boyuan Li, Chuhao Jin, Wenbing Huang, Xiting Wang, Ruihua Song

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that TTR outperforms existing baselines, achieving significant improvements in evaluation metrics, such as reducing FID from 3.988 to 1.942. ...We evaluate our proposed method with strong baselines and further analyze contributions of different components, and the impact of key parameters. ...We conduct an experiment to change the downsampling parameter frame rate and calculate the difference between taking ground-truth action and random action as the input of M, in terms of summed ranking scores (Top-1, Top-2, Top-3 and Acc.).
Researcher Affiliation	Academia	Wenhui Tan, Boyuan Li, Chuhao Jin, Wenbing Huang, Xiting Wang & Ruihua Song Gaoling School of Artificial Intelligence Renmin University of China Beijing, China EMAIL
Pseudocode	No	The paper describes methods and processes in paragraph form and through diagrams (Figure 1 and Figure 2), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Project page: https://Think-Then-React.github.io/.
Open Datasets	Yes	Dataset. We evaluate all the methods on Inter-X dataset, which consists about 9K training samples and 1,708 test samples. Each sample is an action-reaction sequence and three corresponding textual description. As supplementation, we mix our pre-training data with single person motiontext dataset Human ML3D (Guo et al., 2022a), which consists more than 23K annotated motion sequences.
Dataset Splits	Yes	Dataset. We evaluate all the methods on Inter-X dataset, which consists about 9K training samples and 1,708 test samples. ...We evaluate each method for 20 times with different seeds to calculate the final results at 95% confidence interval.
Hardware Specification	Yes	Both the pre-training and fine-tuning phases are trained on a single machine with 8 Tesla V100 GPUs. ...The motion VQ-VAE is trained for 150K steps with batch size set to 256 and learning rate fixed at 1e-4 on a single Tesla V100 GPU.
Software Dependencies	No	For the LLM, we adopt Flan-T5-base (Chung et al., 2024; Raffel et al., 2020) as our base model, with extended vocabulary. ...We use the text embedding layer from clip-vit-large-patch14 (Radford et al., 2021), which is frozen during training.
Experiment Setup	Yes	We warm up the learning rate for 1,000 steps, peaking at 1e-4 for the pre-training phase, and use the same learning rate for fine-tuning. Both the pre-training and fine-tuning phases are trained on a single machine with 8 Tesla V100 GPUs. The training batch size is set to 32 for the LLM and we monitor the validation loss and reaction generation metrics for early-stopping, resulting about 100K pre-training steps and 40K fine-tuning steps. We set the re-thinking interval Nr to 4 tokens and divide each space signal into Nb = 10 bins. ...The motion VQ-VAE is trained for 150K steps with batch size set to 256 and learning rate fixed at 1e-4 on a single Tesla V100 GPU. ...We train the model on both the Inter-X and Human ML3D datasets for 200,000 steps, with batch size set to 256, and learning rate set to 1e-4. We apply L1-loss on both pose feature and velocity reconstruction, and a commitment loss for the embedding process. The weight set to velocity loss is 0.5 and commitment loss is 0.02.