InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Authors: Muhammad Gohar Javed, chuan guo, Li Cheng, Xingyu Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Inter Mask achieves state-of-the-art results, producing highfidelity and diverse human interactions. It outperforms previous methods, achieving an FID of 5.154 (vs 5.535 of in2IN) on the Inter Human dataset and 0.399 (vs 5.207 of Inter Gen) on the Inter X dataset.
Researcher Affiliation Collaboration 1University of Alberta 2Snap Inc. EMAIL, EMAIL
Pseudocode No The paper provides detailed descriptions of the methodology and inference processes, including figures (Figure 2, 3, 7, 8, 10, 13) illustrating the architecture and steps. However, it does not contain explicit sections or blocks labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code Implementation: We provide the open-source code implementation of our method to ensure reproducibility. github.com/gohar-malik/intermask
Open Datasets Yes Datasets We adopt two datasets to evaluate Inter Mask for the text-conditioned human interaction generation task: Inter Human (Liang et al., 2024) and Inter X (Xu et al., 2024a).
Dataset Splits No The paper refers to 'Inter Human test set' and 'Inter X test set' in Table 1 and in section 4, implying the existence of splits. However, it does not explicitly provide details on how these splits (e.g., training, validation, test) were created, their percentages, or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. It only mentions that models are 'implemented using Py Torch' and trained.
Software Dependencies No The paper states, 'Our models are implemented using Py Torch'. While Py Torch is mentioned, a specific version number is not provided, nor are any other software dependencies with their versions.
Experiment Setup Yes The Motion VQ-VAE is trained for 50 epochs with a batch size of 512. The learning rate is initialized at 0.0002 and decays via a multistep learning rate schedule, reducing by a factor of 0.1 after 70% and 85% of the iterations. The Inter-M transformer is trained for 500 epochs with a batch size of 52, following a similar multistep learning rate decay but with a decay factor of 1/3 after 50%, 70%, and 85% of the iterations. During inference, the number of iterations I is set to 20 for interaction generation and 12 for reaction generation. A classifier-free guidance (CFG) scale of 2 is applied, and the temperature is set to 1.