InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
Authors: Muhammad Gohar Javed, chuan guo, Li Cheng, Xingyu Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Inter Mask achieves state-of-the-art results, producing highfidelity and diverse human interactions. It outperforms previous methods, achieving an FID of 5.154 (vs 5.535 of in2IN) on the Inter Human dataset and 0.399 (vs 5.207 of Inter Gen) on the Inter X dataset. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Snap Inc. EMAIL, EMAIL |
| Pseudocode | No | The paper provides detailed descriptions of the methodology and inference processes, including figures (Figure 2, 3, 7, 8, 10, 13) illustrating the architecture and steps. However, it does not contain explicit sections or blocks labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code Implementation: We provide the open-source code implementation of our method to ensure reproducibility. github.com/gohar-malik/intermask |
| Open Datasets | Yes | Datasets We adopt two datasets to evaluate Inter Mask for the text-conditioned human interaction generation task: Inter Human (Liang et al., 2024) and Inter X (Xu et al., 2024a). |
| Dataset Splits | No | The paper refers to 'Inter Human test set' and 'Inter X test set' in Table 1 and in section 4, implying the existence of splits. However, it does not explicitly provide details on how these splits (e.g., training, validation, test) were created, their percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. It only mentions that models are 'implemented using Py Torch' and trained. |
| Software Dependencies | No | The paper states, 'Our models are implemented using Py Torch'. While Py Torch is mentioned, a specific version number is not provided, nor are any other software dependencies with their versions. |
| Experiment Setup | Yes | The Motion VQ-VAE is trained for 50 epochs with a batch size of 512. The learning rate is initialized at 0.0002 and decays via a multistep learning rate schedule, reducing by a factor of 0.1 after 70% and 85% of the iterations. The Inter-M transformer is trained for 500 epochs with a batch size of 52, following a similar multistep learning rate decay but with a decay factor of 1/3 after 50%, 70%, and 85% of the iterations. During inference, the number of iterations I is set to 20 for interaction generation and 12 for reaction generation. A classifier-free guidance (CFG) scale of 2 is applied, and the temperature is set to 1. |