reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Authors: Muhammad Gohar Javed, chuan guo, Li Cheng, Xingyu Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Inter Mask achieves state-of-the-art results, producing highfidelity and diverse human interactions. It outperforms previous methods, achieving an FID of 5.154 (vs 5.535 of in2IN) on the Inter Human dataset and 0.399 (vs 5.207 of Inter Gen) on the Inter X dataset.
Researcher Affiliation	Collaboration	1University of Alberta 2Snap Inc. EMAIL, EMAIL
Pseudocode	No	The paper provides detailed descriptions of the methodology and inference processes, including figures (Figure 2, 3, 7, 8, 10, 13) illustrating the architecture and steps. However, it does not contain explicit sections or blocks labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code Implementation: We provide the open-source code implementation of our method to ensure reproducibility. github.com/gohar-malik/intermask
Open Datasets	Yes	Datasets We adopt two datasets to evaluate Inter Mask for the text-conditioned human interaction generation task: Inter Human (Liang et al., 2024) and Inter X (Xu et al., 2024a).
Dataset Splits	No	The paper refers to 'Inter Human test set' and 'Inter X test set' in Table 1 and in section 4, implying the existence of splits. However, it does not explicitly provide details on how these splits (e.g., training, validation, test) were created, their percentages, or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. It only mentions that models are 'implemented using Py Torch' and trained.
Software Dependencies	No	The paper states, 'Our models are implemented using Py Torch'. While Py Torch is mentioned, a specific version number is not provided, nor are any other software dependencies with their versions.
Experiment Setup	Yes	The Motion VQ-VAE is trained for 50 epochs with a batch size of 512. The learning rate is initialized at 0.0002 and decays via a multistep learning rate schedule, reducing by a factor of 0.1 after 70% and 85% of the iterations. The Inter-M transformer is trained for 500 epochs with a batch size of 52, following a similar multistep learning rate decay but with a decay factor of 1/3 after 50%, 70%, and 85% of the iterations. During inference, the number of iterations I is set to 20 for interaction generation and 12 for reaction generation. A classifier-free guidance (CFG) scale of 2 is applied, and the temperature is set to 1.