reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Auto-Regressive Diffusion for Generating 3D Human-Object Interactions

Authors: Zichen Geng, Zeeshan Hayder, Wei Liu, Ajmal Saeed Mian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model has been evaluated on the OMOMO and BEHAVE datasets, where it outperforms existing state-of-the-art methods in terms of both performance and inference speed. This makes ARDHOI a robust and efficient solution for text-driven HOI tasks. Experiments on the OMOMO and BEHAVE datasets demonstrate that our method outperforms current SOTA techniques in both accuracy and inference speed.
Researcher Affiliation	Academia	1The University of Western Australia 35 Stirling Highway, Perth, WA 6009 Australia 2Commonwealth Scientific and Industrial Research Organization Synergy building Black Mountain Canberra, ACT 2601 Australia EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and training process in prose and uses diagrams (Figure 2) to illustrate the components, but no structured pseudocode or algorithm blocks are present.
Open Source Code	Yes	Code https://github.com/gengzichen/ARDHOI
Open Datasets	Yes	Our experiments are conducted on the OMOMO (Li, Wu, and Liu 2023), and BEHAVE (Bhatnagar et al. 2022) datasets.
Dataset Splits	Yes	For the OMOMO dataset, we trim the sequence to a minimum length of 60 and a maximum length of 240 frames. For the BEHAVE dataset, we follow the annotation and sequence splitting by (Peng et al. 2023).
Hardware Specification	No	The paper discusses inference speed in terms of FLOPs and AITS, but does not specify the exact hardware (e.g., GPU model, CPU type) used for these measurements or for training.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	In c VAE, the encoder is a three-block MLP, each consisting of one fully connected layer, a Si LU activation layer, a fully connected layer, and a layer norm... The channel size of the input is 1024, and the encoded token size is 512. The ARDM consists of 27 Mamba2 layers with a hidden dimension of 512. We use an expansion factor of 2 and the state number is 32. The MLP denoiser has the same setting as the c VAE encoder.