Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Authors: Xiaoyang Liu, Boran Wen, Xinpeng Liu, Zizheng Zhou, Hongwei Fan, Cewu Lu, Lizhuang Ma, Yulong Chen, Yong-Lu Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method demonstrates significant superiority in extensive experiments compared to current baselines. Section 5 is titled "Experiments" and includes sub-sections such as "Setting", "Implementation Details", "Baselines", "Results", "Visualization", and "Ablation Study", all of which describe empirical evaluation. |
| Researcher Affiliation | Academia | Authors are affiliated with Shanghai Jiao Tong University, Shanghai Innovation Institute, and Peking University. Shanghai Jiao Tong University and Peking University are well-known academic institutions. Shanghai Innovation Institute, in this context, is considered a public research institution, aligning all affiliations with academia. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. Methodologies are described in paragraph text and supported by figures. |
| Open Source Code | Yes | Code https://github.com/Dirty Harry LYL/HAKE-AVA |
| Open Datasets | No | The paper introduces a new benchmark called GIO, which is built upon the AVA dataset. While AVA (Gu et al. 2018) is a public dataset and cited, the paper does not provide explicit concrete access information (e.g., a URL or DOI) for the newly introduced GIO dataset itself, which the authors created with new annotations. |
| Dataset Splits | Yes | After filtering, there are 107,663 of 126,700 key-frames attached with 4D HOI layout (85,370 for training, 22,293 for inference). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions several models and frameworks like Slow Fast, SAM, Grounding DINO, PHALP, SMPL, and Zoe Depth, but it does not specify any version numbers for these software components or other ancillary software libraries. |
| Experiment Setup | Yes | An Adam optimizer, an initial learning rate of 1e-3, a cosine learning rate schedule, and a batch size of 16 are adopted. When training the 2D decoder, the learning rate of the parameters of Slow Fast and Grounding DINO is 1e-5. N3D is set to 256, No is set to 256 and Nq is set to 24 for alignment. We use weighted BCE loss, where the loss coefficient for true positions is ten times that of false positions. |