Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation

Authors: Tao Liu, Rongjie Li, Chongyu Wang, Xuming He

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Visual Genome and Open Images v6 datasets demonstrate that our framework consistently achieves state-of-the-art performance, demonstrating its effectiveness in addressing the challenges of open-vocabulary scene graph generation.
Researcher Affiliation Academia 1Shanghai Tech University, Shanghai, China 2Shanghai Engineering Research Center of Intelligent Vision and Imaging EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using natural language and figures, but no explicit pseudocode or algorithm blocks are present.
Open Source Code No The paper does not contain any explicit statements about releasing code or links to a code repository.
Open Datasets Yes To evaluate the SGG task, we adopt two benchmarks: the VG150 version of the Visual Genome (VG) dataset (Krishna et al. 2017) and the Open Image v6 (OIV6) dataset (Kuznetsova et al. 2020).
Dataset Splits Yes In the VG dataset s Pred CLS setting, we follow Epic s predicate split, selecting 70% of the categories as base predicates and the remaining 30% as novel predicates. In the SGDet setting, we follow the Ov SGTR predicate split. For the OIV6 dataset, we use the predicate split from PGSG.
Hardware Specification Yes All experiments are implemented in Py Torch and trained on 4 NVIDIA A40 GPUs.
Software Dependencies Yes We employ the GPT-3.5-turbo, as our LLM. We adopt CLIP (Radford et al. 2021) (Vi TB/32) as our VLM backbone. ... All experiments are implemented in Py Torch
Experiment Setup Yes We set k = 3 to dynamically select and set α = 0.25 to balance the weights of the two text prompts. For training losses, the weight of the entity detector is λ1 = 2, the weight for predicate prediction is λ2 = 1, and the weight for distillation loss is λ3 = 20.