Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation
Authors: Tao Liu, Rongjie Li, Chongyu Wang, Xuming He
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Visual Genome and Open Images v6 datasets demonstrate that our framework consistently achieves state-of-the-art performance, demonstrating its effectiveness in addressing the challenges of open-vocabulary scene graph generation. |
| Researcher Affiliation | Academia | 1Shanghai Tech University, Shanghai, China 2Shanghai Engineering Research Center of Intelligent Vision and Imaging EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using natural language and figures, but no explicit pseudocode or algorithm blocks are present. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code or links to a code repository. |
| Open Datasets | Yes | To evaluate the SGG task, we adopt two benchmarks: the VG150 version of the Visual Genome (VG) dataset (Krishna et al. 2017) and the Open Image v6 (OIV6) dataset (Kuznetsova et al. 2020). |
| Dataset Splits | Yes | In the VG dataset s Pred CLS setting, we follow Epic s predicate split, selecting 70% of the categories as base predicates and the remaining 30% as novel predicates. In the SGDet setting, we follow the Ov SGTR predicate split. For the OIV6 dataset, we use the predicate split from PGSG. |
| Hardware Specification | Yes | All experiments are implemented in Py Torch and trained on 4 NVIDIA A40 GPUs. |
| Software Dependencies | Yes | We employ the GPT-3.5-turbo, as our LLM. We adopt CLIP (Radford et al. 2021) (Vi TB/32) as our VLM backbone. ... All experiments are implemented in Py Torch |
| Experiment Setup | Yes | We set k = 3 to dynamically select and set α = 0.25 to balance the weights of the two text prompts. For training losses, the weight of the entity detector is λ1 = 2, the weight for predicate prediction is λ2 = 1, and the weight for distillation loss is λ3 = 20. |