Semi-Supervised Clustering Framework for Fine-grained Scene Graph Generation

Authors: Jiarui Yang, Chuan Wang, Jun Zhang, Shuyi Wu, Jinjing Zhao, Zeming Liu, Liang Yang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on various SGG models and achieve substantial overall performance improvements, demonstrating the effectiveness of SSC-SGG. Experiments Experimetal Setup Dataset. We evaluate our proposed SSC-SGG framework on two common-used SGG datasets, i.e., Visual Genome (VG) (Krishna et al. 2017) and Open Image (OI) V6 (Kuznetsova et al. 2020). Performance Comparisons Visual Genome.
Researcher Affiliation Academia 1Shanghai Key Lab of Intell. Info. Processing, School of Computer Science, Fudan University 2Shanghai Collaborative Innovation Center on Intelligent Visual Computing 3Institute of Information Engineering, CAS 4School of Computer Science and Technology, Beijing Jiao Tong University 5Guangdong Provincial Key Lab of Intell. Info. Processing & Shenzhen Key Lab of Media Security, Shenzhen University 6Information Research Center of Military Science, PLA Academy of Military Science 7National Key Laboratory of Science and Technology on Information System Security, China 8School of Computer Science and Engineering, Beihang University 9School of Artificial Intelligence, Hebei University of Technology
Pseudocode Yes Algorithm 1: Dynamic Label Assignment Algorithm Input: X RN C: predicted logits; t: iteration times; Output: Y RN C: assignment soft pseudo-labels;
Open Source Code No The paper does not contain an explicit statement about releasing code or a link to a code repository.
Open Datasets Yes We evaluate our proposed SSC-SGG framework on two common-used SGG datasets, i.e., Visual Genome (VG) (Krishna et al. 2017) and Open Image (OI) V6 (Kuznetsova et al. 2020).
Dataset Splits Yes We adopt the most popular pre-processed VG150 including 108k images, the most frequent 150 object classes, and 50 predicate categories. For OI V6, we follow the same data pre-processing and evaluation protocols utilized in (Li et al. 2021; Lin et al. 2022a), including 602 object classes and 30 predicate categories. Following previous work (Zellers et al. 2018; Tang et al. 2019), We adopt the most popular pre-processed VG150 including 108k images, the most frequent 150 object classes, and 50 predicate categories.
Hardware Specification Yes All our experiments are conducted using a RTX A5000 GPU.
Software Dependencies No The paper mentions using a pre-trained Faster R-CNN with ResNeXt-101-FPN but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes In dynamic pseudo-label assignment algorithm, we set the regularization coefficient λ = 0.05, the number of iterations t = 3, and the smooth factor σ = 0.5. Then, we pseudo-label the top-5 samples (n = 5) with the highest assignment confidence scores for each image on average. The model is trained by an SGD optimizer with 60k iterations. The initial learning rate is 1.0 10 3 with being decayed by a factor of 10 at 28k and 48k iterations. The pre-training process is executed for the 50k iterations using proposed multi-view prototype-based clustering framework without pseudo-labels, followed by semi-supervised training for the subsequent 10k iterations. The batch size is set to 8.