reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation

Authors: Thong Thanh Nguyen, Xiaobao Wu, Yi Bin, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our motion-aware contrastive framework significantly improves state-of-the-art methods on both video and 4D datasets. We conduct comprehensive experiments to evaluate the effectiveness of our motion-aware contrastive framework. We first describe the experiment settings, covering the evaluation datasets, evaluation metrics, baseline methods, and implementation details. Next, we present quantitative results of our method, then provide ablation study and careful analysis to explore properties of our motion-aware contrastive framework. Eventually, we conduct qualitative analysis to concretely examine its behavior.
Researcher Affiliation	Academia	1 Institute of Data Science (IDS), National University of Singapore, Singapore 2 Nanyang Technological University (NTU), Singapore, 3 Tongji University, China. All listed institutions are academic universities.
Pseudocode	Yes	Algorithm 1: Computing the optimal transport distance
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository for the methodology described.
Open Datasets	Yes	We assess the effectiveness of our method on natural and 4D video inputs. The corresponding dataset to each input type is as follows: Open-domain Panoptic video scene graph generation (Open PVSG) (Yang et al. 2023): Open PVSG consists of scene graphs and associated segmentation masks with respect to subject and object nodes in the scene graph. Panoptic scene graph generation for 4D (PSG4D) (Yang et al. 2024): The PSG4D dataset is divided into two groups, i.e. PSG4D-GTA and PSG4D-HOI.
Dataset Splits	No	The paper mentions training, fine-tuning, and validation processes, but does not provide specific percentages or counts for training, validation, and test splits for the datasets used. It refers to established datasets like Open PVSG and PSG4D, which likely have standard splits, but these are not explicitly detailed in the paper.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using optimizers like Adam W and Adam, and models such as Mask2Former, Video K-Net, Uni Track tracker, ResNet-101, and DKNet. However, it does not specify version numbers for these software components or any programming languages/libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For fair comparison, we experiment our contrastive framework with both IPS+T and VPS as segmentation module for panoptic video scene graph generation. In the former case, we leverage the Uni Track tracker (Wang et al. 2021) combined with Mask2Former model (Cheng et al. 2022), which is initialized from the best-performing COCO-pretrained weights and fine-tuned for 8 epochs using Adam W optimizer with a batch size of 32, learning rate of 0.0001, weight decay of 0.05, and gradient clipping with a max L2 norm of 0.01. In the latter case, we utilize Video K-Net (Li et al. 2022), also initial- ized from COCO-pretrained weights and fine-tuned with the same strategy as IPS+T. In the relation classification step, we conduct fine-tuning with a batch size of 32, employing the Adam optimizer with a learning rate of 0.001. Based on validation, we adopt a threshold γ = 9.0 and a margin α = 10.0. We set the maximum number of iterations Niter to 1,000.