reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models

Authors: Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model outperforms existing methods in grounded scene graph generation on the Visual Genome and COCO-Stuff datasets, excelling in both standard and newly introduced metrics that more accurately capture the task s complexity. Furthermore, we demonstrate the broader applicability of Diffuse SG in two important downstream tasks: (1) achieving superior results in a range of grounded scene graph completion tasks, and (2) enhancing grounded scene graph detection models by leveraging additional training samples generated by Diffuse SG.
Researcher Affiliation	Academia	Bicheng Xu EMAIL University of British Columbia Vector Institute for AI Qi Yan EMAIL University of British Columbia Vector Institute for AI Renjie Liao EMAIL University of British Columbia Vector Institute for AI Canada CIFAR AI Chair Lele Wang EMAIL University of British Columbia Leonid Sigal EMAIL University of British Columbia Vector Institute for AI Canada CIFAR AI Chair
Pseudocode	Yes	Algorithm 1 Diffuse SG Training Process. Algorithm 2 Diffuse SG Sampler.
Open Source Code	Yes	Code is available at https://github.com/ubc-vision/Diffuse SG.
Open Datasets	Yes	We conduct all experiments on the Visual Genome (Krishna et al., 2017) and COCO-Stuff (Caesar et al., 2018) datasets.
Dataset Splits	Yes	This pre-processed dataset contains 57, 723 training and 5, 000 validation grounded scene graphs with 150 object and 50 relation categories. resulting in 118, 262 training and 4, 999 validation grounded scene graphs.
Hardware Specification	No	The paper acknowledges general providers of computational resources like the Province of Ontario, the Government of Canada through CIFAR, the Digital Research Alliance of Canada, companies sponsoring the Vector Institute, Advanced Research Computing at the University of British Columbia, John R. Evans Leaders Fund CFI grant and Compute Canada, but does not specify exact hardware models (e.g., specific GPUs, CPUs) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Stable Diffusion V1.5' as a base model and 'Adam optimizer' for training, but it does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries critical for replication.
Experiment Setup	Yes	We use Adam optimizer and learning rate being 0.0002. The EMA coefficients used for evaluation are 0.9999 and 0.999 on the Visual Genome and COCO-Stuff datasets respectively. We use Adam optimizer with β1 being 0.9, β2 being 0.999, and weight decay being 0.01; a constant learning rate 0.00001 is used to train the models. Both models are trained for 200 epochs with a batch size of 120.