Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models
Authors: Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model outperforms existing methods in grounded scene graph generation on the Visual Genome and COCO-Stuff datasets, excelling in both standard and newly introduced metrics that more accurately capture the task s complexity. Furthermore, we demonstrate the broader applicability of Diffuse SG in two important downstream tasks: (1) achieving superior results in a range of grounded scene graph completion tasks, and (2) enhancing grounded scene graph detection models by leveraging additional training samples generated by Diffuse SG. |
| Researcher Affiliation | Academia | Bicheng Xu EMAIL University of British Columbia Vector Institute for AI Qi Yan EMAIL University of British Columbia Vector Institute for AI Renjie Liao EMAIL University of British Columbia Vector Institute for AI Canada CIFAR AI Chair Lele Wang EMAIL University of British Columbia Leonid Sigal EMAIL University of British Columbia Vector Institute for AI Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Diffuse SG Training Process. Algorithm 2 Diffuse SG Sampler. |
| Open Source Code | Yes | Code is available at https://github.com/ubc-vision/Diffuse SG. |
| Open Datasets | Yes | We conduct all experiments on the Visual Genome (Krishna et al., 2017) and COCO-Stuff (Caesar et al., 2018) datasets. |
| Dataset Splits | Yes | This pre-processed dataset contains 57, 723 training and 5, 000 validation grounded scene graphs with 150 object and 50 relation categories. resulting in 118, 262 training and 4, 999 validation grounded scene graphs. |
| Hardware Specification | No | The paper acknowledges general providers of computational resources like the Province of Ontario, the Government of Canada through CIFAR, the Digital Research Alliance of Canada, companies sponsoring the Vector Institute, Advanced Research Computing at the University of British Columbia, John R. Evans Leaders Fund CFI grant and Compute Canada, but does not specify exact hardware models (e.g., specific GPUs, CPUs) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Stable Diffusion V1.5' as a base model and 'Adam optimizer' for training, but it does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries critical for replication. |
| Experiment Setup | Yes | We use Adam optimizer and learning rate being 0.0002. The EMA coefficients used for evaluation are 0.9999 and 0.999 on the Visual Genome and COCO-Stuff datasets respectively. We use Adam optimizer with β1 being 0.9, β2 being 0.999, and weight decay being 0.01; a constant learning rate 0.00001 is used to train the models. Both models are trained for 200 epochs with a batch size of 120. |