reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Noise-Guided Predicate Representation Extraction and Diffusion-Enhanced Discretization for Scene Graph Generation

Authors: Guoqing Zhang, Shichao Kan, Fanghui Zhang, Wanru Xu, Yue Zhang, Yigang Cen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments on datasets such as VG (Krishna et al., 2017), GQA (Hudson & Manning, 2019a) and Open Image V6(Kuznetsova et al., 2020), achieving excellent performance, which demonstrates that our method effectively performs feature reconstruction and mitigates the biased predictions caused by long-tail distribution.
Researcher Affiliation	Academia	1State Key Laboratory of Advanced Rail Autonomous Operation, Bejing Jiaotong University, Beijing, China 2School of Computer Science and Technology, Bejing Jiaotong University, Beijing, China 3Visual Intelligence +X International Cooperation Joint Laboratory of MOE, Bejing Jiaotong University, Beijing, China 4School of Computer Science and Technology, Central South University, Hunan, China 5School of Artificial Intelligence, Henan University, Henan, China 6College of Computer and Information Engineering, Henan Normal University, Henan, China. Correspondence to: Wanru Xu <EMAIL>, Yigang Cen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Feature Reconstruction Training Process Based on Diffusion 1: T, n {T: total number of iteration steps, n: randomly selected step sizes} 2: m = ω(Cp), v = ω(Cp) {Init feature distribution} 3: G m + v N(0, 1) {Init noise input} 4: Gp G 5: for t in random(T, n) do 6: Et Embedding(t) {Initialize time embedding} 7: N Attn(Gp, Et, Et) {Conditional diffusion} 8: N Attn(N , Cp, Cp) {Conditional diffusion} 9: N Attn(N , Tp, Tp) {Conditional diffusion} 10: N ω(N ) φ(ω(Cp)) + ω(Cp) {Noise prediction} 11: Gp γt 1 ( G 1 γt N γt ) {Single-step denoising} 12: end for 13: if is training then 14: loss MSELoss(Gp, Tp) 15: return Gp, loss 16: end if 17: return Gp
Open Source Code	Yes	We have uploaded the code to Git Hub: https: //github.com/gavin-gqzhang/No DIS.
Open Datasets	Yes	We use the Visual Genome (VG) (Krishna et al., 2017) and GQA (Hudson & Manning, 2019a) datasets for model training and evaluation. Additionally, we employed the Open Images (Kuznetsova et al., 2020) dataset to further evaluate the generalization capability of our method.
Dataset Splits	Yes	Both VG and GQA datasets are split using the same method: 70% of the samples are used for training, 30% for testing, with 5,000 samples selected from the training set for validation. Additionally, we employed the Open Images (Kuznetsova et al., 2020) dataset... we used 126,368 images for training, 1,813 for validation, and 5,322 for testing.
Hardware Specification	Yes	All experiments are conducted using four NVIDIA 3090 GPUs, each with 24GB of memory.
Software Dependencies	No	We use a pre-trained Faster RCNN (Tang et al., 2020; Ren et al., 2015) for object detection, with the detector frozen during all three tasks. The paper mentions a software component (Faster RCNN) but does not provide specific version numbers for it or any other software libraries/frameworks.
Experiment Setup	Yes	The training process is divided into two phases. First, the basic scene graph generation model (Zellers et al., 2018; Vaswani et al., 2017; Tang et al., 2019) provides coarse-grained contextual information, which is used for pretraining the Noise-Guided Predicate Representation Extraction module. ... During training, the learning rate is set to 0.001. In the pre-training phase, the batch size is set to 8, and the number of iterations is 60,000. In the feature enhancement phase, the batch size is set to 8, and the number of iterations is 40,000.