reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prompt-Free Conditional Diffusion for Multi-object Image Augmentation

Authors: Haoyu Wang, Lei Zhang, Wei Wei, Chen Ding, Yanning Zhang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the superiority of the proposed method over several representative state-of-the-art baselines and showcase strong downstream task gain and out-of-domain generalization capabilities. We validate the proposed framework and comparison methods on the MS-COCO [Lin et al., 2014] dataset, a relatively complex object detection dataset containing 80 categories, with an average of 7.7 objects per image. We use train2017 containing 118K images to train the proposed method and generate images for downstream task evaluation, and use the COCO validation set val2017 consisting of 5K images for generation quality evaluation. We use m AP (mean Average Precision) and AP50 to evaluate the generated data, and for generation quality evaluation, we use the widely used Frechet Inception Distance (FID) [Heusel et al., 2017] to evaluate the fidelity of the generated images. In addition, to evaluate the diversity of the generated images, we calculate the diversity score (DS) by comparing the LPIPS [Zhang et al., 2018] metric of paired images. Finally, to evaluate the object amounts of the generated images, we designed an instance quantity score (IQS) that detects the instance quantity of each category under multiple confidence settings using the pre-trained YOLOv8m [Jocher et al., 2023] and compares it with the original images.
Researcher Affiliation	Academia	Haoyu Wang1 , Lei Zhang1 , Wei Wei1 , Chen Ding2 and Yanning Zhang1 1Northwestern Polytechnical University 2Xi an University of Posts & Telecommunications
Pseudocode	Yes	Algorithm 1 Counting Loss Input: denoised image x i , open vocabulary object detector DOV , number of categories N c i , text prompt Si, class count list Lcount i , class index list Lindex i , counting loss step γ, counting loss threshold τ
Open Source Code	No	The paper states "Code is available at here." without providing a concrete link or specific repository details, which is insufficient for concrete access to source code.
Open Datasets	Yes	We validate the proposed framework and comparison methods on the MS-COCO [Lin et al., 2014] dataset, a relatively complex object detection dataset containing 80 categories, with an average of 7.7 objects per image.
Dataset Splits	Yes	We use train2017 containing 118K images to train the proposed method and generate images for downstream task evaluation, and use the COCO validation set val2017 consisting of 5K images for generation quality evaluation.
Hardware Specification	Yes	We fine-tune the model using Lo RA [Hu et al., 2021] at 512 512 resolution, we set the learning rate to 1e4, total batch size to 32, and train on two RTX 3090 GPUs using the Adam W [Loshchilov and Hutter, 2019] optimizer with constant scheduler.
Software Dependencies	No	The paper mentions models (Stable Diffusion XL, Grounding DINO), methods (LoRA), optimizers (AdamW), and schedulers (Euler), but does not provide specific version numbers for underlying software libraries or programming languages (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We fine-tune the model using Lo RA [Hu et al., 2021] at 512 512 resolution, we set the learning rate to 1e4, total batch size to 32, and train on two RTX 3090 GPUs using the Adam W [Loshchilov and Hutter, 2019] optimizer with constant scheduler. In the inference stage, we use the Euler scheduler with 50 steps for generation.