reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Event-Customized Image Generation

Authors: Zhen Wang, Yilei Jiang, Dong Zheng, Jun Xiao, Long Chen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have demonstrated the effectiveness of Free Event. Moreover, as a pioneering effort in this direction, we also collected two evaluation benchmarks from the existing dataset (i.e., SWi G (Pratt et al., 2020) and HICO-DET (Chao et al., 2015)) and the internet for event-customized image generation, dubbed SWi G-Event and Real-Event, respectively. Extensive experiments have demonstrated the effectiveness of Free Event.
Researcher Affiliation	Academia	1Zhejiang University, Hangzhou, China 2The Hong Kong University of Science and Technology, Hong Kong, China. Work was done when Zhen Wang visited HKUST. Correspondence to: Long Chen <EMAIL>. All listed affiliations are academic institutions (universities).
Pseudocode	No	The paper describes the proposed method using descriptive text and architectural diagrams (Figures 2 and 3), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository for the methodology described.
Open Datasets	Yes	Moreover, as a pioneering effort in this direction, we also collected two evaluation benchmarks from the existing dataset (i.e., SWi G (Pratt et al., 2020) and HICO-DET (Chao et al., 2015)) and the internet for event-customized image generation, dubbed SWi G-Event and Real-Event, respectively.
Dataset Splits	No	For quantitative evaluation, we present SWi G-Event, a benchmark derived from SWi G (Pratt et al., 2020) dataset, which comprises 5,000 samples with various events and entities, i.e., 50 kinds of different actions, poses, and interactions, where each kind of event has 100 reference images, and each reference image contains 1 to 4 entities with labeled bounding boxes and nouns. The paper describes the structure of the evaluation benchmarks but does not specify train/validation/test splits, as the proposed Free Event method is training-free and thus does not require such splits for model training.
Hardware Specification	Yes	Images are generated with a resolution of 512x512 on a NVIDIA A100 GPU1.
Software Dependencies	Yes	We use Stable Diffusion v2-1-base as base model for all methods.
Experiment Setup	Yes	The denoising process was set with 50 steps. For entity switching path, for all blocks and layers containing the crossattention module, we apply the cross-attention guidance during the first 10 steps. And apply the cross-attention regulation during the whole 50 steps. For event transferring path, we perform spatial feature injection for block and layer at {decoder block 1 :[layer 1]} during the whole 50 steps. And perform self-attention injection for blocks and layers at {decoder block 1 :[layer 1, 2], decoder block 2 :[layer 0, 1, 2], decoder block 3 :[layer 0, 1, 2]} during the first 25 steps. We set the classifier-free guidance scale to 15.0.