reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling

Authors: Jaskirat Singh, Junshen K Chen, Jonas Kohler, Michael Cohen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through both qualitative and quantitative results we find that the proposed approach surpasses prior state-of-the-art, exhibiting improved consistency across both multiple-characters and fine-grain subject details. ... Experimental analysis reveals (Sec. 5) that the proposed approach allows for improved character consistency and text-to-image alignment while exhibiting 30 faster inference time than optimization-based methods (Ruiz et al., 2022).
Researcher Affiliation	Collaboration	Jaskirat Singh1,2 Junshen K. Chen1 Jonas Kohler 1 Michael Cohen1 1Meta Gen AI 2Australian National University
Pseudocode	No	The paper describes the method using equations and textual descriptions of layers and processes, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	For consistency, the storyboard prompt dataset from Tewel et al. (2024) is used for evaluating single-subject generation. We also construct an analogous multi-subject dataset (refer appendix) placing two randomly selected subjects in different settings.
Dataset Splits	No	The paper mentions using a 'storyboard prompt dataset from Tewel et al. (2024)' and constructing a 'multi-subject dataset' but does not specify any training, validation, or test splits for these datasets.
Hardware Specification	Yes	For a fair comparison, all methods are benchmarked on a single Nvidia-H100 GPU, using the same base model as (Zhou et al., 2024) for generation.
Software Dependencies	No	The paper mentions various models and methods like 'Textual-inversion', 'DB-Lo RA', 'IP-Adapter', 'BLIP-Diffusion', 'Storygen', 'Consistory', and 'Storydiffusion', but does not provide specific version numbers for software libraries or dependencies used in their implementation.
Experiment Setup	Yes	Our key insight here is to introduce an additional dropout term (see Eq. 3) which randomly allows each token to also pay attention to other global level tokens (e.g., for background) with a small dropout-probability ϑd. ... Since early parts of the reverse diffusion process are primarily responsible for positional or layout consolidation, we use the above insight to increase pose-variance by using a negative ϖ = 0.5 during the initial timesteps t [1000, 950]. A positive ϖ = 0.4 is then used for t [950, 600] in order to improve visual consistency.