reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

Authors: Sixiao Zheng, Yanwei Fu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Pororo SV and Flintstones SV datasets demonstrate that Contextual Story significantly outperforms existing SOTA methods in both story visualization and continuation.
Researcher Affiliation	Academia	1Fudan University 2Shanghai Innovation Institute EMAIL, EMAIL
Pseudocode	No	The paper describes methods in text and uses architectural diagrams (Figure 2, Figure 4) but does not present structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We employ two popular benchmark datasets, Pororo SV (Li et al. 2019) and Flintstones SV (Gupta et al. 2018), to evaluate the performance of our model on story visualization and story continuation tasks.
Dataset Splits	Yes	Pororo SV contains 10,191, 2,334, and 2,208 stories within the train, validation, and test splits, respectively, featuring 9 main characters. Flintstones SV contains 20,132, 2,071, and 2,309 stories within the train, validation, and test splits, respectively, featuring 7 main characters and 323 backgrounds.
Hardware Specification	Yes	Training is performed on 4 NVIDIA A800 GPUs with a batch size of 12, a learning rate of 5 10 5 and 40,000 iterations for Pororo SV and 80,000 iterations for Flintstones SV. ... The experiment is conducted on an A800 GPU with 50 DDIM steps to ensure a fair comparison.
Software Dependencies	Yes	We initialize Contextual Story with the pre-trained Stable Diffusion 2.1-base and fine-tune only the UNet parameters with the Adam W optimizer.
Experiment Setup	Yes	Training is performed on 4 NVIDIA A800 GPUs with a batch size of 12, a learning rate of 5 10 5 and 40,000 iterations for Pororo SV and 80,000 iterations for Flintstones SV. The SETA window size is k = 3, and the SC layer count is 4. During training, we apply classifier-free guidance by ran-domly dropping input storylines with a 0.1 probability and use the PYo Co mixed noise prior for noise initialization. For inference, we use the DDIM sampler with 50 steps and a guidance scale of 7.5 to generate 256 256 images.