reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

Authors: Eslam Abdelrahman, Liangbing Zhao, Tao Hu, MATTHIEU CORD, Patrick Perez, Mohamed Elhoseiny

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on diverse datasets, including LSUN-Churches, Image Net, Celeb HQ, and LAION-Art, demonstrate the efficacy of our approach, consistently outperforming state-of-the-art methods. For instance, Toddler Diffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating 2 faster with a 3 smaller architecture.
Researcher Affiliation	Collaboration	Eslam Abdelrahman 1, Liangbing Zhao 1, Vincent Tao Hu 2, Matthieu Cord 3, Patrick Perez 4, Mohamed Elhoseiny 1 1KAUST 2LMU 3 Valeo AI 4 Kyutai
Pseudocode	Yes	Algorithm 1 Training Pipeline Algorithm 2 Sampling Pipeline
Open Source Code	No	The project website is available at: https : //toddlerdiffusion.github.io/website/ The provided URL points to a project website, which is not an explicit code repository link or a statement of code release. The criteria specify that a project demonstration page or high-level project overview page is insufficient unless it directly hosts the source code or explicitly links to a repository for the methodology.
Open Datasets	Yes	Extensive experiments on datasets such as LSUN-Churches (Yu et al., 2015), Image Net, and Celeb A-HQ (Karras et al., 2017) demonstrate the effectiveness of this approach, consistently outperforming existing methods. Toddler Diffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating 2 faster with a 3 smaller architecture. The project website is available at: https : //toddlerdiffusion.github.io/website/
Dataset Splits	No	The paper mentions using specific datasets (e.g., LSUN-Churches, Celeb HQ, Image Net-100) and training durations (e.g., 600 epochs, 250 epochs, 350 epochs), but it does not explicitly provide specific training/test/validation split percentages, absolute sample counts for each split, or detailed methodology for how the datasets were partitioned. While standard splits might be implied for well-known datasets, the paper does not explicitly state them or cite resources for the exact splits used in its experiments.
Hardware Specification	Yes	The training time is reported till convergence, i.e, 600 epochs, using 4 A100. The training time is calculated per epoch using 4 NVIDIA RTX A6000. The sampling time is calculated per frame using one NVIDIA RTX A6000 with a batch size equals 32.
Software Dependencies	No	The paper mentions several tools and models like VQGAN, UNet, Pidi Net, Edter, Canny, Laplacian, Fast SAM, inception model, DDPM-inversion, Control Net, SDEdit, LoRA, and DDIM. However, it does not specify any programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other software components with their corresponding version numbers, which are necessary for full reproducibility of the software environment.
Experiment Setup	Yes	For instance, LDM is trained using 1k steps; in contrast, Toddler Diffusion could be trained using only 10 steps with minimal impact on generation fidelity. LDM 600 8.15 0.013 0.52 0.41 Ours 600 7.10 0.009 0.61 0.47 Churches LDM 250 7.30 0.009 0.59 0.39 Ours 250 6.19 0.005 0.71 0.44 Image Net LDM 350 8.55 0.015 0.51 0.32 Ours 350 7.8 0.010 0.58 0.40 Starting from the converged model train on the Celeb HQ dataset for 600 epochs, we train our method and Control Net for and additional 50 epochs with the sketch as a condition. The 1st stage is trained for 200 steps, so s = 200 means we omit the sketch and feed pure noise. We have trained the 1st stage for 1K epochs as it is very small, only 5 M parameters, and the dataset scale is very small, thus the 1k epochs takes less than 12 hours using a single A100 GPU. Then, we train the 2nd stage, 141 Million parameters, for only 200 epochs. We train the model starting from the SDv1.5 weights for only five epochs.