reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EMControl: Adding Conditional Control to Text-to-Image Diffusion Models via Expectation-Maximization

Authors: He Wang, Longquan Dai, Jinhui Tang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments We implement EMControl across various conditions, including canny edge (Canny 1986), depth map (Yang et al. 2024b), normal map (Vasiljevic et al. 2019), M-LSD lines (Gu et al. 2022), HED edge (Xie and Tu 2015), semantic segmentation (Cheng et al. 2022), skeleton (Cao et al. 2017), sketch, object location (Redmon et al. 2016) and style guidance (Radford et al. 2021). In this section, we present the generated results and provide a comparison with existing methods to demonstrate the effectiveness of our approach.
Researcher Affiliation	Academia	He Wang, Longquan Dai*, Jinhui Tang Nanjing University of Science and Technology, China EMAIL
Pseudocode	Yes	Algorithm 1: EMControl Sampling
Open Source Code	No	The paper does not provide any explicit statements about the availability of open-source code or links to a code repository.
Open Datasets	Yes	We trained our model on approximately 156,000 images from COCO2017 (Lin et al. 2014), covering a range of tasks. [...] For the aspect of style guidance, we further integrated approximately 81,000 images from Wiki-Art (Tan et al. 2019).
Dataset Splits	Yes	To evaluate the performance of different methods, we used the COCO2017 validation set comprising 5,000 image-text pairs.
Hardware Specification	Yes	During training, the model commenced with the SDv1.5 checkpoint and was trained for 20 hours on a single NVIDIA RTX3090 GPU.
Software Dependencies	No	A batch size of 1 was utilized alongside the Adam W optimizer at a learning rate of 1e-5, where the inputs, including images and condition, were scaled down to 512 512 pixels. For EMControl sampling, we employed the DDPM (Ho, Jain, and Abbeel 2020) scheduler across 20 time steps. This text mentions specific optimizers and schedulers, but not their software versions or versions of the underlying libraries like PyTorch/TensorFlow, Python, CUDA etc.
Experiment Setup	Yes	Experimental Setup Our model for the latent forward network Aθ(zt, t) is based on U-Net (Ronneberger, Fischer, and Brox 2015). During training, the model commenced with the SDv1.5 checkpoint and was trained for 20 hours on a single NVIDIA RTX3090 GPU. A batch size of 1 was utilized alongside the Adam W optimizer at a learning rate of 1e-5, where the inputs, including images and condition, were scaled down to 512 512 pixels. For EMControl sampling, we employed the DDPM (Ho, Jain, and Abbeel 2020) scheduler across 20 time steps.