reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

Authors: Gihyun Kwon, Jong Chul YE

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate that Tweedie Mix can generate highquality multi-concept generation results on both of image and video domains. More results can be found in the experiment section. Section 5: EXPERIMENTAL RESULTS. Table 1: Quantitative Evaluation of Multi-Concept Image Generation. Figure 5: Qualitative Evaluation of Multi-Concept Image Generation. Table 2: Ablation Study on Image Generation. Quantitative evaluation on ablation study.
Researcher Affiliation	Collaboration	Gihyun Kwon KRAFTON EMAIL. Jong Chul Ye Kim Jaechul Graduate School of AI, KAIST EMAIL
Pseudocode	No	The paper describes methods and formulas in prose and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Results and source code are in our project page.1 https://github.com/Kwon Gihyun/Tweedie Mix
Open Datasets	Yes	For the evaluation dataset, we utilized the dataset proposed in the prior work, drawing from various data sources for both quantitative and qualitative analyses. For the quantitative evaluation, we selected 32 distinct concepts from the Custom Concept 101 dataset (Kumari et al., 2023), organized into 10 unique combinations.
Dataset Splits	No	The paper mentions selecting 32 distinct concepts from the Custom Concept 101 dataset for quantitative evaluation and expanding the concept pool for qualitative analysis. It also states "All the dataset contains 5 8 images per each concept." However, it does not provide specific training, validation, or test splits (e.g., percentages or counts for different sets) for the datasets used in its experiments, nor does it refer to standard predefined splits for its evaluation setup.
Hardware Specification	Yes	In terms of sampling time, it takes approximately 30 seconds using a single NVIDIA RTX 3090 GPU. This process took approximately 50 seconds on a single RTX 3090 GPU.
Software Dependencies	No	The paper mentions using "Stable Diffusion 2.1 or higher" as the backbone model and refers to specific models like "langsam (Medeiros, 2023) package, which combines Grounding DINO (Liu et al., 2023b) and Segment-Anything models (Kirillov et al., 2023)" and "I2VGen-XL (Zhang et al., 2023b)". However, it does not provide specific version numbers for general software dependencies like programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries (e.g., CUDA) that are typically needed for reproducibility.
Experiment Setup	Yes	Regarding sampling hyperparameters, we set the reference timestep tcon for content-aware sampling to 0.8T, and we found that values between 0.8T and 0.7T did not significantly affect output quality. The total timestep is set to T=50, and the we used image resolution of 768x768. For resampling, we used P = 10... For the video model, we used the recently proposed image-to-video model, I2VGen-XL (Zhang et al., 2023b). For video sampling, we set T=50. The total number of frames was 16, with a resolution of 512x512. For the lowest resolution blocks, we set η = 1, and for the first upsampling block, we set η = 0.3.