reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization

Authors: Yeji Song, Jimyeong Kim, Wonhark Park, Wonsik Shin, Wonjong Rhee, Nojun Kwak

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments Datasets. Since prevailing benchmark datasets (Ruiz et al. 2023; Kumari et al. 2023) primarily utilize the generating prompts related to changing the texture or introducing new objects, they often fall short in effectively evaluating the crucial aspect of modifying subject poses. To address this limitation, we have constructed a new dataset, Deformable Subject Set (DS set), to effectively assess the model s capability to modify a subject s pose. The DS set comprises 38 live animals from the Dream Booth (Ruiz et al. 2023) and Custom Diffusion (Kumari et al. 2023), along with 11 prompts specifically designed to focus on the deformation of the subjects poses. Furthermore, we also utilized the Dream Booth dataset (DB set) (Ruiz et al. 2023) to evaluate the model s capacity in typical scenarios. Metrics. Following Dream Booth (Ruiz et al. 2023), we measured the subject fidelity using CLIP-I and DINO-I, and measured text alignment using CLIP-T. Qualitative Results. We present a comparative analysis of our method with the baselines in Figure 4. Quantitative Results. Table 1 and Figure 5 show quantitative analyses. User Study. We further evaluate our method through the user study conducted with Amazon Mechanical Turk. Ablations. Our orchestration eliminates the conflicting elements in the visual embedding that interfere with the textual embedding.
Researcher Affiliation	Academia	Seoul National University EMAIL
Pseudocode	No	The paper describes its methodology using mathematical equations (Eq. 3, 4, 5) and textual explanations, but does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code	No	The paper mentions an 'Extended version https://arxiv.org/abs/2403.14155' which points to an arXiv preprint, but there is no explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets	Yes	The DS set comprises 38 live animals from the Dream Booth (Ruiz et al. 2023) and Custom Diffusion (Kumari et al. 2023), along with 11 prompts specifically designed to focus on the deformation of the subjects poses. Furthermore, we also utilized the Dream Booth dataset (DB set) (Ruiz et al. 2023) to evaluate the model s capacity in typical scenarios.
Dataset Splits	No	The paper describes the composition and sources of the datasets used (DS set and DB set) but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not list specific versions of software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup	No	The paper describes the proposed methods, qualitative and quantitative results, and ablation studies, but it does not specify concrete experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or training configurations for their models.