DialogDraw: Image Generation and Editing System Based on Multi-Turn Dialogue

Authors: Shichao Ma, Xinfeng Zhang, Zeng Zhao, Bai Liu, Changjie Fan, Zhipeng Hu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation demonstrates that Dialog Draw excels in command compliance, identifying and adapting to user drawing intentions, thereby proving the effectiveness of our method. To evaluate Dialog Draw s functionality, we propose Drawn Convos, a dataset rich in drawing functions and command dialogue data collected from the open-source community. Our evaluation demonstrates that Dialog Draw excels in command compliance, identifying and adapting to user drawing intentions, thereby proving the effectiveness of our method. Moreover, we employ SFT and RLHF to iterate the Intention Recognition and Parameter Extraction Model (IRPEM). To evaluate Dialog Draw s functionality, we propose Drawn Convos, a dataset rich in drawing functions and command dialogue data collected from the open-source community. Our evaluation demonstrates that Dialog Draw excels in command compliance, identifying and adapting to user drawing intentions, thereby proving the effectiveness of our method. 4 Experiments
Researcher Affiliation Collaboration Fuxi AI Lab, Net Ease Inc. EMAIL, EMAIL
Pseudocode No The paper describes methodologies and processes but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper discusses integrating with 'numerous open-source drawing workflows and models' and mentions 'open-source community' platforms like Civitai and Open Art, but it does not provide an explicit statement or a link to the source code for the Dialog Draw system itself.
Open Datasets No We create a new dataset named Drawn Convos, a dataset of multi-round dialogues including image generation and editing, which incorporates numerous open-source workflows and models. Using this dataset, we apply SFT and RLHF methods for IRPEM training. While the paper describes the creation of the 'Drawn Convos' dataset, it does not provide concrete access information such as a specific link, DOI, or repository for its public availability.
Dataset Splits Yes We then randomly divided Drawn Convos into Drawn Convos(SFT), Drawn Convos(HF), and Drawn Convos(TEST) in a 6:3:1 ratio.
Hardware Specification Yes All our experiments are performed on four NVIDIA A100 GPUs using the Py Torch framework.
Software Dependencies No The paper mentions 'Py Torch framework' and initializing the model with 'a pre-trained Qwen VL (Bai et al. 2023) model', but it does not provide specific version numbers for PyTorch or any other software dependencies crucial for replication.
Experiment Setup Yes In the first phase, we trained the model for 50 epochs using Drawn Convos(SFT) to obtain IRPEM(SFT). The second phase was built upon the first, where we further trained the model for another 50 epochs using Drawn Convos(RLHF) to achieve IRPEM(RLHF). Both phases utilized the Adam W optimizer with weight decay set to 0.1 and 0.05, respectively. The initial learning rate for both stages is initialized as 1e-5.