reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Authors: Meera Hahn, Wenjun Zeng, Nithish Kannen, Rich Galt, Kartikeya Badola, Been Kim, Zi Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments over the three datasets demonstrate the proposed T2I agents ability to ask informative questions and elicit crucial information to achieve successful alignment with at least 2 times higher VQAScore (Lin et al., 2024) than the standard T2I generation. Moreover, we conducted human studies and observed that at least 90% of human subjects found these agents and their belief graphs helpful for their T2I workflow, highlighting the effectiveness of our approach.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Zi Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Belief parsing and interaction
Open Source Code	Yes	Code and Design Bench can be found at https: //github.com/google-deepmind/ proactive_t2i_agents.
Open Datasets	Yes	We experiment over three image-text datasets: Image In Words (Garg et al., 2024), COCO (Lin et al., 2014) and Design Bench, a benchmark we curated with strong artistic and design elements. Code and Design Bench can be found at https: //github.com/google-deepmind/ proactive_t2i_agents. Design Bench3 https://huggingface.co/datasets/meerahahn/Design Bench
Dataset Splits	Yes	We evaluate over the Coco-Captions dataset validation split (Chen et al., 2015)
Hardware Specification	No	We implement the agent belief parsing and interaction in Algorithm 1 on top of the Gemini 1.5 (Gemini Team Google, 2024) using the default temperature and a 32K context length. For T2I generation, we use Imagen 3 (Baldridge et al., 2024) across all baselines given its recency and prompt-following capabilities. We used both the models served publicly using the Vertex API.2 https://cloud.google.com/vertex-ai
Software Dependencies	No	We implement the agent belief parsing and interaction in Algorithm 1 on top of the Gemini 1.5 (Gemini Team Google, 2024) using the default temperature and a 32K context length. For T2I generation, we use Imagen 3 (Baldridge et al., 2024) across all baselines given its recency and prompt-following capabilities. We used both the models served publicly using the Vertex API.2 https://cloud.google.com/vertex-ai
Experiment Setup	Yes	We implement the agent belief parsing and interaction in Algorithm 1 on top of the Gemini 1.5 (Gemini Team Google, 2024) using the default temperature and a 32K context length.