reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models

Authors: Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang Nie, Tat-Seng Chua

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on TIGe R-Bench and two retrieval benchmarks, i.e., Flickr30K and MS-COCO, demonstrate the superiority of our proposed framework. The code, models, and benchmark are available at https://tiger-t2i.github.io.
Researcher Affiliation	Academia	1National University of Singapore 2Nanyang Technological University 3University of Science and Technology of China 4Hong Kong Polytechnic University 5Harbin Institute of Technology (Shenzhen)
Pseudocode	No	The paper describes methods in prose and does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code, models, and benchmark are available at https://tiger-t2i.github.io.
Open Datasets	Yes	To standardize the evaluation of unified text-to-image generation and retrieval, we construct TIGe R-Bench, a benchmark spanning both creative and knowledge-intensive domains. Extensive experiments on TIGe R-Bench and two retrieval benchmarks, i.e., Flickr30K and MS-COCO, demonstrate the superiority of our proposed framework. The code, models, and benchmark are available at https://tiger-t2i.github.io.
Dataset Splits	Yes	To evaluate text-to-image generation and retrieval, we prioritize selecting the original test split of each dataset to construct TIGe R-Bench. In cases where only a validation set is provided, we default to utilizing the validation set. ... We keep the ratio of 1 : 1 for creative and knowledge domains and collect 6,000 high-quality text-image pairs in total.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software like SEED-LLa MA, La VIT, SDXL, CLIP, and LLMs but does not provide specific version numbers for any of these or other key software components.
Experiment Setup	Yes	We utilize the 8B version of SEED-LLa MA and load the parameters of supervised fine-tuning. For La VIT, we employ the 11B model with SDXL as the pixel decoder. ... The beam size for retrieval is set to 800, and the timestep for generation is 25.