LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Authors: Amaia Cardiel, Eloi Zablocki, Elias Ramzi, Oriane Siméoni, MATTHIEU CORD
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LLM-wrapper on multiple datasets using different VLMs and LLMs, demonstrating significant performance improvements and highlighting the versatility of our method. |
| Researcher Affiliation | Collaboration | Amaia Cardiel1,2, Eloi Zablocki1, Elias Ramzi1, Oriane Sim eoni1, Matthieu Cord1,3 1 Valeo.ai 2 APTIKAL, LIG, Universit e Grenoble Alpes 3 Sorbonne Universit e EMAIL |
| Pseudocode | No | The paper describes the method using natural language and figures, but does not contain a dedicated pseudocode block or algorithm section. |
| Open Source Code | Yes | The code and the checkpoints are available at https://github.com/valeoai/LLM_wrapper. |
| Open Datasets | Yes | We experiment with LLM-wrapper on three classic REC datasets Ref COCO, Ref COCO+ (Kazemzadeh et al., 2014), Ref COCOg (Mao et al., 2016) and on Talk2Car (Deruyttere et al., 2019), Additionaly, we evaluate LLM-wrapper on the recent and challenging HC-Ref Lo Co (Wei et al., 2024) benchmark. |
| Dataset Splits | Yes | Dataset statistics are given in Table 2: Dataset statistics. Split Size # words / query Ref COCO unc 120,624 10,834 10,752 3.5 Ref COCO+ unc 120,191 10,758 10,615 3.5 Ref COCOg umd 80,512 4,896 9,602 8.3 Talk2Car 8,348 1,163 2,447 11.0 HC-Ref Lo Co 13,360 31,378 84.6 |
| Hardware Specification | Yes | This approach makes the training efficient in terms of compute and very simple to implement in practice. ... trainable on a single 40GB-A100 GPU in less than 7 hours. |
| Software Dependencies | No | The paper mentions methods and tools like LoRA, Flash Attention, 4-bit quantization, Adam, and Hugging Face's supervised fine-tuning pipeline, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We train LLM-wrapper with Adam (Kingma, 2014), with a batch-size of four, until convergence. ... Unless stated otherwise, we use a learning rate of 10 5 and a rank of r = 128 for Lo RA. |