Are Large Vision Language Models Good Game Players?
Authors: Xinyu Wang, Bohan Zhuang, Qi Wu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on this framework, we conduct extensive experiments that explore the limitations of current LVLMs, such as handling long structured outputs and perceiving detailed and dense elements. Code and data are publicly available at https://github.com/xinkewang/LVLM-Playground. |
| Researcher Affiliation | Academia | 1The University of Adelaide, Australia 2Zhejiang University, China |
| Pseudocode | No | The paper describes methodologies and calculations using formulas but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code and data are publicly available at https://github.com/xinkewang/LVLM-Playground. |
| Open Datasets | Yes | Code and data are publicly available at https://github.com/xinkewang/LVLM-Playground. |
| Dataset Splits | Yes | For Perceiving, Question Answering, and Rule-Following tasks, we utilized the simulator to generate 2,000 samples for each, followed by offline evaluation. For the End-to-End playing task, we conducted online evaluations, running 100 gameplays per model. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. It only lists the models that were evaluated. |
| Software Dependencies | No | The paper mentions support for 'commercial models, such as Open AI API, and open-source models, like those from the Hugging Face Transformers library' and uses the 'Stockfish1 engine' for Chess AI, but it does not provide specific version numbers for these or other software components. |
| Experiment Setup | Yes | All models were evaluated under the same conditions, including identical settings for maximum new tokens and task prompts. |