Are Large Vision Language Models Good Game Players?

Authors: Xinyu Wang, Bohan Zhuang, Qi Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on this framework, we conduct extensive experiments that explore the limitations of current LVLMs, such as handling long structured outputs and perceiving detailed and dense elements. Code and data are publicly available at https://github.com/xinkewang/LVLM-Playground.
Researcher Affiliation Academia 1The University of Adelaide, Australia 2Zhejiang University, China
Pseudocode No The paper describes methodologies and calculations using formulas but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code and data are publicly available at https://github.com/xinkewang/LVLM-Playground.
Open Datasets Yes Code and data are publicly available at https://github.com/xinkewang/LVLM-Playground.
Dataset Splits Yes For Perceiving, Question Answering, and Rule-Following tasks, we utilized the simulator to generate 2,000 samples for each, followed by offline evaluation. For the End-to-End playing task, we conducted online evaluations, running 100 gameplays per model.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. It only lists the models that were evaluated.
Software Dependencies No The paper mentions support for 'commercial models, such as Open AI API, and open-source models, like those from the Hugging Face Transformers library' and uses the 'Stockfish1 engine' for Chess AI, but it does not provide specific version numbers for these or other software components.
Experiment Setup Yes All models were evaluated under the same conditions, including identical settings for maximum new tokens and task prompts.