Impossible Videos
Authors: Zechen Bai, Hai Ci, Mike Zheng Shou
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations reveal limitations and insights for future directions of video models... Based on this benchmark, we conduct comprehensive evaluations for mainstream video understanding models and generation models... Table 2: Evaluation Results of IPV-TXT Across Dimensions. This table compares the performance of state-of-the-art video generation models using the IPV-TXT benchmark as text prompts in the T2V setting. Table 3: Evaluation Results for Impossible Video Understanding. This table compares the performance of sota Video LLMs using the IPV-Vid benchmark. |
| Researcher Affiliation | Academia | Zechen Bai * 1 Hai Ci * 1 Mike Zheng Shou 1 1Show Lab, National University of Singapore, Singapore. |
| Pseudocode | No | The paper describes the methodology and evaluation process in prose, without presenting any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project page: https://showlab.github.io/Impossible-Videos/. The paper states 'We will make the data public to inspire future research,' but does not explicitly mention the release of source code for the methodology or provide a direct link to a code repository. |
| Open Datasets | Yes | To this end, we introduce IPV-BENCH, a novel benchmark designed to evaluate and foster progress in video understanding and generation. IPV-BENCH is underpinned by a comprehensive taxonomy... Based on the taxonomy, a prompt suite is constructed to evaluate video generation models, challenging their prompt following and creativity capabilities. In addition, a video benchmark is curated to assess Video LLMs on their ability of understanding impossible videos... We will make the data public to inspire future research. Project page: https://showlab.github.io/Impossible-Videos/. |
| Dataset Splits | Yes | To ensure a balanced evaluation, the dataset maintains a 1 : 1 ratio of synthetic to real-world videos. This task is framed as a binary classification problem and evaluated using average Accuracy and F1-score. |
| Hardware Specification | No | The paper evaluates existing video generation and understanding models using a newly introduced benchmark. It does not provide specific hardware details used for running these evaluations or for generating the benchmark videos. |
| Software Dependencies | No | The paper evaluates existing video generation and understanding models and uses tools like GPT-4o and CLIP for certain tasks, but it does not specify software dependencies with version numbers for its own experimental setup or methodology. |
| Experiment Setup | Yes | Specifically, we combine the six factors Subject Consistency, Background Consistency, Motion Smoothness, Aesthetic Quality, Imaging Quality, and Dynamic Degree from VBench to form our final metric... The weights we use for each factor are: 2.0, 2.0, 0.2, 0.2, 2.0, 1.0. |