GameGen-X: Interactive Open-world Game Video Generation
Authors: Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, Hao CHEN
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS 4.1 QUANTITATIVE RESULTS Metrics. To comprehensively evaluate the performance of Game Gen-X, we utilize a suite of metrics that capture various aspects of video generation quality and interactive control, following Huang et al. (2024b) and Yang et al. (2024). These metrics include Fr echet Inception Distance (FID), Fr echet Video Distance (FVD), Text-Video Alignment (TVA), User Preference (UP), Motion Smoothness (MS), Dynamic Degrees (DD), Subject Consistency (SC), and Imaging Quality (IQ). ... Table 2: Generation Performance Evaluation ... Table 3: Control Performance Evaluation ... Table 4: Ablation Study for Generation Ability ... Table 5: Ablation Study for Control Ability. |
| Researcher Affiliation | Academia | 1The Hong Kong University of Science and Technology 2University of Science and Technology of China 3Hefei Institute of Physical Science, Chinese Academy of Sciences 4The Chinese University of Hong Kong EMAIL EMAIL EMAIL |
| Pseudocode | Yes | The pseudo-codes of our feature processing pipeline and the Masked Temporal Transformer block are shown in the following. 1 class Base Model: 2 initialize(config): ... 1 class Temporal Transformer Block: 2 initialize(hidden_size, num_heads): |
| Open Source Code | Yes | The project will be available at https://github.com/Game Gen-X/Game Gen-X. |
| Open Datasets | Yes | To realize this vision, we first collected and built an Open-World Video Game Dataset (OGame Data) from scratch. It is the first and largest dataset for open-world game video generation and control, which comprises over one million diverse gameplay video clips with informative captions. ... B.1 DATA AVAILABILITY STATEMENT AND CLARIFICATION We are committed to maintaining transparency and compliance in our data collection and sharing methods. Please note the following: Publicly Available Data: The data utilized in our studies is publicly available. ... Data License: The dataset is made available under the Creative Commons Attribution 4.0 International License (CC BY 4.0). |
| Dataset Splits | No | For the OGame Eval-Gen dataset contains 50 text-video pairs sampled from the OGame Data-GEN dataset, ensuring that these samples were not used during training. For the OGame Eval-Ins dataset, we sampled the last frame of ten videos from the OGame Data-INS eval dataset, which were also unused during training. ... Specifically, we sample 20k samples from OGame Data-GEN to train the generation ability and 10k samples from OGame Data-INS to train the control ability. This resulted in two datasets, OGame Data-GEN-Abl and OGame Data-INS-Abl. This text describes specific subsets for evaluation and ablation studies, but does not provide a general training/validation/test split for the entire OGame Data in percentages or absolute counts. |
| Hardware Specification | Yes | Regarding computational resources, our training infrastructure consisted of 24 NVIDIA H800 GPUs distributed across three servers, with each server hosting 8 GPUs equipped with 80GB of memory per unit. |
| Software Dependencies | Yes | We conducted 30 open-domain generation inferences on a single A800 and a single H800 GPU, with the CUDA environment set to 12.1. |
| Experiment Setup | Yes | We adopted a two-phase training strategy to build our model. In the first phase, our goal was to train a foundation model capable of both video continuation and generation. To achieve this, we allocated 75% of the training probability to text-to-video generation tasks and 25% to video extension tasks. ... The Adam optimizer with a fixed learning rate of 5e-4 was applied for 20 epochs. Additionally, we followed common practices in diffusion models by randomly dropping text inputs with a 25% probability to strengthen the model s generative capabilities Ho & Salimans (2021). |