reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GameGen-X: Interactive Open-world Game Video Generation

Authors: Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, Hao CHEN

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS 4.1 QUANTITATIVE RESULTS Metrics. To comprehensively evaluate the performance of Game Gen-X, we utilize a suite of metrics that capture various aspects of video generation quality and interactive control, following Huang et al. (2024b) and Yang et al. (2024). These metrics include Fr echet Inception Distance (FID), Fr echet Video Distance (FVD), Text-Video Alignment (TVA), User Preference (UP), Motion Smoothness (MS), Dynamic Degrees (DD), Subject Consistency (SC), and Imaging Quality (IQ). ... Table 2: Generation Performance Evaluation ... Table 3: Control Performance Evaluation ... Table 4: Ablation Study for Generation Ability ... Table 5: Ablation Study for Control Ability.
Researcher Affiliation	Academia	1The Hong Kong University of Science and Technology 2University of Science and Technology of China 3Hefei Institute of Physical Science, Chinese Academy of Sciences 4The Chinese University of Hong Kong EMAIL EMAIL EMAIL
Pseudocode	Yes	The pseudo-codes of our feature processing pipeline and the Masked Temporal Transformer block are shown in the following. 1 class Base Model: 2 initialize(config): ... 1 class Temporal Transformer Block: 2 initialize(hidden_size, num_heads):
Open Source Code	Yes	The project will be available at https://github.com/Game Gen-X/Game Gen-X.
Open Datasets	Yes	To realize this vision, we first collected and built an Open-World Video Game Dataset (OGame Data) from scratch. It is the first and largest dataset for open-world game video generation and control, which comprises over one million diverse gameplay video clips with informative captions. ... B.1 DATA AVAILABILITY STATEMENT AND CLARIFICATION We are committed to maintaining transparency and compliance in our data collection and sharing methods. Please note the following: Publicly Available Data: The data utilized in our studies is publicly available. ... Data License: The dataset is made available under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Dataset Splits	No	For the OGame Eval-Gen dataset contains 50 text-video pairs sampled from the OGame Data-GEN dataset, ensuring that these samples were not used during training. For the OGame Eval-Ins dataset, we sampled the last frame of ten videos from the OGame Data-INS eval dataset, which were also unused during training. ... Specifically, we sample 20k samples from OGame Data-GEN to train the generation ability and 10k samples from OGame Data-INS to train the control ability. This resulted in two datasets, OGame Data-GEN-Abl and OGame Data-INS-Abl. This text describes specific subsets for evaluation and ablation studies, but does not provide a general training/validation/test split for the entire OGame Data in percentages or absolute counts.
Hardware Specification	Yes	Regarding computational resources, our training infrastructure consisted of 24 NVIDIA H800 GPUs distributed across three servers, with each server hosting 8 GPUs equipped with 80GB of memory per unit.
Software Dependencies	Yes	We conducted 30 open-domain generation inferences on a single A800 and a single H800 GPU, with the CUDA environment set to 12.1.
Experiment Setup	Yes	We adopted a two-phase training strategy to build our model. In the first phase, our goal was to train a foundation model capable of both video continuation and generation. To achieve this, we allocated 75% of the training probability to text-to-video generation tasks and 25% to video extension tasks. ... The Adam optimizer with a fixed learning rate of 5e-4 was applied for 20 epochs. Additionally, we followed common practices in diffusion models by randomly dropping text inputs with a 25% probability to strengthen the model s generative capabilities Ho & Salimans (2021).