Effective and Efficient Masked Image Generation Models
Authors: Zebin You, Jingyang Ou, Xiaolu Zhang, Jun Hu, Jun Zhou, Chongxuan Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, e MIGM demonstrates strong performance on Image Net generation, as measured by Fr échet Inception Distance (FID). In particular, on Image Net 256 256, with similar number of function evaluations (NFEs) and model parameters, e MIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, e MIGM achieves performance comparable to the state-of-the-art continuous diffusion model REPA while requiring less than 45% of the NFE. Additionally, on Image Net 512 512, e MIGM outperforms the strong continuous diffusion model EDM2. |
| Researcher Affiliation | Collaboration | 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China. 2Beijing Key Laboratory of Research on Large Models and Intelligent Governance. 3Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE. 4Ant Group. |
| Pseudocode | No | The paper describes mathematical formulations and processes but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https: //github.com/ML-GSAI/e MIGM. |
| Open Datasets | Yes | Building on our training and sampling improvements, we develop e MIGM and evaluate it on Image Net (Deng et al., 2009) at 256 256 and 512 512 resolutions. |
| Dataset Splits | No | The paper evaluates on Image Net (Deng et al., 2009) at 256 256 and 512 512 resolutions, but does not explicitly provide specific training, validation, or test dataset splits. It mentions using FID for benchmarking, which implies evaluation on a standard dataset, but no explicit split information is detailed. |
| Hardware Specification | Yes | The speed is measured using a single A100 GPU with a batch size of 256. ... we conducted additional experiments to compare sampling speeds on a single A100 GPU (batch size 256) |
| Software Dependencies | No | The paper states: "We implement e MIGM upon the official code of MAR (Li et al., 2024), DC-AE (Chen et al., 2024), DPM-Solver (Lu et al., 2022a;b), whose code links and licenses are presented in Tab. 5." However, it does not specify version numbers for these software components or other libraries like Python or PyTorch. |
| Experiment Setup | Yes | Table 6. Training configurations of models on Image Net 256 256. Table 7. Training configurations of models on Image Net 512 512. These tables list specific hyperparameters such as 'Epochs', 'Learning rate', 'Batch size', 'Adam β1', and 'Adam β2'. |