Masked Generative Nested Transformers with Decode Time Scaling
Authors: Sahil Goyal, Debapriya Tula, Gagan Jain, Pradeep Shenoy, Prateek Jain, Sujoy Paul
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously experiment with Image Net256 256 , Image Net128 128, UCF101, and Kinetics600 to showcase the efficacy of the proposed method for image/video generation and frame prediction. Our experiments show that with almost 3 less compute than baseline, our model obtains competitive performance. Section 5. Experiments and Results |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2University of California, Los Angeles. Correspondence to: Sahil Goyal <EMAIL>, Sujoy Paul <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Ma GNe TS Decoding Algorithm |
| Open Source Code | No | The paper does not explicitly state that the authors are releasing their code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using pretrained tokenizers from other works but not their own code. |
| Open Datasets | Yes | Datasets. We evaluate our model on Image Net256 256 and Image Net128 128 (Deng et al., 2009) for image generation, UCF101 (Soomro et al., 2012) for video generation and Kinetics600 (Carreira et al., 2018) for frame prediction (5frame condition). |
| Dataset Splits | No | The paper mentions evaluating on well-known datasets like ImageNet, UCF101, and Kinetics600, which have standard splits. However, it does not explicitly state the training/test/validation splits used, nor does it specify if standard splits were followed with explicit percentages or counts. The text only says: 'We train our model for 270 epochs for all the experiments.' and 'We drop input class condition labels for 10% of the training batches in image generation'. |
| Hardware Specification | Yes | All experiments are run on a single A100 GPU. ... We implement Ma GNe TS on a single TPUv5 chip |
| Software Dependencies | No | The paper mentions using 'Bert model (Devlin et al., 2019) as a transformer backbone' and 'pretrained tokenizers from Mask GIT (Chang et al., 2022) ... and MAGVIT (Yu et al., 2023a)'. However, it does not provide specific version numbers for these or any other software libraries or frameworks used. |
| Experiment Setup | Yes | We train our model for 270 epochs for all the experiments. ... We drop input class condition labels for 10% of the training batches in image generation... We mention the details of sampling hyperparameters in Appendix B. ... Table 9: Best Sampling Hyperparameters... We use bias=0.5 and scale=0.8 for all experiments. |