reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Masked Generative Nested Transformers with Decode Time Scaling

Authors: Sahil Goyal, Debapriya Tula, Gagan Jain, Pradeep Shenoy, Prateek Jain, Sujoy Paul

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We rigorously experiment with Image Net256 256 , Image Net128 128, UCF101, and Kinetics600 to showcase the efﬁcacy of the proposed method for image/video generation and frame prediction. Our experiments show that with almost 3 less compute than baseline, our model obtains competitive performance. Section 5. Experiments and Results
Researcher Affiliation	Collaboration	1Google Deep Mind 2University of California, Los Angeles. Correspondence to: Sahil Goyal <EMAIL>, Sujoy Paul <EMAIL>.
Pseudocode	Yes	Algorithm 1 Ma GNe TS Decoding Algorithm
Open Source Code	No	The paper does not explicitly state that the authors are releasing their code for the methodology described in this paper, nor does it provide a link to a code repository. It mentions using pretrained tokenizers from other works but not their own code.
Open Datasets	Yes	Datasets. We evaluate our model on Image Net256 256 and Image Net128 128 (Deng et al., 2009) for image generation, UCF101 (Soomro et al., 2012) for video generation and Kinetics600 (Carreira et al., 2018) for frame prediction (5frame condition).
Dataset Splits	No	The paper mentions evaluating on well-known datasets like ImageNet, UCF101, and Kinetics600, which have standard splits. However, it does not explicitly state the training/test/validation splits used, nor does it specify if standard splits were followed with explicit percentages or counts. The text only says: 'We train our model for 270 epochs for all the experiments.' and 'We drop input class condition labels for 10% of the training batches in image generation'.
Hardware Specification	Yes	All experiments are run on a single A100 GPU. ... We implement Ma GNe TS on a single TPUv5 chip
Software Dependencies	No	The paper mentions using 'Bert model (Devlin et al., 2019) as a transformer backbone' and 'pretrained tokenizers from Mask GIT (Chang et al., 2022) ... and MAGVIT (Yu et al., 2023a)'. However, it does not provide specific version numbers for these or any other software libraries or frameworks used.
Experiment Setup	Yes	We train our model for 270 epochs for all the experiments. ... We drop input class condition labels for 10% of the training batches in image generation... We mention the details of sampling hyperparameters in Appendix B. ... Table 9: Best Sampling Hyperparameters... We use bias=0.5 and scale=0.8 for all experiments.