reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Object-Centric Representation Learning using Masked Generative Modeling

Authors: Akihiro Nakano, Masahiro Suzuki, Yutaka Matsuo

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that MOGENT significantly improves computational efficiency, accelerating the generation process by up to 67x and 17x compared to autoregressive models and diffusion-based models, respectively. Importantly, the efficiency is attained while maintaining strong or competitive performance on object segmentation and compositional generation tasks.
Researcher Affiliation	Academia	Akihiro Nakano EMAIL Graduate School of Engineering The University of Tokyo
Pseudocode	Yes	For inference, we use the iterative parallel decoding scheme of Mask GIT. We start with a blank canvas with all tokens masked out and operate the following procedures iteratively for T steps; (1) Predict the probabilities for all the masked tokens at step t, z<t = z mt. (2) Sample a token based on the predicted probabilities. (3) Compute the number of tokens to mask using the mask scheduler function. (4) Decide tokens to unmask for the next iteration, zt using the schedule from (3) and the log probabilities from (1) used as confidence score.
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code for MOGENT, nor does it provide a direct link to a code repository for their methodology.
Open Datasets	Yes	We evaluate on four datasets with distinct characteristics: 3D Shapes dataset (Burgess & Kim, 2018), CLEVR dataset (Johnson et al., 2017), CLEVRTex dataset (Karazija et al., 2021), and Celeb A dataset (Liu et al., 2015).
Dataset Splits	No	The paper mentions '3D Shapes dataset consists of 400K training images', but does not provide explicit details for training, validation, and test splits for all datasets, nor does it cite predefined splits used for all of them.
Hardware Specification	Yes	All metrics were computed on a single NVIDIA Tesla V100 GPU, with batch size of 64 for training and 1 for test.
Software Dependencies	No	The paper mentions using 'Adam optimizer (Kingma, 2015)' but does not specify software dependencies like Python, PyTorch, or TensorFlow with their respective version numbers.
Experiment Setup	Yes	The hyperparameters used for our experiments are reported in Table 8 and Table 9. We used a fixed learning rate of 3e-4 for the DVAE and a learning rate of 1e-4 with linear warmup for stable learning.