reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fractal Generative Models

Authors: Tianhong Li, Qinyi Sun, Lijie Fan, Kaiming He

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show strong performance in both likelihood estimation and generation quality. We conduct extensive experiments on the Image Net dataset (Deng et al., 2009) with resolutions at 64 64 and 256 256. Our evaluation includes both unconditional and class-conditional image generation, covering various aspects of the model such as likelihood estimation, fidelity, diversity, and generation quality. Accordingly, we report the negative log-likelihood (NLL), Frechet Inception Distance (FID) (Heusel et al., 2017), Inception Score (IS) (Salimans et al., 2016), Precision and Recall (Dhariwal & Nichol, 2021a), and visualization results for a comprehensive assessment of our fractal framework.
Researcher Affiliation	Collaboration	Tianhong Li EMAIL MIT Qinyi Sun EMAIL MIT Lijie Fan EMAIL Google Deep Mind Kaiming He EMAIL MIT
Pseudocode	No	The paper describes implementation details and processes in prose within Appendix A, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We also provide our source code in the supplementary materials. All codes and models will be made publicly available.
Open Datasets	Yes	We conduct extensive experiments on the Image Net dataset (Deng et al., 2009) with resolutions at 64 64 and 256 256.
Dataset Splits	Yes	We conduct extensive experiments on the Image Net dataset (Deng et al., 2009) with resolutions at 64 64 and 256 256. More fractal levels achieve better likelihood estimation performance with lower computational costs, measured on unconditional Image Net 64 64 test set.
Hardware Specification	Yes	The training time is measured per training iteration on 1 H100 GPU with batch size 8. Fractal MAR-H achieves an FID of 6.15 and an Inception Score of 348.9, with an average throughput of 1.29 seconds per image (evaluated at a batch size of 1,024 on a single Nvidia H100 PCIe GPU). The 64 64 model takes 3.5 days, and the 256 256 Fractal MAR-L model takes 7.6 days on 32 H100 GPUs.
Software Dependencies	No	The paper mentions using the Adam W optimizer and various model architectures, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We train our fractal model end-to-end directly on raw image pixels following a breadth-first manner through the fractal architecture. ... The models are trained using the Adam W optimizer (Loshchilov & Hutter, 2019) for 800 epochs (the Fractal MAR-H model is trained for 600 epochs). The weight decay and momenta for Adam W are 0.05 and (0.9, 0.95). We use a batch size of 2048 for Image Net 64 64 and 1024 for Image Net 256 256, and a base learning rate (lr) of 5e-5 (scaled by batch size divided by 256). The model is trained with 40 epochs linear lr warmup (Goyal et al., 2017), followed by a cosine lr schedule.