reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

Authors: Xingyu Zheng, Xianglong Liu, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jinyang Guo, Michele Magno

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments demonstrate that Binary DM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), Binary DM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87).
Researcher Affiliation	Academia	1Beihang University 2ETH Zürich 3Nanyang Technological University 4Zhongguancun Laboratory 5Xi an Jiaotong University
Pseudocode	No	The paper describes the methods Evolvable-Basis Binarizer and Low-rank Representation Mimicking using mathematical formulations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link to a source-code repository, nor does it explicitly state that the code for their methodology is released or available in supplementary materials. It mentions using 'Larq[1]' but that refers to a third-party library, not their own implementation.
Open Datasets	Yes	We conduct experiments on various datasets, including CIFAR-10 32 32 (Krizhevsky et al., 2009), LSUN-Bedrooms 256 256 (Yu et al., 2015), LSUN-Churches 256 256 (Yu et al., 2015), FFHQ 256 256 (Karras et al., 2019) and Image Net 256 256 (Deng et al., 2009)
Dataset Splits	No	The paper mentions generating 50,000 samples for evaluation metrics and using reference batches containing all corresponding datasets, but it does not explicitly provide details about the training, validation, or test splits used for the models themselves, beyond standard benchmark usage without explicit numbers or percentages for the splits.
Hardware Specification	Yes	All our experiments were conducted on a server with Intel Xeon Gold 6336Y 2.40@GHz CPU and NVIDIA A100 40GB GPU.
Software Dependencies	No	The paper mentions 'ADM’s Tensor Flow evaluation suite' for evaluation metrics and 'Larq' as a deployment library, but does not provide specific version numbers for these or any other key software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Relative to the training hyperparameters of the full precision model, we adjust the learning rate, reducing it to one-tenth to one-hundredth of the corresponding rate in the original full precision training script, especially on certain datasets, such as CIFAR-10 and Image Net. For DDIM training, we set the batch size to 64, while for LDM training, the batch size is configured as 4. Typically, models are trained for around 200K iterations.