reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Authors: Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that VQ4Di T establishes a new state-of-the-art in model size and performance trade-offs, quantizing weights to 2-bit precision while retaining acceptable image generation quality. The paper includes sections like 'Experiments', 'Experimental Settings', 'Main Results', and 'Ablation Study' and presents performance metrics in tables.
Researcher Affiliation	Collaboration	Juncan Deng 1, Shuaiting Li1, Zeyu Wang 1, Hong Gu 2, Kedong Xu 2, Kejie Huang 1 1Zhejiang University 2vivo Mobile Communication Co., Ltd EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. The methodology is described in prose and visualized in a pipeline diagram (Figure 1).
Open Source Code	No	The paper does not provide an explicit statement about the availability of source code, nor does it include any links to a code repository.
Open Datasets	Yes	Training Di Ts typically relies on the Image Net dataset (Russakovsky et al. 2015). Our method achieves competitive evaluation results compared to full-precision models on the Image Net (Russakovsky et al. 2015) benchmark.
Dataset Splits	No	The paper states: 'We select the pre-trained Di T XL/2 model as the floating-point reference model, which has two versions for generating images with resolutions of 256 256 and 512 512, respectively.' and 'The validation setup is generally consistent with the settings used in the original Di T paper (Peebles and Xie 2023).' While it references the Image Net dataset and mentions sampling 10k images for evaluation (which are generated images, not dataset splits), it does not explicitly provide the training/test/validation splits for the Image Net dataset used to train or evaluate the Di T models within this paper. It mentions a 'zero-data and block-wise calibration method' for its own calibration step.
Hardware Specification	Yes	VQ4Di T quantizes a Di T XL/2 model on a single NVIDIA A100 GPU within 20 minutes to 5 hours depending on the different quantization settings. ... allowing the experiments to be conducted on a single NVIDIA A100 GPU within 20 minutes to 5 hours.
Software Dependencies	No	The paper mentions using 'RMSprop optimizer' but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with their version numbers that would be necessary to replicate the experiment.
Experiment Setup	Yes	We calibrate all quantized models using RMSprop optimizer, with a constant learning rate of 5 10 2 for ratios of candidate assignments and 1 10 4 for other parameters. The batch size and iteration are set to 16 and 500 respectively. We employ a DDPM scheduler with sampling timesteps of 50, 100, and 250. The classifierfree guidance (CFG) is set to 1.5.