reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Authors: Anh Duc Nguyen, Ilia Markov, Zhengqing Wu, Ali Ramezani-Kebrya, Kimon Antonakopoulos, Dan Alistarh, Volkan Cevher

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that QODA achieves up to a 150% speedup over the baselines in end-to-end training time for training Wasserstein GAN on 12+ GPUs. ... In Section 7, we provide empirical studies on GANs and Transformer-XL.
Researcher Affiliation	Collaboration	1National University of Singapore (NUS)... 2Neural Magic 3Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL) 4University of Oslo (Ui O)... 7Institute of Science and Technology Austria (ISTA).
Pseudocode	Yes	Algorithm 1: Quantized Optimistic Dual Averaging (QODA)
Open Source Code	Yes	We used the implementation of (Markov et al., 2024) and provide our code in the supplementary material.
Open Datasets	Yes	We have implemented QODA in Algorithm 1... and train WGAN (Arjovsky et al., 2017) on CIFAR10 and CIFAR100 (Krizhevsky, 2009). ... training Transformer-XL on Wiki Text-103.
Dataset Splits	No	The paper mentions using well-known datasets like CIFAR10, CIFAR100, and Wiki Text-103 but does not explicitly state the training, validation, or test splits used for these datasets. It refers to 'training recipe' and 'hyperparameters as in the original codebase' but lacks specific split percentages or methodologies.
Hardware Specification	Yes	In our experiments, we use 4 to 16 nodes, each with a single NVIDIA RTX 3090 GPU, in a multi-node Genesis Cloud environment... We used 8 NVIDIA GH200 120GB GPUs for the experiments here.
Software Dependencies	No	We use the torch_cgx Pytorch extension (Markov et al., 2022). Moreover, we adapt compression choices layer-wise, following the L-Gre Co (Markov et al., 2024) algorithm. For the communication backend, we pick the best option for quantized and full-precision regimes: Open MPI (ope, 2023) and NCCL (ncc, 2023), respectively. The paper mentions software like PyTorch, Open MPI, and NCCL but does not provide specific version numbers for these components.
Experiment Setup	Yes	We follow the training recipe of Q-Gen X (Ramezani-Kebrya et al., 2023), where authors set large batch size (1024) and keep all other hyperparameters as in the original codebase of (Gidel et al., 2018). For global and layer-wise compression, we use 5 bits (with bucket size 128), and run the L-Gre Co adaptive compression algorithm every 10K optimization steps for both the generator and discriminator models.