reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentiable Model Compression via Pseudo Quantization Noise

Authors: Alexandre Défossez, Yossi Adi, Gabriel Synnaeve

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally verify that our method is competitive with STE based quantization techniques on several benchmarks and architectures for image classification, language modeling, and audio source separation. For instance, on the Image Net dataset, Diff Q compresses a 12 layers transformer-based model by more than a factor of 8, (lower than 4 bits precision per weight on average), with a loss of 0.3% in model accuracy. Code is available at github.com/facebookresearch/diffq.
Researcher Affiliation	Industry	Alexandre Défossez EMAIL Meta AI, FAIR Team, Paris, France Yossi Adi EMAIL Meta AI, FAIR Team, Tel-Aviv, Israel Gabriel Synnaeve EMAIL Meta AI, FAIR Team, Paris, France
Pseudocode	No	The paper describes methods using mathematical formulations and descriptive text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at github.com/facebookresearch/diffq. Both experimental code, and a generic framework usable with any architecture in just a few lines, is available on our Github github.com/facebookresearch/diffq.
Open Datasets	Yes	For instance, on the Image Net dataset (Deng et al., 2009)... We trained a 16 layers transformer (Vaswani et al., 2017) based language model on the Wikitext-103 text corpus (Merity et al., 2016)... The model is trained on the standard Mus DB benchmark (Rafii et al., 2017)... We evaluated three image classification benchmarks: Image Net Deng et al. (2009), CIFAR-10 and CIFAR-100 Krizhevsky et al. (2009).
Dataset Splits	Yes	We trained a 16 layers transformer (Vaswani et al., 2017) based language model on the Wikitext-103 text corpus (Merity et al., 2016)... The model is trained on the standard Mus DB benchmark (Rafii et al., 2017)... Image Net results are reported using Efficient Net-B3 Tan & Le (2019) and Dei T-B Touvron et al. (2020) models.
Hardware Specification	Yes	At evaluation time, decompressing the Demucs model from its variable bitwidth compact representation takes around 2.81 seconds on a Mac Book Pro with 2.4 GHz 8 cores Intel i9 processor.
Software Dependencies	No	The paper mentions using Py Torch native support (Paszke et al., 2019), the Fairseq framework (Ott et al., 2019), and the ZLib library. However, it does not specify version numbers for any of these software components.
Experiment Setup	Yes	All hyper-parameters for optimization and model definition are detailed in the Appendix. The trainable parameter l is initialized so that b = binit. We set binit = 8. We compare to the Quant-Noise method by Fan et al. (2021), but use a reduced layer-drop (Fan et al., 2019) of 0.1 instead of 0.2. We use the Demucs architecture by Défossez et al. (2019) with 64 initial hidden channels. The model is trained on the standard Mus DB benchmark (Rafii et al., 2017), for 180 epochs. Diff Q (λ=5, g=16), Diff Q (λ=10, g=16), Diff Q (λ=3e-4), Diff Q (λ=1e-2), Diff Q (λ=0.1) are specific hyperparameter settings. We additionally evaluate the affect of the group-size, g, on model size and accuracy, by optimizing Diff Q models using g {1, 4, 8, ∞}.