reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BiT: Robustly Binarized Multi-distilled Transformer

Authors: Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We follow recent work (Bai et al., 2021; Qin et al., 2021) in adopting the experimental setting of Devlin et al. (2019), and use the pre-trained BERT-base as our full-precision baseline. We evaluate on GLUE (Wang et al., 2019), a varied set of language understanding tasks (see Section A.5 for a full list), as well as SQu AD (v1.1) (Rajpurkar et al., 2016), a popular machine reading comprehension dataset.
Researcher Affiliation	Collaboration	Zechun Liu Reality Labs, Meta Inc. EMAIL Barlas O guz Meta AI EMAIL Aasish Pappu Meta AI EMAIL Lin Xiao Meta AI EMAIL Scott Yih Meta AI EMAIL Meng Li Peking University EMAIL Raghuraman Krishnamoorthi Reality Labs, Meta Inc. EMAIL Yashar Mehdad Meta AI EMAIL
Pseudocode	Yes	Algorithm 1 Bi T: Multi-distillation algorithm
Open Source Code	Yes	Code and models are available at: https://github.com/facebookresearch/bit.
Open Datasets	Yes	We evaluate on GLUE (Wang et al., 2019), a varied set of language understanding tasks (see Section A.5 for a full list), as well as SQu AD (v1.1) (Rajpurkar et al., 2016), a popular machine reading comprehension dataset.
Dataset Splits	Yes	Table 1: Comparison of BERT quantization methods on the GLUE dev set. [...] Table 3: Comparison of BERT quantization methods on SQu ADv1.1 dev set.
Hardware Specification	No	The paper claims to include hardware specifications but does not provide any specific GPU or CPU models, memory details, or cloud provider instances with specifications within the text.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup	No	The paper references prior work for experimental settings (e.g., "following the exact setup in Zhang et al. (2020)") but does not explicitly state hyperparameters (like learning rate, batch size, number of epochs) or other system-level training configurations within the main text.