Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization

Authors: Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental demonstrate its performance on CIFAR-10 and Image Net.
Researcher Affiliation Collaboration 1Plumerai Research EMAIL 2Hong Kong University of Science and Technology EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Training procedure for BNNs using latent weights. Algorithm 2: Bop, an optimizer for BNNs.
Open Source Code Yes Code is available at: https://github.com/plumerai/rethinking-bnn-optimization.
Open Datasets Yes CIFAR-10 and Image Net
Dataset Splits No The paper uses standard datasets like CIFAR-10 and ImageNet, which have predefined splits, but it does not explicitly state the dataset split percentages or sample counts used for training, validation, or testing within the text.
Hardware Specification Yes The experiments were conducted using Tensor Flow [27] and NVIDIA Tesla V100 GPUs.
Software Dependencies No The experiments were conducted using Tensor Flow, but no specific version number for Tensor Flow or any other software dependencies is provided.
Experiment Setup Yes To benchmark Bop we train for 500 epochs with threshold τ = 10 8, adaptivity rate γ = 10 4 decayed by 0.1 every 100 epochs, batch size 50, and use Adam with the recommended defaults for β1, β2, ϵ [14] and an initial learning rate of α = 10 2 to update the real-valued variables in the Batch Normalization layers [28]. We train Binary Net and Bi Real-Net for 150 epochs and XNOR-Net for 100 epochs. We use a batch size of 1024 and standard preprocessing with random flip and resize but no further augmentation. For all three networks we use the same optimizer hyperparameters. We set the threshold to 1 10 8 and decay the adaptivity rate linearly from 1 10 4 to 1 10 6. For the real-valued variables, we use Adam with a linearly decaying learning rate from 2.5 10 3 to 5 10 6 and otherwise default settings (β1 = 0.9, β2 = 0.999 and ϵ = 1 10 7).