Fast and Slow Gradient Approximation for Binary Neural Network Optimization

Authors: Xinquan Chen, Junqi Gao, Biqing Qi, Dong Li, Yiang Luo, Fangyuan Li, Pengfei Li

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive comparative experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method achieves faster convergence and lower loss values, outperforming existing baselines. [...] We conduct comprehensive experiments on the CIFAR-10 and CIFAR-100 datasets. The results demonstrate that our method outperforms various competitive benchmarks.
Researcher Affiliation Academia 1Harbin Institute of Technology, Harbin, P.R.China, 2Shanghai Artificial Intelligence Laboratory, Shanghai, P.R.China EMAIL, EMAIL
Pseudocode No The paper describes the methodology using equations and textual descriptions, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code https://github.com/two-tiger/FSG
Open Datasets Yes In this work, we use CIFAR-10 and CIFAR-100 as benchmark datasets. CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR-100 (Krizhevsky, Hinton et al. 2009) are commonly used datasets for image classification, consisting of 50000 training images and 10000 test images, with 10 and 100 categories respectively.
Dataset Splits Yes CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR-100 (Krizhevsky, Hinton et al. 2009) are commonly used datasets for image classification, consisting of 50000 training images and 10000 test images, with 10 and 100 categories respectively.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It mentions experiments were conducted but no hardware details.
Software Dependencies No The paper mentions models/architectures like Mamba and LSTM and optimizers like SGD and Adam, but does not provide specific version numbers for any software libraries or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Experiment Setup In this work, we use CIFAR-10 and CIFAR-100 as benchmark datasets. [...] For CIFAR-10, we conduct experiments using Res Net-20/32/44 architecture, and for CIFAR-100, we conduct experiments using Res Net56/110 architecture. For fair comparison, except for the first convolutional layer and the last fully connected layer, other layers are binarized in this experiment. See the Appendix for more experimental settings and details. [...] Our experiments are conducted on both SGD and Adam optimizers. [...] Influence of β. In this experiments, we explore the sensitivity of the FSG method to the combination parameter β, selecting values of 0.9, 0.7, 0.5, 0.3, and 0.1 for evaluation on the CIFAR-100 dataset. [...] Influence of l. For l, we select 3, 4, 5, 6, and 7 to conduct experiments on CIFAR-100, using Res Net56 as the network architecture and Do Refa as the quantization function.