Fast and Slow Gradient Approximation for Binary Neural Network Optimization
Authors: Xinquan Chen, Junqi Gao, Biqing Qi, Dong Li, Yiang Luo, Fangyuan Li, Pengfei Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive comparative experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method achieves faster convergence and lower loss values, outperforming existing baselines. [...] We conduct comprehensive experiments on the CIFAR-10 and CIFAR-100 datasets. The results demonstrate that our method outperforms various competitive benchmarks. |
| Researcher Affiliation | Academia | 1Harbin Institute of Technology, Harbin, P.R.China, 2Shanghai Artificial Intelligence Laboratory, Shanghai, P.R.China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using equations and textual descriptions, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code https://github.com/two-tiger/FSG |
| Open Datasets | Yes | In this work, we use CIFAR-10 and CIFAR-100 as benchmark datasets. CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR-100 (Krizhevsky, Hinton et al. 2009) are commonly used datasets for image classification, consisting of 50000 training images and 10000 test images, with 10 and 100 categories respectively. |
| Dataset Splits | Yes | CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR-100 (Krizhevsky, Hinton et al. 2009) are commonly used datasets for image classification, consisting of 50000 training images and 10000 test images, with 10 and 100 categories respectively. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. It mentions experiments were conducted but no hardware details. |
| Software Dependencies | No | The paper mentions models/architectures like Mamba and LSTM and optimizers like SGD and Adam, but does not provide specific version numbers for any software libraries or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Experiment Setup In this work, we use CIFAR-10 and CIFAR-100 as benchmark datasets. [...] For CIFAR-10, we conduct experiments using Res Net-20/32/44 architecture, and for CIFAR-100, we conduct experiments using Res Net56/110 architecture. For fair comparison, except for the first convolutional layer and the last fully connected layer, other layers are binarized in this experiment. See the Appendix for more experimental settings and details. [...] Our experiments are conducted on both SGD and Adam optimizers. [...] Influence of β. In this experiments, we explore the sensitivity of the FSG method to the combination parameter β, selecting values of 0.9, 0.7, 0.5, 0.3, and 0.1 for evaluation on the CIFAR-100 dataset. [...] Influence of l. For l, we select 3, 4, 5, 6, and 7 to conduct experiments on CIFAR-100, using Res Net56 as the network architecture and Do Refa as the quantization function. |