Stepping Forward on the Last Mile
Authors: Chen Feng, Jay Zhuo, Parker Zhang, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Andrew Zou Li
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we investigate the feasibility of ondevice training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains. |
| Researcher Affiliation | Collaboration | Chen Feng Qualcomm AI Research Qualcomm Canada ULC EMAIL Shaojie Zhuo Qualcomm AI Research Qualcomm Canada ULC EMAIL Xiaopeng Zhang Qualcomm AI Research Qualcomm Canada ULC EMAIL Ramchalam Kinattinkara Ramakrishnan Qualcomm AI Research Qualcomm Canada ULC EMAIL Zhaocong Yuan Qualcomm AI Research Qualcomm Canada ULC EMAIL Andrew Zou Li University of Toronto EMAIL |
| Pseudocode | Yes | Algorithm 1 QZO-FF: Quantized Zero-order Forward Gradient Learning(quantized, fp16) |
| Open Source Code | No | However, we cannot open source the code. |
| Open Datasets | Yes | Vision Benchmark. Image classification models are compared across commonly used 5 few-shot learning benchmark datasets (Table 1). Training methods are evaluated on 3 network backbones (modified Resnet12 Ye et al. [2020], Resnet18 He et al. [2015] and Vi T tiny Dosovitskiy et al. [2020]), with Proto Nets Snell et al. [2017] as few-shot classifier. Table 1: Vision datasets used for few-shot learning Name Setting No. Classes (train/val/test) No. Samples Resolution CUB Bird Species 200 (140/30/30) 11,788 84 84 Omniglot Handwritten characters 1623 (1000/200/423) 32,460 28 28 Cifar100_fs Color 100 (64/16/20) 60,000 32 32 mini Image Net Natural images 100 (64/16/20) 60,000 84 84 tiered Image Net Natural images 608 (351/97/160) 779,165 84 84 |
| Dataset Splits | Yes | Table 1: Vision datasets used for few-shot learning Name Setting No. Classes (train/val/test) No. Samples Resolution CUB Bird Species 200 (140/30/30) 11,788 84 84 Omniglot Handwritten characters 1623 (1000/200/423) 32,460 28 28 Cifar100_fs Color 100 (64/16/20) 60,000 32 32 mini Image Net Natural images 100 (64/16/20) 60,000 84 84 tiered Image Net Natural images 608 (351/97/160) 779,165 84 84 |
| Hardware Specification | Yes | All our experiments are running on single Nvidia Tesla V100 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Table 6: The hyper-parameters used in our few-shot learning experiments for vision tasks. For fair comparisons, FF and BP are using the same hyper-parameters. Model architectures of Resnet18, modified Resnet12 and Vi T tiny are based on [14], [43], and [39]. Pre-trained models used for zero-shot evaluation can be found at [33], [34] and [38]. Different learning rate grids are explored, and the best accuracy is reported. Experiment Hyper-parameters Values n_way 5 n_shot 5 ϵ 1e-3 Epochs 40 Optimizer SGD Learning rate {1e-3, 1e-4, 1e-5} Val/test tasks 100/ 100 |