FLOPS: Forward Learning with OPtimal Sampling

Authors: Tao Ren, Zishi Zhang, Jinyang Jiang, Guanghao Li, Zeliang Zhang, Mingqian Feng, Yijie Peng

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments for finetuning Vision Transformers on various datasets and further deploy the allocator to two black-box applications: prompt tuning and multimodal alignment for foundation models. All findings demonstrate that our proposed allocator significantly enhances the scalability of forward-learning algorithms, paving the way for realworld applications. The implementation is available at https://github.com/ RTkenny/FLOPS-Forward-Learning-with-OPtimal-Sampling.
Researcher Affiliation Academia 1 Guanghua School of Management, Peking University 2 Xiangjiang Laboratory 3 Tsinghua Shenzhen International Graduate School, Tsinghua University 4 Huazhong University of Science and Technology 5 Johns Hopkins University EMAIL EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Perturbation-based training via optimal allocation
Open Source Code Yes The implementation is available at https://github.com/ RTkenny/FLOPS-Forward-Learning-with-OPtimal-Sampling.
Open Datasets Yes Experimental setting: We evaluate our model s performance using a diverse set of widely used benchmark datasets, each chosen for its unique characteristics and specific challenges contributing to a comprehensive analysis of our method s generalization across different domains: Image Net (Deng et al., 2009), Caltech101 (Fei-Fei et al., 2004), Food101 (Bossard et al., 2014), Flowers102 (Nilsback & Zisserman, 2008), CIFAR10/100 (Krizhevsky et al., 2009), and Euro SAT (Helber et al., 2019).
Dataset Splits Yes All methods use the same 16-shot split for training and are evaluated on the full test sets for evaluation.
Hardware Specification Yes Platform: All the experiments are conducted on a machine with 8 NVIDIA A800 GPUs. Each A800 GPU has 80GB of memory.
Software Dependencies No The paper mentions software components like 'Adam optimizer' and specific models 'CLIP', 'Vicuna v1.5 7B', but does not provide version numbers for general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes We train the network for 10 epochs with the learning rate of 1 10 4 and the Adam optimizer. The batch size is 128. All the methods in our experiments use the same query budgets, except for Mezo, which uses only 2 queries per data point in accordance with its original memory-efficient settings. ... The intrinsic dimension, d I +d T , is 1000. The visual prompt length is 10 and the text prompt length is 12. The batch size is 64. We use 60 queries per data to tune the intrinsic dimension. We use Adam optimizer with a learning rate of 1 10 4 and train for 50 epochs with early stopping.