Quantum-PEFT: Ultra parameter-efficient fine-tuning

Authors: Toshiaki Koike-Akino, Francesco Tonin, Yongtao Wu, Frank Zhengqing Wu, Leyla Naz Candogan, Volkan Cevher

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply Quantum-PEFT to several transfer learning benchmarks in language and vision, demonstrating significant advantages in parameter efficiency. [...] Through extensive experiments on language and vision tasks, we show Quantum-PEFT s significant advantage in parameter efficiency, achieving 5 to 25-fold reduction in trainable parameters compared to Lo RA, yet maintaining competitive performance. [...] 5 EXPERIMENTS We evaluate our Quantum-PEFT for De BERTa V3 (He et al., 2023), GPT-2 (Radford et al., 2019), Vi T (Dosovitskiy et al., 2020), and Mistral-7B (Zhang et al., 2023) on diverse fine-tuning.
Researcher Affiliation Collaboration 1 Mitsubishi Electric Research Laboratories (MERL), 201 Broadway, Cambridge, MA, USA 2 LIONS, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Pseudocode No The paper describes methodologies mathematically and textually, and includes figures illustrating quantum circuit ansatze and tensor diagrams, but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured code-like steps.
Open Source Code No The paper states: 'We train the baselines using the code provided by the respective authors or using the peft library from Hugging Face.' This refers to the code used for baselines, not the authors' own implementation of Quantum-PEFT. There is no explicit statement or link in the paper indicating that the source code for Quantum-PEFT is released or publicly available.
Open Datasets Yes We fine-tune (1) De BERTa V3 and Mistral-7B on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019); (2) GPT-2 Medium on E2E Challenge following the original Lo RA paper (Hu et al., 2021); and (3) Vi T on CIFAR10 (Krizhevsky et al., 2009). Our experiments are not to claim that Quantum-PEFT always improves accuracy w.r.t. Lo RA, but to show that Quantum-PEFT can maintain a competitive level of accuracy with orders-of-magnitude fewer parameters. [...] Vi T model (google/vit-base-patch16-224) pre-trained on Image Net-21k (Deng et al., 2009)
Dataset Splits Yes The E2E benchmark consists of 42,200 samples for training, 4,600 for validation, and 4,600 for testing. [...] CIFAR10 is an image classification dataset having 10 classes of 32 32 colored images with 50k training samples and 10k test samples. [...] SST-2: stands for The Stanford Sentiment Treebank, a dataset on sentiment analysis tasks with two labels. The size of the training set is 67k, and the size of the test set is 1.8k.
Hardware Specification Yes For fair comparison, we use the same training settings and hardware, i.e., 4 NVIDIA A100 GPUs, for all methods. [...] Fig. 6 shows the comparison of different unitary mapping methods over different matrix size N for a rank of K = 4 on an NVIDIA RTX6000 GPU 24GB. [...] The required run-time on GPU A40 40GB was about 3.37 second per iteration, and 5284.16 second per epoch.
Software Dependencies No The paper mentions 'Py Torch s autograd' and 'peft library from Hugging Face' for training baselines, but does not provide specific version numbers for these or any other software components used for the Quantum-PEFT implementation. Without version numbers, the description is not reproducible.
Experiment Setup Yes Table 12: Hyperparameter configurations for Quantum-PEFT on the GLUE benchmark. [...] Table 13: CIFAR-10 transfer learning for Vi T. [...] Table 14: E2E benchmark for GPT2 Medium. These tables list specific hyperparameters including Optimizer, Learning Rate Schedule, Weight Decay, Batch Size, Epochs, Warmup ratio/Steps, Max sequence length, Rank K, α, and Learning Rate.