Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning

Authors: Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on computer vision, natural language processing and multi-modal tasks validate the effectiveness of our method. (Abstract) ... We conduct comprehensive experiments across computer vision (CV), natural language processing (NLP) and multi-modal (MM) tasks. (Section 4.1) ... The results are shown in Tables. 1-4 and Tables. 7-8 in Appendix, and the normalized performance are illustrated in Fig. 2.
Researcher Affiliation Academia 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2 Open GVLab, Shanghai AI Laboratory EMAIL
Pseudocode No The paper describes the FLo RA method mathematically in Section 3 and its manifestations for convolution and linear layers, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: "We use publicly available Py Torch Paszke et al. (2019) implementation to execute all the baseline comparisons, and all the experiments are conducted on NVIDIA A100 GPUs." This refers to a third-party framework used for baselines, not the authors' own implementation of FLo RA. There is no explicit statement or link indicating that the source code for the proposed FLo RA method is available.
Open Datasets Yes Specifically, for CV tasks, we employ FLo RA to fine-tune the Conv Ne Xt-V2-L Woo et al. (2023), evaluating it on MS COCO Lin et al. (2014) by using Mask R-CNN He et al. (2017)... and on remote sensing image datasets DOTA Xia et al. (2018)... and on the ADE20K Zhou et al. (2017) dataset... For NLP tasks, we evaluate the De BERTa V3-base He et al. (2021b) with FLo RA on the General Language Understanding Evaluation (GLUE) Wang et al. (2018) benchmark... For multi-modal tasks, we employ FLo RA to fine-tune LLa VA-1.5-7B Liu et al. (2024a)... on visual instruction tuning tasks, which include seven vision-language benchmarks: VQAv2 Goyal et al. (2017), GQA Hudson & Manning (2019), Vis Wiz Gurari et al. (2018), SQA Lu et al. (2022), VQAT Singh et al. (2019), POPE Li et al. (2023), and MMBench Liu et al. (2023).
Dataset Splits Yes More details on GLUE dataset can be found in Table 10 in the Appendix. (Table 10 explicitly lists # Train, # Dev, # Test for each GLUE dataset). ... For multi-modal tasks, we employ FLo RA to fine-tune LLa VA-1.5-7B Liu et al. (2024a), which is composed of a language model, Vicuna-1.5-7B Peng et al. (2023) and a vision encoder, CLIP Vi T-L/336px Radford et al. (2021), on visual instruction tuning tasks, which include seven vision-language benchmarks... Each dataset consists of 800 training samples and 200 validation samples. (VTAB-1K benchmark, Appendix A.3)
Hardware Specification Yes We use publicly available Py Torch Paszke et al. (2019) implementation to execute all the baseline comparisons, and all the experiments are conducted on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using "publicly available Py Torch Paszke et al. (2019) implementation" and toolkits like "MMDetection", "MMRotate", and "MMSegmentation", as well as "diffusers supported by Hugging Face". However, it does not provide specific version numbers for any of these software dependencies, which are required for reproducibility.
Experiment Setup Yes The hidden dimension of Adapters is chosen from {32, 64}, the budget of Ada Lo RA is set as {144, 288, 576} and the rank r of Lo RA and Do RA is selected from {2, 4, 8, 16, 32}. Other hyper-parameters are initialized according to their original papers. Additionally, we simply set r = r1 = r2 for FLo RA. For CV tasks, when fine-tuning Conv Ne Xt V2-L, we set r3 = 2 and s = 4. ... For different methods, we fine-tune all the convolutional layers for Conv Ne Xt-V2-L (and SDXL). When fine-tuning Intern Vi T-6B, FLo RA s r is set to {16, 32}, s = 0.04, and we fine-tune all the linear layers for different methods. For NLP tasks, FLo RA s r is set to {2, 8}, and s = 0.4. ... For MM tasks, we set r = 128, s = 1.5 and fine-tune all linear layers. ... Table 9: Hyper-parameter setup for object detection and segmentation. (Includes LR, BS, Optimizer, Weight Decay). Table 11: Hyper-parameter setup for GLUE benchmark. (Includes learning rate, batch size, #epochs). Table 12: Hyper-parameter configurations on FLo RA for fine-tuning LLa VA-1.5-7B (Includes Rank r, s, Dropout, Optimizer, LR, LR Scheduler, Batch size, Warmup ratio, Epochs, Target module).