Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

Authors: Yeoreum Lee, Jinwook Jung, Sungyong Baik

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Our extensive experimental results demonstrate that our proposal greatly improves the overall performance of a merged model. In this section, we experimentally validate our argument by showing that our proposal, SAFT, leads to better weight disentanglement (Figure 1 and Figure 2), better cross-task linearity (Figure 3), and better joint-task loss linearity (Figure 4 and Figure 5)...
Researcher Affiliation Academia 1 Dept. of Artificial Intelligence, 2 Dept. of Data Science Hanyang University EMAIL
Pseudocode No The paper describes methods through mathematical formulations (e.g., Equation 2, 7) and textual descriptions, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code Yes Our code is available at https://github.com/baiklab/SAFT-Merge.
Open Datasets Yes Our experiments are conducted across eight diverse datasets: (1) Cars (Krause et al., 2013), (2) DTD (Cimpoi et al., 2014), (3) Euro SAT (Helber et al., 2019), (4) GTSRB (Stallkamp et al., 2011), (5) MNIST (Deng, 2012), (6) RESISC45 (Cheng et al., 2017), (7) SUN397 (Xiao et al., 2016), (8) SVHN (Netzer et al., 2011).
Dataset Splits Yes These best models are selected based on their performance on a validation set split, which is split from the training set at a 0.1 ratio, as specified in Ilharco et al. (2023).
Hardware Specification Yes Additionally, all training is conducted using NVIDIA Quadro RTX 8000 GPUs.
Software Dependencies No The paper mentions optimizers like Adam W (Loshchilov & Hutter, 2019) but does not provide specific version numbers for programming languages, libraries, or frameworks (e.g., Python version, PyTorch version).
Experiment Setup Yes We fine-tune each model for 8000 iterations with a batch size of 128 and a learning rate of 10 5 for all backbones and all fine-tuning methods. The learning rate schedule follows a cosine annealing approach with 500 warm-up steps, and optimization is performed using the Adam W (Loshchilov & Hutter, 2019). ... We set the ρ value of ASAM to 0.5, following the default setup outlined in ASAM, along with all other ASAM hyperparameters.