Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Authors: Yeoreum Lee, Jinwook Jung, Sungyong Baik
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Our extensive experimental results demonstrate that our proposal greatly improves the overall performance of a merged model. In this section, we experimentally validate our argument by showing that our proposal, SAFT, leads to better weight disentanglement (Figure 1 and Figure 2), better cross-task linearity (Figure 3), and better joint-task loss linearity (Figure 4 and Figure 5)... |
| Researcher Affiliation | Academia | 1 Dept. of Artificial Intelligence, 2 Dept. of Data Science Hanyang University EMAIL |
| Pseudocode | No | The paper describes methods through mathematical formulations (e.g., Equation 2, 7) and textual descriptions, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | Yes | Our code is available at https://github.com/baiklab/SAFT-Merge. |
| Open Datasets | Yes | Our experiments are conducted across eight diverse datasets: (1) Cars (Krause et al., 2013), (2) DTD (Cimpoi et al., 2014), (3) Euro SAT (Helber et al., 2019), (4) GTSRB (Stallkamp et al., 2011), (5) MNIST (Deng, 2012), (6) RESISC45 (Cheng et al., 2017), (7) SUN397 (Xiao et al., 2016), (8) SVHN (Netzer et al., 2011). |
| Dataset Splits | Yes | These best models are selected based on their performance on a validation set split, which is split from the training set at a 0.1 ratio, as specified in Ilharco et al. (2023). |
| Hardware Specification | Yes | Additionally, all training is conducted using NVIDIA Quadro RTX 8000 GPUs. |
| Software Dependencies | No | The paper mentions optimizers like Adam W (Loshchilov & Hutter, 2019) but does not provide specific version numbers for programming languages, libraries, or frameworks (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | We fine-tune each model for 8000 iterations with a batch size of 128 and a learning rate of 10 5 for all backbones and all fine-tuning methods. The learning rate schedule follows a cosine annealing approach with 500 warm-up steps, and optimization is performed using the Adam W (Loshchilov & Hutter, 2019). ... We set the ρ value of ASAM to 0.5, following the default setup outlined in ASAM, along with all other ASAM hyperparameters. |