Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging
Authors: Weiyu Chen, James Kwok
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed Pareto Merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines. In this section, we perform experiments on both toy problem (Section 4.1) and real-world datasets (Section 4.2). Ablation study is provided in Section 4.3. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, The Hong Kong University of Science and Technology. Correspondence to: Weiyu Chen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Pareto Merging (PM). |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the described methodology. It only mentions using official implementations for baselines. |
| Open Datasets | Yes | Following (Ilharco et al., 2023; Yang et al., 2024b): SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Netzer et al., 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), and DTD (Cimpoi et al., 2014). All datasets are publicly available. The Cars dataset has a custom license restricted to non-commercial use. The DTD dataset has a custom license restricted to research-only use. Euro SAT is under the MIT license. The licenses for the remaining datasets are unknown. |
| Dataset Splits | No | The paper mentions using models fine-tuned on datasets 'as in (Ilharco et al., 2023; Yang et al., 2024b)' but does not explicitly state the dataset split percentages, counts, or detailed splitting methodology within this paper. It implicitly defers to the cited works for this information. |
| Hardware Specification | Yes | For experiments on the Vi T-B/32 (resp. Vi T-L/14) model, we use a single NVIDIA A6000 (resp. H800) GPU with 48GB (resp. 80GB) memory. |
| Software Dependencies | Yes | We use Ubuntu 22.04.1 with Py Torch 1.12. |
| Experiment Setup | Yes | Following (Yang et al., 2024b), we initialize all {λk}K k=1 in (8) to 0.3. We employ the Adam optimizer (Kingma & Ba, 2014), with learning rate 1 10 3 and momentum parameters β1, β2 set to 0.9 and 0.999, respectively. The batch size is 32. We set the number of optimization steps to 2000 when merging 2 models and 4000 when merging 8 models. We initialize G in (8) to the zero tensor, and initialize A and B using the Kaiming uniform distribution (He et al., 2015). We set rank r to 16, and both the regularization coefficient β and distribution parameter p to 0.1. |