reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging

Authors: Weiyu Chen, James Kwok

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that the proposed Pareto Merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines. In this section, we perform experiments on both toy problem (Section 4.1) and real-world datasets (Section 4.2). Ablation study is provided in Section 4.3.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, The Hong Kong University of Science and Technology. Correspondence to: Weiyu Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Pareto Merging (PM).
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the described methodology. It only mentions using official implementations for baselines.
Open Datasets	Yes	Following (Ilharco et al., 2023; Yang et al., 2024b): SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Netzer et al., 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), and DTD (Cimpoi et al., 2014). All datasets are publicly available. The Cars dataset has a custom license restricted to non-commercial use. The DTD dataset has a custom license restricted to research-only use. Euro SAT is under the MIT license. The licenses for the remaining datasets are unknown.
Dataset Splits	No	The paper mentions using models fine-tuned on datasets 'as in (Ilharco et al., 2023; Yang et al., 2024b)' but does not explicitly state the dataset split percentages, counts, or detailed splitting methodology within this paper. It implicitly defers to the cited works for this information.
Hardware Specification	Yes	For experiments on the Vi T-B/32 (resp. Vi T-L/14) model, we use a single NVIDIA A6000 (resp. H800) GPU with 48GB (resp. 80GB) memory.
Software Dependencies	Yes	We use Ubuntu 22.04.1 with Py Torch 1.12.
Experiment Setup	Yes	Following (Yang et al., 2024b), we initialize all {λk}K k=1 in (8) to 0.3. We employ the Adam optimizer (Kingma & Ba, 2014), with learning rate 1 10 3 and momentum parameters β1, β2 set to 0.9 and 0.999, respectively. The batch size is 32. We set the number of optimization steps to 2000 when merging 2 models and 4000 when merging 8 models. We initialize G in (8) to the zero tensor, and initialize A and B using the Kaiming uniform distribution (He et al., 2015). We set rank r to 16, and both the regularization coefficient β and distribution parameter p to 0.1.