reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

Authors: Yeoreum Lee, Jinwook Jung, Sungyong Baik

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Our extensive experimental results demonstrate that our proposal greatly improves the overall performance of a merged model. In this section, we experimentally validate our argument by showing that our proposal, SAFT, leads to better weight disentanglement (Figure 1 and Figure 2), better cross-task linearity (Figure 3), and better joint-task loss linearity (Figure 4 and Figure 5)...
Researcher Affiliation	Academia	1 Dept. of Artificial Intelligence, 2 Dept. of Data Science Hanyang University EMAIL
Pseudocode	No	The paper describes methods through mathematical formulations (e.g., Equation 2, 7) and textual descriptions, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	Our code is available at https://github.com/baiklab/SAFT-Merge.
Open Datasets	Yes	Our experiments are conducted across eight diverse datasets: (1) Cars (Krause et al., 2013), (2) DTD (Cimpoi et al., 2014), (3) Euro SAT (Helber et al., 2019), (4) GTSRB (Stallkamp et al., 2011), (5) MNIST (Deng, 2012), (6) RESISC45 (Cheng et al., 2017), (7) SUN397 (Xiao et al., 2016), (8) SVHN (Netzer et al., 2011).
Dataset Splits	Yes	These best models are selected based on their performance on a validation set split, which is split from the training set at a 0.1 ratio, as specified in Ilharco et al. (2023).
Hardware Specification	Yes	Additionally, all training is conducted using NVIDIA Quadro RTX 8000 GPUs.
Software Dependencies	No	The paper mentions optimizers like Adam W (Loshchilov & Hutter, 2019) but does not provide specific version numbers for programming languages, libraries, or frameworks (e.g., Python version, PyTorch version).
Experiment Setup	Yes	We fine-tune each model for 8000 iterations with a batch size of 128 and a learning rate of 10 5 for all backbones and all fine-tuning methods. The learning rate schedule follows a cosine annealing approach with 500 warm-up steps, and optimization is performed using the Adam W (Loshchilov & Hutter, 2019). ... We set the ρ value of ASAM to 0.5, following the default setup outlined in ASAM, along with all other ASAM hyperparameters.