Representation Surgery in Model Merging with Probabilistic Modeling

Authors: Qi Wei, Shuo He, Enneng Yang, Tingcong Liu, Haobo Wang, Lei Feng, Bo An

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the effectiveness of Prob Surgery while maintaining generalization capabilities in real-world scenarios. The code is now available at this url. ... 5. Experiments ... 5.2. Main Results ... 5.3. Performance in One-to-All Setting ... 5.4. More Analysis ... 5.5. Ablation Study
Researcher Affiliation Collaboration 1Nanyang Technological University, Singapore 2Zhejiang University, China 3Southeast University, China 4Skywork AI, Singapore. Correspondence to: Lei Feng <EMAIL>.
Pseudocode No The paper describes methods using mathematical formulations and descriptive text, but no distinct pseudocode or algorithm blocks are provided.
Open Source Code No The code is now available at this url. While a statement is made, no actual URL is provided in the text for concrete access to the source code.
Open Datasets Yes Following prior studies on model merging, such as Task Arithmetic (Ilharco et al., 2023), Ties-Merging (Yadav et al., 2023), Ada Merging (Yang et al., 2024c), and Surgery (Yang et al., 2024a), we merge models trained on the following eight vision datasets and five NLP datasets: SUN397 (Xiao et al., 2016) ... Cars (Krause et al., 2013) ... RESISC45 (Cheng et al., 2017) ... Euro SAT (Helber et al., 2019) ... SVHN (Yuval, 2011) ... GTSRB (Stallkamp et al., 2011) ... MNIST (Le Cun, 1998) ... DTD (Cimpoi et al., 2014) ... AGNews (Del Corso et al., 2005) ... Yelp (Zhang et al., 2015) ... Amazon (Zhang et al., 2015) ... DBPedia (Zhang et al., 2015) ... Yahoo (Auer et al., 2007)
Dataset Splits Yes MNIST (Le Cun, 1998): One of the most renowned datasets in machine learning: 70,000 (60k training + 10k testing) grayscale images of handwritten digits in 10 classes, each sized 28 28.
Hardware Specification Yes All experiments are implemented by Pytorch library and conducted on a single NVIDIA RTX A6000.
Software Dependencies No All experiments are implemented by Pytorch library. We adopt Adam as the optimizer. Specific version numbers for these software components are not provided.
Experiment Setup Yes Note that λ is a hyperparameter set to 1e-4 ... We adopt Adam as the optimizer with a learning rate of 1e-3 for all training iterations. We totally train the Prob Surgery module for 5,000 iteration with a batch size of 16. During the training phase, we utilize the stochastic sampling strategy, i.e., the reparameterization trick, to obtain the generated representation bias. For inference and testing, we consider the mean value µ output by Prob Surgery as the representation bias. ... h2 is the width of the hidden layer, a hyperparameter in this paper. We set h2 = 128 for all experiments. ... r denotes the sampling number that is set as 5.