reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Representation Surgery in Model Merging with Probabilistic Modeling

Authors: Qi Wei, Shuo He, Enneng Yang, Tingcong Liu, Haobo Wang, Lei Feng, Bo An

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments verify the effectiveness of Prob Surgery while maintaining generalization capabilities in real-world scenarios. The code is now available at this url. ... 5. Experiments ... 5.2. Main Results ... 5.3. Performance in One-to-All Setting ... 5.4. More Analysis ... 5.5. Ablation Study
Researcher Affiliation	Collaboration	1Nanyang Technological University, Singapore 2Zhejiang University, China 3Southeast University, China 4Skywork AI, Singapore. Correspondence to: Lei Feng <EMAIL>.
Pseudocode	No	The paper describes methods using mathematical formulations and descriptive text, but no distinct pseudocode or algorithm blocks are provided.
Open Source Code	No	The code is now available at this url. While a statement is made, no actual URL is provided in the text for concrete access to the source code.
Open Datasets	Yes	Following prior studies on model merging, such as Task Arithmetic (Ilharco et al., 2023), Ties-Merging (Yadav et al., 2023), Ada Merging (Yang et al., 2024c), and Surgery (Yang et al., 2024a), we merge models trained on the following eight vision datasets and five NLP datasets: SUN397 (Xiao et al., 2016) ... Cars (Krause et al., 2013) ... RESISC45 (Cheng et al., 2017) ... Euro SAT (Helber et al., 2019) ... SVHN (Yuval, 2011) ... GTSRB (Stallkamp et al., 2011) ... MNIST (Le Cun, 1998) ... DTD (Cimpoi et al., 2014) ... AGNews (Del Corso et al., 2005) ... Yelp (Zhang et al., 2015) ... Amazon (Zhang et al., 2015) ... DBPedia (Zhang et al., 2015) ... Yahoo (Auer et al., 2007)
Dataset Splits	Yes	MNIST (Le Cun, 1998): One of the most renowned datasets in machine learning: 70,000 (60k training + 10k testing) grayscale images of handwritten digits in 10 classes, each sized 28 28.
Hardware Specification	Yes	All experiments are implemented by Pytorch library and conducted on a single NVIDIA RTX A6000.
Software Dependencies	No	All experiments are implemented by Pytorch library. We adopt Adam as the optimizer. Specific version numbers for these software components are not provided.
Experiment Setup	Yes	Note that λ is a hyperparameter set to 1e-4 ... We adopt Adam as the optimizer with a learning rate of 1e-3 for all training iterations. We totally train the Prob Surgery module for 5,000 iteration with a batch size of 16. During the training phase, we utilize the stochastic sampling strategy, i.e., the reparameterization trick, to obtain the generated representation bias. For inference and testing, we consider the mean value µ output by Prob Surgery as the representation bias. ... h2 is the width of the hidden layer, a hyperparameter in this paper. We set h2 = 128 for all experiments. ... r denotes the sampling number that is set as 5.