Fine-Tuning Language Models with Collaborative and Semantic Experts
Authors: Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Lei Zhang, Qiang Qu, Junyang Lin
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on comprehensive benchmarks across MMLU, Human Eval, GSM8K, MT-Bench, and Alpaca Eval confirm Co E s efficacy, demonstrating improved performance and expert collaboration in diverse tasks, significantly outperforming traditional SFT methods. |
| Researcher Affiliation | Collaboration | Jiaxi Yang1,2,*, , Binyuan Hui4, , Min Yang1,3, , Jian Yang4, Lei Zhang1,2, Qiang Qu1, Junyang Lin4, 1 Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 Shenzhen University of Advanced Technology 4 Alibaba Group |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly labeled or formatted as such in the paper. The methodology is described in text and mathematical formulas. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Our initial methodology involved utilizing a large-scale dataset derived from TULU-v2 (Wang et al. 2023; Ivison et al. 2023), a comprehensive collection of instruction tuning datasets. We extracted samples from Share GPT (Chiang et al. 2023), Wizard LM (Xu et al. 2023a), Co T (Chung et al. 2024), FLAN (Chung et al. 2024), Open-Orca (Mukherjee et al. 2023; Lian et al. 2023), GPT4-Alpaca (Peng et al. 2023), and Open Assistant 1 (K opf et al. 2024). Each sample was labeled to categorize it into capability groups: General, Coding, or Math. To enhance the coding and math datasets, we incorporated additional samples from Code Alpaca (Chaudhary 2023) and OSS-Instruct (Wei et al. 2023b) for coding, and the Co T partition from MAmmo TH (Yue et al. 2023) for math. |
| Dataset Splits | No | The paper describes using a labeled large-scale SFT dataset (Dgeneral, Dmath, Dcoding) for training but does not provide specific details on how this dataset was split into training, validation, or test sets for reproducibility. |
| Hardware Specification | Yes | We utilized LLa MA2-7B-Base (Touvron et al. 2023) for our experiments on 8 NVIDIA A100 GPUs |
| Software Dependencies | No | The paper mentions using LLa MA2-7B-Base and the AdamW optimizer, but it does not provide specific version numbers for software libraries like PyTorch, TensorFlow, or Python, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We utilized LLa MA2-7B-Base (Touvron et al. 2023) for our experiments on 8 NVIDIA A100 GPUs, with training sequences limited to 2048 tokens using the Chat ML formatting template(Open AI 2022). Batch sizes were standardized at 8 per device to maintain consistency. Optimization was handled with the Adam W optimizer, starting with a learning rate warmup to 1 10 5, and then adjusted down to 10% of its maximum via a cosine scheduler. |