Large Continual Instruction Assistant

Authors: Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Shouhong Ding, Yuan Xie

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple continual instruction tuning benchmarks demonstrate that our approach not only enhances anti-forgetting capabilities but also significantly improves overall continual tuning performance. Our code is available at https: //github.com/Jingyang Qiao/Co IN. Section 5. Experiments, Section 5.1 Experimental Setup, Section 5.2 Continual Instruction Tuning Results, Section 5.3 Robust Performance, Section 5.4 Analysis of Examples, Section 5.5 Ablation Study all indicate empirical studies and data analysis.
Researcher Affiliation Collaboration 1East China Normal University 2Shanghai Innovation Institute 3Shanghai AI Laboratory 4Xiamen University 5Tencent You Tu Lab. The affiliations include universities (East China Normal University, Xiamen University) and an industry lab (Tencent You Tu Lab), indicating a collaboration.
Pseudocode Yes Q. Algorithm Algorithm 1 Dynamical EMA Updating and Instruction Grouping Input: Pre-trained LFMs flfm, number of datasets D, number of iterations T, training set {{xt i, It i, yt i}nt i=1}T t=1, learning rate η, loss function Lx, matching threshold ϵ. Output: training parameters pool {f i trn} n i=1, instruction codebook {Ii}n i=1. initialize: {f i trn} n i=1, {Ii}n i=1.
Open Source Code Yes Our code is available at https: //github.com/Jingyang Qiao/Co IN.
Open Datasets Yes We continually fine-tune on multimodal instruction datasets and verify performances of the tuned model. Furthermore, we also implement our method with the LM-adapted version of T5-small on the NLP continual instruction tuning tasks (Raffel et al., 2020; Zhang et al., 2023). We follow the datasets and tuning orders of the Co IN benchmark (Chen et al., 2024a), including Science QA (Lu et al., 2022), Text VQA (Singh et al., 2019), Image Net (Deng et al., 2009), GQA (Hudson & Manning, 2019), Viz Wiz (Gurari et al., 2018), Grounding (Kazemzadeh et al., 2014; Mao et al., 2016), VQAv2 (Goyal et al., 2017), and OCR-VQA (Mishra et al., 2019). We use the Instr Dialog Stream dataset from (Zhang et al., 2023).
Dataset Splits No The paper states, "We follow the datasets and tuning orders of the Co IN benchmark (Chen et al., 2024a)" and mentions "Each dataset tj Tseq has a natural language instruction Itj, training set Dtj train and test set Dtj test." However, it does not provide specific percentages, sample counts, or explicit details of how the data was split for training, validation, and testing within this paper itself. It defers to the benchmark for these details.
Hardware Specification No No specific hardware details (GPU models, CPU models, memory amounts, or detailed computer specifications) used for running the experiments are mentioned in the paper.
Software Dependencies No The paper mentions "the codebase is based on Co IN (Chen et al., 2024a)", "We use the instruction-tuned T5 model from (Zhang et al., 2023)", "utilize the Ze RO stage: 0 mode of Deep Speed for training", and "we utilize the Tfidf Vectorizer class in sklearn library". However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup Yes The inserted Lo RA in each module layer of LLM has a rank of 128. For each fine-tuning dataset, the training epoch is set to 1, and the initial learning rate and weight decay are configured at 2e-4 and 0. The max length of input text is fitted as 2048. Additionally, we adopt gradient checkpoint strategy and mixed precision mode of TF32 and BF16. Furthermore, we also utilize the Ze RO stage: 0 mode of Deep Speed for training.