reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large Continual Instruction Assistant

Authors: Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Shouhong Ding, Yuan Xie

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple continual instruction tuning benchmarks demonstrate that our approach not only enhances anti-forgetting capabilities but also significantly improves overall continual tuning performance. Our code is available at https: //github.com/Jingyang Qiao/Co IN. Section 5. Experiments, Section 5.1 Experimental Setup, Section 5.2 Continual Instruction Tuning Results, Section 5.3 Robust Performance, Section 5.4 Analysis of Examples, Section 5.5 Ablation Study all indicate empirical studies and data analysis.
Researcher Affiliation	Collaboration	1East China Normal University 2Shanghai Innovation Institute 3Shanghai AI Laboratory 4Xiamen University 5Tencent You Tu Lab. The affiliations include universities (East China Normal University, Xiamen University) and an industry lab (Tencent You Tu Lab), indicating a collaboration.
Pseudocode	Yes	Q. Algorithm Algorithm 1 Dynamical EMA Updating and Instruction Grouping Input: Pre-trained LFMs flfm, number of datasets D, number of iterations T, training set {{xt i, It i, yt i}nt i=1}T t=1, learning rate η, loss function Lx, matching threshold ϵ. Output: training parameters pool {f i trn} n i=1, instruction codebook {Ii}n i=1. initialize: {f i trn} n i=1, {Ii}n i=1.
Open Source Code	Yes	Our code is available at https: //github.com/Jingyang Qiao/Co IN.
Open Datasets	Yes	We continually fine-tune on multimodal instruction datasets and verify performances of the tuned model. Furthermore, we also implement our method with the LM-adapted version of T5-small on the NLP continual instruction tuning tasks (Raffel et al., 2020; Zhang et al., 2023). We follow the datasets and tuning orders of the Co IN benchmark (Chen et al., 2024a), including Science QA (Lu et al., 2022), Text VQA (Singh et al., 2019), Image Net (Deng et al., 2009), GQA (Hudson & Manning, 2019), Viz Wiz (Gurari et al., 2018), Grounding (Kazemzadeh et al., 2014; Mao et al., 2016), VQAv2 (Goyal et al., 2017), and OCR-VQA (Mishra et al., 2019). We use the Instr Dialog Stream dataset from (Zhang et al., 2023).
Dataset Splits	No	The paper states, "We follow the datasets and tuning orders of the Co IN benchmark (Chen et al., 2024a)" and mentions "Each dataset tj Tseq has a natural language instruction Itj, training set Dtj train and test set Dtj test." However, it does not provide specific percentages, sample counts, or explicit details of how the data was split for training, validation, and testing within this paper itself. It defers to the benchmark for these details.
Hardware Specification	No	No specific hardware details (GPU models, CPU models, memory amounts, or detailed computer specifications) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions "the codebase is based on Co IN (Chen et al., 2024a)", "We use the instruction-tuned T5 model from (Zhang et al., 2023)", "utilize the Ze RO stage: 0 mode of Deep Speed for training", and "we utilize the Tfidf Vectorizer class in sklearn library". However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	The inserted Lo RA in each module layer of LLM has a rank of 128. For each fine-tuning dataset, the training epoch is set to 1, and the initial learning rate and weight decay are configured at 2e-4 and 0. The max length of input text is fitted as 2048. Additionally, we adopt gradient checkpoint strategy and mixed precision mode of TF32 and BF16. Furthermore, we also utilize the Ze RO stage: 0 mode of Deep Speed for training.