InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Authors: Yutong Wu, Di Huang, Wenxuan Shi, Wei Wang, Yewen Pu, Lingzhe Gao, Shihao Liu, Ziyuan Nan, Kaizhao Yuan, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Dawei Yin, Xing Hu, Yunji Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate Inverse-Instruct on a range of open-source code models (e.g., Code Llama-Python and Deep Seek-Coder) and benchmarks (e.g., Human Eval(+), MBPP(+), DS-1000 and Multi PL-E), showing it consistently improves the base models. We evaluated Inverse Coder on a wide range of benchmarks (Section 6), including Human Eval(+) (Chen et al. 2021; Liu et al. 2023), MBPP(+) (Austin et al. 2021; Liu et al. 2023), Multi PL-E (Cassano et al. 2023), and DS-1000 (Lai et al. 2023).
Researcher Affiliation Collaboration 1SKL of Processors, Institute of Computing Technology, CAS 2University of Chinese Academy of Sciences 3Baidu Inc., Beijing, China 4Autodesk Research
Pseudocode No The paper describes the method Inverse-Instruct in Section 4, detailing 'Code Preprocessing', 'Code Summarization', and 'Self-evaluation and Data Selection'. It also includes 'Figure 1: The overview of Inverse-Instruct' which is a flowchart-like diagram. However, it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code Yes Code https://github.com/wyt2000/Inverse Coder
Open Datasets Yes In this work, we mainly use evol-codealpaca-v1 as our original instruction tuning dataset {(xi, yi)}, which is widely used for instruction tuning of code LLMs (Wei et al. 2023; Yu et al. 2023; Song et al. 2024). It contains 111183 instruction-response pairs generated by Evol-Instruct using GPT-4. ... We evaluated Inverse Coder on a wide range of benchmarks (Section 6), including Human Eval(+) (Chen et al. 2021; Liu et al. 2023), MBPP(+) (Austin et al. 2021; Liu et al. 2023), Multi PL-E (Cassano et al. 2023), and DS-1000 (Lai et al. 2023). ... theblackcat102. 2023. The evolved code alpaca dataset. https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1.
Dataset Splits Yes Following Magicoder (Wei et al. 2023), evol-codealpaca-v1 is decontaminated by removing data that contain docstrings or solutions from Human Eval (Chen et al. 2021), MBPP (Austin et al. 2021), Multi PL-E (Cassano et al. 2023), and DS-1000 (Lai et al. 2023), which are used to evaluate Inverse Coder.
Hardware Specification Yes To obtain the beginning code LLM M (hereinafter called Wizard Coder-GPT4), we fine-tune the base models on evol-codealpaca-v1 for 2 epochs using 8 NVIDIA A100-40GB SMX GPUs.
Software Dependencies No The paper mentions using 'the v LLM inference framework (Kwon et al. 2023)' but does not provide a specific version number for it or any other software components.
Experiment Setup Yes To obtain the beginning code LLM M (hereinafter called Wizard Coder-GPT4), we fine-tune the base models on evol-codealpaca-v1 for 2 epochs using 8 NVIDIA A100-40GB SMX GPUs. We set the initial learning rate at 5e 5 with 15 warmup steps and a linear learning rate scheduler. We use Adafactor (Shazeer and Stern 2018) as our optimizer and choose a batch size of 512 with a sequence truncation length of 1024.