EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
Authors: Dong Huang, Guangtao Zeng, Jianbo Dai, Meng Luo, Han Weng, Yuhao Qing, Heming Cui, Zhijiang Guo, Jie Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate significant improvements when fine-tuning with EFFIINSTRUCT. |
| Researcher Affiliation | Academia | 1University of Hong Kong 2National University of Singapore 3Singapore University of Technology and Design 4University of Edinburgh 5Beijing University of Posts and Telecommunications 6University of Cambridge 7King s College London. Correspondence to: Zhijiang Guo <EMAIL>. |
| Pseudocode | No | The paper describes the methodology for constructing the EFFIINSTRUCT dataset and fine-tuning LLMs through narrative text and diagrams (Figure 1), but it does not include a formal pseudocode block or algorithm section with structured, code-like steps for any of the main processes. |
| Open Source Code | Yes | Dataset and Code are available at https:// github.com/huangd1999/Effi Coder. |
| Open Datasets | Yes | We construct EFFIINSTRUCT, to the best of our knowledge, it is the first instruction-tuning dataset designed to improve the efficiency of LLM-generated code, facilitating fine-tuning for more efficient code generation. ... Dataset and Code are available at https:// github.com/huangd1999/Effi Coder. ... We collect the candidate tasks from the open-source code LLM training sets, which include Self Code Align (Self Code Align; Wei et al. 2024a), Code Feedback-Filtered-Instruction (Code Feed; MAP 2023), Tested-143k-Python-Alpaca (Alpaca; Vezora 2023), Glaive Code-Assistant (Glaive; Computer 2023), Magicoder-Evol Instruct-110K (Evol-Ins; UIUC 2023a), Dolphin-Coder (Dolphin; Computations 2023), Magicoder-OSS-Instruct75K (Oss-Ins; UIUC 2023b), Self-OSS-Instruct-SC2-Exec Filter-50K (Self-Oss; Big Code 2023), and Apps (Hendrycks et al., 2021). |
| Dataset Splits | No | The paper mentions collecting candidate tasks from various open-source datasets and filtering them, resulting in a total of 65k tasks. It also refers to evaluating on existing benchmarks like Effi Bench and Human Eval Plus. Footnote 2 states: 'Analysis shows no exact duplicates between training and evaluation sets, with only 0.20% of evaluation samples having minimal vocabulary overlap (5-10%).' However, specific percentages, absolute counts, or detailed methodologies for splitting the EFFIINSTRUCT dataset itself into training, validation, and test sets are not explicitly provided in the main text. |
| Hardware Specification | Yes | Firstly, we have evaluated the effectiveness of Effi-Code on seven different software-hardware setups, as shown in Rebuttal Table 2. The results demonstrate that Effi-Code fine-tuned LLMs achieve higher efficiency than the original LLMs across all setups. For example, in the environment of Python 3.11.10 Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz, the average execution time decreases from 0.59s to 0.40s when using Effi-Code to fine-tune Qwen2.5-Coder-7B, reducing the average execution time by 32%. |
| Software Dependencies | Yes | We use Llama-factory (Zheng et al., 2024) to fine-tune LLMs with fully supervised fine-tuning with the same setup and train the models using EFFIINSTRUCT. ... Python 3.11.10 Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz |
| Experiment Setup | Yes | The maximum sequence length is set to 2048 tokens. We use a batch size of 128 and set the learning rate to 5e-6 with a cosine learning rate scheduler and a warmup ratio of 0.03. We fine-tune all LLMs for 4 epochs under the bf16 data type. |