EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

Authors: Dong Huang, Guangtao Zeng, Jianbo Dai, Meng Luo, Han Weng, Yuhao Qing, Heming Cui, Zhijiang Guo, Jie Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate significant improvements when fine-tuning with EFFIINSTRUCT.
Researcher Affiliation Academia 1University of Hong Kong 2National University of Singapore 3Singapore University of Technology and Design 4University of Edinburgh 5Beijing University of Posts and Telecommunications 6University of Cambridge 7King s College London. Correspondence to: Zhijiang Guo <EMAIL>.
Pseudocode No The paper describes the methodology for constructing the EFFIINSTRUCT dataset and fine-tuning LLMs through narrative text and diagrams (Figure 1), but it does not include a formal pseudocode block or algorithm section with structured, code-like steps for any of the main processes.
Open Source Code Yes Dataset and Code are available at https:// github.com/huangd1999/Effi Coder.
Open Datasets Yes We construct EFFIINSTRUCT, to the best of our knowledge, it is the first instruction-tuning dataset designed to improve the efficiency of LLM-generated code, facilitating fine-tuning for more efficient code generation. ... Dataset and Code are available at https:// github.com/huangd1999/Effi Coder. ... We collect the candidate tasks from the open-source code LLM training sets, which include Self Code Align (Self Code Align; Wei et al. 2024a), Code Feedback-Filtered-Instruction (Code Feed; MAP 2023), Tested-143k-Python-Alpaca (Alpaca; Vezora 2023), Glaive Code-Assistant (Glaive; Computer 2023), Magicoder-Evol Instruct-110K (Evol-Ins; UIUC 2023a), Dolphin-Coder (Dolphin; Computations 2023), Magicoder-OSS-Instruct75K (Oss-Ins; UIUC 2023b), Self-OSS-Instruct-SC2-Exec Filter-50K (Self-Oss; Big Code 2023), and Apps (Hendrycks et al., 2021).
Dataset Splits No The paper mentions collecting candidate tasks from various open-source datasets and filtering them, resulting in a total of 65k tasks. It also refers to evaluating on existing benchmarks like Effi Bench and Human Eval Plus. Footnote 2 states: 'Analysis shows no exact duplicates between training and evaluation sets, with only 0.20% of evaluation samples having minimal vocabulary overlap (5-10%).' However, specific percentages, absolute counts, or detailed methodologies for splitting the EFFIINSTRUCT dataset itself into training, validation, and test sets are not explicitly provided in the main text.
Hardware Specification Yes Firstly, we have evaluated the effectiveness of Effi-Code on seven different software-hardware setups, as shown in Rebuttal Table 2. The results demonstrate that Effi-Code fine-tuned LLMs achieve higher efficiency than the original LLMs across all setups. For example, in the environment of Python 3.11.10 Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz, the average execution time decreases from 0.59s to 0.40s when using Effi-Code to fine-tune Qwen2.5-Coder-7B, reducing the average execution time by 32%.
Software Dependencies Yes We use Llama-factory (Zheng et al., 2024) to fine-tune LLMs with fully supervised fine-tuning with the same setup and train the models using EFFIINSTRUCT. ... Python 3.11.10 Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz
Experiment Setup Yes The maximum sequence length is set to 2048 tokens. We use a batch size of 128 and set the learning rate to 5e-6 with a cosine learning rate scheduler and a warmup ratio of 0.03. We fine-tune all LLMs for 4 epochs under the bf16 data type.