Function-to-Style Guidance of LLMs for Code Translation

Authors: Longhui Zhang, Bin Wang, Jiahao Wang, Xiaofeng Zhao, Min Zhang, Hao Yang, Meishan Zhang, Yu Li, Jing Li, Jun Yu, Min Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both our new benchmark and existing datasets demonstrate that our approach significantly improves code translation performance. Notably, our approach enables Qwen1.5B to outperform promptenhanced Qwen32B and GPT-4 on average across 20 diverse code translation scenarios.
Researcher Affiliation Collaboration 1Harbin Institute of Technology, Shenzhen, China. 2Huawei Translation Services Center, Beijing, China. 3Zhejiang University, Hangzhou, China.
Pseudocode No The paper describes its methodology in natural language and block diagrams (Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing code or a link to a source-code repository for the described methodology.
Open Datasets Yes Experiments on both our new benchmark and existing datasets demonstrate that our approach significantly improves code translation performance. ... We further evaluate F2STRANS on x Code Eval (Khan et al., 2024), as shown in Table 5. ... The latest data for the Code Net benchmark comes from 2020 (Puri et al., 2021).
Dataset Splits No In the function-oriented training, we construct approximately 5,000 code pairs for each translation scenario, such as translating from C++ to Python, with a corresponding scale of 10,000 in the style-oriented training. ... The paper does not provide specific train/test/validation splits for the datasets used in evaluation.
Hardware Specification Yes All our experiments are carried out on a machine equipped with eight NVIDIA A800-SXM4-80GB GPUs.
Software Dependencies No The paper mentions using LLMs (Qwen, GPT-4) and general concepts like Instruction Fine-tuning, but does not provide specific version numbers for any software libraries, frameworks, or environments used for implementation or experimentation (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes In the function-oriented guidance, we set the maximum algorithmic consistency label K in Eq. 1 to 5. In the style-oriented guidance, we set both the numbers of positive translations T + and negative translations T , namely m and n, to 10, with the value of α in negative translation collection construction set to 0.8 and the trade-off hyperparameter β in Eq. 5 fixed at 0.6. ... Throughout both training stages, we maintain consistent hyperparameters, employing 2 epochs and a learning rate of 1 10 5. During inference, we set the temperature of the LLMs to 0.7.