Function-to-Style Guidance of LLMs for Code Translation
Authors: Longhui Zhang, Bin Wang, Jiahao Wang, Xiaofeng Zhao, Min Zhang, Hao Yang, Meishan Zhang, Yu Li, Jing Li, Jun Yu, Min Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both our new benchmark and existing datasets demonstrate that our approach significantly improves code translation performance. Notably, our approach enables Qwen1.5B to outperform promptenhanced Qwen32B and GPT-4 on average across 20 diverse code translation scenarios. |
| Researcher Affiliation | Collaboration | 1Harbin Institute of Technology, Shenzhen, China. 2Huawei Translation Services Center, Beijing, China. 3Zhejiang University, Hangzhou, China. |
| Pseudocode | No | The paper describes its methodology in natural language and block diagrams (Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code or a link to a source-code repository for the described methodology. |
| Open Datasets | Yes | Experiments on both our new benchmark and existing datasets demonstrate that our approach significantly improves code translation performance. ... We further evaluate F2STRANS on x Code Eval (Khan et al., 2024), as shown in Table 5. ... The latest data for the Code Net benchmark comes from 2020 (Puri et al., 2021). |
| Dataset Splits | No | In the function-oriented training, we construct approximately 5,000 code pairs for each translation scenario, such as translating from C++ to Python, with a corresponding scale of 10,000 in the style-oriented training. ... The paper does not provide specific train/test/validation splits for the datasets used in evaluation. |
| Hardware Specification | Yes | All our experiments are carried out on a machine equipped with eight NVIDIA A800-SXM4-80GB GPUs. |
| Software Dependencies | No | The paper mentions using LLMs (Qwen, GPT-4) and general concepts like Instruction Fine-tuning, but does not provide specific version numbers for any software libraries, frameworks, or environments used for implementation or experimentation (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | In the function-oriented guidance, we set the maximum algorithmic consistency label K in Eq. 1 to 5. In the style-oriented guidance, we set both the numbers of positive translations T + and negative translations T , namely m and n, to 10, with the value of α in negative translation collection construction set to 0.8 and the trade-off hyperparameter β in Eq. 5 fixed at 0.6. ... Throughout both training stages, we maintain consistent hyperparameters, employing 2 epochs and a learning rate of 1 10 5. During inference, we set the temperature of the LLMs to 0.7. |