reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Can We Translate Code Better with LLMs and Call Graph Analysis?

Authors: Yang Luo

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple mainstream datasets demonstrate that, compared to existing code translation methods and LLMs, this method achieves a significant improvement in translation accuracy. Conducting extensive experiments on multiple commonly used datasets, confirming that the Trans Graph method significantly improves the success rate of code translation compared to the existing baseline method, with an average increase of 15.7%.
Researcher Affiliation	Academia	Yang Luo1,2,3 1National Engineering Research Center for Software Engineering, Peking University, Beijing, China 2School of Software and Microelectronics, Peking University, Beijing, China 3PKU-OCTA Laboratory for Blockchain and Privacy Computing, Peking University, Beijing, China EMAIL
Pseudocode	Yes	Algorithm 1 Code Translation and Algorithm 2 Recursive Binary Search Debugging
Open Source Code	No	The paper does not contain any explicit statements about releasing code or links to code repositories for the methodology described.
Open Datasets	Yes	The datasets used in this paper include: Code Net [Puri et al., 2021], Avatar [Ahmad et al., 2021b], Eval Plus [Liu et al., 2024], Apache Commons CLI [apa, 2023] (Java), Click [pyt, 2023] (Python), and Human Eval-X [Zheng et al., 2023].
Dataset Splits	No	The paper does not provide specific information regarding dataset splits (e.g., percentages, sample counts for training, validation, or testing sets).
Hardware Specification	No	The current Trans Graph implementation is primarily based on a single-machine multi-core CPU, but in scenarios with sufficient computing power, using GPUs or even multi-machine distributed environments can significantly improve the scale and efficiency of parallel translation. This sentence describes a general setup or future possibilities, not specific hardware used for the experiments. The paper does not provide specific CPU or GPU models or other detailed hardware specifications.
Software Dependencies	No	The compilation environments included gcc, Open JDK, and CPython, and the debugging tools included gdb, jdb, and pdb. Trans Graph integrates with VSCode using the Language Server Protocol to extract call graphs for these programming languages. In addition, it controls debuggers for various languages through the Debug Adapter Protocol. No specific version numbers are provided for these software dependencies.
Experiment Setup	Yes	The LLM prompts used in this experiment are as follows: Translate the following code from {source_language} to {target_language}. The translated code should be correct, efficient, and idiomatic. , {source_code} and To prevent prompts from becoming too long, we set a maximum length threshold (e.g., 32k tokens), and when this threshold is exceeded, low-priority contexts (such as comments) are truncated until the length limit is met. and Thirty experiments were conducted for each translation.