reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Coda: An End-to-End Neural Program Decompiler

Authors: Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuandong Tian, Farinaz Koushanfar, Jishen Zhao

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess Coda s performance with extensive experiments on various benchmarks. Evaluation results show that Coda achieves an average of 82% program recovery accuracy on unseen binary samples, where the state-of-the-art decompilers yield 0% accuracy. Furthermore, Coda outperforms the sequence-to-sequence model with attention by a margin of 70% program accuracy.
Researcher Affiliation	Collaboration	Cheng Fu, Huili Chen, Haolan Liu UC San Diego EMAIL Xinyun Chen UC Berkeley EMAIL Yuandong Tian Facebook EMAIL Farinaz Koushanfar, Jishen Zhao UC San Diego EMAIL
Pseudocode	Yes	Algorithm 1 Workﬂow of iterative EC Machine.
Open Source Code	No	The paper mentions using open-source disassemblers (mipt-mips, REDasm) but does not state that the code for Coda itself is open-source or provide a link.
Open Datasets	No	To build the training dataset for stage 1, we randomly generate 50,000 pairs of high-level programs with the corresponding assembly code for each task. The training dataset for the error correction stage is constructed by injecting various types of errors into the high-level code. The paper generated its own dataset and does not provide public access information.
Dataset Splits	No	The paper does not provide explicit details about a validation dataset split (e.g., percentages or counts).
Hardware Specification	No	The paper mentions "limited GPU memory" as a challenge for long programs but does not specify any particular GPU model, CPU, or other hardware used for the experiments.
Software Dependencies	No	The paper mentions using `clang` for compilation and `mipt-mips` and `REDasm` for disassembling, but it does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We set Smax = 30 and cmax = 10 for EC machine in Algorithm 1. In our experiments, we inject 10 20% token errors whose locations are sampled from a uniform random distribution. To address the class imbalance problem during EP training, we mask 35% of the tokens with error status 0 (i.e., no error occurs) when computing the loss. The program is compiled using clang with conﬁguration -0O which disables all optimizations.