Leveraging Automated Unit Tests for Unsupervised Code Translation
Authors: Baptiste Roziere, Jie Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, Guillaume Lample
ICLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Baptiste Rozière Facebook AI Research Paris-Dauphine University EMAIL Jie M. Zhang University College London EMAIL François Charton Facebook AI Research EMAIL Mark Harman Facebook EMAIL Gabriel Synnaeve Facebook AI Research EMAIL Guillaume Lample Facebook AI Research EMAIL |
| Pseudocode | No | The paper describes its methods in narrative text and with diagrams but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We submit our code with this submission, along with a Read Me file detailing clear steps to reproduce our results, including a script to set-up a suitable environment. We will open-source our code and release our trained models. |
| Open Datasets | Yes | Datasets. As Trans Coder and DOBF, we use the Git Hub public dataset available on Google Big Query filtered to keep only projects with open-source licenses1. |
| Dataset Splits | Yes | We evaluate our models on the full validation and test sets of Trans Coder. |
| Hardware Specification | Yes | Our models were trained using standard hardware (Tesla V100 GPUs) and libraries (e.g. Pytorch, Cuda) for machine-learning research. |
| Software Dependencies | No | The paper mentions 'Pytorch, Cuda' as libraries used but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For the online version, we set a cache warm-up parameter to ensure that we always generate new parallel examples if there are less than 500 examples in the cache for any language pair. Otherwise, we sample from the cache with probability 0.5, or create new parallel functions to add to the cache. When an example is sampled, we remove it from the cache with a given probability. The sampled elements are removed from the cache with probability 0.3, so that each element we create is trained on about 4 times in average before being removed from the cache. We initialize the cache with parallel examples created offline. During beam decoding, we compute the score of generated sequences by dividing the sum of token log-probabilities by lα where l is the sequence length. We found that taking α = 0.5 (and penalizing long generations) leads to the best performance on the validation set. |