DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
Authors: Yi Liu, Changran Xu, Yunhao Zhou, Zeju Li, Qiang Xu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models understanding capabilities. These metrics capture semantic similarity more accurately than traditional methods like BLEU and ROUGE, which are limited to surface-level n-gram overlaps. By adapting curriculum learning to train Deep RTL, we enable it to significantly outperform GPT-4 in Verilog understanding tasks, while achieving performance on par with Open AI s o1-preview model in Verilog generation tasks. Section 5 presents 'EXPERIMENTAL RESULTS' including detailed tables and analysis for 'VERILOG UNDERSTANDING' and 'VERILOG GENERATION'. |
| Researcher Affiliation | Academia | Yi Liu1,2, Changran Xu1,2, Yunhao Zhou1,2, Zeju Li1,2, Qiang Xu1,2 1The Chinese University of Hong Kong 2National Technology Innovation Center for EDA EMAIL EMAIL |
| Pseudocode | No | The paper describes methods and processes (e.g., data annotation in Figure 1, curriculum learning in Section 4.3) but does not present any structured pseudocode or algorithm blocks with formal steps. |
| Open Source Code | Yes | Our code and datasets are available at https://github.com/Peter Lau61/Deep RTL. |
| Open Datasets | Yes | Our code and datasets are available at https://github.com/Peter Lau61/Deep RTL. |
| Dataset Splits | Yes | As the first work to consider the task of Verilog understanding, we introduce a pioneering benchmark to evaluate LLMs capabilities in interpreting Verilog code. This benchmark consists of 100 high-quality Verilog modules... Note that we exclude the cases in the benchmarks from our training dataset. |
| Hardware Specification | Yes | We utilize the distributed framework, Deep Speed, to efficiently fine-tune the model across a cluster equipped with eight NVIDIA A800 GPUs, each with 80GB of memory. |
| Software Dependencies | No | In our work, we have chosen to fine-tune Code T5+ (Wang et al., 2023a)... We primarily follow the instruction tuning script of Code T5+3 in the fine-tuning process, with a modification to expand the input context length to the maximum of 2048 tokens. We utilize the distributed framework, Deep Speed... Although Code T5+ and Deep Speed are mentioned, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | During inference, we adjust the temperature to 0.8 for understanding tasks and to 0.5 for generation tasks, while other hyperparameters remain at their default settings to ensure optimal performance. |