reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

Authors: Yi Liu, Changran Xu, Yunhao Zhou, Zeju Li, Qiang Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models understanding capabilities. These metrics capture semantic similarity more accurately than traditional methods like BLEU and ROUGE, which are limited to surface-level n-gram overlaps. By adapting curriculum learning to train Deep RTL, we enable it to significantly outperform GPT-4 in Verilog understanding tasks, while achieving performance on par with Open AI s o1-preview model in Verilog generation tasks. Section 5 presents 'EXPERIMENTAL RESULTS' including detailed tables and analysis for 'VERILOG UNDERSTANDING' and 'VERILOG GENERATION'.
Researcher Affiliation	Academia	Yi Liu1,2, Changran Xu1,2, Yunhao Zhou1,2, Zeju Li1,2, Qiang Xu1,2 1The Chinese University of Hong Kong 2National Technology Innovation Center for EDA EMAIL EMAIL
Pseudocode	No	The paper describes methods and processes (e.g., data annotation in Figure 1, curriculum learning in Section 4.3) but does not present any structured pseudocode or algorithm blocks with formal steps.
Open Source Code	Yes	Our code and datasets are available at https://github.com/Peter Lau61/Deep RTL.
Open Datasets	Yes	Our code and datasets are available at https://github.com/Peter Lau61/Deep RTL.
Dataset Splits	Yes	As the first work to consider the task of Verilog understanding, we introduce a pioneering benchmark to evaluate LLMs capabilities in interpreting Verilog code. This benchmark consists of 100 high-quality Verilog modules... Note that we exclude the cases in the benchmarks from our training dataset.
Hardware Specification	Yes	We utilize the distributed framework, Deep Speed, to efficiently fine-tune the model across a cluster equipped with eight NVIDIA A800 GPUs, each with 80GB of memory.
Software Dependencies	No	In our work, we have chosen to fine-tune Code T5+ (Wang et al., 2023a)... We primarily follow the instruction tuning script of Code T5+3 in the fine-tuning process, with a modification to expand the input context length to the maximum of 2048 tokens. We utilize the distributed framework, Deep Speed... Although Code T5+ and Deep Speed are mentioned, specific version numbers for these software components are not provided.
Experiment Setup	Yes	During inference, we adjust the temperature to 0.8 for understanding tasks and to 0.5 for generation tasks, while other hyperparameters remain at their default settings to ensure optimal performance.