OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module Design
Quick Answer
OpenRTLSet is the largest open-source dataset for hardware design, featuring over 131,000 Verilog code samples.
Quick Take
OpenRTLSet is the largest open-source dataset for hardware design, featuring over 131,000 Verilog code samples. It enables fine-tuning of language models like Qwen and Granite for Verilog code generation, demonstrating superior performance in hardware design tasks through open-source methodologies.
Key Points
- Dataset includes 102k Verilog modules from GitHub and 29k from translations.
- Paired natural language descriptions generated using DeepSeek-R1 model.
- Explores quantization techniques like INT4 and BF16 for performance.
- Supports fine-tuning for various language model families.
- Establishes a foundation for accessible research and commercial use.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 10285v1 Announce Type: new Abstract: OpenRTLSet introduces the largest fully open-source dataset for hardware design, offering over 131,000 diverse Verilog code samples to the research community and industry. Our dataset uniquely combines Verilog code from GitHub repositories (102k modules), VHDL translations (5k modules), and synthesizable C/C++ translations (24k modules), all freely accessible without proprietary restrictions.
Using the reasoning model DeepSeek-R1, we generated paired natural language descriptions for each code sample, enabling fine-tuning of various language model families (e. g. , Qwen and Granite) for Verilog code generation. Our dataset explores multiple options, including Verilator-generated C++ files as additional context during labeling, quantization techniques (INT4 vs. BF16), and performance differences across model sizes (7B-32B parameters).
OpenRTLSet demonstrates that open-source approaches can achieve superior performance in hardware design tasks, establishing a new foundation for accessible research and commercial use in this domain.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.