《RTLMarker:通過硬件水印框架保護 LLM 生成的 RTL 版權.pdf》由會員分享,可在線閱讀,更多相關《RTLMarker:通過硬件水印框架保護 LLM 生成的 RTL 版權.pdf(24頁珍藏版)》請在三個皮匠報告上搜索。
1、RTLMarker:Protecting LLM-Generated RTL Copyright via a Hardware Watermarking FrameworkKun Wang,Kaiyan Chang,Mengdi Wang,Xingqi Zou,Haobo Xu,Yinhe Han,Ying Wang*Outline Introduction Background RTLMarker Evaluation ConclusionLarge Language ModelTranslationCode GenerationChatSummarySearchReasoningLarge
2、 Language ModelRTL CodeLogic synthesisNetlistExpert-writtenHigh-level codespecificationRTL CodeLogic synthesisNetlistspecificationLarge Language ModelNature language Electronic Design Automation(EDA)flow LLM for hardware design Risks of LLMs Fake news Malicious/Vulnerable code Sensitive content Priv
3、ate data leaks Fraud Security vulnerabilityWe need to embed watermarks to RTL code generated by LLM!Outline Introduction Background RTLMarker Evaluation ConclusionLLM Watermark Text Watermark:WLLM Code Watermark:SWEETWLLM1 SWEET21 John Kirchenbauer,Jonas Geiping,Yuxin Wen,Jonathan Katz,Ian Miers,and
4、Tom Goldstein.2023.A watermark for large language models.In InternationalConference on Machine Learning.PMLR,1706117084.2 Taehyun Lee,Seokhee Hong,Jaewoo Ahn,Ilgee Hong,Hwaran Lee,Sangdoo Yun,Jamin Shin,and Gunhee Kim.2023.Who wrote this code?watermarking for code generation.arXiv preprint arXiv:230
5、5.15060(2023).LLM WatermarkEffectivenessRobustnessTransparencywatermarkTrade-offThe watermarks should be effectively embedded and detectable.The watermark should preserve the code quality and remain inconspicuous.The watermark should be resilient to common attack methods,such as string replacement a
6、ttacks.LLM Watermark Embedding watermarks to LLM-generated RTL code presents the following challenges:Existing methods cannot guarantee the correctness of the watermarked RTL code.There is a tradeoff between the transparency and effectivenessof watermarks.Watermarks at the Register Transfer Level(RT
7、L)are difficult to further embed into the synthesized netlist.Outline Introduction Background RTLMarker Evaluation ConclusionRTLMarker Rule-Based Verilog Code Transformations Transformations can be split into Token level and statement level.15 code transformations are implemented by Pyverilog.RTLMar
8、ker FrameworkWatermarking Embedding Learning-Based Watermark Embedding The Embedding network outputs the selected Transformation Set based on the LLM-generated code.An AST-based approach is used to apply the corresponding transformations,generating the watermarked code.Transformation SetToken-levelR
9、1:State Variables EncodingR2:Parameterized ModuleStatement-levelR8:State Transition PathR9:Combinational Logic Operation.EncoderEncoderEncoderEncoderEmbedding NetworkEncoderEncoderMLPMLPAST-based transformations(Pyverilog)LLM-generated Code(x)Watermarked Code(xw)parameter S_IDLE=2b00;parameter S_1=2
10、b01;always(posedge CLK or posedge RST)beginif(!rst_nc)begincount=2b0;acc_data=10b0;parameter 3:0 S_IDLE=4b0001;parameter 3:0 S_1=4b0010;always(posedge clk_nc,posedge rst_nc)beginif(!rst_nc)beginacc_data=10b00000_00000;count=2b0;R1R5R6R12Selected Transformation Set(T)Watermarking Embedding Embedding
11、watermark into netlistRTL codeLogic synthesis(yosys)NetlistHigh-level semantic information will be lost1234567Feature Representation Input:LLM-generated Code()&Selected transformation set()Output Transformed code()EncoderEncoderEncoderEncoderEncoderEncoderEncode informationDecoderDecoderDecoderDecod
12、erDecoderDecoderFeature Representation NetworkTransformated Code(xa).parameter 3:0 S_IDLE=4b0001;parameter 3:0 S_1=4b0010;always(posedge clk_nc,posedge rst_nc)beginif(!rst_nc)beginacdc_dada=10b00000_00000;count=2b0;R1R5R6R12Selected Transformation Set(T)LLM-generated Code(x)parameter S_IDLE=2b00;par
13、ameter S_1=2b01;always(posedge CLK or posedge RST)beginif(!rst_nc)begincount=2b0;acc_data=10b0;Watermark Detection Watermark Detection at the Register Transfer Level(RTL)Watermark Detection at Netlist Level Employ synthesis tools yosys to synthesize code into netlist Parse out the embedded watermark
14、 from netlistTransformated Code(xa).parameter 3:0 S_IDLE=4b0001;parameter 3:0 S_1=4b0010;always(posedge clk_nc,posedge rst_nc)beginif(!rst_nc)beginacdc_dada=10b00000_00000;count=2b0;EncoderEncoderEncoderEncoderEncoderEncoderMLPMLPDetection NetworkWatermarkConfidence P(e.g.,0.79)Outline Introduction
15、Background RTLMarker Evaluation ConlusionExperiment Setup Benchmark RTLLM:30 Verilog problems with varying levels of complexity.VerilogEval:156 verilog problems sourced from the Hdlbits website.Target Model RTLCoder、GPT4、ChipGPT-FT Baseline WLLM&SWEET Metrics ACC:(TP+TN)/(TP+TN+FP+FN)TPR:TP/(TP+FN)F
16、PR:FP/(FP+TN)TP:True PositivesTN:True NegativesFP:False PositivesFN:False NegativesEvaluation Effectiveness RTLMarker achieves an accuracy of over 95%on the RTLLM benchmark,while SWEET and WLLM only achieve 71.67%and 83.33%,respectively.RTLMarker achieves an accuracy of over 92%on VerilogEval benchm
17、ark,while SWEET and WLLM only achieve 58.33%and 63.14%,respectively.Compared to the VerilogEvalbenchmark,the accuracy of watermark embedding and detection is higher on the RTLLM benchmark.Evaluation Robustness Variable name replacement attack.We considered renaming 25%,50%,75%and 100%of the variable
18、s in the watermarked code.RTLMarker is only slightly affected by variable name replacement attacks.Evaluation Transparency We use the number of code transformations to measure the transparency of the watermark.The average number of applicable code transformations in the RTLLM benchmark is 6.42,while
19、 the number of code transformations that RTLMarker utilizes is 4.25,effectively enhancing the transparency of the watermarkOutline Introduction Background RTLMarker Evaluation ConclusionConclusion To our knowledge,this research is the pioneering effort to introduce a practical and efficient watermar
20、king framework designed to safeguard the copyright of RTL generated by large language models.We propose a comprehensive suite of Verilog-centric code transformations and concurrently create a state-of-the-art tool powered by Pyverilog to facilitate these transformations.Our study introduces an advanced framework for embedding and identifying hardware watermarks,functional at both the Register Transfer Level(RTL)and the logic netlist level.THANKS