當前位置：首頁 > 報告詳情

LLSM：具有 EDA 引導的 CoT 提示、混合嵌入和 AIG 定制加速的 LLM 增強邏輯綜合模型.pdf

上傳人：蘆葦編號：651808 2025-05-01 PDF PDF 25頁 1.40MB

該報告所屬合集： 第三十屆亞洲及南太平洋設計自動化會議（ASP-DAC 2025）嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/25

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《LLSM：具有 EDA 引導的 CoT 提示、混合嵌入和 AIG 定制加速的 LLM 增強邏輯綜合模型.pdf》由會員分享，可在線閱讀，更多相關《LLSM：具有 EDA 引導的 CoT 提示、混合嵌入和 AIG 定制加速的 LLM 增強邏輯綜合模型.pdf（25頁珍藏版）》請在三個皮匠報告上搜索。

1、LLSM:LLM-enhanced Logic Synthesis Model with EDA-guided CoT Prompting,Hybrid Embedding and AIG-tailored AccelerationShan Huang*,Jinhao Li*,Zhen Yu,Jiancai Ye,Jiaming Xu,Ningyi Xu,Guohao Dai*Equal contributionShanghai Jiao Tong UniversityCorrespondence to:Guohao Dai ASP-DAC 2025OutlineOutlinePage 2Ba

2、ckgrounds and MotivationsRelated WorksChallenges and Techniques Overview EDA-guided CoT Prompting Text-Circuit Hybrid Embedding EDA-Tailored Acceleration Experiment ResultsExtension WorksElectronic Design Automation(EDA)Electronic Design Automation(EDA)Page 3Spec/ArchitectureDesignLogicDesignPhysica

3、lDesignSign-offMem.UnitComuteUnitCtrl.UnitCtrl.UnitCom.UnitMem.UnitLogic optimization&Map to a netlistPlacement&RoutingRTL codewritten byengineersTapeoutVerify functionality&manufacturabilityEDA refers to the use of EDA software tools to complete the functional design,synthesis,verification,physical

4、 design of VLSI chips.Key objective:Optimize the Power,Performance,Area(PPA)of the listplaceImportance of logic synthesisImportance of logic synthesisPage 4 Logic synthesis is time-consuming(50%)and has high capital cost(55%)in EDA process.TimeCost of capitalQualification of IP（26%，45%）Logic Synthes

5、is（RTLNetlist）（50%，55%）Physical Design（NetlistTapeout）（21%，56%）RTL DesignArch Design1 https:/ proportion，cost proportion）startend100%LogicLogicSynthesisSynthesisPage 5Logic synthesis is iterative in chip design.Predicting synthesis results can reduce iteration overhead.Fast,including syntax parsing,

6、design checking,etc.(15%)Slow,extensive heuristic processes(50%)Slow,further optimization(35%)Iter.SlowReduced circuit depthImport process library filesRTLCodeTraditional logic synthesis flow AI-assisted logic synthesis flow 1.Translation2.Logicoptimization3.Process mapping+PPAResultPPAResultAIModel

7、predictGraph Neural Networks（GNN）TransformerRTLCodeIter.FastOutlineOutlinePage 6Backgrounds and MotivationsRelated WorksChallenges and Techniques Overview EDA-guided CoT Prompting Text-Circuit Hybrid Embedding EDA-Tailored Acceleration Experiment ResultsExtension WorksGNN model circuits as graphs an

8、d extract graph-level features for predicting PPA,but face the inherent problemsGNNGNN-based methods for Logic Synthesisbased methods for Logic SynthesisPage 7Directed-AcylicGraphPIaANDPIbPIcPIdANDANDPO1 Akansha S.Over-squashing in graph neural networks:A comprehensive surveyJ.arXiv preprint arXiv:2

9、308.15568,2023.2 Rusch T K,Bronstein M M,Mishra S.A survey on oversmoothing in graph neural networksJ.arXiv preprint arXiv:2303.10993,2023.GNNOver-smoothing1GNN layersNode feature similarityAccuracyOver-squashing2Long distance weak connectionTransformerTransformer-based methods for Logic Synthesisba

10、sed methods for Logic SynthesisTransformer flats circuit to sequence,but faces scalability problems and cannot be applied to large graphsPage 81 Xu,Ceyu,Chris Kjellqvist,and Lisa Wu Wills.SNSs not a synthesizer:a deep-learning-based synthesis predictor.Proceedings of the 49th Annual International Sy

11、mposium on Computer Architecture.2022.2 https:/ https:/ AND ANDPOAttention MatrixTransformer1The attention of each node pair is calculatedSequentialModelingPIaPIbPIcPIdAND AND ANDPOCompute/Storage Complexity O(N2)NVIDIAB2002,N=2.081012Apple M43,N=2.81011Datasets used in academia also have circuits w

12、ith N105OutlineOutlinePage 9Backgrounds and MotivationsRelated WorksChallenges and Techniques Overview EDA-guided CoT Prompting Text-Circuit Hybrid Embedding EDA-Tailored Acceleration Experiment ResultsExtension WorksOverviewOverviewPage 10Logic Synthesis FlowLLMLM EncoderTech.1 EDA-guided CoT Promp

13、tingDownstreamGNNPredict PPA ResultsLLSMPrevious workTech.2 Text-Circuit Hybrid EmbeddingAIG-tailoredSpMMStateCacheTranslationLogicoptimizationProcess mappingRTLCodeTech.3 EDA-Tailored AccelerationLogic Synthesis CoTText EmbeddingCircuit SummaryFusionSynthesis FlowTechniqueTechnique1:1:EDAEDA-guided

14、guidedCoTCoTPromptingPromptingPage 11ChallengeLLMs lack the knowledge to analyze RTL code,and its expensive to train or fine-tune1 Chang,Kaiyan,et al.Data is all you need:Finetuning llms for chip design via an automated design-data augmentation framework.Proceedings of the 61st ACM/IEEE Design Autom

15、ation Conference.2024.2 Liu,Mingjie,et al.Chipnemo:Domain-adapted llms for chip design.arXiv preprint arXiv:2311.00176(2023).Lack of RTL Code Data1PretrainingDomain-AdaptivePretrainingFoundation ModelsLLaMA2(7B,13B，70B)EDA-domainFoundation Models(7B,13B，70B)ModelAlignmentEDA-domainChat Models(7B,13B

16、，70B)High Training Cost2Thousands of GPU hoursHundreds of GPU hoursTechniqueTechnique1:1:EDAEDA-guidedguidedCoTCoTPromptingPromptingPage 12ApproachTraining-free CoT method to guide LLM summarize size and gate-level information of RTL Code.Inputs:inData,clk,Outputs:outDataFunction of the Circuit:The

17、circuit implementsNave outputFunctional of each moduleScale:The overall structure involves a total of 10 delay stages due toOutput with CoTWithout Gate-levelInfromationWith Gate-levelInfromationEstimation:Each multiplier is estimated to be composed of 32 AND gates and 31 OR gates,while each adder co

18、nsists of 32 full adders.Each full adder is estimated to be 3 gates.-Multiplier gates:32 AND+31 OR=63*12=756 gates-Adder gates:32*3 gates*10=960 gatesEDA-guided Chain-of Thought methodsAnalyzing RTL IOAnalyzing modules and layersEstimate gatesnumberEstimate layersnumberRolePromptExampleLogic Synthes

19、is CoTThe information of the netlist after logical synthesis is deduced by logical analysisLLMRolePromptExampleLLMNave methodsTechniqueTechnique2:2:TextText-Circuit Hybrid EmbeddingCircuit Hybrid EmbeddingPage 13ChallengeClosed LLM results in the inability to extract feature embeddings and circuit s

20、ummary cannot be directly input into downstream modelsLLMClosedRTLCodeCircuitSummaryEmbeddingText modalCircuit modalPIaANDPIbPIcPIdANDANDPODownstreamModelsCircuitSummaryThe circuit summary&graph cannot be the downstream model input at the same timeEmbeddings cannot be obtained from closed-source LLM

21、sTechniqueTechnique2:2:TextText-Circuit Hybrid EmbeddingCircuit Hybrid EmbeddingPage 14Using a small Language Model as a text encoder togenerate text embeddingLLMClosedRTLCodeCircuitSummaryGraph modalPIaANDPIbPIcPIdANDANDPODownstreamGNNApproachLMSynthesis FlowTrans.PreddictedPPALM as an encoderThe l

22、ightweight trainable model trains both GNN and LM weights to improve prediction accuracyWeighted sumGraph EmbeddingText EmbeddingFused EmbeddingTechniqueTechnique3:3:EDAEDA-Tailored AccelerationTailored AccelerationPage 15BackgroundThe bottleneck of GNN is message propagation on edges,which can be a

23、bstracted as SpMM operatorGraph Neural Network(GNN)NabstractNInputnode featureANDPIaPIbNOutputnode featurenon-zero inadjacent matrixCSR(Efficient)COO(Not Efficient)00122231201232abcdefgrowIndcolIndvalue023671201232abcdefgrowPtrcolIndvalueCommonSparseFormatadjacent matrixPIaANDPIbAggregate embedding

24、from source Nodesabcdefgadjacent matrixTechniqueTechnique3:3:EDAEDA-Tailored AccelerationTailored AccelerationPage 16ChallengeThe introduction of LM result in slower inference and More sparse circuit graphs Time-consuming format conversion from COO to CSRTwo orders of magnitude slower than GNNCircui

25、tSummaryLMSynthesis Flow1000tokens20tokensAnd-InverterGraph(AIG)Edge Index(COO)CSRFormat0510110 xFormatConversioncuSPARSEComputingNorm.timeConvertcuSPARSE1 APICall1 NVIDIA sparse computing library,https:/ FormatAdjacent MatrixGNNToo SparseGNN Sparse FlowCost of LMCost of format conversionTechniqueTe

26、chnique3:3:EDAEDA-Tailored AccelerationTailored AccelerationPage 17Different input changes are the logic synthesis flow,and AIG has structural featuresInsightAND2 inputsAIGs Adjacent Matrix AIG In-degree PercentageNOT1 inputPrimary Input 0 inputIn-degree=2 is dominant9.38%7.80%82.82%012AIGs structur

27、al featuresCircuitSummaryLMSynthesis FlowRedundant computingof Circuit SummaryFixedVariable20 tokens1000 tokensTechniqueTechnique3:3:EDAEDA-Tailored AccelerationTailored AccelerationPage 18Using ELLPACK2 for efficient memory accessand fuse conversion and computing on GPUApproachAIGs Adjacent Matrix

28、2 0 5 1 2 3 1 36 427 6ValueIndexHigher access efficiencyFuse format conversion and computationELLPACK2 with padding:PaddingsLeverage AIG structural features toimprove parallelismStore state to cacheLMOffline StagestatecacheLMStoreLoadembeddingOnline StageCircuitSummarySynthesis FlowInference with20

29、tokensOutlineOutlinePage 19Backgrounds and MotivationsRelated WorksChallenges and Techniques Overview EDA-guided CoT Prompting Text-Circuit Hybrid Embedding EDA-Tailored Acceleration Experiment ResultsExtension WorksExperiment SetupExperiment SetupSetup GPU:A100,nvcc 11.8,Pytorch 2.0.1,PyG v2.5.3 Da

30、taset:OpenABC，23 IP，1500logic synthesis flow Baseline:OpenABC LOSTIN LM-model Mamba-130m DeBERTa-base Training 20 epochs Learning rate(0.1 for LM,0.01 forGNN)Page 201 Chowdhury A B,Tan B,Karri R,et al.Openabc-d:A large-scale dataset for machine learning guided integrated circuit synthesisJ.arXiv pre

31、print arXiv:2110.11292,2021.Communication/BusProtocolControllerCrptoDSPProcessorEvaluationEvaluationResultResultPage 213.49%and 1.19%average MAPE reductionArea prediction5.76%and 6.80%average MAPE reductionDelay prediction02468101214161820AreaDelayMAPE(%)reductionOpenABCOurs+OpenABCLOSTINOurs+LOSTIN

32、00.511.522.5ac97_ctrlaesaes_secworksaes_xcryptdes3_areadynamic_nodeethernetfirfpui2ciirjpegmem_ctrlpicosocsascsha256simple_spispiss_pcmtv80usb_phyvga_lcdwb_dmaaverageSpeedupSpeedupResultResultPage 22AIG-Tailored SpMM kernel achieves an average of 1.74 speedup compared with cuSPARSEAn average of end-

33、to-end 1.37 speedup compared with PyG00.511.522.533.5ac97_ctrlaesaes_secworksaes_xcryptdes3_areadynamic_nodeethernetfirfpui2ciirjpegmem_ctrlpicosocsascsha256simple_spispiss_pcmtv80usb_phyvga_lcdwb_dmaaverageSpMM Speedup vs cuSPARSEEnd-to-End Speedup vs PyGOutlineOutlinePage 23Backgrounds and Motivat

34、ionsRelated WorksChallenges and Techniques Overview EDA-guided CoT Prompting Text-Circuit Hybrid Embedding EDA-Tailored Acceleration Experiment ResultsExtension WorksExtension:Extension:AIGAIG-basedbasedGATGATaccelerationaccelerationThread workload reallocation and skip redundant computingPage 24Fus

35、ed-GAT kernel1AIG-GAT kernel 1 in-degree=1in-degree=2warp93.75%thread wastesync overhead()()():node feature:threadwarp 1 2 1 2No sync overhead 32 thread 32 threadaggregationaggregation1.54x average speedup and 46.8%memory usage reduction over PyGNo thread wasteSkip RedundantRedundant Softmax1 Zhang,

36、Hengrui,et al.Understanding gnn computational graph:A coordinated computation,io,and memory perspective.Proceedings of Machine Learning and Systems 4(2022):467-484.Page 25LLSM:LLM-enhanced Logic Synthesis Model with EDA-guided CoT Prompting,Hybrid Embedding and AIG-tailored AccelerationShan Huang,supervised by Prof.Guohao D,

相關圖表

本文主要介紹了上海交通大學的團隊提出的一種新的邏輯合成模型LLM-enhanced Logic Synthesis Model（LLSM），該模型通過EDA-guided CoT Prompting、Hybrid Embedding和AIG-tailored Acceleration等技術提高了邏輯合成的效率。研究背景和動機部分指出，邏輯合成是EDA過程中耗時最長、資本成本最高的環節，因此提高邏輯合成效率對優化芯片的PPA指標至關重要。相關工作部分介紹了現有的邏輯合成方法和神經網絡在邏輯合成中的應用。挑戰和技巧部分討論了神經網絡在邏輯合成中面臨的問題以及本文提出的解決方案。實驗結果部分展示了LLSM模型在邏輯合成中的優異性能。擴展工作部分探討了LLSM模型在更大規模電路中的應用潛力。

"LLM如何提升邏輯合成效率？" "如何通過文本-電路混合嵌入優化LLM？" "EDA引導的CoT提示技術有何優勢？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站