當前位置：首頁 > 報告詳情

基于大模型的缺陷靜態檢查.pdf

上傳人：張** 編號：175747 2024-09-09 PDF PDF 41頁 5.08MB

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/41

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《基于大模型的缺陷靜態檢查.pdf》由會員分享，可在線閱讀，更多相關《基于大模型的缺陷靜態檢查.pdf（41頁珍藏版）》請在三個皮匠報告上搜索。

1、婁一翎計算機科學技術學院復旦大學基于大模型的缺陷靜態檢測基于大模型的缺陷靜態檢測LLM-based StaticLLM-based Static BugBug Detection Detection StaticStatic BugBug DetectionDetectionSoftwareStatically analyzing the codeIdentifying bugs/vulnerabilitiesStaticStatic analysisanalysis toolstoolsData-driven&learning-Data-driven&learning-based based

2、 Buggy/correct code instanceML/DL modelsRecent trend:LLM-based bug detectionRecent trend:LLM-based bug detectionReseachers are very interested in“how effectively do LLMs detect bugs?”It seems quite promising that LLMs can identify bugs sometimes,But still not always perfect for the precision and rec

3、all in pracice.Recent trend:LLM-based bug detectionRecent trend:LLM-based bug detectionAdvanced Prompting StrategiesFine-tuningCrafted instructionsProject InformationCWE General KnowledgeCoT reasoningAST/CFG in PromptThere emerge many studies exploring how different prompting strategies can help LLM

4、 in bug dectionLLMs1 Zhang C,Liu H,Zeng J,et al.Prompt-enhanced software vulnerability detection using chatgpt.ICSE 2024 Poster.2 Purba,Moumita Das,et al.Software vulnerability detection using large language models.ISSREW 2023.3 Fu,Michael,et al.Chatgpt for vulnerability detection,classification,and

5、 repair:How far are we?APSEC 2023.Recent trend:advanced prompt strategies in LLM-based bug detectionRecent trend:advanced prompt strategies in LLM-based bug detection1 Zhang C,Liu H,Zeng J,et al.Prompt-enhanced software vulnerability detection using chatgpt.ICSE 2024 Poster.2 Purba,Moumita Das,et al

6、.Software vulnerability detection using large language models.ISSREW 2023.3 Fu,Michael,et al.Chatgpt for vulnerability detection,classification,and repair:How far are we?APSEC 2023.Crafted instructionsProject InformationCWE General KnowledgeCoT reasoningAST/CFG in PromptIt still remain unexploredune

7、xplored how do LLMs perform compared to traditional techniques(i.e.,based on static analysis)?LimitationsLimitations ofof traditionaltraditional techniquestechniquesHow LLMs address the limitationsHow LLMs address the limitations ofof traditionaltraditional techniquestechniquesBoundary of knowledge

8、-E.g.,the specifications of APIs are not comprehensively includedScalability issue of analysis mechanism -E.g.,path explosion in inter-procedure analysis Generality issue to specific domains -E.g.,manually implement the checking rules for business-related bugsLLMs are good at summarizing the intenti

9、on of codeLLMs can avoid diving into some procedure based on API intention LLMs can detect buggy behaviors based on natural language description This talk is aboutThis talk is about1.1.Synergy of LLMs and Static AnalysisSynergy of LLMs and Static AnalysisUsing LLMs to refining souce/sinks and reacha

10、bility analysis2.Enhance LLMs with Bug Knowledge Base2.Enhance LLMs with Bug Knowledge BaseUsing LLMs to build and use bug-specific knowledge base This talk is aboutThis talk is about1.1.Synergy of LLMs and Static AnalysisSynergy of LLMs and Static AnalysisUsing LLMs to refining souce/sinks and reac

11、hability analysis2.Enhance LLMs with Bug Knowledge Base2.Enhance LLMs with Bug Knowledge BaseUsing LLMs to build and use bug-specific knowledge base Case study on ResourceCase study on Resource leakleak detectiondetection (Background)(Background)LockManager.acquireLock()If(LockManager!=null)LockMana

12、ger.releaseLock()RAR pairRAR pair:the pair of the R Resource A Acquisition method and the R Resource Release API methode.g.,ResourceResource reachabilityreachability validationvalidationAn unreachable resource would not cause leaks even without the releaseCase study on ResourceCase study on Resource

13、 leakleak detectiondetectionResource AcquireResource ReleaseConstruct control-flow pathsIdentify paths related to resourceCheck the resource reachabilityCheck if the resource is releasedUnreachableAccurate RAR pairs PoolPrecise Resource Reachability Validation KeyKey ChallengesChallengesExistingExis

14、ting staticstatic analysisanalysis toolstoolsPredefine a set of RAR pairs and perform string matchPredefine several rules(e.g.,res!=null)Incomplete RAR pairsMissing unreachable paths Challenges in precise context-sensitive and intuitive reasoningIncorrect RAR pairs False positiveFalse positive/negat

15、iveLimitationLimitation 1:1:Incomplete/IncorrectIncomplete/Incorrect RARRAR pairpair poolpoolCompleteComplete RAR Pair PoolKeyKey ChallengesChallengesExistingExisting staticstatic analysisanalysis toolstoolsPredefine a set of RAR pairs and perform string matchFalseFalse NegativeNegative(low recall):

16、It is infeasible to detect resource leaks that are related to RAR undefined in the initial RAR pair poolA huge number of RAR pairs in open-source projects:e.g.,738 RAR pairs related to the Lock resource ChallengeChallenge 1:1:howhow to to buildbuild a a generalgeneral resourceresource leakleak detec

17、tiondetection tooltool thatthat couldcould covercover a a widewide rangerange of of RARRAR parspars in in diversediverse projects?projects?Incomplete RAR pairsIncorrect RAR pairs False positive/negativeLimitationLimitation 2:2:MechanicalMechanical resourceresource reachabilityreachability validation

18、validationFalseFalse PositivePositive(low precision):The unreachable resource without release would be considered as resource leak.Miss potential reachability validation checke.g.,!res.Disabled()ChallengeChallenge 2 2:howhow to to buildbuild a a generalgeneral resourceresource leakleak detectiondete

19、ction tooltool thatthat couldcould precisely identifyprecisely identify thethe resourceresource reachabilityreachability validationvalidation in in diversediverse projectsprojects?Precise Resource Reachability Validation Predefine several rules(e.g.,res!=null)False PositiveMissing unreachable paths

20、Resource AcquireResource ReleaseUnreachableFalse AlarmsMotivation:Motivation:toto improveimprove existingexisting staticstatic analysisanalysis approachesapproachesChallengeChallenge 1:1:howhow to to buildbuild a a generalgeneral resourceresource leakleak detectiondetection tooltool thatthat couldco

21、uld covercover a a widewide rangerange of of RARRAR parspars in in diversediverse projects?projects?MiningMining resource-relatedresource-related knowledgeknowledge fromfrom thethe massivemassive corpuscorpus in in open-sourceopen-source softwaresoftware.EnhancingEnhancing existingexisting analysis-

22、basedanalysis-based approachesapproaches withwith thethe minedmined knowledgeknowledge forfor a bettera better understandingunderstanding of theof the codecode intentionintention.PredefinedPredefined RulesRulesResourceResource managementmanagement knowledgeknowledge basebase(e.g.,RAR pairs,reachabil

23、ity checking operations)ChallengeChallenge 2 2:howhow to to buildbuild a a generalgeneral resourceresource leakleak detectiondetection tooltool thatthat couldcould precisely performprecisely perform thethe resourceresource reachabilityreachability validationvalidation in in diversediverse projectspr

24、ojects?DomainDomain KnowledgeKnowledgeModelsOverviewOverview ofof MIROKMIROK MiningMining resource-relatedresource-related knowledgeknowledge fromfrom thethe massivemassive corpuscorpus in in open-sourceopen-source softwaresoftware to to improveimprove resourceresource leakleak detectiondetection Ev

25、aluation:the improvement over basic static analysisEvaluation:the improvement over basic static analysisMIROK mines 1,313 new Abs-RAR pairs from 1,454,224 Java methods (89.2%(89.2%areare valid)valid)MIROK instantiates 6,314 RAR pairs in 2,261 Maven libraries (93.3%are valid)Our mined RAR pairs are r

26、eleased for the community and could be integrated into existing resource leak detection tools.MIROK detects 761 leaks v.s.baselines detects 16873.4%(188)are manually checked as true positive Benchmark:46,389 Java code snippets in Stack OverflowOur Method:Rule-based matching based on 1,197 valid Abs-

27、RAR pairsBaseline:Rule-based matching based on 26 seed Abs-RAR pairsBenchmark:10 compilable Java projects from GitHubOur Method:Findbugs*=Findbugs+6,314 RAR pairs mined by MIROKBaseline:original Findbugs Results:Findbugs*:15 reports,7 are true bugs(PR was accepted)Findbugs:9 reports,4 are true bugsO

28、verviewOverview ofof INFERROIINFERROIStep1:Use LLM to identify resource-oriented codeStep2:Provide static analysis with the identified resource-oriented code for resource leak detectionINFERROI:INFERROI:LLM-basedLLM-based intentionintention inferenceinference The answer returned by GPT-4Prompt templ

29、ate in INFERROIFormalized (client,167),(client,186),(client,185)Resource-orientedResource-oriented intentionintentionINFERROI:enhancing static analysis with identified intentionINFERROI:enhancing static analysis with identified intentionAlternatively:the inferred intention can be represented in the

30、format accepted by existing static analysis tools(e.g.,representing as the source/sink specification query in CodeQL)Evaluation:Evaluation:on existingon existing resource leak detection datasetsresource leak detection datasets INFERROI achieves a best trade-off between both detection rate and false

31、alarms.INFERROI coverages a a widewide rangerange of resource types.Evaluation:Evaluation:detectdetect unknownunknown resourceresource leaksleaks onon open-sourceopen-source projectsprojectsIn the 100100 methods sampled from open-source projects in Github,InferROI reports 16 16 resourceresource leak

32、sleaks and 12 12 are annotated as are annotated as true bugstrue bugs (7(7 bugsbugs areare confirmedconfirmed byby developers)developers)Accepted PRsEvaluation:Evaluation:comparedcompared toto basicbasic GPT-4GPT-4Directly applying GPT-4 without combining with analysis techniques has very high false

33、 positivesThis talk is aboutThis talk is about1.1.Synergy of LLMs and Static AnalysisSynergy of LLMs and Static AnalysisUsing LLMs to refining souce/sinks and reachability analysis2.Enhance LLMs with Bug Knowledge Base2.Enhance LLMs with Bug Knowledge BaseUsing LLMs to build and use bug-specific kno

34、wledge base Using existingUsing existing bugsbugs toto boost LLM-based bug detection boost LLM-based bug detection Similar bugs recur during software evolution or among similar softwareSimilar Code ContextSimilar Root CauseSimilar Fixing SolutionsProviding relevant existing bugs in the input conetxt

35、 of LLMs(Using the in-context learning capabilities of models)UsingUsing existingexisting bugsbugs toto boost LLM-based bug detectionboost LLM-based bug detection viavia RAGRAGA classic pipeline of RAG(Retrieval-based Augmentation Generation)11 A Survey on RAG Meeting LLMs:Towards Retrieval-Augmente

36、d Large Language ModelsAssertion generationCode completionProgram repairRAG has shown promising effectiveness in many software engineering tasks.Step1:Retrieving the relevant info Step1:Retrieving the relevant info from the knowledge basefrom the knowledge baseStep2:Putting the Step2:Putting the ret

37、rieved info in the input retrieved info in the input promtpromtChallenges:usingChallenges:using existingexisting bugsbugs toto boost LLM-based bug detectionboost LLM-based bug detection viavia RAGRAGA classic pipeline of RAG(Retrieval-based Augmentation Generation)11 A Survey on RAG Meeting LLMs:Tow

38、ards Retrieval-Augmented Large Language ModelsKnowledge Knowledge BaseBaseRetrieval Retrieval MechanismMechanismKey Components Key Components in RAG in RAGInference Inference MechanismMechanismHow to represent existing bugs in the knowledge base?How to use the retrieved bugs to prompt LLMs?Just code

39、 snippets?Directly append in the input?How to find the most relevant bugs?Code similarity?Motivating examplesMotivating examplesWhen only retriving only based on code simialrityIt is very likely to get semantically or functionally different bugs Better retrieval strategy is required Motivating examp

40、lesMotivating examplesWhen only putting the relevant buggy code in the prompt,Its hard for LLMs to get the correlation between the retrieved bug and the given code Better in-context prompting strategy is required Our insight Our insight RepresentSummarizeClusterExisting Bugs Knowledge-levelrepresent

41、ation FunctionalityRoot CauseFixing SolutionInstead of straightforward code snippetsfurther representing existing bugs with in high-level knowledge of natural language descriptions 1.To retrieve the lexically-different but semantically-similar bugs 2.To faciliate the comprehension capabilities of LL

42、Ms for the input VulVul-RAG Approach Pipeline:Knowledge-level RAG for vulnerability detection-RAG Approach Pipeline:Knowledge-level RAG for vulnerability detectionStepStep1:1:ConstructingConstructing a a knowledgeknowledge bugbug of of existingexisting CVEsCVEsVul-RAG PipelineVul-RAG PipelineStepSte

43、p2:2:RetrievingRetrieving thethe related vulnerability related vulnerability knowledge for the given knowledge for the given code code VulVul-RAG Approach Pipeline:Knowledge-level RAG for vulnerability detection-RAG Approach Pipeline:Knowledge-level RAG for vulnerability detectionVul-RAG PipelineVul

44、-RAG PipelineStepStep3 3:reasoning:reasoning whether the given code whether the given code is vulnerable based on is vulnerable based on the retrieved knowledge the retrieved knowledge VulVul-RAG Approach Pipeline:Knowledge-level RAG for vulnerability detection-RAG Approach Pipeline:Knowledge-level

45、RAG for vulnerability detectionVul-RAG PipelineVul-RAG PipelineStep1:Constructing knowledge base(off-line)Step1:Constructing knowledge base(off-line)Step2:Retrieving relevant vulnerability knowledgeStep2:Retrieving relevant vulnerability knowledge Query GenerationQuery Generation:the abstract purpos

46、e,detailed behavior,and the code itself Candidate Knowledge RetrievalCandidate Knowledge Retrieval:three-dimension similary Candidate Knowledge Re-rankingCandidate Knowledge Re-ranking:re-rank candidate knowledge items with the Reciprocal Rank FusionStep3:Detection Reasoning Step3:Detection Reasonin

47、g Prompt for LLMsIf the given code:(i)with the similar vulnerability causes(ii)without the relevant fixing operationsit will be considered as vulnerable Evaluation:benchmark Evaluation:benchmark PairVulPairVulDetailed statistics of the benchmark PairVulWe first construct a new benchmark PairVul by m

48、ining from CVEs of Linux KernelPairVul exclusively includes the pairs of vulnerable code and the relevant patched codeEvaluation:compared with SOTA vulnerability detection techniques Evaluation:compared with SOTA vulnerability detection techniques Compared with existing fine-tuning based techniquesC

49、ompared with basic GPT-4 and code-level RAGPain Points in IndustryPain Points in IndustryIn industry,there are many customized code checking rules related to to the business or specific domains It takes lots of efforts to manually implement them into static analysis checking tools Most rules are des

50、cribed in natural language,which can be leveraged to construct the knowledge base and to incorparate into our LLM-based detection framework.Discussion and future improvements Discussion and future improvements The limitationlimitation of knowledge-level RAG for bug/vulnerablity detection The strengt

51、hstrength of knowledge-level RAG for bug/vulnerablity detection DomineDomine-specific-specific:knowledge-level RAG framework is suitable for business-related bugs or specific bug types.ExtentabilityExtentability:the knowledge base can be continously extended with newly-released bugs.Interpretability

52、Interpretability of the detection results:we perform a user study to confirm that the knowledge can quickly/precisely help developers confirm the detection resultsCan be practically deployed for industrial code review(especially for their domain-specific bugs)False Negative:Inaccurate vulnerability

53、knowledge descriptionsUnretrieved relevant vulnerability knowledge Non-existent relevant vulnerability knowledge in the knowledge baseFalse Positive:Mismatched fixing solutionsIrrelevant vulnerability knowledge retrieval SummarySummaryCheck out our papers for more details!Mining Resource-Operation K

54、nowledge to Support Resource Leak Detection.FSE 2023.Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference.Arxiv 2024.Vul-RAG:Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG.Arxiv.2024How to better extract input contexts for LLMs?How to better identify dynamically-loaded constraints via LLMs?Configuration/external API queryOpen challenges

相關圖表

本文探討了大規模語言模型（LLM）在軟件缺陷靜態檢測中的應用。作者提出，盡管LLM在某些情況下能有效識別缺陷，但在實踐中仍存在準確性和召回率不足的問題。文章詳細介紹了兩種基于LLM的缺陷檢測方法：一是通過優化源/匯點分析和可達性分析來協同LLM和靜態分析；二是通過增強LLM的缺陷知識庫來構建和使用針對性的知識庫。研究還介紹了MIROK和INFERROI兩種方法，MIROK從開源軟件的大量代碼庫中挖掘資源相關的知識，以改進資源泄漏檢測；INFERROI則通過LLM識別資源導向代碼，并結合靜態分析來檢測資源泄漏。文章還討論了如何利用現有的缺陷信息來提高LLM在缺陷檢測中的性能。評估結果顯示，這些方法在檢測率和支持的資源類型方面取得了顯著成果，但仍有改進空間。

"如何提高LLM在靜態缺陷檢測中的效果？" "LLM如何與靜態分析工具協同工作？" "如何構建適用于不同項目的通用資源泄漏檢測工具？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站