當前位置：首頁 > 報告詳情

利用自然語言處理提高生成式人工智能.pdf

上傳人：王** 編號：171098 2024-07-23 PDF PDF 26頁 2.09MB

該報告所屬合集： 2024年AI/機器學習峰會（AI/MACHINE LEARNING SUMMIT）嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/26

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《利用自然語言處理提高生成式人工智能.pdf》由會員分享，可在線閱讀，更多相關《利用自然語言處理提高生成式人工智能.pdf（26頁珍藏版）》請在三個皮匠報告上搜索。

1、Using NLP to Improve Generative AIPresented at:Data Summit 2024May 9,2024Using Generative AI in business researchOptions for creating Generative AI solutionsHallucination and Retrieval Augmented GenerationThe importance of context windowsStrategies for overcoming limitations of context windowsUsing

2、NLP in Generative AI solutionsAgenda3SinglePoint integrates all the content relevant to the client regardless of its originClientBest PracticesUser InterfaceContent CollectionsInternalRepositoriesNorthern Light SinglePointExternal Content ProvidersMachine Learning&Gen AIPrimary Market Research&CICus

3、tom Research SuppliersSyndicated ResearchBusiness NewsSearch Results,Content,and InsightsJournal Articles andJournal AbstractsInsight Distribution ToolsIntegrated SearchAuto Tagging and TaxonomiesPrimary Research Manager Workflow System4Generative AI for market research and competitive intelligence

4、is a powerful new toolUser questionCitation and link appears when mouse hovers over chickletMultiple sources for an observation are all citedGen AI responseStudy by Harvard Business School with 758 consultants at BCG found Generative and Conversational AI had dramatic impact on business strategy wor

5、k5Consultants divided in groups that either used ChatGPT-4 or did notGiven a series of business strategy research tasks to performThose consultants that were previously judged to be below average improved their performance by 43%Those consultants that were previously judged to be above average incre

6、ased their performance by 17%The group using Gen AI finished tasks 40%faster with 25%higher qualityOutput measured on quantity and qualitySurvey of 30,000 LinkedIn members by Microsoft,just published this week,found that employees are way ahead of employers on adoption of AI675%of business professio

7、nals are using AI at work78%of those(59 points of 75 points)are bringing their own AI tools to work not willing to wait for their companies to provide themUsers say AI helps them save time(90%),focus on their most important work(85%),be more creative(84%),and enjoy their work more(83%)All generation

8、s,from Boomers(73%)to Gen Z(85%)were heavy users of AI at workNew acronym coined:“BYOAI”for Bring Your Own AIRely on a pre-trained modelFine tune a pre-trained modelUse Retrieval Augmented Generation Three options for creating a Generative AI solution7LLM training data:dogs chase(frisbees 100 cars50

9、 cats 10)User input:A dog is chasing FluffyUser question:What is Fluffy?The LLM formulates the question as:What word is most likely to complete“Dogs chase“?Consults its training data to find the most probable answer Which is“Dogs chase frisbees.”Generative AI answer:Fluffy is a frisbeeThe problem of

10、 hallucination:LLMs are probabilistic text predictors that in chat applications often rely on their training data8Generate a list of relevant documents from vetted contentSend the document text to the API of the LLM along with the users questionPrompt the model to answer the question only from the s

11、ubmitted content“Retrieval Augmented Generation”Avoiding hallucination9User asks the question using a search index of high-quality,vetted contentSend text to LLM API along with the users questionSearch ResultsGenerative AI answers and summariesGather text from most relevant documentsLLM training dat

12、a:dogs chase(frisbees 100 cars50 cats 10)User input:A dog is chasing FluffyUser question:What is Fluffy?Search process inputs a retrieved set of documents that have the word“Fluffy”in themOne of the retrieved documents has this sentence:“Fluffy,despite being a cat,loves to chase frisbees.”The proces

13、s prompts the LLM model to only use text from the retrieved documents to answer the questionGenerative AI formulates the question as:What word is most likely to complete“Fluffy is a“Generative AI answer:Fluffy is a catUse RAG to avoid hallucination and ensure accuracy10Retrieval Augmented Generation

14、 is becoming a defacto standard11Retrieval-augmented generation is a technique that can provide more accurate results to queries than a generative large language model on its own because RAG uses knowledge external to data already contained in the LLM.-OracleRetrieval-Augmented Generation(RAG)is the

15、 process of optimizing the output of a large language model,so it references an authoritative knowledge base outside of its training data sources before generating a response.-AmazonCurrent models have made significant progress on the shortcomings of models that rely on memorized information issue b

16、y enhancing the solution platforms with a retrieval-augmented generation(RAG)front-end to allow for extracting information external to the model.-IntelRAG is an AI framework for retrieving facts from an external knowledge base to ground large language models(LLMs)on the most accurate,up-to-date info

17、rmation-IBMRetrieval-augmented generation(RAG)is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.-NvidiaHowever,when it comes to using LLMs in a real-world production scenario,they have some limitations,mainly due to the fact th

18、at they can answer questions related only to the data they were trained on.This means that they do not know facts that happened after their date of training,and they do not have access to data protected by firewalls.Retrieval Augmented Generation(RAG)is a pattern designed to overcome the limitations

19、of LLMs mentioned above by providing the LLM with the relevant and freshest data to answer a user question,injecting the information through the prompt.-MicrosoftLarge Language Models(LLMs)have context windows expressed as token limits(one token averages.75 words)GPT-3.5 Turbo last summer had a 4K c

20、ontext window GPT-3.5 Turbo today has a 16K context window GPT-4-Turbo has a 128K context windowThe context window constrains the sum of the input and output textHow much is enough?The business problem12How much text can be sent is where the context window comes into playUse Retrieval Augmented Gene

21、ration to avoid hallucination13User asks the question using a search index of high-quality,vetted contentSend text to LLM API along with the users questionSearch ResultsGenerative AI answersGather text from most relevant documentsToken Count Token Count Per Document50th PercentileToken Count Per Doc

22、ument75th percentilex20Life sciences journal abstracts3624529,040News articles8901,48429,680Engineering journal articles6,62410,510210,200Syndicated market research4,12312,520250,400Primary market research6,275 23,658 473,160Confidential Information of Northern LightHow much context is enough?14Use

23、the LLMs with larger context windowsChunk the documents,send just the relevant chunksSummarize the documents using LLM,operate on the summariesSend separate transactions for each document;make two passesUse NLP to eliminate text that is not meaningfulConfidential Information of Northern LightStrateg

24、ies for fitting into context windows15GPT-3.5 Turbo has a 16K context window$0.50 per M tokensGPT-4 Turbo has a 128K context window$10.00 per M tokensUsing the larger context window costs 20 x more per tokenConfidential Information of Northern LightUsing a larger model has an economic problem16Was m

25、ore talked about in antiquity(2023)when the context size was 4KBreak each document down into paragraph-sized chunksUse embeddings(a form of vector search)for retrieval of the chunksSend the most relevant chunks and ask for the Gen AI responseBut chunking the documents runs the risk of loss of accura

26、cy because relevant context may be in different chunks which do not get retrievedConfidential Information of Northern LightChunk the documents,operate on the chunks17Will lose a lot of information that doesnt make it into the summariesHave to process the entire corpus when only a small portion will

27、be usedNews example 15 million news articles in the corpus Only 1 million of them will show up on a search result for the users of a particular client in any given year Why pay to summarize all 15 million?Confidential Information of Northern LightSummarize the documents with the LLM,only send the su

28、mmaries in the RAG solution18First pass summarizes a document in the context of the users questionOne transaction per document;send as many as you wantSecond pass summarized the output from the first pass into an overall summaryBut hard to support conversational interactionSecond pass to provide ove

29、rall summary19User asks the question using a search index of high-quality,vetted contentSend text to LLM API along with the users questionSearch ResultsGenerative AI answers from each documentGather text from most relevant documentsSend answers to the LLMGenerative AI overall summaryWorks for many c

30、ontent typesWont fit into the most cost-efficient model for some secondary market research and for a large portion primary market researchConfidential Information of Northern LightSend separate transactions for each document with a second pass20Token Count Token Count Per Document75th percentileLife

31、 sciences journal abstracts452News articles1,484Engineering journal articles10,510Syndicated market research12,520Primary market research23,658 Reduce the document to its“Summary Worthy Sentences”Summary Worthy Sentences are declarative,which can be determined by the parsing tree of the sentenceSumm

32、ary Worthy Sentences express an interesting ideaSummary Worthy SentenceIBM acquired Red Hat today for$30 billion.Not Summary Worthy SentencesWhat did IBM do?Follow us on social media.This document contains forward looking statements .Publisher name provides market research for.Microsoft new producti

33、on similaritiesUse NLP eliminate text that is not useful21Confidential Information of Northern Light22NLP is used to condense documents into just their Summary Worthy SentencesIngest contentText&metadata extractionCreate Search IndexUse SyntaxNet to Part of Speech tag every wordUse Parsey to diagram

34、 every sentencesApply Summary Worthy Sentence RulesCreate“NLP Text”Version of each document that contains only the sentences that express ideas,commentary,analysis,and factsSearch ResultsBlue:Northern Light proprietary Red:Google TensorFlow librariesGreen:OpenAI API/Azure AIGold:Outputs to usersSend

35、 text to LLM API along with the users questionGenerative AI answersGather text from most relevant documentsDeclarative sentences are the unit of ideas and insights.Declarative sentences have a noun subject,a verb predicate in the root clause,and a direct object.We can use the parsed sentence diagram

36、 to evaluate whether a sentence expresses a relevant and pithy idea.In this case the machine learns that the sentence is about IBM(the noun subject)and that IBM acquired(the root verb)RedHat(the direct object)Northern Light computes the parsing tree for three million sentences a daySyntaxNet and Par

37、sey work together to produce diagramed sentences that can be interpreted235/9/2024Confidential Information of Northern LightEven the largest documents in business organizations can fit within the 16K modelReduces the API costs by 95%For any given model,reduces the API costs of a Generative AI soluti

38、on by 55%Confidential Information of Northern LightUsing NLP to focus on Summary Worthy Sentences reduces the document text by 55%on average24Token Count Token Count Per Document75th percentileLife sciences journal abstracts452News articles1,484Engineering journal articles10,510Syndicated market res

39、earch12,520Primary market research23,658 Generative AI has changed the search paradigm,and the genie cant be put back into the bottleHigh rewards will accrue to those organizations that get a head startGenerative AI dramatically reduces the time to accomplish tasks and improves the quality of work f

40、or business analysis Retrieval Augmented Generation is becoming a defacto standard for Generative AIContext windows impose serious limitations on the design and operation of RAG solutionsThere are many strategies for overcoming these limitations,and NLP to reduce document text to only meaningful sentences is often usefulParting shots25Thanks!C.David SeussCEONorthern LightD1-617-515-577126This presentation was written entirely by a human being and GPT-3.5 Turbo accepts no responsibility for errors made by the author

相關圖表

本文主要探討了自然語言處理（NLP）在改進生成性人工智能（AI）方面的應用，特別是在商業研究和競爭情報領域。文章提到了生成性AI在商業策略工作中的巨大影響，以及員工在采用AI方面的領先地位。數據顯示，75%的商業專業人士在工作中使用AI，其中59%的人自帶AI工具到工作中。所有年齡段，從嬰兒潮一代到Z一代，都是AI在工作中的重度用戶。文章還討論了生成性AI解決方案的創建選項，以及如何使用檢索增強生成（RAG）來克服生成性大型語言模型（LLM）的限制。最后，文章提出了使用NLP將文檔縮減為表達觀點、評論、分析和事實的“摘要性句子”的方法，以降低API成本并提高生成性AI解決方案的效率。

"Generative AI如何改變商業研究？" "Retrieval Augmented Generation如何提升AI生成效率？" "NLP如何幫助克服Generative AI的上下文窗口限制？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站