《大語言模型提示詞工程原理及最佳實踐-鄭昊pdf.pdf》由會員分享,可在線閱讀,更多相關《大語言模型提示詞工程原理及最佳實踐-鄭昊pdf.pdf(41頁珍藏版)》請在三個皮匠報告上搜索。
1、 2023,Amazon Web Services,Inc.or its affiliates.All rights reserved.大語言模型提示詞工程原理大語言模型提示詞工程原理及最佳實踐及最佳實踐鄭昊 Zheng HaoAI/ML Specialist SAPrompt Engineering(PE)Basics One/Few ShotZero ShotSome simple examplesPrompt Engineering(PE)Agenda-基本基本原理原理-通過提示詞文本提升通過提示詞文本提升LLM任務表現任務表現-通過思維結構進一步挖掘通過思維結構進一步挖掘LLM潛力潛力
2、-Claude PE 最佳實踐最佳實踐Prompt Engineering(PE)Basics Instruction 指令指令:您想讓模型執行的特定任務或指令(做什么,也可以放題干)Context 上下文上下文:涉及到的外部信息或附加上下文,可以引導模型產生更好的響應(背景知識)Input 輸入數據輸入數據:我們有興趣為其找到響應的輸入或問題(一般來說是題干、或選項)Output Indicator 輸出指示符輸出指示符:指示輸出的類型或格式(以何種形式做)*不是所有組件都需要出現在Prompt中,取決于任務定義及需要Elements of a PromptWhy Prompts can w
3、ork?PretrainSFTInputs:“once upon a time,.”“the AI model is powerful and.”“1+1=.”Inputs:“Write me a story happened long time ago.”“Tell me how AI model can empower our life.”“what is the result of 1+1?”PPO/DPO/RL為什么有效為什么有效?可簡單理解為可簡單理解為,訓練&對齊后,LLM記住了語料中的Query跟Response的共現共現;同時輸入的Query在訓練的語料中有出現出現;Promp
4、ts&labelsIn-context Learning*https:/arxiv.org/abs/2301.00234*https:/arxiv.org/abs/2111.02080貝葉斯視角貝葉斯視角“文文-文文”匹配匹配 分布匹配分布匹配如果指令復雜&抽象,LLM is powerful enough,不需要額外的訓練,能夠只通過若干帶有input-output樣例的prompt來學習到如何進行downstream task。有技巧的Prompt相當于在幫助LLM進行更豐富的回憶及聯想,并幫助In-context LearningTunable Hyper-params for Caus
5、al GenerationTop-K:每個時間步,保留Top K個字,re-scale后重采樣。增加了衍生Sequence的數量和可能性,但不好控制數量(容易采到long-tail)。Top-P(Nucleus Sampling):每個時間步,保留累積概率在P以內的字數,re-scale后重采樣。-K&P聯合調控聯合調控,提升生成效果提升生成效果(提升多樣性并減少預測分布平緩時采樣到長尾詞的概率提升多樣性并減少預測分布平緩時采樣到長尾詞的概率)Temperature:控制文本生成隨機性(re-scale時拉大gap)。值越低,生成的文本就越接近于模型已有的知識和樣本,可能會更加保守和可預測。當
6、Temperature值較高時,生成的文本就會更加隨機化、多樣化、新穎和出人意料。因為模型更容易從多個可能的單詞和短語中進行選擇和組合,從而創造出更加多樣化的文本。-對于對于基于事實的問答基于事實的問答,可以使用較低的溫度可以使用較低的溫度,以鼓勵更具事實性和簡潔的回答以鼓勵更具事實性和簡潔的回答;-對于詩歌生成或其他對于詩歌生成或其他創意任務創意任務,增加溫度可能會更適合增加溫度可能會更適合。-對于LLM的逐字生成,用怎樣的策略選擇?Greedy-Beam search-Accumulated prob.小結:訓練內容+Hyper params,構成了LLM的personality通過提示詞
7、文本提升通過提示詞文本提升LLM任務表現任務表現為什么LLM引發了企業積極開始重構應用?12分類分類歸納總結歸納總結信息提取信息提取問答問答閑聊閑聊預訓練預訓練Natual Language PE&ICLFoundation Model任務任務未未標記數據標記數據Foundation ModelClassic ML Model分類分類歸納總結歸納總結信息提取信息提取問答問答閑聊閑聊訓練訓練任務任務機器學習機器學習模型模型標記數據標記數據多任務適配多任務適配One Model for AllBoth Classic NLP tasks&Generative tasks經典語言任務經典語言任務文本
8、分類、NER、主題抽取、Summary.對話對話客服Bot、RAG代碼代碼生成生成輔助編程、程序語言轉換、代碼檢查、程序文檔編寫其他其他翻譯,內容審核,訓練大模型內容生成內容生成劇本/小說、營銷文案Role PlayVirtual Being,Multi-Agent for Gaming/Virtual SocietyLLM Use CasesHuman:ABC is an e-commerce company.You are a customer service agent in ABC,and you are classifying customer questions by type.B
9、esides,only give the option and do not give your own explanations.Categories are:(1)Refund,(2)Complaint,(3)Ask for help,(4)Cannot finish payment,(5)Cannot open account,.Customer question:QUESTION1.客服場景文本分類客服場景文本分類,取代取代Slot-Intent范式的范式的傳統傳統BOT2.UGC場景內容審核場景內容審核(Content Moderation),),取代傳統的取代傳統的關鍵詞匹配關鍵詞
10、匹配*Use Amazon Bedrock ClaudeHuman:You are a language model trained to detect inappropriate content in text.Here is the content:CONTENTIf the content refers to harmful,pornographic,or illegal activities,reply with(Y),and provide explanations.If not,reply with(N).Be aware that some users may attempt t
11、o bypass detection by altering offensive words with symbols,spaces,or elongated letters(e.g.,dog as dooooog,d.0.g,d.o_g,or lixu as lixo,etc),should also be identified and considered in your evaluation.Tips-避免幻覺避免幻覺-如果LLM不知道問題的答案,需要給予LLM明確的許可,讓其能夠回答“I dont know”。有時,trying to be helpful is much worse
12、than telling“I dont know”。What is the heaviest hippo ever recorded?The heaviest hippo ever recorded was a bull named Hubert,who weighed 5,000 kg or 11,023 lbs.Hubert was measured in Kenya in the early 20th century.Bad PromptBad ResponseAnswer the following question only if you know the answer or can
13、 make a well-informed guess;otherwise tell me you dont know it.What was the heaviest hippo ever recorded?Im sorry,I dont know the exact weight of the heaviest hippo ever recorded.That seems like a very specific and obscure piece of information.Good PromptGood Response*Tips-Cooperate with your Assist
14、ant“Human:請重新審閱重新審閱您的上述回復。您能發現存在的錯誤嗎?如果存在,請找出這些錯誤并進行必要的編輯?!薄癏uman:查看您剛剛生成的代碼。目前它沒有運行。您是否能夠看到可以糾正的語法錯誤?如果是這樣,請找出有問題的代碼部分并重新生成它?!盜teratively Self-criticismPrompt of prompt“Human:.。我現在可以使用什么提示來進一步幫助您完成此任務?”引導性引導性prompt“Human:我希望你編寫一個 Python 程序來管理我的客戶信息,這些信息存儲在AWS S3中。為了完成這項任務,請詢問我您需要我回答的任何問題?!蓖ㄟ^通過思維結構進
15、一步挖掘思維結構進一步挖掘LLM潛力潛力Least-to-MostHuman:我想寫一個美人魚的故事。先幫我生成一個簡短的人物概要,不要包含細節。Assistant:這里是一個簡短的美人魚故事人物概要,不包含細節:主人公:一位美麗的年輕美人魚公主 王子:一位英俊的人類王子海王:美人魚的父親,海洋之王巫婆:一個古老而神秘的巫師.船夫:王子所在的船上的水手Human:根據以上的故事標題和人物概要,生成一些故事發生的場景,不要包含細節。Assistant:好的,根據之前的概要,這里生成一些沒有細節的場景:-美人魚公主在海洋中遇見了人類王子的船只-美人魚公主救了落水的王子-美人魚公主迷戀上了王子,常常
16、潛在王子的船邊觀望-巫婆給了美人魚一個魔藥,可以讓她變成人類-海妖傷害了王子,美人魚舍生救了王子-美人魚無法再次變成美人魚,她的生命消逝了Human:基于上文的框架,生成一個長故事。Assistant:好的,這里是一個根據前面框架創作的較長美人魚故事:海洋深處,有一個美麗的海洋王國,那里居住著美人魚和其他海洋生物。王國的公主-艾麗兒,有著如陽光般燦爛的長發和美麗的魚尾,她是海洋中最美的美人魚。一天,艾麗兒游到海面上,看見一艘華麗的船駛過。船上站著一位英俊的王子,艾麗兒立刻被王子深深吸引,愛上了他。從那天起,艾麗兒經常躲在礁石后,遙望王子的船只,幻想著能跟王子在一起的生活。一天,海上刮起了狂風,
17、王子的船遇難了。其他水手都落海而亡,只有王子還抓著船桅的碎.*Use Amazon Bedrock Claude長小說CoT-Chain Of ThoughtCoT PromptingZero-shot CoT也可以直接使用自定義的reasoning step-復雜的(Numeric)Reasoning是LLM天然的弱項。而當人類面臨復雜的推理任務時(如多步驟的數學應用題),直覺上,我們會將任務或問題分解為子任務,然后將這些子任務鏈接在一起,形成最終答案。*https:/arxiv.org/abs/2201.11903Self-consistency CoT*https:/arxiv.org/
18、abs/2203.11171-觸發大模型的CoT,多次生成拿到多個reasoning path的Result。類似于人類的自然推理:如果多個不同的思路最終指向的是同一個答案,那選擇該答案有更大的概率是正確的。為了獲取多樣化reasoning paths,論文中采用T=0.5&k=40 for UL2-20B and LaMDA-137B,T=0.7,k=40 for PaLM-540B.總計10 runs&40 samples.ToT-Tree of Thought*https:/arxiv.org/abs/2305.10601-跟self-consistency CoT不同的是,ToT是生成
19、多個Reasoning Path/Plan,讓LLM vote之后,使用最佳的plan來進行進一步生成。ReAct-Combine Reasoning and Action-taking-Reasoning讓模型能夠歸納、跟蹤和更新action plan,而Action讓模型能夠從外部來源獲取額外信息。因此ReAct(Reasoning+Action)被應用于多種自然語言任務和決策任務,展示出優于SOTA的表現,提升了人類可解釋性和trustworthiness。*https:/arxiv.org/abs/2210.03629向向Agent進一步進一步迭代迭代-使用工具、外設來裝備LLM 20
20、23,Amazon Web Services,Inc.or its affiliates.All rights reserved.使用基礎模型構建和擴展生成式 AI 應用程序的最簡單方法模型選擇自定義大模型執行任務的代理安全和隱私負責任的人工智能海外區域可用Claude 2Claude 2Claude 2.1Claude 2.1Claude InstantClaude InstantJurassic-2 UltraJurassic-2 UltraJurassic-2 MidJurassic-2 MidTitan Text EmbeddingsTitan Text EmbeddingsTitan
21、 Multimodal EmbeddingsTitan Multimodal EmbeddingsTitan Text LiteTitan Text LiteTitan Text ExpressTitan Text ExpressTitan Image Generator Titan Image Generator Llama 2Llama 2Llama 2 13BLlama 2 13BLlama 2 70BLlama 2 70BCommand+Embed Command+Embed Cohere Command LightCohere Command LightCohere Embed En
22、glish Cohere Embed English Cohere Embed Multilingual Cohere Embed Multilingual 廣泛的模型選擇Amazon Amazon Stable Diffusion XL1.0Stable Diffusion XL1.0Claude專為對話、總結、問答、工作流程自動化、編碼等而設計。支持復雜的對話和創造性的內容生成,以及詳細的指令跟蹤。支持Max tokens 100K版本,更大的上下文長度減少了對基礎模型進行微調的需要。Tokens長度:Max 100K支持語種:多語種多語種Claude 2Claude instant每個對
23、話任務接受最多 100K token,意味著一次可處理數百頁文本,甚至一整本書可用于大量生產數據、論文等提供更多的context改善輸出質量編程、數學和推理方面的改進CodexHumanEval Python編程測試中得分為71.2%(Claude V1得分為56%)GRE 閱讀和寫作部分達到 90 百分位安全性得到改進,內容輸出更無害,盡量避免生成具有冒犯性或危險內容(Constitutional AI)在Red-teaming評估中,性能優于 Claude V1.3達到2 倍亮點亮點業界領先的 200K token上下文窗口Claude 2.1相對于Claude 2.0,Claude2.1
24、上下文窗口提升2倍Bedrock上的單次提示詞完成成本降低 25%對于長上下文使用場景中顯著提升準確率更多輸入,支持大量信息處理,例如整個代碼庫,財務報表,內部數據集等Claude 2.1幻覺率下降50%對于開放域對話場景,虛假陳述減少一半對于基于文檔的問答場景(RAG),幻覺率降低3-4倍Claude越來越清楚自己不知道什么,因此可能直接回答不知道或提出異議而不是產生幻覺;Claude 2.1Claude 2.1System prompts提供定制、穩定的輸出Claude 2.1為Claude提供自定義的指示,例如任務描述,輸出風格,生成規則,扮演角色等,提升客戶場景的表現;主要優勢包括:在
25、更長時間的對話中保持其角色特性;提高Claude遵守規則和指令的能力;表現出更具創造性的自然行為Claude 2.1提示詞工程最佳實踐Guidance(Text):Anthropic Claude“Human人類:”/“Assistant 助手:”格式Claude is trained on alternating“Human:”/“Assistant:”dialogue:Human:InstructionsAssistant:Claudes responseFor any API prompt,you must start with“Human:”and end with“Assistant
26、:”Also include two new lines before eachHuman:Why is the sky blue?Assistant:Human:Why are sunsets orange?Assistant:*symbols above shown for illustrationExamples:清晰且直接指示Claude responds best to clear and direct instructionsWhen in doubt,follow the Golden Rule of Clear Prompting:show your prompt to a f
27、riend and ask them if they can follow the instructions themselves and produce the exact result youre looking forHuman:Write a haiku about robotsAssistant:Here is a haiku about robots:Metal bodies moveCircuits calculate tasksMachines mimic lifeExample:Human:Write a haiku about robots.Skip the preambl
28、e;go straight into the poem.Assistant:Metal bodies moveCircuits calculate tasksMachines mimic life指定角色(角色提示 Role Prompting)Claude sometimes needs context about what role it should inhabitAssigning roles changes Claudes response in two ways:Improved accuracy in certain situations(such as mathematics)
29、Changed tone and demeanor to match the specified roleHuman:How is the U.S.President elected?Assistant:The president is not voted on directly by the people.Instead,each state gets a certain number of electors based on its population.The states with more people get more electors.When people vote on El
30、ection Day in November,they are actually voting for which candidate they want their states electors toExample:Human:You are a kindergarten teacher.How is the U.S.President elected?Assistant:The president is the leader of our country.Every four years,we have a presidential election to choose a new pr
31、esident or re-elect the current president.During the election,people called candidates try to become president by getting votes.On Election Day in November,people go to voting places使用 XML 標簽雜亂無章的提示對Claude很難理解就像章節標題和主題可以幫助人們更好地跟蹤信息一樣,使用 XML 標標簽簽 等可以幫助等可以幫助 Claude 理解提理解提示的結構示的結構Human:Hey Claude.Show
32、up at 6AM because I say so.Make this email more polite.Assistant:Dear Claude,I hope this message finds you wellExample:Human:Hey Claude.Show up at 6AM because I say so.Make this email more polite.Assistant:Good morning team,I hope you all had a restful weekend我們建議使用 XML 標簽,因為 Claude 在XML 標簽上經過特定訓練PE
33、最佳實踐-實際案例 1(Text):51TranslationPE最佳實踐-實際案例 2(Text):52Comment ClassificationHuman:Assume you are an analyst in an e-commerce company.Your work is to analyze customer messages and classify each messageinto one categoryHere are all possible categories:Here are descriptions to each categoryHere is a cus
34、tomer messagePut your predicted categorynnAssistant:Human:Assume you are an analyst in an e-commerce company.Your work is to analyze customer messages and classify each messageinto one categoryHere are all possible categories:labelsHere are descriptions to each categorydescHere is a customer message
35、textPut your predicted category in tagnnAssistant:based on the customer message and category descriptions,I would classify this message into the Human:Assume you are an analyst in an e-commerce company.Your work is to analyze customer messages and classify each messageinto one categoryHere are all p
36、ossible categories:labelsHere are descriptions to each categorydescHere is a customer messagetextPut your predicted category in tagnnAssistant:based on the customer message and category descriptions,I would classify this message into the ACC:65%ACC:73%ACC:82%XMLDescriptionOptimizationSeparating data
37、 from instructions將輸入數據直接包含在提示中可能會使提示過長且難以排除故障將提示結構與輸入數據分離,可達到效果有:更容易編輯容易編輯提示詞本身處理多個數據集的速度更快速度更快Human:I will tell you the name of an animal.Please respond with the noise that animal makes.ANIMALAssistant:Example:Tip:雖然并不總是必需的,但我們建議使用 XML 標簽來分離數據集,以便Claude能更輕松地解析CowDogSealInput dataPrompt template Pl
38、ease respond with the noise that animal makes.Cow Please respond with the noise that animal makes.Dog Please respond with the noise that animal makes.SealComplete prompt54Human:Im going to give you a document.Then Im going to ask you a question about it.Id like you to first write down exact quotes o
39、f parts of the document that would help answer the question.Here is the document context:contextHere is the first question:question First,answer the question.Do not include or reference quoted content verbatim in the answer.Dont say“According to Quote 1”when answering.Instead make references to quot
40、es relevant to each section of the answer solely by adding their bracketed numbers at the end of the relevant sentences.Then,find the quotes from the document that are most relevant to answering the question,and then print them in numbered order.Please add a n for each quote.Quotes should extract fr
41、om the original context and you can not modify them.If there are no relevant quotes,write“No relevant quotes”instead.Thus the format of your overall outputs should look like whats shown between the tags and include in tags.Make sure to follow the formatting and spacing exactly.Company X earned$12 mi
42、llion in 2021.1 Almost 90%of it was from widget sales.2Relevant quotes:1 Company X reported revenue of$12 million in 2021.n2 Almost 90%of revenue came from widget sales,with gadget sales making up the remaining 10%.nIf the question cannot be answered by the document,say so.Answer the question immedi
43、ately without preamble.Once again,my question is:question Assistant:RAG with ReferenceOutput IndicatorInputDataContextInstructionPE最佳實踐最佳實踐-實際案例實際案例 3(Text):格式化輸出&speaking for Claude你可以通過以下方式讓 Claude 說出你需要的:明確所需的確切輸出格式確切輸出格式 Speaking for Claude 明確在Prompt中寫出Claude回復的開頭(在“Assistant:”之后)Human:Please wr
44、ite a haiku about a cat.Put it in tags.Assistant:Example:Curious little catPouncing on strings and lasersNaps in sunny spotsPromptClaudes response使用示例樣例樣例是讓 Claude 表現好的最有效工具 確保給 Claude 提供常見情況的樣例常見情況的樣例通常,更多示例更多示例=更可靠的響應更可靠的響應(副作用是有較高延遲和消耗更多Token)樣例樣例是讓 Claude 表現好的最有效工具 Human:Quote:“When the reasonin
45、g mind is forced to confront the impossible again and again,it has no choice but to adapt.”N.K.Jemisin,The Fifth SeasonAuthor:N.K.JemisinQuote:“Some humans theorize that intelligent species go extinct before they can expand into outer space.If theyre correct,then the hush of the night sky is the sil
46、ence of the graveyard.”Ted Chiang,ExhalationAuthor:Assistant:Ted ChiangExample:Sometimes,examples are so powerful that they dont even need to be accompanied by task instructions!Cooperate&cowork with your LLM AssitantRecommendLimitations提供有價值的Few-Shot Samples進行實驗和迭代,留出測試集來進行效果驗證組合不同的Thought Structur
47、e進一步挖掘潛力(Chaining,Bootstraping,Boosting,Bagging,Stacking,.)-分治-Break complex tasks into subtasks -指出前序產出的問題并讓LLM進一步迭代Hallucination無法絕對避免(可以通過設計Prompts,提供Context來削弱)LLM訓練成本的原因,用于其訓練的內容天然是有滯后的(可以通過RAG來注入新內容)LLM無法執行Action,但可以規劃Action(使用Tool和Chain把LLM變成LLM Agent e.g.Amazon Bedrock Agent)Equip LLM with Tools&Chains 總結總結Good Prompting Mattershttps:/