當前位置：首頁 > 報告詳情

大模型能力對齊 - 桂韜.pdf

上傳人：哆哆編號：186313 2024-11-01 PDF PDF 48頁 6.93MB

該報告所屬合集： 中國計算機學會 (CCF)決策智能會議暨RL China 2024嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/48

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《大模型能力對齊 - 桂韜.pdf》由會員分享，可在線閱讀，更多相關《大模型能力對齊 - 桂韜.pdf（48頁珍藏版）》請在三個皮匠報告上搜索。

1、大模型智能體能力對齊與超越Tao GuiFudan University2024/10/152Fudan NLP LabWhat is An Agent?If they find a parrot who could answer to everything,I would claim it to be an intelligent being without hesitation.Denis Diderot,1875Agent in PhilosophyAgency-individuality-asymmetry-normativityGenerally Speaking:Entities

2、with the capacity to act.Narrowly Speaking:Entities possessing desires,beliefs,intentions,and the ability to take actions.3Fudan NLP Lab3OpenAIs Mission&Goalhttps:/ thus building a living metric which measures how well an agentcan achieve its users intended goal in a wide range of environments.4Fuda

3、n NLP LabWhat is AI Agent?Agents:Artificial entities that are capable of perceiving their surroundings using sensors,making decisions,and then taking actions in response using actuators.Perceiving surroundingsMaking decisionsTaking actions1 Russell,S.J.Artificial intelligence a modern approach.Pears

4、on Education,Inc.,2010.2 Wooldridge,M.J.,N.R.Jennings.Intelligent agents:theory and practice.Knowl.Eng.Rev.,10(2):115152,1995.為誰服務？5Fudan NLP Lab56Fudan NLP Lab67Fudan NLP LabTraining language models to follow instructions with human feedbackWhat is Alignment?HelpfulFollow instructionsAsk relevant f

5、ollow-up questions and obtain necessary detailsRe-direct ill-informed requestsHonestKnow who it is,and what can/cannot it do/knowHarmlessRefuse inappropriate requests8Fudan NLP LabTraining language models to follow instructions with human feedbackTwo Steps of RLHF AlignmentAlignmentTrainingPreferenc

6、esModeling9Fudan NLP LabHard for Alignment TrainingLanguage Environment;Reward Design;Optimization Algorithm10Fudan NLP Lab1.Evaluation Metrics for Monitor Training Process2.Implement Details in PPO3.PPO-max SetupPPO Max for Stable TrainingTechnical reportSecrets of RLHF in Large Language Models Par

7、t I:PPO11Fudan NLP LabLLMs with Competitive Self-PlayCompetitive self-play|OpenAI12Fudan NLP LabLLMs with Competitive Self-PlayToward Optimal LLM Alignments Using Two-Player Games13Fudan NLP Lab1314Fudan NLP Lab1415Fudan NLP Lab先進行規劃規劃，生成Plan，然后再回答O1系列模型表現出的特點16Fudan NLP LabO1效果test time scaling，XOT

8、17Fudan NLP LabProcess SupervisionImproving mathematical reasoning with process supervision|OpenAI18Fudan NLP LabLLMs with Process Supervision18https:/ from a Single Demonstration19Fudan NLP LabLLMs with Process SupervisionReversed curriculum makes learning easier20Fudan NLP LabLLMs with Process Sup

9、ervisionR3:Reinforcement learning for Reasoning with Reversed curriculum,ICML 202421Fudan NLP LabLLMs with Environment FeedbackStepCoder:Improving Code Generation with Reinforcement Learning from Compiler Feedback,ACL 202422Fudan NLP Lab2223Fudan NLP Lab2324Fudan NLP LabLLaVAThis sym bolic represent

10、ation allow s us to encode the im age as an LLM-recognizable sequence.W euse C O C O im ages 28and generate three types ofinstruction-follow ing data.O ne exam ple pertypeis show n in the bottom block ofTable 1.Foreach type,w efirstm anually design a few exam ples.They are the only hum an annotation

11、s w e have during data collection,and are used as seed exam plesi n i n-cont ext-l earni ng t o query G PT-4.C onversation.W e design a conversation betw een the assistantand a person asking questionsaboutthis photo.The answ ers are in a tone as ifthe assistantis seeing the im age and answ eringthe

12、question.A diverse setofquestionsare asked aboutthe visualcontentofthe im age,includingthe objecttypes,counting the objects,objectactions,objectlocations,relative positions betw eenobjects.O nly questions thathave definite answ ers are considered.Please see Table 10 forthedet ai l ed prom pt.D etail

13、ed description.To include a rich and com prehensive description foran im age,w e create alistofquestions w ith such an intent.W e prom ptG PT-4 then curate the list,w hich is show inTable 9 in the A ppendix.Foreach im age,w e random ly sam ple one question from the listto askG PT-4 t o generat e t h

14、e det ai l ed descri pt i on.C om plex reasoning.The above tw o types focus on the visualcontentitself,based on w hichw e furthercreate in-depth reasoning questions.The answ ers typically require a step-by-stepreasoni ng processby fol l ow i ng ri gorousl ogi c.W e collect158K unique language-im age

15、 instruction-follow ing sam ples in total,including 58K inconversations,23K in detailed description,and 77k in com plex reasoning,respectively.W e ablatedthe use of C hatG PT and G PT-4 in our early experim ents,and found thatG PT-4 can consistentlyprovi dehi gherqual i t y i nst ruct i on-fol l ow

16、i ng dat a,such asspat i alreasoni ng.4Vi sualInstructi on Tuni ng4.1A rchi tectureThe prim ary goalis to effectively leverage the capabilities ofboth the pre-trained LLM and visualm odel.The netw ork archtecture is illustrated in Figure 1.W e choose LLaM A as ourLLMf()param eterized by,as its effec

17、tiveness has been dem onstrated in severalopen-source language-onlyi nst ruct i on-t uni ng w orks.43,45,34.Vision EncoderWfProjectionXvZvHvImageLanguage InstructionLanguage Response HqXqXaLanguage ModelFi gure 1:LLaVA net w ork archi t ect ure.For an input im ageXv,w e consider the pre-trained C LI

18、P visual encoder V iT-L/14 36,w hichprovides the visualfeatureZv=g(Xv).The grid features before and afterthe lastTransform erlayerare consi dered i n ourexperi m ent s.W econsi dera si m pl e l i nearl ayert o connecti m age feat uresi nt othe w ord em bedding space.Specifically,w e apply a trainabl

19、e projection m atrixWto convertZvintol anguage em beddi ng t okensHq,w hi ch havet hesam e di m ensi onal i t y oft hew ord em beddi ng spacei n t he l anguage m odel:Hv=WZv,w i t h Zv=g(Xv)(1)Thus w e have a sequence ofvisualtokensHv.N ote thatoursim ple projection schem e is lightw eightand cost-e

20、ffective,w hich allow s us to iterate data centric experim ents quickly.M ore sophisticated(butexpensive)schem es to connectthe im age and language representations can also be considered,such as gated cross-attention in Flam ingo 2and Q-form erin B LIP-2 25,orothervision encoderssuch as SA M 21thatp

21、rovide object-levelfeatures.W e leave exploring possibly m ore effective andsophi st i cat ed archi t ect ure desi gnsforLLaVA asfut ure w ork.4多模態Vocabulary（token embedding alignment）PTW:use CC3M 595k image-textSFT（ViT）：150kCLIP ViT-L/14LLaMA-13BVicuna-13B25Fudan NLP Lab空間能力欠缺W hat s“up”w i th vi s

22、i on-l anguage m odel s?Investi gati ng thei r struggl e w i th spati alreasoni ngA m i ta K am ath1Jack H essel2K ai-W eiC hang11U ni versi t y ofCal i forni a,LosA ngel es2A l l en Inst i t ut e forA Ikamatha,kwchangcs.ucla.edu,jackhallenai.orgA bstractR ecentvision-language(V L)m odels are pow-er

23、ful,butcan they reliably distinguish“right”from“left”?W e curate three new corpora toquantify m odelcom prehension of such basicspatialrelations.These tests isolate spatialrea-soning m ore precisely than existing datasetslike V Q A v2,e.g.,ourW hat sU pbenchm arkcontains sets ofphotographs varyingon

24、lythespatialrelations ofobjects,keeping theiriden-tityfixed(see Figure 1:m odels m ustcom pre-hend not only the usual case of a dog undera table,butalso,the sam e dogon top ofthesam e table).W e evaluate 18 V L m odels,find-ing that all performpoorly,e.g.,B LIPfine-tuned on V Q A v2,w hich nears hum

25、 an parityon V Q A v2,achieves 56%accuracy on ourbenchm arks vs.hum ans at99%.W e concludeby studying causes ofthis surprising behavior,finding:1)that popular vision-language pre-training corpora like LA IO N-2B contain littlereliable data forlearning spatialrelationships;and 2)thatbasic m odeling i

26、nterventions likeup-w eighting preposition-containing instancesorfine-tuning on ourcorpora are notsufficientto address the challenges ourbenchm arks pose.W e are hopefulthatthese corpora w illfacilitatefurther research,and w e release our data andcode athttps:/ onPre-trained vision-language m odels

27、perform w ellon com plex tasks such as V Q A v2(G oyal et al.,2016)and N ocaps(A graw al et al.,2019),evenin the zero-shot setting(Li et al.,2023).H ow-ever,recent w ork has re-surfaced a concern thathas long plagued vision-language m odels(Y atskaretal.,2016;Johnson etal.,2017):new m ultim odalm od

28、elsstillexhibitpoorbehavioron sim ple taskslike attribute attachm ent,counting,etc.(Y am adaetal.,2022;Thrush etal.,2022;Yuksekgonuletal.,2023;Parcalabescu etal.,2021).D espite im prove-m ents,m odels still fail to reliably capture evenA dog ona tabl e!A dog ri ght ofa tabl eA dog l ef t ofa tabl eA

29、 dog ri ght ofa tabl eA dog ri ght ofa tabl eA dog ri ght ofa tabl eA dog undera tabl eA dog ri ght ofa tabl eFigure 1:W e propose three tightly controlled bench-m arks to assess m odelcapacity forfine-grained spatialreasoning,show ing thatpopularvision-language m od-els fallfarbehind hum an perform

30、 ance w hen asked toselectthe correctspatialrelation betw een tw o objects inan i m age(realexam pl esshow n).basic spatialfactors of im ages,a prerequisite form ore preci se and com pl ex reasoni ng benchm arks.Butw hy?In this w ork,w e study vision-languagem odelsperform ance on basic spatial rela

31、tions,such as“leftof”and“rightof”.Existing bench-m arks w hich aim to operationalize spatialunder-standing such as V Q A v2 and G Q A(H udson andM anning,2019)often conflate the evaluation ofspatial reasoning w ith other types of reasoning,such as in the G Q A question“Is there a w om an tot hel eft

32、oft he person t hati sw eari ng aw et sui t?”.arXiv:2310.19785v1 cs.CL 30 Oct 202326Fudan NLP Lab細節能力不足Tabl e 1:Com pari son ofvi sualt okeni zersofVi T-B w i t h di fferentpret rai ni ng st rat egi es.The bestresul ti sbol d w hi l et hesecond besti sunderl i ned.Joi ntSupervi sedVi sual#Pret rai n

33、i ngV Q ACapt i oni ngO CM CIAvgTuni ngTokeni zerIm agesA ccCID ErSPICEA ccA ccFul l yD ei T 161.28 M48.365.815.937.583.658.8Sel fD IN O 191.28 M50.145.013.546.580.855.6M A E 181.28 M48.437.311.847.582.753.4D IN O v2 20142 M51.367.916.147.086.063.1W eakl yCLIP 17400 M52.269.316.642.586.062.5Ful l yD

34、 ei T 161.28 M50.738.410.041.086.954.3Sel fD IN O 191.28 M47.354.114.544.586.658.1M A E 181.28 M48.948.014.247.588.758.2D IN O v2 20142 M50.549.613.043.584.156.9W eakl yCLIP 17400 M47.764.215.445.588.061.42.2C om pari ng Vi sualTokeni zersO n G V TBench,w e eval uat e vi sualt okeni zers w i t h t h

35、e sam e archi t ect ure(Vi T-B 34)butdi ffer-entpret rai ni ng st rat egi es,i ncl udi ng ful l y-supervi sed(D ei T 16),sel f-supervi sed(D IN O 19,D I-N O v2 20,M A E 18)and t ext-gui ded w eakl y supervi sed(CLIP 17)pret rai ni ng.Based on t heresul t si n Tabl e1,w earri veatt he fol l ow i ng c

36、oncl usi ons.Ful l y/w eakl ysupervi sed m odel scapturem oresem anti csthan sel f-supervi sed ones,butthegapi snarrow ed by scal i ng up the pre-trai ni ng dataset.W i t h t okeni zerspret rai ned on rel at i ve sm al l-scal e dat aset(i.e.,Im ageN et-1k 35w i t h 1.28M i m ages),D ei T dem onst ra

37、t esbet t eri m age capt i on-i ng perform ance(65.8 CID Er)t han sel f-supervi sed m odel sD IN O(45.0)and M A E(37.3),w i t houtj oi nt l y t uni ng t he vi sualt okeni zer.H ow ever,w i t h 142M i m agesforpret rai ni ng,t hesel f-supervi sedm odelD IN O v2 out perform s t he supervi sed D ei T o

38、n i m age capt i oni ng(67.9)and V Q A(51.3),and i s onl y i nferi ort o CLIP w hi ch i s pret rai ned w i t h w eak supervi si on from a l arge-scal e dat asetw i t h 400M i m age-t extpai rs.Thi si ndi cat est hatsupervi si on i sbenefici alforsem ant i crepresent at i oncapabi l i t y,butt hi sca

39、n al so em ergefrom l arge-scal epret rai ni ng w i t h sel f-supervi si on.Sel f-supervi sed m odel s are better at fine-grai ned percepti on,w here patch-l evelsupervi si oni s parti cul arl y effecti ve.O n fine-grai ned vi sual underst andi ng t asks,i.e.,O C and M CI,sel f-supervi sed m odel sd

40、em onst rat econsi st ent l y bet t erperform ancet han t hosew i t h supervi si on.W hent hey are j oi nt l y t uned on t he i nst ruct i on dat aset,t hei r O C and M CI perform ance are m ost l yboost ed,i ndi cat i ng t hei rfine-grai ned vi sualpercept i on capabi l i t y get s i m proved.A m o

41、ng al lt hesel f-supervi sed m odel s,M A E achi eves t he bestperform ance,i ndi cat i ng t he pat ch-based supervi-si on i spart i cul arl y effect i vefori m provi ng fine-grai ned vi sualunderst andi ng.Tuni ng sem anti c-ri ch vi sualtokeni zer l eadsto sem anti c l osson sm al l-scal e i nstru

42、cti on tuni ngdataset.W hen t het okeni zeri sj oi nt l yopt i m i zedont hei nst ruct i on t uni ng dat aset,t heri ch sem ant i csobt ai ned from l arge-scal e pret rai ni ng i n CLIP and D IN O v2 have not i ceabl y dropped(e.g.,CLIPV Q A 52.2 47.7 and D IN O v2 capt i oni ng 67.9 49.6).W econj e

43、ct uret hi si sduet o t herel at i vel ysm al lscal e ofouri nst ruct i on dat aset(5M142M).A ssuch,form odern M LLM st hatare oft ent uned on sm al l-scal e and hi gh-qual i t y i nst ruct i on dat aset s7;8,j oi nt l y t uni ng t he vi sualt okeni zerm ay notbeagood opt i on.3U ni fyi ng Sem anti

44、c and Fi ne-grai ned Vi sualU nderstandi ng3.1C LIP w i th R egi on-based Trai ni ngThe general i stM LLM scal lfora versat i l evi sualt okeni zert hatcoul d properl y representan i m age scont entatm ul t i pl e l evel s.H ow ever,based on t he resul t s i n Tabl e 1,none ofexi st i ng pret rai ni

45、 ngm et hodsl eadst o agood vi sualt okeni zert hatexcel satbot h sem ant i cand fine-grai ned vi sualpercep-427Fudan NLP Lab復旦眸思（MouSi）28Fudan NLP Lab復旦眸思（MouSi）不同專家各有所長，是否能專家協同合作？29Fudan NLP Lab復旦眸思（MouSi）30Fudan NLP Lab復旦眸思（MouSi）31Fudan NLP LabAnyGPT:OverviewMultimodal TokenizersImage:SeedSpeech

46、:SpeechTokenizerMusic:EncodecAutoregressive LMDe-TokenizersImage:Diffusion ModelSpeech:Soundstorm+SpeechTokenzier DecoderMusic:Encodec32Fudan NLP LabMultimodal Alignment PretrainingImage-to-TextSpeech-to-TextMusic-to-TextText-to-ImageText-to-SpeechText-to-MusicImage-text Interleaved data33Fudan NLP

47、Lab多模態大模型34Fudan NLP Lab日益增長的信息無障礙需求截至2023年底，中國殘疾人中視力殘疾人最多，達到2856.5萬人。2023年我國60歲以上老年人口達到3億人左右，并且在未來一段時期，老年人口規模還將不斷增加。視障者老年人35Fudan NLP Lab基于大模型的手機中控智能體36Fudan NLP Lab聽見世界Agent3637Fudan NLP Lab3738Fudan NLP Lab3839Fudan NLP Lab3940Fudan NLP Lab4041Fudan NLP LabCapabilityHuman LevelSFTRLHFCollaboratio

48、n Week2Strong Env FeedbackHuman SupervisionLLMs with Various Feedback模型能力增強與人類監督減少的矛盾42Fudan NLP LabLLMs with Environment FeedbackImitation LearningExploration LearningEvolving Large Language Model based Agents across Diverse Environments43Fudan NLP LabLLMs with Environment FeedbackInteractive Train

49、ing PlatformDiverse environments and tasks that allow the agents to evolve dynamically and comprehensively,rather than being confined to an isolated world,which may limitgeneralization.Base agent with basic abilities and prior knowledgeWe need a trajectory set of an appropriate size to train a base

50、agent with preliminary instruction-following abilities and knowledge.This facilitates further exploration as in diverse,complex environments,it would be extremely inefficient for an agent to learn everything from scratch through trial and errorSelf-Evolving AlgorithmAn effective and flexible evoluti

51、on method can adapt to environments of varying difficulty and elicit the generalizing ability of LLM-based agents.This involves how the agent interacts with the environment and how it utilizes the feedbackThree pillars to achieve our goal44Fudan NLP LabLLMs with Environment FeedbackAGENTGYM2.Behavio

52、ral Clone4.Multi-taskEvaluation3.Exploring&LearningEnv ServersBaseTool UsingMovieWeatherTodoSheetTextCraftWebArenaWebShopBabyAIAlfWorldScienceWorldBIRD-SQLMAZETrajectoryFormatsEnv ClientsImitationAgentEvolPerformanceSingle TaskMulti-TasksGeneral DataReasoning and ActingInstruction:Find me a pillow w

53、ith blue andThought:I think I should search for pillowsAction:searchpillowObservation:Results:Sep Item 1 Sep General Domain ChatInstruction:Hello!Can you translate this into Chinese for me?Response:Sure!Heres the translationEvolveExplorationFeedbackWordleWebWebShopWebArenaEmbodiedAlfWorldScienceWorl

54、dBabyAIGameTextCraftMAZE/WordleToolWeather/TodoAcademiaMovie/SheetCodeBIRD-SQLHTTPHTTPHTTPHTTPHTTP1.Data PrepareAgentGym:Online Interactive Training and Evaluation Platform45Fudan NLP Lab4546Fudan NLP Lab4647Fudan NLP Lab47SPA-VLAgentG YMTool Sw ordLLMReasoning/PlanningRealityAlignmentR3StepC oder眸思聽世界Stable PPOGPOThanksFudanNLP

相關圖表

本文主要探討了大型模型智能體能能力對齊與超越的研究。首先，介紹了什么是智能體，從哲學角度闡述了代理的概念，包括個體性、不對稱性和規范性。其次，提到了OpenAI的使命和目標，即構建一個能衡量智能體在各種環境中實現用戶目標的能力的活度量。然后，詳細解析了AI代理的概念，即使用傳感器感知周圍環境，做出決策，并使用執行器采取行動。文章還討論了如何訓練語言模型以遵循指令并使用人類反饋，以及AI代理的兩種主要類型：基于指令的代理和基于任務的代理。此外，還介紹了RLHF對齊的兩種步驟，以及LaVA網絡架構。最后，文章提到了多模態大模型在信息無障礙需求方面的應用，以及我國視障者和老年人的相關數據。

"大型模型智能體能如何應用？" "AI智能體如何實現與人類的協同合作？" "多模態大模型如何助力信息無障礙需求？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站