毛宇航_RLChina23 - 周日上午 - 毛航宇 - 從 強化學習(多)智能體 到 大語言模型(多)智能體(1)_watermark.pdf

編號:155525 PDF 35頁 2.73MB 下載積分:VIP專享
下載報告請您先登錄!

毛宇航_RLChina23 - 周日上午 - 毛航宇 - 從 強化學習(多)智能體 到 大語言模型(多)智能體(1)_watermark.pdf

1、從 強化學習(多)智能體到 大語言模型(多)智能體1毛航宇商湯科技RLChina2023 “大模型與AI Agent”目錄SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11SingleMultiDRL TRL LLM-based Agent SEIHAISEIHAI:A Sample-efficient

2、Hierarchical AI for the MineRL CompetitionMotivation驗證agents在Open-ende環境中的不斷學習能力成為AI的一個重要方向MineCraft成為天然的“演練場”SEIHAI是第一個在NeurIPS MineRLCompetition中完全learning-based達到“鐵器時代”的agentMineCraft難點item依賴、稀疏獎勵+長episode、無任何語義SEIHAISEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competitiontraining the

3、 scheduler boils down to a classification taskSEIHAISEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competition目錄SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11Single

4、MultiDRL TRL LLM-based Agent Gated-ACMLLearning Agent Communication under Limited Bandwidth by Message PruningMotivationMulti-agent communication是個很古老的研究課題,研究how、what、to whom to communicate但實際問題中通信帶寬有限,如何在limited bandwidth下進行通信?Gated-ACMLLearning Agent Communication under Limited Bandwidth by Messag

5、e Pruning如何設置T=動態(如下圖)和靜態(?)將limited bandwidth轉化為message pruningmessage pruning轉化為binary classification如何設置T=動態(如下圖)和靜態(?)Gated-ACMLLearning Agent Communication under Limited Bandwidth by Message PruningNCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement LearningMotivationMulti-agent

6、怎么才能像人一樣很好的合作?人在合作時有什么特性?認知一致性!NCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement LearningNCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement Learning一致性近似變分推理目錄SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAA

7、I20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11SingleMultiDRL TRL LLM-based Agent TIT/PDiTTransformer in Transformer as Backbone for Deep Reinforcement LearningPDiT:Interleaving Perception and Decision-making Transformers for Deep Reinforcement LearningMotivation在DRL時期,網絡結構至關重要,Du

8、eling Network和Value Iteration Network獲NIPS/ICML best paper在TRL事情,網絡結構同樣至關重要?找一個適配RL(perception-decision)的Transformer結構!TIT/PDiTTransformer in Transformer as Backbone for Deep Reinforcement LearningPDiT:Interleaving Perception and Decision-making Transformers for Deep Reinforcement LearningMethodsFro

9、m DT to vanilla-PDiT to PDiT(perception-decision interleaving Transformer)TIT/PDiTTransformer in Transformer as Backbone for Deep Reinforcement LearningPDiT:Interleaving Perception and Decision-making Transformers for Deep Reinforcement LearningSTEERStackelberg Decision Transformer for Asynchronous

10、Action Coordination in Multi-Agent SystemsMotivation多智能體決策范式:同步or異步?STEERStackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems目錄SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmi

11、t to AAAI24Arxiv 23.05LLaMACArxiv 23.11SingleMultiDRL TRL LLM-based Agent TPTUTPTU:Large Language Model-based AI Agents for Task Planning and Tool UsageTPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world SystemsMotivationLLM已經初步具備通識能力,可以認為是通用的“世界模型”LLM如何賦

12、能智能體?(從RLer角度看)關鍵點在什么地方?Task Planning(類似于RL中的long-term decision)!Tool Usage(類似于RL中的external environment)!TPTUTPTU:Large Language Model-based AI Agents for Task Planning and Tool UsageTPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world SystemsTPTUTPTU:Larg

13、e Language Model-based AI Agents for Task Planning and Tool UsageTPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world SystemsExperiments3.2 Evaluation on Task Planning Ability3.2.1 TPTU-OA:Tool Order Planning3.2.2 TPTU-OA:Tool Order Planning and Subtask De

14、scription Generation3.2.3 TPTU-OA:The Planning of Tool-Subtask Pair3.2.4 TPTU-OA:The Planning of Tool-Subtask Pair with Unrelated Tools3.2.5 TPTU-SA:The Planning of Tool-Subtask Pair Generation3.3 Evaluation on Tool Usage Ability3.3.1 The Effectiveness of Single Tool Usage3.3.2 TPTU-OA and TPTU-SA:T

15、ool Usage for Multiple Tools3.4 Insightful ObservationsTPTU-V2TPTU:Large Language Model-based AI Agents for Task Planning and Tool UsageTPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world SystemsMotivationLLM已經初步具備通識能力,可以認為是通用的“世界模型”LLM如何賦能智能體?(從RLer角度看)關

16、鍵點在什么地方?Task Planning(類似于RL中的long-term decision)!=真實系統task太復雜Tool Usage(類似于RL中的external environment)!=真實系統tool多、tool functionality語義交叉TPTU-V2TPTU:Large Language Model-based AI Agents for Task Planning and Tool UsageTPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in

17、 Real-world SystemsTPTU-V2TPTU:Large Language Model-based AI Agents for Task Planning and Tool UsageTPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world SystemsLLaMACControlling Large Language Model-based Agents for Large-Scale Decision-Making:An Actor-Cri

18、tic ApproachMotivationLLM已經初步具備通識能力,可以認為是通用的“世界模型”LLM如何賦能多(large-scale)智能體?(從MARLer角度看)關鍵點在什么地方?Exploration and ExploitationLLM HallucinationToken EfficiencyLLaMACKey Design-TripletCritic1.將探索和利用放到centralized critic,N-agent EE轉變為2-agent EE,降低探索成本,同時減少non-stationary。2.通過accessor來平衡探索和利用,修復critic 1/2幻

19、覺帶來的問題。3.Memory Redundant Information Filtering和Internal/External Feedback,使信息更加精簡準確,減少迭代次數,進而減少token數量。Controlling Large Language Model-based Agents for Large-Scale Decision-Making:An Actor-Critic ApproachLLaMACControlling Large Language Model-based Agents for Large-Scale Decision-Making:An Actor-C

20、ritic ApproachLLaMACControlling Large Language Model-based Agents for Large-Scale Decision-Making:An Actor-Critic ApproachLLaMACControlling Large Language Model-based Agents for Large-Scale Decision-Making:An Actor-Critic Approach總結SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS

21、24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11SingleMultiDRL TRL LLM-based Agent 介紹了從【強化學習(多)智能體】到【大語言模型(多)智能體】的范式轉變,以及一些關鍵技術思考SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MA

22、RLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11Single-AgentMulti-AgentLarge-scaleDRL TRL LLM-based Agent MF-MARLICML20DebateRole PlayingArxiv?范式越來越固定范式越來越固定結論SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubm

23、it to AAAI24Arxiv 23.05LLaMACArxiv 23.11研究可行性-未來潛力 Single-AgentMulti-AgentLarge-scaleDRL TRL LLM-based Agent MF-MARLICML20DebateRole PlayingArxiv?技術發展方向范式越來越固定參考論文SEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competition.DAI21.Learning Agent Communication under Limited Bandwidth by Messag

24、e Pruning.AAAI20.Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning.AAAI20.Transformer in Transformer as Backbone for Deep Reinforcement Learning.Arxiv22.PDiT:Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning.Arxiv22.Stackelberg Decision Trans

25、former for Asynchronous Action Coordination in Multi-Agent Systems.Arxiv23.TPTU:Large Language Model-based AI Agents for Task Planning and Tool Usage.Arxiv23.TPTU-v2:Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems.Arxiv23.Controlling Large Language Model-based Agents for Large-Scale Decision-Making:An Actor-Critic Approach.Arxiv23.36堅持原創,讓 AI 引領人類進步面向全球,商湯科技堅持原創,致力讓 AI 推動經濟、社會和人類發展,建設未來。400 900 5986(周一到周五 9:00-18:00)商業合作

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(毛宇航_RLChina23 - 周日上午 - 毛航宇 - 從 強化學習(多)智能體 到 大語言模型(多)智能體(1)_watermark.pdf)為本站 (張5G) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站