當前位置：首頁 > 報告詳情

黃世宇-OpenRL支持大模型訓練的強化學習框架與大模型時代的PluginStore.pdf

上傳人： 2*** 編號：142161 2023-09-10 PDF PDF 61頁 8.33MB

該報告所屬合集： 2023AIDD AI+軟件研發數字峰會·北京站嘉賓PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/61

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《黃世宇-OpenRL支持大模型訓練的強化學習框架與大模型時代的PluginStore.pdf》由會員分享，可在線閱讀，更多相關《黃世宇-OpenRL支持大模型訓練的強化學習框架與大模型時代的PluginStore.pdf（61頁珍藏版）》請在三個皮匠報告上搜索。

1、OpenRL:A Unified Reinforcement Learning Framework黃世宇第四范式演講嘉賓黃世宇第四范式強化學習科學家，開源強化學習OpenRL Lab負責人本科與博士均畢業于清華大學計算機系，導師是朱軍和陳挺教授，本科期間在CMU交換，導師為Deva Ramanan教授。主要研究方向為強化學習，多智能體強化學習，分布式強化學習。曾在ICLR、CVPR、AAAI、NeurIPS,Nature Machine Intelligence,ICML,AAMAS,Pattern Recognition等會議和期刊發表多篇學術論文。其領導開發的TiZero谷歌足球游戲智能

2、體曾在及第平臺上取得排名第一的成績。黃世宇也曾在騰訊AI Lab、華為諾亞、商湯、瑞萊智慧等工作。目錄CONTENTS1.強化學習背景2.OpenRL介紹3.OpenRL未來發展4.OpenPlugin介紹Introduction&MotivationPART 01What is Reinforcement Learning?Goal of RL:Artificial General Intelligence(AGI)Reinforcement learning in dog training.What else?Robotics Autonomous DrivingOpenAI 2019C

3、ARLA 2017What else?Industrial Design Quantitative TradingPrefixRL 2022FinRL 2020What else?Chat BotWhat else?Multi-agent RL Competitive RLTiZero 2023Honor of Kings Arena 2022Do RL in a Unified FrameworkVarious RL AlgorithmsVarious EnvironmentsMulti-agent&Self-playOffline RLOpenRL:An Open-Souce RL Fra

4、meworkPART 02Main Features of OpenRL Friendly to beginnerspip install openrlordocker pull openrllab/openrlMain Features of OpenRL Friendly to beginnersopenrl-mode train-env CartPole-v1Main Features of OpenRL Friendly to beginnersMain Features of OpenRL Friendly to beginnersDocumentation/中文文檔Tutorial

5、Main Features of OpenRL Customizable capabilities for professionalsConfigure everything via YAMLUse yaml python train_ppo.py-config mpe_ppo.yamlUse yaml python train_ppo.py-config mpe_ppo.yaml python train_ppo.py-seed 1-lr 5e-4Main Features of OpenRL Customizable capabilities for professionalsTrack

6、your experiments via WandbMain Features of OpenRL Customizable capabilities for professionalsTrack your experiments via TensorboardCustomize Wandb Outputhttps:/ Wandb OutputMain Features of OpenRL Customizable capabilities for professionalsAbstract&Modularized DesignReward ModulePolicy ModuleValue M

7、oduleAlgorithmCustomize Reward ModelChen,Wenze,et al.DGPO:Discovering Multiple Strategies with Diversity-Guided Policy Optimization.arXiv preprint arXiv:2207.05631(2022).Customize Reward ModelCustomize Reward ModelCustomize Reward ModelIntent Reward：When the generated text by the agent is close to t

8、he expected intent,the agent can receive higher rewards.METEOR Metric Reward：METEOR is a metric used to evaluate text generation quality and can be used to measure how similar generated texts are compared with expected ones.We use this metric as feedback for rewards given to agents in order to optim

9、ize their text generation performance.KL Divergence Reward：This reward is used to limit how much text generated by agents deviates from pre-trained models and prevent issues of reward hacking.Customize Reward ModelIntent Reward：When the generated text by the agent is close to the expected intent,the

10、 agent can receive higher rewards.Customize Reward Model METEOR Metric Reward：METEOR is a metric used to evaluate text generation quality and can be used to measure how similar generated texts are compared with expected ones.We use this metric as feedback for rewards given to agents in order to opti

11、mize their text generation performance.Customize Reward Model KL Divergence Reward：This reward is used to limit how much text generated by agents deviates from pre-trained models and prevent issues of reward hacking.Main Features of OpenRL Support Offline RLLearn from IteractionLearn from Expert Dat

12、aMain Features of OpenRL Support Offline RLMain Features of OpenRL Customizable capabilities for professionals Dictionary observation space support Serial or parallel environment training Support for models such as LSTM,GRU,Transformer etc.Automatic mixed precision(AMP)training Data collecting wth h

13、alf precision policy networkMain Features of OpenRL Build on top of othersDatasetsModelsMain Features of OpenRLGalleryMain Features of OpenRLHigh performanceTraining CartPole on a laptop only takes a few seconds.+17%speedup for language model training.Ranking 1st on Google Research Football.Achievin

14、g+43%performance improvement on LLM.Compared with RL4LMsTiZeroLin,Fanqi,et al.TiZero:Mastering Multi-Agent Football with Curriculum Learning and Self-Play.arXiv preprint arXiv:2302.07515(2023).TiZeroLin,Fanqi,et al.TiZero:Mastering Multi-Agent Football with Curriculum Learning and Self-Play.arXivpre

15、print arXiv:2302.07515(2023).TiZeroLin,Fanqi,et al.TiZero:Mastering Multi-Agent Football with Curriculum Learning and Self-Play.arXiv preprint arXiv:2302.07515(2023).Future ReleasePART 03Large-Scale RLLarge ModelLarge ClusterLarge PopulationLarge-Scale RLLarge PopulationYang,Xinyi,et al.Learning Gra

16、ph-Enhanced Commander-Executor for Multi-Agent Navigation.arXiv preprint arXiv:2302.04094(2023).Open RL via SharingShare ModelsShare CodesShare ResultsScan the QR code to try OpenRL!Visit: the QR code to try OpenRL!Visit: for LLMPART 03Why?Think about pip for Python package(apt/yum/brew/dnf/npm/)!Th

17、ink about App Store.Standardize plugin.Provide a simple way to use,share LLM plugins.Main Features of OpenPlugin Installationpip install openplugin-pyMain Features of OpenPlugin Usage install plugin:op install install locally op install./reinstall op reinstall uninstall plugin:op uninstall start to

18、run plugin:op run list installed plugins:op listop is all you need!Main Features of OpenPlugin Usage Provide config API for SageGPT/ChatGPT platform can get json file via:server_host/ai-plugin.json can get YAML file via:server_host/openapi.yamlMain Features of OpenPlugin Build on top of othersYou ca

19、n share your plugin to others!https:/ Features of OpenPluginPlugin Storeikun_plugintodo_pluginQRcode_plugin.Main Features of OpenPlugin QRcode_pluginSupport for placeholder:Plugin StructureMain Features of OpenPlugin How to use QRcode_plugin Step 0:Find a server Step 1:pip install openplugin-py Step 2:op install QRcode_plugin Step 3:op run QRcode Step 4:Get the json and YAML file Step 5:Register plugin to SageGPT or ChatGPT website Step6:Finished!Have fun!Main Features of OpenPluginQRcode_pluginDemoTry OpenPlugin,Click Star!Visit:https:/ 謝聆聽

相關圖表

華立科技和世宇科技比較

行業數據 2021-09-02

原圖定位查看詳情
2017-2019世宇科技海外銷售收入占比

行業數據 2021-09-02

原圖定位查看詳情
2017-2019華立科技、世宇科技、光陽游樂模擬類產品銷售額對比

行業數據 2021-09-02

原圖定位查看詳情
2017-2019世宇科技外購套件銷售收入占比

行業數據 2021-09-02

原圖定位查看詳情
2017-2019華立科技、光陽游樂、世宇科技合計市占率

行業數據 2021-09-02

原圖定位查看詳情
公司與世宇科技游藝設備銷售情況對比

行業數據 2021-09-02

原圖定位查看詳情

本文主要介紹了黃世宇，第四范式強化學習科學家，開源強化學習OpenRL Lab負責人。他畢業于清華大學計算機系，師從朱軍和陳挺教授，并在CMU交換，師從Deva Ramanan教授。他的主要研究方向為強化學習，多智能體強化學習和分布式強化學習。他曾多次在ICLR、CVPR、AAAI、NeurIPS、Nature Machine Intelligence、ICML、AAMAS、Pattern Recognition等會議和期刊發表多篇學術論文。他領導開發的TiZero谷歌足球游戲智能體曾在及第平臺上取得排名第一的成績。黃世宇曾在騰訊AI Lab、華為諾亞、商湯、瑞萊智慧等工作。他還是OpenRL框架的創始人，該框架是一個開源的強化學習框架，具有友好的用戶界面和高度可定制的能力。OpenRL框架支持離線強化學習，可以學習從交互和專家數據中。此外，OpenRL框架還支持大規模強化學習，可以處理大型模型、大型集群和大型種群。最后，他還介紹了OpenPlugin，這是一個用于大型語言模型(LLM)的插件，可以簡化LLM的使用和分享。

"OpenRL框架有哪些主要特點？" "OpenPlugin插件如何簡化LLM的使用和分享？" "黃世宇在強化學習和多智能體強化學習領域有哪些重要貢獻？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站