1-3 當強化學習遇上高自由度動作游戲:問題研究與應用實踐.pdf

編號:102359 PDF 36頁 4.78MB 下載積分:VIP專享
下載報告請您先登錄!

1-3 當強化學習遇上高自由度動作游戲:問題研究與應用實踐.pdf

1、When RL Meets Highly Free Action Game:Research and Case Study2022/09/24胡裕靖胡裕靖1Overview2Navigation3Melee Combat4FutureIntro of Fuxi&Naraka:Bladepoint How we solve navigation problem in Naraka:BladepointHow we solve melee combat problem in Naraka:BladepointWhat we wantto do nextOverviewIntro of Fuxi&N

2、araka:Bladepoint NetEase FuxiBusiness and Research InterestsFuxi is founded on the principle of bridging artificial intelligence and video gamesReinforcement LearningComputer VisionNatural Language ProcessingUser PersonaVirtual HumanRoboticsNetEase Fuxi RL GroupBusiness and Research InterestsCard Ga

3、meRevelation MobileRevelation MobileMMORPGJustice 6Justice 6-vsvs-6 6Sports GameFever Basketball 3Fever Basketball 3-vsvs-3 3ACT GameNarakaNaraka BladepointBladepointTypical Applications of RL in Games Game AI Bots60-player PVP mythical action combatMelee combatGravity defying mobilityVast arsenals

4、of melee&ranged weaponsLegendary customizable heroes with epic abilitiesAction-adventure Battle Royale Game Developed by 24 Entertainment and published by NetEase Games MontrealNaraka:Bladepoint(永劫無間)Naraka:BladepointTwo major problems in Naraka(人機模式)we want to solve1.1.NavigationNavigation in very

5、complex terrains2.2.Melee combatMelee combat bots with high skill levelReinforcement Learning Applications in NarakaNavigation and Melee CombatNavigation TaskMelee Combat TaskNavigationComplex threeComplex three-dimensional terrainsdimensional terrains:mountains,trees,rivers,temples,tall buildings(T

6、oo many disconnected areas)Problems for AI in Naraka:BladepointProblems for pathfindingNavMeshTypical terrains in Naraka:BladepointDynamic environmentDynamic environment(i.e.,poison circle,bombing zone,traps)Problems for AI in Naraka:BladepointProblems for pathfindingBombing ZonePoison CircleTrapsMu

7、ltiple game mechanisms for moving Multiple game mechanisms for moving(i.e.,grappling hooks,scale rush,sliding jump,charge-to-dodge)Demand for humanDemand for human-likenesslikenessGrappling hookProblems for AI in Naraka:BladepointProblems for pathfindingScale RushSliding Jump&Charge-to-dodgeThreeThr

8、ee-Dimensional realDimensional real-time perceptiontime perceptionComplex threeComplex three-dimensional terrainsdimensional terrainsDynamic environmentDynamic environmentDisconnected areasDisconnected areasMultiple game mechanism for movingMultiple game mechanism for movingHuman Like moving operati

9、onsHuman Like moving operationsHumanHuman-like Policy Output Designlike Policy Output DesignDeep Reinforcement LearningDeep Reinforcement LearningNavigation:3D perception with DRLProblems and methodsTechniques such as Automated Reward Techniques such as Automated Reward Shaping and Curriculum Learni

10、ngShaping and Curriculum LearningRaderDepth MapNavigation:3D perception with DRL3D real-time perception in the game3D Features3D FeaturesScalar FeaturesScalar FeaturesTimeTime-Series FeaturesSeries FeaturesW/A/S/DW/A/S/D(ForwardForward、BackBack、LeftLeft、RightRight)Hook/Crouch/Hook/Crouch/Dodge/Jump/

11、Dodge/Jump/Navigation:3D perception with DRLNeural Network StructureNavigation:3D perception with DRLAgent can get stuck and lacks human-likeness Agent gets stuck in cornersAgent keeps jumpingNavigation:3D perception with DRLAutomated Reward ShapingReward Shaping needs tedious tuning work to get app

12、ropriate weight hyperparametersReward Shaping needs tedious tuning work to get appropriate weight hyperparametersOptimal PolicySuboptimal PolicyTrue RewardShaping RewardIRATLi Wang,Yupeng Zhang,Yujing Hu,et al.Individual Reward Assisted Multi-Agent Reinforcement Learning.ICML 2022.Navigation:3D perc

13、eption with DRLAutomated Reward Shaping Updating Shaping PolicyLi Wang,Yupeng Zhang,Yujing Hu,et al.Individual Reward Assisted Multi-Agent Reinforcement Learning.ICML 2022.For each shaping policy and the target policy:When two policies are consistentconsistent,the shaping policy should learn quickly

14、learn quickly.When two policies conflict conflict too much,the shaping policy should update carefullyupdate carefully.Combine with its original optimization objective:=1max,+1min,An increasing-effect KL regularizer is introduced to distill target policy knowledge:=,A new objective is:=clip,1 ,1+Simi

15、larity between and is defined as:=|Navigation:3D perception with DRLAutomated Reward Shaping Updating Target PolicyLi Wang,Yupeng Zhang,Yujing Hu,et al.Individual Reward Assisted Multi-Agent Reinforcement Learning.ICML 2022.Target policy uses learning objective corrected by importance sampling:=A de

16、creasing-effect KL regularizer to ensure effective update.The total learning objective of team policy is:=min ,clip ,1 ,1+,Where is a decreasing coefficient.Curriculum learningCurriculum learning:choose start point in specific areas,then randomly choose from the full map,and lastly choose stuck poin

17、tsNavigation:3D perception with DRLCurriculum LearningArea NameFull MapCelestraStilltide TempleWreckage PlainsShadow jade MineSun wings RestAverageNavMesh Arrival Rate63.40%32.70%27.90%35.90%24.80%41.40%37.70%Our methodArrival rate81.50%88.00%74.70%85.50%81.50%73.30%80.75%Increase Ratio28.54%169.11%

18、167.74%138.16%228.63%77.05%114.19%Comparison of the arrival rate between NavMesh and our method in different areaNavigation in complex terrainsHigh arrival rate in complex terrainsNavigation in complex terrainsHigh arrival rate in complex terrainsShadow Jade MineRL Navigation Agent vs Rule-based Age

19、ntMelee CombatRockRock-paperpaper-scissors combat systemscissors combat systemFocus Strikes Common AttackCounterstrikes Focus StrikesCommon Attack CounterstrikesProblems for AI in Naraka:BladepointProblems for melee combatThirteen heroesThirteen heroes(more in the future)with different hero skillsPr

20、oblems for AI in Naraka:BladepointProblems for melee combatSkills of Different Heros in Naraka:BladepointVarious melee weaponsVarious melee weapons with different mechanismsProblems for AI in Naraka:BladepointProblems for melee combatSpearNunchukA playing demo of Naraka,showing rich attack modesProb

21、lems for AI in Naraka:BladepointProblems for melee combatPolicy distillationPolicy distillation:knowledge transferOpponent ModelingOpponent Modeling:observing opponents historical behaviors to predict opponents next movesVarious melee weapons Various melee weapons with different mechanismsThirty her

22、oes Thirty heroes(more in the future)with different hero skillsRockRock-paperpaper-scissors combat system scissors combat system Requires players to guess/predict and counteract the others strategiesCombat Bot with High Skill LevelProblems and methodsPolicy distillation Policy distillation can impro

23、ve student agents performance effectively by transferring knowledge from multiple teachersCombat Bot with High Skill LevelAn all-rounder AI to master every kind of weapon:knowledge transfer Stage One:Train all teacher proficient in one weapon Stage Two:Distill their knowledge to one studentSince wea

24、pon combos are more complicated than hero skills,we only use distillation to handle weaponsCombat Bot with High Skill LevelAn all-rounder AI to master every kind of weapon:knowledge transfer How to predict opponents next moves:Observe and encode their historical behaviorsDifferent historical behavio

25、rsEncoded featuresCommon Attack(White)Force Strikes(Blue)Counter Strikes(Red)0.4,0.05,0.450.4,0.55,0.050.8,0.15,0.05Combat Bot with High Skill LevelPredict opponents next movesCombat Bot with High Skill LevelAn all-rounder AI to master every kind of weaponCombat Bot with High Skill LevelPredict opponents next moves(PVE)Future workWhat we want to do nextFuture workNavigation in the room and high buildings with big height differencesThe timing for switching melee/ranged weaponsOther sub-goal in battle royale game,i.e.,resource collection,team cooperation etc.Remaining ProblemsQ&A

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(1-3 當強化學習遇上高自由度動作游戲:問題研究與應用實踐.pdf)為本站 (云閑) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站