《AIGC與大模型賦能機器人智能控制.pdf》由會員分享,可在線閱讀,更多相關《AIGC與大模型賦能機器人智能控制.pdf(29頁珍藏版)》請在三個皮匠報告上搜索。
1、AIGC與大模型賦能機器人智能控制穆堯-香港大學-在讀博士DataFunSummit #2023sDataFun#page#DataFun.目錄CONTENTIntroduction of AIGCIntroduction of Embodied AI0301此部分內容作為文字排版占位顯示此部分內容作為文字排版占位顯示(建議使用主題字體)(建議使用主題字體)EmbodiedGPTAdaptdiffuser0204此部分內容作為文字排版占位顯示此部分內容作為文字排版占位顯示(建議使用主題字體)(建議使用主題字體)#page#01AIGC簡介DataFunSummit #2023sDataFun#
2、page#HKUMMLAB1.1Diffusion Models are Powerful Generative ModelDALLEForward difusionimagenoisenoisyimageHKUMMLAB.The Universityy of Hong Kong, Hong Kong#page#NXHMMLAB1.21From VAE to Diffusion ModelVAEMulti layers VAE46(21l)4(22|21)qo(2|3)222133De(21|22)pe(x|z1)1De(2)Da(,z1,22)10gp()B2os(2)logq(z1,x21
3、)Do(x2)p(z)10gp()Ea(1)1ogp(,x1,x2)=p(x|x1)p(x|22)p(z2)0821)q(z1,22|a)=a(x1|a)q(z2|z1)HKUMMLAB. The UniversofHong Kong. Hong Kong#page#HKUMMLAB1.2From VAE to Diffusion ModelGenerativeAdversarialNormalizing FlowNetwork.ofofi(zo)Aggx(ax)=og9(2)(=)=(=)det皖DRS=Normalizing flow: reversible functionGAN: Ne
4、eds to learn a discriminator trainingwith limited expressive powerprocess unstableRL Application: Generative Adversarial lmitationRL Application: Flow-based Recurrent Belief StateLearning for POMDPs (ICML2022)LearningHKUMMLAB.The Universityy ofHong Kong. Hong Kong#page#page#HKUMMLAB1.3Forward Diffus
5、ion ProcessThe formal definition of the forward process in T steps:Forward difusion process(fixed)DataNoiseX0q(x1:|xo)=lg(x+|x-1)4(x.|x-1)=N(xt1-Bxt-1,3)Goint1=1B:valuesschedule(ie.thenoise schedule)is designedsuchthatar0 and q(xrlxo)J(xT;0,I)HKUMMLAB.The University of Hong Kong. Hong Kong#page#HKUM
6、MLAB1.3Reverse Denoising ProcessFormal definition of forwardandreverse processes inT steps:Reverse denoising process(generative)DataNoiseX0店p(xT)=(xT;0,I)De(xo:T)=p(xT)IIDe(xt-1lxt)Do(x-11x1)=N(x1-11(x,t)I)欄1Trainable networkHKUMMLAB.The University of Hong Kong, Hong Kong#page#02AdaptdiffuserDataFun
7、Summit #2023sDataFun#page#NXHMMLAB2.1Convert RL problems to Trajectory Generation with Generative ModelPlanning as generative modelingOfload as much of MBRL into contemporary generative modeling as possiblereplace prediction and planning with big generative modelAlgorithm 1 Model-based RL (idealized
8、)Inputs:DatasetoftransitionsD=(st,at,St+1).j,rewarfunctionr(,),current state So1: Train a predictive modelminimive Beas+plls+1-f(sa)2:Use model to cvaluate potential plans ao:r, selecting the best one:maximizer(s0,ao)+r(s1,a1)+r(s2,a2)+.80:TgKong,Hong Kong#page#HKUMMLAB2.2Diffusion Model for Traject
9、ory GenerationDiffuser1HKUMMLAB.The University of Hong Kong. Hong Kong#page#NXHMMLAB2.31 Diffusion Model for Trajectory Generationlocal receptive fieldA generative model of trajectoriesSSaRepresent trajectories as single-channel imagesDiffuser2Train a diffusion model to iteratively denoiseSSSSSentir
10、etrajectoryaaaplanning horizonUse (one-dimensional) convolutions for temporalequivariance and horizon-independenceHKUMMLAB.The University of Hong Kong, Hong Kong#page#HKUMMLAB2.41Transfer Unconditional Trajectory Model to a Conditional PolicyDo(T)xpe(T)h(T)use a value function to bias trajectory mod
11、el to particular taskBehaviorDiffusionGuidanceModelModelFunctionS。S528SS。Spa(+)aaaaaa2111(-)0g*1h(+)(s0)+(s:)+(s)+(8)+(s)+(s)Guidance functions transforms an unconditional trajectoryuse a single Diffuser model for multiple different tasksmodel into a conditional policy for diverse tasks.HKUMMLAB.The
12、 University of Hong Kong, Hong Kong#page#HKUMMLAB2.4Goal-conditioned GuidenceSpecify a guidance function over the final explicit goal state of a trajectorySS。S。SsaSPo()1aa4a4S.S.S.S.S1xPo(T)電aaaaaa.h(T)81Construct a goal seeking policy through guidanceHKUMMLAB.The Universityy of Hong Kong. Hong Kong
13、#page#HKUMMLAB2.51Challenges of Diffusion PlanningLimited by the distribution of Training offline dataset and ischallenging to adapt new tasks/goals and new environmentsSolutionsTestTrainULearn an self-evolving diffuser for diversetasks/goals without real interaction withenvironmentsLearn a meta dif
14、fuser to generalize across taskHKUMMLAB.The University ofHong Kong, Hong Kong#page#page#NXHMMLAB3.1Framework of AdaptDiffuserTr(Initialized as noise)Different start and goal pointDiverse TaskGeneration送Goal3GoallGoal2GoalStart State全ttDenoisingeward GradientReward FunctionGuidanceProcessDenoisinU-Ne
15、tUpdate DiffusionDifferent Goal of ManipulationData PoolModelTasks全Tt-1DropAcceptGoalDiscriminatorStart Stat本01HKUMMLAB.The Universityof Hong Kong, Hong Kong#page#HKUMMLAB3.11Diverse task generation with Chatgpt寵窗網際(a)ExistingMaze(b)GeneratedMazes with ChatGPT(sdoo puespu peap ordnnu IA) ueu xorduou
16、enaorduns uol ldtu q sajdux 3zen poeuamnaNow.canyou summarize therulesofgeneratingthe maze?車Surehereare therules for generatingamaze with the desiredcharacteristics:1.Themazemustberectangularwithafixedsizeof9rowsand12columns.2.Eachrowofthe maze musthave thesame lengthwihexactly 12obstaclesorcmpty sp
17、aces in cachrow.3.The maze must be surrounded byawall ofobstacles.moving horizontally orvertically between adjacent empty spaces.5.The maze can haveacomplex layout with dcadends.loopsandother features.aslongasitmcets thcabove criteriaBy following thcse rules.you can gencrate a maze that mects the de
18、siredspecifications.Figure l1.Rules for generating maze layouts summarized by ChatGPTHKUMMLAB.The University of Hong Kong. Hong Kong#page#HKUMMLAB3.1Experimental ResultsTable 3.Adaptation Performance on Pick-and-Place TaskEnvironmentDiffuserAdaptDiffuser28.162.036.032.1Pick and Place setup 135.251.4
19、39.001.3Pick and Place setup237.5231.71Average(a)Star(b)PlaceBlock1(e)PlaceBlock2(d)Place Block3(Finish(e)Place Block4Figure5.Visualization of KUKA Pick-andPlace Task.We re-HKUMMLAB.The University of Hong Kong, Hong Kong#page#HKUMMLAB3.1Experimental Results發防話(a)Diffuse(b)AdaptDiffuser(Collision)(a)
20、Diffuser(b)AdaptDiffuserOnginal SettingHard CaselMaze2DMedium)福2防站(c)Diffuser(d)AdaptDifuser(d)AdapDifuser(c)Diffiser(Failed)(Collision)(NoCollision)HardCase2(Maze2D-Lange)WithGoldCoinon(4.2)HKUMMLAB.The University:of Hong Kong. Hong Kong#page#03具身智能簡介DataFunSummit #2023sDataFun#page#DataFun.Challen
21、ges of Embodied Al1.Embodied Cognitive Systems froma FirstPerson Perspective2.Achieving Highly Autonomous Decision-Planning Capabilities3.Achieving Goal-conditioned Physical Interaction with the World#page#DataFun.Introduction of Embodied AEmbodied AIrefers to the integration of artificial intellige
22、nce systems with physical bodies or roboticplatforms. allowing them to perceive and interact with the physical world in a manner similar tohumans.#page#DataFun.EmbodiedGPT1)Building a Human Manipulation Video-Text Dataset EgoCOT with a Multimodal CognitiveChain, Associating Visual Information with S
23、ub-Goals in Manipulation TasksOpen the drawertask:open a drawerplans: grasp the handle with the gripper and pull thehandleactions1.grasp(handle,gripper)1. pullhandle)Pick up the cuptask:pick up a cup on the tableplans: grasp the handle ofthe cup with the gripper and lift it upactions1.grasp(handle o
24、f the cup, gripper)2.lift up(cup)#page#DataFun.EmbodiedGPT2)Introducing a Visual-Language Pretraining Method based on a Multimodal Cognitive Chain.OEmbodied QueriesTextQueriesVisionLanguageLLaMAEmbodied-FormerTransformerMappingEmbodied PlanningInstanceinformationextprompiQuestionsTask descriptionGlo
25、balinformationconcatExamplesDialogue Memory2CNNw/globalPolicyVideo Q&AaveragepoolingPhysical ManipulationMulti-round Dialogue#page#DataFun.EmbodiedGPT3) Extracting Highly Relevant Features of Current Visual Observations and Plannings Specific Sub-Goals using Self-Attention Mechanism Enabling the Mod
26、el to Learn Low-Level Control withMinimal Demonstration Data,#page#aiFunEmbodiedGPT:Vision-Language Pre-Training viaEmbodied Chain of ThoughtYaoMu,Qinglong Zhang,MengkangHu,WenhaiWang,MingyuDing,Jun JinBinWang,JitengDai4,Yu Qiao,PingLuo21The University of Hong Kong OpenGyLab,Shanghai AILaboratory3Huawei Noahs Ark Lab 4Tsinghua UniversityPaper:https:/arxivorg/pd/2305.15021.pdfGithub:https:/