1、稀土掘金intel.YgolonnaT.cnJigAIGC前沿技術E虛擬人動作生成技術的發展與應用陳欣騰訊科技QQ影像中心研究員Traveler)10.9JUESINO2023nicatel#page#陳欣2021年加入騰訊QQ影像中心負責QQ秀的虛擬人服飾、動畫的AIGC技術研究致力于AIGC、多模態大模型方向的研究與落地中國科學院大學博士,研究生成式人工智能,具體包括虛擬人生成、動作生成、三維物體生成等,在CVPR,ICCV,SIGGRAPH等國際頂會發表論文20余篇個人主頁:HTTPS:/CHENXIN.TECH/Crueim.#page#動作生成技術的發展018擴散模型與動作生成02州
2、語言模型與動作生成03contentueiinuejir#page#page#相關資源獲取數據下載與代碼開源Motion BackgroundMotion-Latent-DiffusionMotionGPTeiin夏CanyougenerateamotionthatapersonkneelingontheBroundgetsupQ0過F寧大手生月Chtps/ is Human Motion?CSkeletonSkeleton and Skin11OSSO:Obtaining SkeltalfromOutside2ASknned Muli3jExpressive BodyCapt#page#pa
3、ge# Why Industry Needs Motion Synthesis?cnjueiin.cFilmGARMetaversecn#page#Animation:虛擬人動作制作Craftly MadeTime-consumingUnnatural人工制作耗時巨大過程不自然aPolyyfior#page#MoCap:動作捕捉技術花費巨大需要外部設別Equipment requiredHigh cost專業表演人員Human Driven#page#Why Academia Needs Motion Synthesis?juejin.crMotion CaptureImageAction/B
4、ehavior RMotionLanguageueiin.crGesture UnderstandingSpeech GenerationAnimation GenerationText-driven Generation#page#Timeline:動作生成技術發展與動作捕捉技術的研究工作3D Motion Capture動作捕捉HMR家然VIBE-Video Inference for Body4DHumansHuman Mesh RecoveryPose and Shape Estimation餐皮賓慶1.4kgithubstars2.6kgithubstarsRecentWork1kc
5、itations636citations20182021202320202022MotionGPTMDM- Motion Diffusion ModelsMLD-Motion-latent-diffusionRecentWork2.2kgithubstars,100citations300+github stars3DMotion Synthesis動作生成#page#02YgolonnaTJigMotion LatentEDiffusionModels擴散模型與動作生成Traveler)10.9JUESINO2023inicatej#page#CIVITAI#page# Diffusion
6、in Image SynthesisDALLE2,Disco-Diffusion,GLIDE.jueiin.cLatent DiffusionImageno larger Text encoder rather than CLIPDiffusion in VAE latent spacetext-encoder:T5-XXL(4.6B)Image512x512-VAE64x64x4diffusion decoder 64 (2B)Less computational costsdiffusion upsampler 64-256Supported Inputs: segmentations,i
7、mages,textsEfficient Unet 2-3x fasterLatent SpaceConditioningSemanticDifusionProcessMapDenoisingUNetceTextRepresentationsQQQQmagesEVKVKVKVPixel SpaceTe區上AiaadAdenoising stepCrOattentionskp connectonconcat/To Creal#page#GenerativeModels:生成式模型DiscriminatoGeneratolGAN:AdversarialX0/D(x)C(z)trainingEnco
8、derDecodercnVAE:maximize96(z1x)Pe(xz)variational lower boundInverseFlowFlow-based models:f(x)f一1(2)Invertibletransfommodistributionsueiin.Diffusion modelsX0Gradually add Gaussiannoiseandthenreverse市3點#page#DiffusionModels:擴散模型圖像生成的理解生成式模型本質上是一個采樣過程Sampling。挑戰:圖像的維度很高,直接構建分布很困難Pdata1.2.Diffusion Prob
9、abilistic Models(2020年)9(x10-7)將復雜的數據采樣過程簡化從一個二維純高斯噪聲分布逐步去噪的過程p(x()#page#1DiffusionModels:擴散模型jueiin.c數據準備階段:正向的加噪過程Fixed Forward Diffusion ProcessNoiseDataGenerative Reverse Denoising Processueiin.網絡訓練/預測:反向的噪聲預測過程h=n兒part-2#page#1UPriorWork:動作生成相關工作MDMText-to-MotionJeiinRealistic human motionPoor
10、condition matchingA person is crouched down and08發walking around sneakilyLinearMD1Guy Tevet,Sigal Raab,Brian Gordon,Yonatan ShafirAmitHBermano,and DanielCohen-Or.Human motion diffusion modelarxivpreeprintarxiv:2209.14916,2022#page# Motion Latent Diffusion Mjueiin.cMLDText-to-MotionCVPR23Faster infer
11、ence timeLimited task capacity4.0AITSFID Methods3.50.01753.734TEMOSHE0.03851.067T2M3.021114.7450.630MotonDifuse2.524.7450.544 MDM0.21750.473Ours-1ON12.01.5Diffusion models【斯1.0O0.50.04O0.51020AveraeInerenceTmeperSentence ATSinsecondsjuaelu uosnp uonou el spueuuos InoA buunexa nA Bueg pue nA IbuIr ua
12、ua oeln ula BuenH BuolIzni uaM BueIr oela uIX ueuaspace,In Proceedings of theIEEE/CVF Confeerence on Computer Vision and Pattem Recognitiion (CVPR),June 2023#page#03YgolonnaTJigMotionGPTEMotionHumanasa ForeignLanguage語言模型與動作生成10.9JUESINO2023inicatej#page#LargeLanguageModel:語言模型100beiinAi2nodels(100b
13、parameters)argeOpenAIGoogl#page#What Makes Motion Synthesis Challenging?1.Text & Motion ModalitySmalLDistribution varies greatlyDatasetsLimited paired datawavingSomewho2.Motion DiversityDiverse motionsDiverse descriptions“Drink“DrinkDrink#page#Challenges:動作生成的挑戰Modeling language-motion relationellVl
14、eftA person jumpSforwards and tums rightjumpS forwards and turns A personUniform multi-task frameworkMethodsText-to-MotionMotion-to-TextMotion PredictionMotion In-betweenRandom MotionRandom DescriptionXX子XVT2M-GPT2XXXXMLD3XXNTM2T 1XxXMDM4XVMotionDifuse 5VVVVVVMotionGPT(Ours)#page#iUPrior Work:T2M-GP
15、TT2M-GPTText-to-MotionRealistic human motionLimited to single taskA person is crouched down and33walkingaroundsneakily.eii“heisflying kickwith hisleftlegTransformer LayersCLIPJianrong Zhang,Yangsong zhangXiaodong Cun Shaoli Huang Yong Zhang,Hongwei Zhao Hongtao Lu andX Shen T2mgpt:Generating human m
16、otion from textual descriptions with discrete representations.Conference on Computer Vision and Pattern Recognition (CVPR),2023.#page#1uA Unified Framework: MotionGPTuejin.crjueiin.cMotionGPTpredicText-to-MotionMotion-to-TextText-to-TextMotion-to-Motionoton+lengthLengthMotionMotionjueiin.crjueiin.ue
17、iincrX#page# Demo: MotionGPTueiin.Text-to-MCrueiin#page#1Pipeline:MotionGPT的框架jueiin.cCan you give mea motion thatText TokensText CodebookToken1A person kneeling on theToken23ground gets up?Motion CMotion TokensMationToken1EncoderToken2Input Motion(optional)Token3Pipeline:MotionGPT的框架Can yougive mea
18、 motionthatText TokensInput Mixed TokensText CodebookCground gets up?177Moton Codebook廣容財銀得運LargUaBMotion TokensDubndarsEngodersMotionInstructedDataset:MotionGPT的數據集MotionGPT-gtio-to-MotosOutumofonTex-to-MoiommotionText-to-Motionw/ lenghLenghtoMoionfoRadom MoionmoionlcapioMotion-to-TextcapionMotion-
19、to-Textw/lenghMofont-Legueiin1nCaptio-to-LeghLengi-to-CapdoncapioncapionRandom Capiion#page#MotionGPT的訓練Training:Step 1Step2Step3Training of Motion TokenizerMotion-languagePretraining,InstructionTuning.CanyoushowmehataAmotion sequenceA motion andaSondoesthreosrightissampled from3Dlanguagedescriptonn
20、ping jiaeks7aresampled.motion dataset原Joqouepod nopoleouttotheside?Motion tokenizerThis motion isAperson catchesand throwsa balllearns motionmappedto discreteThe QAs are sampledrepresentation,motion indices andfrom our promptmixedwithwords.templatesThe prompts areThis dataisusedtoMotion codebookused
21、 to finetune ouris usedtorepresentpre-train our motionmodel on diversehuman motion aslanguage model.motion tasksdiiscrete tokens.#page#Training:MotionGPT的訓練Text-to-MotionMotion-to-TextMotion In-betweenMotion PredictionSizeInstruction TuningRTOP31Bleu4CidertFIDJDIVFIDJDIVMMDistFIDJDIVReal0.7970.0029.
22、5032.9010.0029.5030.0029.5030.70612.0224.9Small0.7279.2642.748Small0.6630.3369.2392.93110.5424.30.9548.7270.3269.6180.7220.36512.4729.2Base9.4072.82128.2Base0.7000.1609.4113.01911.420.9058.9720.2149.560Large0.6940.2349.3102.77612.4428.529.1Large0.7080.1599.3013.01111.710.5568.9750.2239.358jueiir#pag
23、e#iueMotionGPT的效果展示Text-to-Motion ComparisonGTT2M-GPTeiin.A person is croucheddown and walkingaround sneakily,2OursMDMueiin.#page#MotionGPT的效果展示Motion-to-Text ComparisonCrGTAman starts to walk straight then walks totheright.TM2TA person slowly walk in acounterclockwise circle.SinoA person walks in a
24、 semi-circular pattern,tip-toeing,ueim.#page#MotionGPT的效果展示Text-to-Text ResultscrJeiirDepict a motion asA man is standing still swaying and thenlike you have seen it.walks slowly towards the 1 oclockRandom say something aboutA person is standing upleft while puttingdescribing a human motion.their ha
25、nds together in a praying motion.Describe the motion ofA standing person holds their hands in frontSomeone as you willof their chest and claps three tiimes.#page#page#Next Stepsjueiin.ciiModelMulti-modal Largediverse modalgenerationsjuejinonunderstanding motions at different levelsImage and Motion3 Large Human Motion Datasetsueiin#page#cnMotionGPTMotion-Latent-Diffusionueiinthepersonrisesftomalayingpositionndwalksin2lockwisecrcleandthen lays bacdown thegro3Aperson kickstwonhisleftQthen kicksforwcom/openMotionLab/MotionGPThttps:/