高通：2024年生成式AI在高效圖像與視頻生成中的潛力與應用白皮書（英文版）（44頁）.pdf

報告預覽

高通：2024年生成式AI在高效圖像與視頻生成中的潛力與應用白皮書（英文版）（44頁）.pdf

編號：178573

PDF 中文版 PPTX 44頁 16.60MB 下載積分：VIP專享

下載報告請您先登錄！

高通：2024年生成式AI在高效圖像與視頻生成中的潛力與應用白皮書（英文版）（44頁）.pdf

1、Snapdragon and Qualcomm branded products are products of Qualcomm Technologies,Inc.and/or its subsidiaries.Amirhossein Amirhossein HabibianHabibianDirector of EngineeringQualcomm Technologies NetherlandsJan 25,2024EfficientgenerativeAI for imagesand video2Agenda The potential impact of efficient gen

2、erative vision Efficient image generation Efficient video generation Efficient 3D generation Applications:automotive Q&A23Text generationText generation(ChatGPT,Bard,Llama,etc.)Image generationImage generation(Stable Diffusion,MidJourney,etc.)Code generationCode generation(Codex,etc.)3InputInputprom

3、ptsprompts“Write a lullaby aboutcats and dogs to helpa child fall asleep,includea golden shepherd”A greatA greatlullaby islullaby iscreated increated insecondssecondsRealReal-life applicationlife application Communications Journalism Publishing Creative writing Writing assistanceInputInputpromptspro

4、mpts“Super cute fluffycat warrior in armor”RealReal-life applicationlife application Advertisements Publishedillustrations Corporate visuals Novel imagegenerationInputInputpromptsprompts“Create code for a pool cleaning website withtab for cleaning,repairs,and testimonials”RealReal-life applicationli

5、fe application Web design Software development Coding TechnologyA beautiful websiteA beautiful websiteis created in secondsis created in secondsWhat isgenerative AI?AI models that create new andoriginal content like text,images,video,audio,or other dataGenerative AI,foundational models,and large lan

6、guage models aresometimes used interchangeably4GeneratingGenerating3D content3D contentGenerative models create3D meshes and assetsBased on textual descriptionor a handful of images with minimal manual effortExamples:Examples:DreamFusion,Magic3DWhy isgenerativeAI forcomputervisionimportant?EditingEd

7、itingimages and videosimages and videosGenerative models change aspect of images and videosSwap the background,change style or edit objectsattribute and appearanceExamples:Examples:SDEdit,PnP,Pix2PixGeneratingGeneratingimages and videosimages and videosGenerative modelscreate images andvideos from s

8、cratchOriginal,life-like visuals generated from textualand/or image prompts(in the case of image/text-to-image or image/text-to-video)Examples:Examples:Stable Diffusion,ControlNetPrompt:“Super cute fluffy cat warrior in armor,photorealistic,4K,ultra detailed,vray rendering,unreal engine”Prompt:“a pl

9、ush dragon toy”5Worldsfastest AItext-to-image generative AI on a phoneATATSNAPDRAGONSNAPDRAGONSUMMITSUMMIT20232023Fast Stable DiffusionTakes less than 0.6 secondsfor generating 512x512 imagesfrom text promptsEfficient UNet architecture,guidance conditioning,andstep distillationFull-stack AI optimiza

10、tion toachieve this improvement6DataDatainefficiencyinefficiencyModels requirebillions of trainingsamples,thatmakes it hardto adapt themto new domains.Memory Memory costscostsModels demanda lot of memoryto perform welland sometimesneed to runconcurrently.What are the challenges to overcome for gener

11、ative AIimages andvideos?High computationHigh computationand latencyand latencyGen AI requiresimmensecomputationalpower andinfrastructure.7Output imageVAE:Variational Auto Encoder;CLIP:Contrastive Language-Image Pre-TrainingWhat isdiffusion?Reverse diffusion(subtract noise or denoise)Forward diffusi

12、on(add noise)Image generationStableDiffusionarchitectureUNet is the biggest componentmodel of Stable DiffusionMany steps,often 20 or more,are used for generatinghigh-quality imagesSignificant computeis requiredInput promptInput promptStable Diffusion(1B+parameters)CLIP text encoder(123M123M paramete

13、rs)SchedulerUNet(860M860M parameters)VAE decoder(49M49M parameters)StepVase in Greek style with intricate patterns and designPrompt:Panoramic view of mountains of Vestrahorn and perfect reflection in shallow water,soon after sunrise,Stokksnes,South Iceland,Polar Regions,natural lighting,cinematic wa

14、llpaper 8BK-SDM:Architecturally Compressed Stable Diffusionfor Efficient Text-to-Image Generation,arXiv23Key concept:ModeldistillationTeach the studentmodel to achieve whatthe teacher achievesat each stepTeacher:Teacher:UNetsUNetsStudent:Small Student:Small UNetUNetMSEMSElossloss9Progressive Distill

15、ation for Fast Sampling of Diffusion Models,ICLR22Key concept:StepdistillationTeach the student modelto achieve in one stepwhat the teacher achievesin multiple stepsTeacher:2 Teacher:2 UNetsUNetsStudent:1 Student:1 UNetUNetMSEMSElossloss10Clockwork Diffusion:Efficient Generation With Model-Step Dist

16、illation,submitted to CVPR24HighHigh-resolutioresolutionrepresentationsin UNet carryhigh-frequencycontent(e.g.,textures)LowLow-resolutionresolutionrepresentationsin UNet carryhigh-level structure(e.g.,scene layout)Conv2dResNetBlockTrans.BlockDownSampleResNetBlockTrans.BlockResNetBlockResNetBlockResN

17、etBlockTrans.BlockDownSampleResNetBlockTrans.BlockResNetBlockTrans.BlockDownSampleResNetBlockTrans.BlockResNetBlockTrans.BlockResNetBlockResNetBlockResNetBlockResNetBlockUpSampleResNetBlockTrans.BlockUpSampleResNetBlockTrans.BlockResNetBlockTrans.BlockResNetBlockTrans.BlockUpSampleResNetBlockTrans.B

18、lockResNetBlockTrans.BlockConv2dResNetBlockTrans.BlockResNetBlockTrans.BlockResNetBlockTrans.Block11Prompt:image of an astronaut riding a horse on marsPerturb from step 0Perturb from step 0Perturb from step 1Perturb from step 1Perturb from step 2Perturb from step 2Perturb from step 3Perturb from ste

19、p 3Perturb from step 4Perturb from step 4Perturb from step 5Perturb from step 5Perturb from step 10Perturb from step 10Perturb from step 15Perturb from step 15LowLow-res featuresres featuresMidMid-res featuresres featuresHighHigh-res featuresres featuresLow-resolution features can be perturbed witho

20、ut a noticeable change in the final output,whereas small perturbations on the high-resolution features degrade the image generation1212How to leverage the perturbation robustness to save computations?Clockwork Clockwork architecturearchitectureAn efficient approximation oflow-res featuresby adapting

21、 fromprevious stepsTraining theTraining theadaptoradaptorDistillation froma full UNet over all denoising steps!#!Full UNet$+%Adaptor UNet$+f!&#Adaptor UNet$+f!#!Full UNet$+%!&#Full UNet$+%Full UNet$+%!&!&$%&$&High-res inin()*High-res outout(+,$%&+,$High-res inin()*High-res out out(+,$Low-res-$)*$+,$

22、Adaptor Low-res-$%&)*13Clockwork improves any diffusion modelFID=Frechet Inception Distance,CLIP=Contrastive Language-Image Pre-trainingModelModelFID FID CLIP CLIP FLOPs FLOPs()Stable Diffusion UNetStable Diffusion UNet24.640.30010.8+Clockwork+Clockwork24.110.2957.3(1.41.4)Efficient Efficient UNetUN

23、et24.220.3029.5+Clockwork+Clockwork23.210.2965.9(1.61.6)Distilled Efficient Distilled Efficient UNetUNet25.750.2974.7+Clockwork+Clockwork24.450.2952.9(1.61.6)TextText-toto-image generation on MSimage generation on MS-COCO 2017COCO 2017-5K5K14Clockwork generates high-quality images faster than state

24、of the artModelModelFID FID CLIP CLIP FLOPs FLOPs()InstalFlowInstalFlow(1 step)(1 step)1 129.300.2830.8Model DistillationModel Distillation2 231.480.2687.8Guidance DistillationGuidance Distillation3 326.900.3006.4SnapFusionSnapFusion4 424.200.3004.0ClockworkClockwork24.4524.450.2950.2952.92.9TextTex

25、t-toto-image generation on MSimage generation on MS-COCO 2017COCO 2017-5K5K1 Instaflow:One step is enough for high-quality diffusion-based text-to-image generation.arXiv2 2 On architectural compression of text-to-image diffusion models,arXiv23 3 On distillation of guided diffusion models,CVPR234 Sna

26、pFusion:Text-to-image diffusion model on mobile devices within two seconds,NeurIPS23FID=Frechet Inception Distance,CLIP=Contrastive Language-Image Pre-training1515ClockworkdistillationReduce cost ofUNet for someforward passesStepdistillationReduces UNetforward passesto less than 20Guidanceconditioni

27、ngCombines conditional and unconditional generationEfficientUNetReduces compute(FLOPs),model size,peak memory usageClockwork reduces the total latency by 1.2xwhile improving quality compared to Fast Stable Diffusione-to-vReparameterization from epsilon to velocity space for robust distillationBaseli

28、ne Baseline Stable Stable DiffusionDiffusionFIDFIDCLIP CLIP Diffusion latencyDiffusion latencyFast Stable Fast Stable DiffusionDiffusion26.040.2970.40 secondsFastFasterer Stable DiffusionStable Diffusion25.210.2920.27 seconds0.65 seconds0.53 secondsTotal latencyTotal latencySpeedup1.2x1.2xFast Stabl

29、e DiffusionFast Stable DiffusionFastFasterer Stable DiffusionStable Diffusion16Prompts:large white bear standing near a rock”,”the vegetables are cooking in the skillet on the stove.”,”bright kitchen with tulips on the table and plants bythe window”,“red clouds as sun sets over the ocean”,“a picnic

30、table with pizza on two trays”,“a couple of sandwich slices with lettuce sitting next to condiments.”Results:Results:state-of-the-art efficient image generation by ClockworkFast Stable Diffusion+ClockworkStable Diffusion v1.5+ClockworkStable Diffusion v1.5Fast Stable Diffusion17PnP:Plug-and-Play Dif

31、fusion Features for Text-Driven Image-to-Image Translation,CVPR23Results:Results:Clockwork for image editingEdited byPnP+ClockworkInputEdited byPnP18The potentialof generativevideo editingGiven an input videoand a text promptdescribing the edit,generate a new videoThe edit usually changesthe appeara

32、nce or shapeof a particular objectKey challenges:Key challenges:1.Temporal consistency2.High computational costPrompt:Prompt:“pink flamingo walking”Input videoInput videoEdited videoEdited video19GENERATIONGENERATIONSource promptInput videoINVERSIONINVERSION$.$%&Sourceinvertedlatent$.$%&Target promp

33、tEdited videoFeaturesAttentionWhy isvideo editingso slow?DiffusionDiffusioninversioninversion Essential to preserve temporal consistency and details in the source video Comes at a high memory cost to store attention maps and feature20Why isvideo editingso slow?Temporal Temporal attentionsattentionsC

34、omes at a high computational costdue to the quadraticcost with respectto video length!:#$Softmax!.$(%.Spatial AttentionSpatial AttentionCompute self-attentionindividually per frameSelf-attention cost 21Why isvideo editingso slow?Temporal Temporal attentionsattentionsComes at a high computational cos

35、tdue to the quadraticcost with respectto video length!:#$Softmax!.$(%.Temporal AttentionTemporal AttentionCompute self-attentionover all framesSelf-attention cost(2 2)22Object Centric Diffusions for Efficient Video Editing,submitted to CVPR24Token mergingsolves thesetwo challenges!:#$Softmax!.$(%.%&

36、%MergeTokensThe costly attention is computedover a fraction of tokens(centroids)1.Merge the redundanttokens2.Perform computationon clusters3.Copy the output backinto merged tokens23Merged tokensUnmerged tokensEncouragesmerging tokenson thebackgroundregionsBy increasing,more and moreforeground tokens

37、will be left unmergedJeepPorscheSwanFlamingoObject-Centric 3D Token MergingToken MergingTwo tokens are merged if their similarities exceeds a thresholdIntroduce different thresholds for background vs.edited regionsUsing lower threshold on background regions:Encourages merging more tokenson backgroun

38、d Leaves more unmerged tokenson foreground regions2424Further acceleration by Object-Centric Sampling!#$!$!%$!&$!$!()!)!%()Scatter#$GatherGenerating edited regions at higher sampling stepsGenerating background regions at lower sampling stepsBlending steps to avoid inconsistenciesWe perform a differe

39、nt number of sampling steps on editededited and backgroundbackground regions:Edited regions:Are usually small,but require most synthesis(more sampling steps)Background regions:Background regions:Are usually large,and dont require much synthesis(less sampling steps)25FateZero:Fusing Attentions for Ze

40、ro-shotText-based Video Editing,ICCV23 ControlVideo:Training-free ControllableText-to-Video Generation,arXiv236 6-10 x speedup10 x speedupwith negligibledrop in qualityApplied our accelerationApplied our accelerationon two recent videoon two recent videogeneration frameworks:generation frameworks:Fa

41、teZero ControlVideo Our acceleration includes:Our acceleration includes:Object-centric 3D token merging Object-centric samplingModelModelTemporal Temporal Cons Cons CLIP CLIP Latency Latency()InversionInversionGenerationGenerationOverallOverallFateZero0.9610.344135.8041.34177.14+Our acceleration+Our

42、 acceleration0.9670.3318.22(16.5(16.5)9.29(4.4(4.4)17.51(10(10)DAVIS benchmark:Generating 8 frames on server GPUModelModelTemporal Cons Temporal Cons CLIP CLIP Latency(Latency()ControlVideo0.9720.318152.64+Our acceleration+Our acceleration0.9770.31325.21(6.0(6.0)CV benchmark:Generating 15 frames on

43、server GPU262610 x faster at a comparable editing qualitycomparable editing qualityResearch on further optimizations to enable on-target deployment of video generation modelsWatercolor Watercolor paintingpaintingPorsche carPorsche carPink flamingo Pink flamingo walkingwalkingSwarovski Swarovski crys

44、talcrystalCartoon photoCartoon photoWhite duckWhite duckMakoto Makoto shinkaishinkai stylestylePokemonPokemon cartooncartoonUkiyoUkiyo-e stylee styleFateZeroFateZero+Object+Object-Centric DiffusionCentric Diffusionshapeshape,attributeattribute,stylestyle editing27How does3D generationwork?Generating

45、 3D meshfrom a text prompt Crucial for many taskse.g.,XR,graphics Manual creation of3D assets is costlyA plush dragon toyA plush dragon toyA DSLR photo of a hippoA DSLR photo of a hippowearing a sweaterwearing a sweaterA DSLR photo of a train engineA DSLR photo of a train enginemade out ofmade out o

46、f clayclayA beautiful rainbow fishA beautiful rainbow fish28OptimizationOptimization-based based approachapproach Costly optimizations to fitmesh parameters for each object Takes+20 min+20 min to modela new object/scene Leverage a pretrained image generator to improve the optimization,i.e.,score dis

47、tillation sampling1FeedFeed-forwardforwardapproachapproach Generate mesh parameters directly without any optimizations at inference Takes secondsseconds to modela new object/scene Learned from scratch on the limited 3D data availableCan pretrainedimage generators,e.g.,Stable Diffusion,improve feed-f

48、orward 3D generation?Transfer the huge diversity in2D image datasets into 3D tasks1 DreamFusion:Text-to-3D using 2D Diffusion,arXiv2229Stable Diffusion Stable Diffusion(modified,finetuned)Point cloudPoint cloudLatent encoderLatent encoderTexText tTriplanePointNetUNetEncDMTetDecLatent decoderLatent d

49、ecoderMLPColor&SDF“A green and whiterobot with armsand legs”Textured Textured meshmeshHexaGen3D1.Variational Auto-Encoder(VAE)to reconstruct meshesfrom point clouds:Latents defined in a triplanespace of the shape 3.2.Conditional generation oftriplanar latents from text By adapting a pretrainedStable

50、 Diffusion modelLike other latent diffusion models,we follow a two-stage training:3030We generate triplanar latents using a pretrainedStable Diffusion Model in two stepsStep1:Step1:Generate Hexaview guidance Tile“front”,“rear”,“right”,“left”,“top”and“bottom”views into a large Hexaview image As an in

51、termediate generation step,guides Stable Diffusion to generate triplanar latents3131We generate triplanar latents using a pretrainedStable Diffusion Model in two stepsStep 2:Step 2:Convert Hexaviewsinto triplanar latents Split and align the views,followed by a ConvNet Using the same UNet(parameter e

52、fficiency)with a different prompt and 3D embeddingStep1:Step1:Generate Hexaview guidance Tile“front”,“rear”,“right”,“left”,“top”and“bottom”views into a large Hexaview image As an intermediate generation step,guides Stable Diffusion to generate triplanar latents32MVDream:Multi-view Diffusion for 3D G

53、eneration,arXiv23TextMesh:Generation of Realistic 3D Meshes From Text Prompts,arXiv23“a bald eaglecarved outof wood”“A DSLR photoof a frog wearinga sweater”“a brightly coloredmushroom growingon a log”“a DSLR photoof a mug of hotchocolate withwhipped creamand marshmallows”“a DSLR photoof a hippoweari

54、ng a sweater”“a beautiful dressmade out of fruit,on a mannequin.Studio lighting,high quality,high resolution”“a bluemotorcycle”DreamFusion:Text-to-3D using 2D Diffusion,arXiv22Shape-E:Generating Conditional 3D Implicit Functions,arXiv23HexaGen3DHexaGen3D-SDXL(SDXL(7 sec)7 sec)DreamFusion-SDv2.1(22 m

55、ins)MVDream-SDv2.1(194 mins)TextMesh-SDv2.1(23 mins)ModelModelLatency Latency()CLIP CLIP UserUserpreferencepreferenceMVDream194 mins30.350.97TextMesh23 mins25.060.12DreamFusion22 mins28.910.59HexaGen3DHexaGen3D7 secs7 secs29.580.73We generate high quality meshes,much faster than optimization-based m

56、ethods:7 sec 7 sec vs.+22 min33VaLID:Variable-Length Input Diffusion for Novel View Synthesissubmitted to CVPR24How to enablezero-shot modelsto handle multipleviews withoutincreasing compute?Novel View Synthesis(NVS)generates a novel view ofan object from a target pose Input:an imageimage and a came

57、ra pose camera pose Optimization based modelsi.e.,NeRF Slow but high-quality when thereare multiple views availableRecently Stable Diffusion isadopted for Zero-shot NVS Fast but quality is limited asconsumes single view only Zero-1-to-3:Zero-shot One Image to 3D Object,ICCV23Sourcecamera poseSourcei

58、mageTargetcamera poseNovel ViewSynthesisTargetimage3434VaLID:Variable Length Input DiffusionTokenizeTokenize each view,aggregateaggregate over views,and use the multimulti-view tokens view tokens as cross-attention conditioningRn,TnToken aggregationR1,T1R2,T2ViewTokenizerViewTokenizerViewTokenizerVi

59、ewTokenizerMulti-viewtokensViewTokenizerViewTokenizerUNetPoseCSource view*viewsMAELearnable tokensCross attentionFFWK,VQNlayersView tokensMulti-view tokensPretrained MaskedPretrained MaskedAutoEncoderAutoEncoder (MAE)(MAE)to tokenize viewsto tokenize viewsProvides betterrepresentationthan CLIPLearna

60、ble tokensLearnable tokensto generate fixedto generate fixedlength outputlength outputAvoid increasingthe computationswhen fusingmore views35Our method outperformsexisting SOTA methodsin qualityGoogle Scanned Object datasetSourceviewTargetviewVaLID(1 view)VaLID(2 view)VaLID(3 view)VaLIDVaLID(4(4 vie

61、w)view)At a negligible computational cost,VaLID processesmultiple views to generate more accurate viewsPSNR=Peak Signal-to-Noise RatioLPIPS=Learned Perceptual Image Patch SimilarityModelModelPSNR PSNR LPIPS LPIPS GFLOPs GFLOPs DietNeRF18.930.412HighSJC-I25.910.545HighIV36.570.484HighZero123419.00.11

62、5Similarto oursVaLID(1 view)20.030.09187.2VaLID(2 view)20.410.08587.8VaLID(3 view)21.050.07388.8VaLIDVaLID(4 view)(4 view)21.3021.300.0690.06991.491.41:Putting NeRF on a Diet:Semantically ConsistentFew-Shot View Synthesis,CVPR212:Stable Diffusion Image Variation,arXiv233:Score Jacobian Chaining:Lift

63、ing pretrained 2d diffusionmodels for 3d generation,CVPR234:Zero-1-to-3:Zero-shot One Image to 3D Object,ICCV2336Our method outperformsexisting SOTA methodsin qualityGoogle Scanned Object datasetAt a negligible computational cost,VaLID processesmultiple views to generate more accurate viewsPSNR=Peak

64、 Signal-to-Noise RatioLPIPS=Learned Perceptual Image Patch SimilarityModelModelPSNR PSNR LPIPS LPIPS GFLOPs GFLOPs DietNeRF18.930.412HighSJC-I25.910.545HighIV36.570.484HighZero123419.00.115Similarto oursVaLID(1 view)20.030.09187.2VaLID(2 view)20.410.08587.8VaLID(3 view)21.050.07388.8VaLIDVaLID(4 vie

65、w)(4 view)21.3021.300.0690.06991.491.41:Putting NeRF on a Diet:Semantically ConsistentFew-Shot View Synthesis,CVPR213:Stable Diffusion Image Variation,arXiv232:Score Jacobian Chaining:Lifting pretrained 2d diffusionmodels for 3d generation,CVPR234:Zero-1-to-3:Zero-shot One Image to 3D Object,ICCV23Z

66、ero123Ground TruthVaLID(1 view)VaLIDVaLID(4 view)(4 view)37Animal DetectionAnimal DetectionTraining dataTraining datamAmAP PReal images50.2Real+generated images57.757.7We adapt the generative modelto new domain:automotiveGenerative models improve graphic simulators by being:Realistic by being traine

67、d on real images and videos Scalable by sampling examples instead of manuallycrafting the assets/objects and scenarioGenerate trainingdata for long-tailedobject classesi.e.,animals andemergency vehiclesScale uptest set bydiversifyingi.e.,weather,objectappearance andgeometriesGenerate safetycritical

68、testscenariosi.e.,crashesand pedestrianson road38Inpainting forInpainting foranimal detection animal detection Adapting the generativemodel to new domains,i.e.,automotive scenesHigh-fidelity generationin tight bounding boxesPutting animals at the rightgeometry:location and scaleAdded to the training

69、 setto improve animal detector39HighHigh-fidelityfidelityobject editingobject editingUsing generative editingmodels to changeappearance of vehiclesDiversify the test datato less common vehicletypes like classicsAvoid unintended changesin appearance and geometryof vehicle and its background40Generati

70、ve vision has agreat potential in image andvideo generation acrossenterprise,entertainment,XR,and automotive.Efficient generative visionis important for achievingscale,at the cloud andon device.Qualcomm AI Research hasachieved state-of-the-art resultsin image and video generationwith novel technique

71、s.Qualcomm AI Research is an initiative of Qualcomm Technologies I Connect with usQ QCOMResearch https:/ us on:For more information,visit us at:& youNothing in these materials is an offer to sell any of the componentsor devices referenced herein.2018-2024 Qualcomm Technologies,Inc.and/or its affilia

72、tedcompanies.All Rights Reserved.Qualcomm is a trademark or registered trademark of QualcommIncorporated.Other products and brand names may be trademarksor registered trademarks of their respective owners.References in this presentation to“Qualcomm”may mean Qualcomm Incorporated,Qualcomm Technologie

73、s,Inc.,and/or other subsidiaries or business units withinthe Qualcomm corporate structure,as applicable.Qualcomm Incorporated includes our licensing business,QTL,and the vast majority of our patent portfolio.Qualcomm Technologies,Inc.,a subsidiary of Qualcomm Incorporated,operates,along with its sub

74、sidiaries,substantially all of our engineering,research and development functions,and substantially all of our products and services businesses,including our QCT semiconductor business.Snapdragon and Qualcomm branded products are products of Qualcomm Technologies,Inc.and/or its subsidiaries.Qualcomm

75、 patented technologies are licensed by Qualcomm Incorporated.43EfficientgenerativevisionProcessingon the edgeenables scaleacross devicesRecording high-qualityimages and videosOn-device contentcreation and editingInteracting with 3Dobjects and scenes44“a brightly coloredmushroom growingon a log”“a sq

76、uirrel dressedlike Henry VIIIKing of England”“a hippo wearinga sweater”HexaGen3DHexaGen3D-SDXLSDXLDreamFusion-SDv2.1MVDream-SDv2.1We generate more diverse generations(random seeds)We generate more diverse generations(random seeds)Generating Hexaviews is much more effectivethan directly generating th

77、e triplanar latents24.0224.02With With HexaviewHexaviewgenerationgenerationCLIPscore18.47Without HexaviewgenerationUsing the same UNet for generating andconverting Hexaviews is more effective24.0224.02With weightWith weightsharingsharingCLIPscore23.43Without weightsharingHigher quality 3D images with HexaGen3D

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后，可能會被瀏覽器默認打開，此種情況可以點擊瀏覽器菜單，保存網頁到桌面，就可以正常下載了。
3、本站不支持迅雷下載，請使用電腦自帶的IE瀏覽器，或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮，下載后原文更清晰。

本文（高通：2024年生成式AI在高效圖像與視頻生成中的潛力與應用白皮書（英文版）（44頁）.pdf）為本站（Kelly Street）主動上傳，三個皮匠報告文庫僅提供信息存儲空間，僅對用戶上傳內容的表現方式做保護處理，對上載內容本身不做任何修改或編輯。若此文所含內容侵犯了您的版權或隱私，請立即通知三個皮匠報告文庫（點擊聯系客服），我們立即給予刪除！

溫馨提示：如果因為網速或其他原因下載失敗請重新下載，重復下載不扣分。

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站