當前位置：首頁 > 報告詳情

Hammer：一體化的模型壓縮和 NAS 引擎框架.pdf

上傳人： li 編號：29534 2021-02-07 PDF PDF 46頁 4.54MB

該報告所屬合集： 2020年GTC中國線上大會嘉賓演講PPT資料合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/46

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《Hammer：一體化的模型壓縮和 NAS 引擎框架.pdf》由會員分享，可在線閱讀，更多相關《Hammer：一體化的模型壓縮和 NAS 引擎框架.pdf（46頁珍藏版）》請在三個皮匠報告上搜索。

1、名快手Hammer!:A UnifiedFramework of ModelCompression and NASSpeaker: Xiufeng Xie， Hongmin XuAI Platform， Seattle Al lab，and FeDA lab KwaiKey Contributors: Hongmin Xu Xiufeng Xie Jiianchao TanYiGuo Huan Yuan Jixiang Li Xiangru Lian and JiLiu#page#名快手ContributorsKey contributors#page#why Do We Need DNN M

2、odel Compression？名快手Speech recognitionBeauty photosFace unlock共國：1悅璃Online recommendation (iinference）#page#Goals in Model Compression名快手Model compressionAccuracyComputation efficiencyEnergy efficiency#page#Energy 8 Computing名快手Cost A LotExample: run an Al application that costs 100M FlopsOn ServerO

3、n Phone8Cost 1000 （5） per dayCost 1.8M （5） per daySource: 1 and assuming 300M DAU and 100 videos/user#page#名快手However. Model Size ls Not The Whole Story10%Model Size/FlopsEnergyLatencyDepend on the hardware#page#Model Compression Tools名快手ToolToolUnifiedModelModelNAS forHardwareNamePruningAuthorFrame

4、workQuantizationAwarenessCompressionKwaiXXXPocketFlowTencentSXXINNXMicrosoftX一DistillerIntelXXXPaddlePaddleBaiduXXXXGoogleXTensorflow LiteXXAIMETXQualcommXSXCondensaNvidiaHammer:JointPruning + Quantization +NASOthers:PruningQuantization#page#名快手Ubiquitous AI optimization with an allin-one framework電

5、中中AccuracymaxS Latency toleranceApplication requirementHardware resource budgetS Resource budgetAdd as many consttraints as the user wants#page#名快手Hammer! Supported1Compression Strategies88888aooo2（8oPruning（剪枝）（量化）QuantizationNeural architecture search時可電建中（NAS，神經結構搜索）#page#名快手Hammer! ls GPU-Hardwa

6、re-AwareSearch for a DNN structure that best fits GPU hardwareModular booster#input channelis multiple of N三貿#output channelis multiple ofNN=8LatencyOptimize DNNs running latency on the target GPUProfilingLatencyOptimizerModelEnergyOptimize DNNsenergy cost on the target GPUProfilingEnergyOptimizerMo

7、delMemoryOptimize DNNs memory consumption within hardwares quotaProfilingMemoryOptimizerModel#page#名快手Hammer! Supported Network ArchitecturesPruned channelsDog（0.01）Cat(0.01）Ship（0.92）CNNPlane(0.06）一Image model to video modelRNNwSkip connection#page#All-In-One Framework: DAG Representation名快手NNisthe

8、directedacyclicgraph（DAG，有向無環圖）0Model compression is equal to cut-off less important edges in DAG,under some certainconstraintsEdge from computing nodei to nodeLComputing nodeInputOutputEdge indicatorEdge indicator for NAS#page#All-In-One Framework: Unified Compression Strategies名快手minc(w，e）M NN wei

9、ghtsw.eC NN loss functioneit+ejt+ert=1S.t.Featureeief0,13,i，1Pruning=1VLEONASNAS2ONAS:SetofnodeswhereNAS pathsconvergeQuantization0aMQ:Quantized weight space#page#All-In-One Framework: Hardware-Aware Model Compression名快手minc(w，e）w.eS.t.Features:ECPUCe)S EbudgetEnergyTGPUCe）S TbudgetLatencyIHardware-

10、aware constraintsMGPUCe）S MbudgetMemoryC(e)S CpudgetFlopsN(e） NbudgetParam numberSCe） S SbudgetSparsity#page#All-In-One Framework: Hardware-Aware Model Compression名快手minc(wv，e）WNN weightsW.eC NN loss functions.t.FeaturesVModular boosterIA0= NpoulaiEK:Rj7A0= NpouufajEk:1k）N=8#page#All-In-One Framewor

11、k: Complicated Network Support名快手minc(w，e）w.eeikFeatures:S.t.eikeik=ejk，DADDSkip connectionDADD:Setofnodes whereoutputs frommultiple previous nodes add together#page#All-In-One Framework: Summary名快手minc（w,e）W.e38880Skip connectionPruning建NASQuantizationHardware-awareresource constraints三洗Modular boo

12、ster#page#All-In-One Framework: Summary名快手minc(w，e）w.e88880FanSkip connectionPruning3DU=YA68中NASE(e） EbudgetT（e)S TbudgetQuantizationC（e)CaeidgetHardware-awareM(e)resource constraintsgudgetN(e）S NbudgeteumodN=O,VjiE(kkModular boosterS(e） S Sbudget1ANpoujk:ik）N=8How to use these constraints？We have i

13、mplement all the details in Hammer， users simpplyy turn them on/off and set resource budgets#page#名快手Hammer!Comparison with SOTA Algorithm#page#名快手Hammer!Comparison with SOTA Algorithm#page#SOTA Benchmark Comparison: FLOPs & Accuracy名快手Evaluation Setup: ResNet56 on CIFAR10GoodHammer Compressior05Unc

14、ompressed baseline-0.5DeltaTop1-1.01502060709008FLOPs Compression Ratio w.rtOriginalDNN Model%Bad#page#SOTA Benchmark Comparison: FLOPs & Accuracy名快手Evaluation Setup: MobileNetV2 on CIFAR10GoodHammer Compression for Mobilenetv2 on CIFAR100.5_Uncompressedbaseline0.012-0.5uracy公-1.0T-dol2301.5-2.04050

15、203009708090FLOPS Compression Ratio w.rt Original DNN Model %Bad#page#SOTA Benchmark Comparison: BitoPs & Accuracy名快手Evaluation Setup: ResNet56 on CIFAR10GoodHammer CompressionUncompressed basejin0.0Accuracy1%0.5Delta Top-1.20191030405090097080BitoPs Compression Ratio w.r.t.Original DNN Model %Bad#p

16、age#名快手Hammer! Fits Various Application ScenariosApplications to showcaseFrame-by-frame video enhancementDe-ArtifactsVideo super-resolutionVideo cartoonizationReal-time visual detection and recognitionFace landmarks detectionHand segmentation3Dhand pose detectionSequential model compressionVideo sty

17、le transfer3D human mesh reconstructionRecommendation systemRecall modelRanking model#page#名快手Hammer！Frame-by-frame video enhancement#page#名快手Hammerl in De-ArtifactsDe-Artifacts （DeArts）Video makers upload compressed videoVideo compression causes artifactsComplicated DNN to remove artifactsOriginalD

18、eArts Output#page#名快手Hammer! in De-Artifacts Compared To SOTAHammer! boosts the efficiency of De-ArtifactsGoodDeArt CompressionFirst_orderMeta_PrunningGroup_SparsityDMCPNetAdaptAMCHammer(Flops）Hammer(Latency）60657075850610095Latency（ms）Bad#page#名快手Hammer! in Low-Flops Video Super-Resolution (4）Input

19、 VideoOriginal Model (7 frames/s）Pruned Model (40 frames/s）#page#名快手Hammer! in Low-Flops Video Super-Resolution (4）Input VideoOriginal Model (7 frames/s）Pruned Model (40 frames/s）#page#名快手Hammer! in Low-Flops VideoCartoonizationOriginal ModelPruned ModelInput Video（4G Flops）7755M Flops: mid-range ph

20、ones）#page#名快手Hammer! in Low-Flops VideoCartoonizationOriginal ModelPruned ModelInput Video（4G Flops）（755M Flops: mid-range phones）#page#名快手Hammer!Real-Time Visual Detectiion and Recognition#page#名快手Hammer! in Low-Latency Face Landmarks DetectionBadModel Pruning ComparisonBasalinPruned Modi laMguide

21、dbylatency）0.825PrunedModelMguidedbyFLOPs）uidadby FLOPsPruned Model 180775382N10.4M07250.7008.5M10.9M13.4MLatency（ms）18.2MGood10128#page#名快手Hammer! in 3D Hand Pose EstimationReduce to 60% LatencyOriginal Modelwith no accuracy drop.(manually optimiized）#page#名快手Hammer!Sequential model compression#pag

22、e#Single Framework Model to Compressed Recurrent Model名快手User provide a DNN model withTrain Lossimage input to hammerOuttInputtOuttInputtHammer一Temporal LossOutt-1Outt=F(npute）Oute =F(Inputt,Outt-i）#page#Video Style Transfer (Frame-by-Frame vS. Sequential Model）名快手Original Model(Frame-by-Frame）Hamme

23、rl Pruned Sequential ModelPruned Model （Frame-by-Frame）30%latengy30%latengySevere jittersSmooth trackiing under the same latengy#page#名快手Hammer! in 3D Human Mesh ReconstructionPruned Model（Frame-by-Frame）Hammer! Pruned Sequential ModelOriginal Model (Frame-by-Frame）250M Flops250M Flops390M FlopsSeve

24、re jittersSmooth trackiing under the same Flopss#page#名快手Hammer!Recommendation system#page#名快手Hammer! in Recommendation SystemOnline InferencePrune the Two Tower Recall Mode64ltem SizeCross-Entropy LoSSXUser4ltem MatrixDot Production（x）BatchMatrix64?魚KItem Size256Topk512IndeBatchScore MatrixUser Fea

25、turesltem FeaturesX#page#名快手Hammer! in Recommendation SystemGoal: Increase online inference QPS,without hurting performanceUser/ltem embedding feature size and type influence online inference QPSUser/itemUser/itemQPSSupportingIncomingFeature sizeFeature datatypeItem size4823Float321.5MillionNAOrigin

26、al Recall Model32Int83 Million1142Pruned &Quantized+300kyuan per dayRecall Model+29 user click#page#名快手Hammer! in Recommendation SystemBaseline vs. Different Model Compression methods over DLRM benchmarkDLRM: open-source recommendation framework from Facebook78.97578.95078.92578.900Acc78.875sinoECC78.850Handcrafted78.825MPTP78.800Dense Model78.7750.50.70.86:01.00.6Compression Ratio (CR）#page#名快手Hammer! Awards2019年CCF科學技術獎科技進步杰出獎授予北市達佳互聯信息技術有限公司“確土實時人工智集平臺”項目主要完成人鄭文、劉、王華意、張國鑫、師小燕、邊紅昌等要委院公木路板#page#THANKS

相關圖表

本文主要介紹了Hammer!框架，這是一個統一化的模型壓縮和神經架構搜索（NAS）的工具。它旨在提高深度神經網絡（DNN）的計算效率、降低能耗和優化硬件資源，如GPU上的運行延遲、能量消耗和內存使用。Hammer!框架整合了模型剪枝、量化以及硬件感知的技術，并通過統一框架支持這些技術。作者Xiufeng Xie和Hongmin Xu等人，通過實驗驗證了Hammer!在模型壓縮和優化方面的有效性，特別是在視頻增強、實時視覺檢測與識別、推薦系統以及3D人體網格重建等應用場景中。例如，Hammer!在去視頻 artifacts、低Flops視頻超分辨率以及實時人臉特征點檢測等方面，相較于現有技術有顯著的性能提升。此外，Hammer!還能在不犧牲準確性的前提下，大幅減少模型運行的延遲。通過實驗對比，Hammer!在多個標準基準測試中展示了其優越性。

"如何實現模型壓縮與神經結構搜索的統一框架？" "如何在保持準確性的同時，降低計算成本和資源消耗？" "如何通過模型壓縮框架，優化不同應用場景下的模型性能？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站