《使用 MAX78000 AI 神經網絡加速器高效處理原始圖像.pdf》由會員分享,可在線閱讀,更多相關《使用 MAX78000 AI 神經網絡加速器高效處理原始圖像.pdf(20頁珍藏版)》請在三個皮匠報告上搜索。
1、Processing Raw Images Efficiently with the MAX78000 AI Neural Network AcceleratorMehmet Gorkem UlkarPrincipal Engineer,Machine LearningAnalog DevicesAgenda2 2023 Analog Devices1.Challenges of AI at the edge2.MAX78000 overview3.MAX78000 sample applications4.Energy requirements for data manipulation5.
2、Proposal:CNN based de-bayerization6.Results7.Q&AMehmet Gorkem Ulkar,PhDDallas,TXPrincipal ML EngineerKeep Your Data Close:The Physics of DataSources:Rick Zarr,TI,2008,The True Cost of an Internet“Click”-estimate of transfer cost for 30KB page from server http:/ J Kunkel et al,University of Hamburg 2
3、010,Collecting Energy Consumption of Scientific Data Horowitz ISSCC 2014,1300-2600 pJ per 64b access Chris Rowen,Cadence Design Systems,January 2016,Get Real!Neural Network Technology for Embedded Systems1E-131E-141E-121E-111E-101E-091E-081E-071E-031E-041E-051E-061E-051E-041E-031E-021E-011E+021E+031
4、E+041E+051E+06J per 64b1E+001E+01Distance(m)Credit:Cadence623mi1000 km3In inference,computational effort is in forward propagation On classic hardware,almost all spent ina triple nested matrix multiplication loop O(n3)to O(n2.8)*Very energy intensive even with fast matrix multiply using integer math
5、 on DSP or GPU large number of memory accesses*Strassens algorithmSoftware Inference:Slow and Power Hungry 2023 Analog Devices4CNN Accelerator:MAX78000/MAX78002 The conv operation is parallelizable in the channel dimension.64 processors in total,more channels are processed in a multi-pass fashion Pr
6、oper architecture that minimizes data movement provides energy efficiency Each input channel is processed in parallel using different processors to minimize data movement Each processor uses dedicated memory 2023 Analog Devices5MAX78000 AI Micro-System-on-Chip 2023 Analog Devices67Model,Training,Dep
7、loyment:Development Flow 2023 Analog Devices7MAX78000 Benchmarks0500100015002000MAX78000 MAX32650 STM32F7Inference Time ms050100150200250300MAX78000MAX32650STM32F7Inference Energy mJNetworkMACsMAX78000CNN at 50 MHz1,1.2VMAX326502Cortex-M4,120 MHz,1.2VSTM32F72Cortex-M7,216 MHz,2.1V KWS2013,801,0882.0
8、 ms,0.14 mJ350 ms,8.37 mJ125 ms,30.1 mJ3 FaceID55,234,56013.89 ms4,0.40 mJ1760 ms5,42.1 mJ714 ms5,153 mJ+59 mJ6128 billion operations/second,2ARM DSP with CMSIS-NN,running exact same INT8 network as MAX78000,3STMF722ZE,internal memory,4Includes time to load input,5Does not include time to load input
9、,6STMF746NG+external 3.3V SDRAM IS42S32400F-6BL+SDRAM controller 2023 Analog Devices8Battery Life Leader in Independent BenchmarksBestA Battery-Free Long-Range Wireless Smart Camera for Face Detection:An accurate benchmark of novel Edge AI platforms and milliwatt microcontrollers Michele MAGNO,Head
10、of the Project-based learning Center,ETH Zurich,D-ITET,EMEA TinyML Talks June 20219Thinking About Edge AI Use CasesIf my application _ then do _seeshearssensesobject/sound/event/situation/actionIf my camera sees a bear,then take a high-resolution picture and send over cell networkIf my thermostat he
11、ars glass break,then send a text message to the ownerIf my factory robot sees a person nearby,then shutdown until they leaveIf my pet door sees a cat with a mouse in its mouth,then lock the pet door and send me a text message 2023 Analog Devices10 Embeddings saved in memory on a rolling basis No red
12、undant calculationsAction RecognitionDatasetValidation Acc.ParametersKinetics-400(4 classes+other)79.8%379k 2023 Analog Devices11No UrlPeople Tracking12https:/ Camera13System Energy:From Traditional Systems to MAX78000 Accelerator drastically lowers CNN energy Input and data manipulation become much
13、 larger relativecontributors to energy MAX78000 improves data loading,better algorithms can help with data manipulation:e.g.better ways of handling raw imagesCNNCNNTraditionalmicroAcceleratoronlyMAX78000expressloaderCNNhardwarealgoimprovementData inputData manipulationCNNoperationOutputCNN14Energy 2
14、023 Analog Devices14Data Manipulation:Debayerization15In order to obtain an RGB format,the raw image must be debayerized.There are several debayerization methods*:Bilinear Interpolation Sequential Demosaicing Iterative Demosaicing Machine Learning Methods Adaptive Color Plane InterpolationFigure 1.B
15、ayer Filter(Nkansah et.al.,2022)Outside the CNN acceleratorIncreased system energy consumption 2023 Analog Devices*Dammer,K.,Grosz R.,(2017).Demosaising using a Convolutional Neural Network approach.Lund University,Lund,Sweden.CNN based Debayerization16 Approach 1:Learning the manipulation&interpola
16、tion by a CNN model and embedding this network into an accelerator Efficient way of debayerizationFigure 3.The Network of B2RGBNet(Syu et.al.,2018)#parameters:124715 2023 Analog DevicesCNN based Debayerization17 Approach 2:Using folding and fixed 1x1 kernelsStep 1:Folding the pixels into channelsSte
17、p 2:Convolution with the fixed kernel to obtain RGB 2023 Analog DevicesAccuracy Results1800.0010.0020.0030.0040.0050.0060.007ImageNetMean Squared Reconstruction ErrorBilinear Interpolationb2rgbconv w/fold+transconv+convconv w/fold+b2rgb 2023 Analog Devices MAX78000 enables battery-powered smart appl
18、ications at the edge Effective data manipulation and preprocessing are much more important when using highly-efficient NN inference engines Two methods proposed to perform interpolation inside CNN accelerator,MAX78000 Results show better accuracies compared to simple conventional interpolation;the w
19、ork is ongoingConclusion19 2023 Analog Devices We are waiting for you at the ADI booth!Upper-level AI repo:https:/ Open-source training repo:https:/ synthesis repo:https:/ Data-folding paper:L3U-net:Low-Latency Lightweight U-net Based Image Segmentation Model for Parallel CNN Processors https:/arxiv.org/pdf/2203.16528.pdf B2RGBNet paper:Learning Deep Convolutional Networks for Demosaicinghttps:/arxiv.org/pdf/1802.03769.pdfResources20 2023 Analog Devices