《付俊偉-AIGC 浪潮下 WebNN 的演進與實踐.pdf》由會員分享,可在線閱讀,更多相關《付俊偉-AIGC 浪潮下 WebNN 的演進與實踐.pdf(27頁珍藏版)》請在三個皮匠報告上搜索。
1、演講人:付俊偉胡寧馨,英特爾首席工程師,W3C Web Neural Network(WebNN)標準的起草和主要編輯者,Chromium committer and Chromium WebNN 組件的主要擁有者張敏,Intel WebNN 團隊的技術經理,Chromium and ONNX Runtime WebNN EP 的開發者,WebNN developer preview的作者付俊偉,英特爾高級軟件工程師,Chromium committer and Chromium WebNN的基礎架構設計和Chromium Shape Detection API 主要開發者目 錄01WebNN
2、出現的背景02WebNN的架構設計03如何使用WebNN04WebNN的性能對比https:/microsoft.github.io/webnn-developer-preview/WebNN Execution Provider of ONNX Runtime Web with GPU acceleration from DirectML.Running on Intel CoreUltra 7 processor 155H with integrated ArcGPU.Stable DiffusionA cat under the snowText EncoderText EncoderI
3、mage GenerationUnetUnetStepStep1 1Image DecoderUnetUnetStepStep2 2UnetUnetStepStep3 3UnetUnetStepStep4 4WebNN OperationmatMulgathersigmoidsoftmaxDirectMLGEMMGATHERLOGISTICSOFTMAXTFLiteBATH_MATMULGATHERACTIVATION_SIGMOIDACTIVATION_SOFTMAXCoreMLmatmulgather_along_axissigmoidsoftmaxCPUGPUNPU系統ML APIsWe
4、b Browser(e.g.,Chrome/Edge)框架運用場景WebNNJavaScript Runtime(e.g.,Electron/Node.js)Noise SuppressionImageClassificationBackgroundSegmentationTensorFlow.jsONNXRuntimeWebMediaPipe WebNatural Language硬件CoreMLDirectMLWeb API Web引擎Transformers.jsWebAssemblyWebGPUObject DetectionTFLiteOther ML OS APIsWindows
5、Studio EffectsAPI extensionsComputational Graph(Web)conv2daddreluinputoutputfilterbiastmptmpcompilecomputeInput Buffers(CPU/GPU)MLGraphBuilderMLGraphBuilderMLContextMLContextMLGraphMLGraphOutput Buffers(CPU/GPU)device type:cpu/gpu/npupower preference:high-perf/low-powerbuildcreateCompiled Graph(Nati
6、ve)Fused conv2dinputoutputWebNN為Web帶來了神經網絡的統一抽象Other Web APIWebNN API Call flowDataflowWeb ApplicationJS ML FrameworksGPUNPUCPUWebNNMojo ClientDirectMLBackendWebNN Mojo ServerBNNS/MPSCoreMLMCDMApps/FrameworksHardwareChromiumNative ML APIsOS DriversRenderer ProcessIPCmacOSWindowsMLContextMLGraphBuild
7、erMLGraphDirectMLCoreMLBackendTFLiteBackendXNNPACK/DelegateTFLiteAndroid/ChromeOS/LinuxGPU/Utility ProcessIntegration StatusPrototype Available1.18 releaseCPUGPUNPUNative CPU KernelsNative GPU KernelsNative NPU KernelsTensorFlow Lite WebONNX Runtime WebWeb ApplicationWasmKernelsWebNN GraphWebGL Kern
8、elsWebGPUKernelsBrowsers with WebNN supportWasmKernelsWebNN GraphWasmKernelsPre-processingintermediateweightsMatMulweightsbiasIntermediatePost-ProcessinginputConv2dintermediatehttps:/microsoft.github.io/webnn-developer-preview/WebNN Execution Provider of ONNX Runtime Web with GPU acceleration from D
9、irectML.Running on Intel CoreUltra 7 processor 155H with integrated ArcGPU.VanillaJS(plain JavaScript)use of WebNN API,with NPU acceleration from DirectML.Running on Intel CoreUltra 7 processor 155H with integrated Intel AI Boost NPU.Browser:Chrome Canary 118.0.5943.0 DUT:Dell/Linux/i7-1260P,single
10、p-core Workloads:MediaPipesolution models(FP32,batch=1)1.01.01.01.01.01.01.01.01.01.01.01.01.01.01.03.03.13.03.03.02.93.12.84.43.12.52.52.91.82.33.13.23.23.23.13.03.23.04.53.32.82.93.12.22.70.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%100.0%0.00.51.01.52.02.53.03.54.04.55.0WebNN vs.Native RatioI
11、nference Speedup MediaPipeModels Inference Performance(Normalized/Higher is Better)Wasm SIMD WebNN XNNPackNative XNNPackWebNN vs NativeBrowser:Chrome Canary 126.0.6459.0OS:Windows 11 Pro 23H2DUT:Asus ZenbookCPU:Intel(R)Core(TM)Ultra 7 155H 3.80 GHzGPU:Intel(R)Arc(TM)GraphicsGPU Driver:31.0.101.55127
12、5.881.587.987.387.488.882.682.471.488.178.672.071.589.587.676.595.086.791.493.279.095.673.085.391.581.50.020.040.060.080.0100.0120.0110100100010000WebNN DirectML vs.Native DirectMLWebNN GPUNative DirectMLWebNN GPU vs.Native DirectMLInference Time(ms)(Logscale)Percentage(%)The average performance of
13、listed 4 models on WebNNDirectMLis about 80%80%of native DML on MTL NPUBrowser:Chrome Canary 126.0.6459.0OS:Windows 11 Pro 23H2DUT:Asus ZenbookCPU:Intel(R)Core(TM)Ultra 7 155H 3.80 GHzNPU:Intel(R)AI BoostNPU Driver:32.0.100.238162.7%95.8%73.4%86.1%0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%80.0%90.0%100
14、.0%0.001.002.003.004.005.006.007.008.00MobileNetV2SqueezeNet 1.0ResNet50 v1EffiecientNet Lite 4WebNN vs Native(%)Inference Time(ms)WebNN DirectML vs Native on MTL NPUWebNN DirectML NPUNative NPUWebNN NPU vs NativeSpeech to Text PoC Demo for Khan Academy Khanmigo.WebNN Execution Provider of ONNX Runtime Web with NPU acceleration from DirectML.Running on Intel CoreUltra 7 processor 155H with integrated Intel AI Boost NPU.THANKS大模型正在重新定義軟件Large Language Model Is Redefining The Software