超越 GPUs:為下一波 AI 提供動力.pdf

編號:464920 PDF 17頁 1.72MB 下載積分:VIP專享
下載報告請您先登錄!

超越 GPUs:為下一波 AI 提供動力.pdf

1、Anton McGonnellVP of ProductSept 10,2024Sept 10,2024Beyond GPUs:Powering the Next Wave of AIv 1.0Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only2 2The Need for SpeedSpeed and Latency are important Speed and Latency are important criteria for Gen AI Developers criteri

2、a for Gen AI Developers Artificial AnalysisArtificial Analysis65%Building Agents Requires Many Building Agents Requires Many Models and Faster RealModels and Faster Real-Time Time InferenceInferenceFast TokensFast TokensThe faster,the better3 33 3The Fastest AI Inference on the Best ModelCopyright 2

3、024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only5 55 5Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only6 66 6405B is the Best Open-Source Model Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only7 77 7Faster On All Scal

4、esSambaNova RDUsNvidia GPUsLlama 3.1 8B 16-bit1066106693Llama 3.1 70B 16-bit57057032Llama 3.1 405B 16-bit1321321410X Faster Than GPUs10X Faster Than GPUsTokens/Second/UserNo Number of GPUs Can No Number of GPUs Can Achieve RDU PerformanceAchieve RDU Performance8 88 8Copyright 2024 SambaNova Systems

5、Inc.|Confidential&Proprietary|Internal Use OnlyA Fundamental Shift of Models Deployment at ScaleTraditional GPU SystemsAll models in memory(Super low latency model switching)Individual model endpointsCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlySN40L:The Best Chip

6、Designed for AI“Cerulean”Architecture-based Reconfigurable Dataflow Unit1.5 TB High Capacity Memory5nm TSMC5nm TSMC3 3-tier Dataflow Memorytier Dataflow Memory1,040 RDU Cores102B Transistors64 GB High Bandwidth Memory520 MB On-Chip Memory638 TFLOPS(bf16)Cerulean SN40L RDUGenerative AI Training and I

7、nferenceCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlyOnOn-Chip SRAMChip SRAM8 GB,PBs per sec8 GB,PBs per secRDU High Bandwidth RDU High Bandwidth Memory 1 TB Memory 1 TB RDU High Capacity DDR RDU High Capacity DDR Memory 24 TBMemory 24 TB1600 GB/s25.6 TB/sHigh thro

8、ughput High throughput inference with inference with caching caching Low Latency Low Latency Model SwitchingModel Switching(E.g.,0.01s for llama3.1 8B)Dataflow enabled by Dataflow enabled by large Onlarge On-Chip MemoryChip Memory1111SN40L:SambaNovas LLM optimized RDU3-tier Memory System with SRAM,H

9、BM,and DDRThe Spatial Dataflow Advantages on RDUsGPU:kernelGPU:kernel-byby-kernel executionkernel executionBottlenecked by memory bandwidthBottlenecked by memory bandwidth(544 GB(544 GB size of intermediate representations have to be written out and read back in)Automatic kernel fusion eliminates un

10、necessary I/O read and writeExample13B GPT at 32k sequence length(SS)Number of layers=40,numbers of heads=40,embedding dimensions=5120Size of input/output tensors=BS x 5120 x 32768Between two matrix-multiplications size expanded to:BS x 40 x 32768 x 32768For BS=2,tensor size between two matmuls is 1

11、36 GB(for 16-bit representation)XMaskSoftmaxDropoutX136 GB136 GB136 GB136 GBRDU:Spatial ExecutionRDU:Spatial ExecutionEliminates memory traffic and overheadEliminates memory traffic and overhead(ZeroZero Intermediate representation needs to be written out)XMaskSoftmaxDropoutX1212Copyright 2024 Samba

12、Nova Systems Inc.13139 racks8 usable nodes per rack(1 spare)8 LPU chips per node1 rack8 trays per rack4 racks1 WSE per rackHardware Configuration to run 70BUnrivaled Unrivaled Chip Chip Efficiency Efficiency Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only14141414Samb

13、aNovaCloud FreeSambaNovaCloud EnterpriseSambaNova Suite DedicatedThe Only Enterprise Platform that Scales in the Cloud or On PremSambaNovaCloud Developer(coming soon)Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlyStart Building Todaycloud.sambanova.ai1515Copyright 20

14、24 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only1616Vasanth MohanDirector Technical Product Marketing&Developer RelationsVarun KrishnaSr Principal AI EngineerWorkshop:Get Started Developing on the Fastest AI PlatformDive Deeper into Dive Deeper into SambaNovaSambaNovawith our 1wi

15、th our 1-hour workshop hour workshop September 11thSeptember 11th Learn about how speed enables Agentic AI applications Get setup on the SambaNovaCloud Build your first Hello World application Quicky setup Enterprise RAG search with our AI Starter KitsCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlyQUESTIONS?1717

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(超越 GPUs:為下一波 AI 提供動力.pdf)為本站 (com) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站