《超越 GPUs:為下一波 AI 提供動力.pdf》由會員分享,可在線閱讀,更多相關《超越 GPUs:為下一波 AI 提供動力.pdf(17頁珍藏版)》請在三個皮匠報告上搜索。
1、Anton McGonnellVP of ProductSept 10,2024Sept 10,2024Beyond GPUs:Powering the Next Wave of AIv 1.0Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only2 2The Need for SpeedSpeed and Latency are important Speed and Latency are important criteria for Gen AI Developers criteri
2、a for Gen AI Developers Artificial AnalysisArtificial Analysis65%Building Agents Requires Many Building Agents Requires Many Models and Faster RealModels and Faster Real-Time Time InferenceInferenceFast TokensFast TokensThe faster,the better3 33 3The Fastest AI Inference on the Best ModelCopyright 2
3、024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only5 55 5Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only6 66 6405B is the Best Open-Source Model Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only7 77 7Faster On All Scal
4、esSambaNova RDUsNvidia GPUsLlama 3.1 8B 16-bit1066106693Llama 3.1 70B 16-bit57057032Llama 3.1 405B 16-bit1321321410X Faster Than GPUs10X Faster Than GPUsTokens/Second/UserNo Number of GPUs Can No Number of GPUs Can Achieve RDU PerformanceAchieve RDU Performance8 88 8Copyright 2024 SambaNova Systems
5、Inc.|Confidential&Proprietary|Internal Use OnlyA Fundamental Shift of Models Deployment at ScaleTraditional GPU SystemsAll models in memory(Super low latency model switching)Individual model endpointsCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlySN40L:The Best Chip
6、Designed for AI“Cerulean”Architecture-based Reconfigurable Dataflow Unit1.5 TB High Capacity Memory5nm TSMC5nm TSMC3 3-tier Dataflow Memorytier Dataflow Memory1,040 RDU Cores102B Transistors64 GB High Bandwidth Memory520 MB On-Chip Memory638 TFLOPS(bf16)Cerulean SN40L RDUGenerative AI Training and I
7、nferenceCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlyOnOn-Chip SRAMChip SRAM8 GB,PBs per sec8 GB,PBs per secRDU High Bandwidth RDU High Bandwidth Memory 1 TB Memory 1 TB RDU High Capacity DDR RDU High Capacity DDR Memory 24 TBMemory 24 TB1600 GB/s25.6 TB/sHigh thro
8、ughput High throughput inference with inference with caching caching Low Latency Low Latency Model SwitchingModel Switching(E.g.,0.01s for llama3.1 8B)Dataflow enabled by Dataflow enabled by large Onlarge On-Chip MemoryChip Memory1111SN40L:SambaNovas LLM optimized RDU3-tier Memory System with SRAM,H
9、BM,and DDRThe Spatial Dataflow Advantages on RDUsGPU:kernelGPU:kernel-byby-kernel executionkernel executionBottlenecked by memory bandwidthBottlenecked by memory bandwidth(544 GB(544 GB size of intermediate representations have to be written out and read back in)Automatic kernel fusion eliminates un
10、necessary I/O read and writeExample13B GPT at 32k sequence length(SS)Number of layers=40,numbers of heads=40,embedding dimensions=5120Size of input/output tensors=BS x 5120 x 32768Between two matrix-multiplications size expanded to:BS x 40 x 32768 x 32768For BS=2,tensor size between two matmuls is 1
11、36 GB(for 16-bit representation)XMaskSoftmaxDropoutX136 GB136 GB136 GB136 GBRDU:Spatial ExecutionRDU:Spatial ExecutionEliminates memory traffic and overheadEliminates memory traffic and overhead(ZeroZero Intermediate representation needs to be written out)XMaskSoftmaxDropoutX1212Copyright 2024 Samba
12、Nova Systems Inc.13139 racks8 usable nodes per rack(1 spare)8 LPU chips per node1 rack8 trays per rack4 racks1 WSE per rackHardware Configuration to run 70BUnrivaled Unrivaled Chip Chip Efficiency Efficiency Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only14141414Samb
13、aNovaCloud FreeSambaNovaCloud EnterpriseSambaNova Suite DedicatedThe Only Enterprise Platform that Scales in the Cloud or On PremSambaNovaCloud Developer(coming soon)Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlyStart Building Todaycloud.sambanova.ai1515Copyright 20
14、24 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only1616Vasanth MohanDirector Technical Product Marketing&Developer RelationsVarun KrishnaSr Principal AI EngineerWorkshop:Get Started Developing on the Fastest AI PlatformDive Deeper into Dive Deeper into SambaNovaSambaNovawith our 1wi
15、th our 1-hour workshop hour workshop September 11thSeptember 11th Learn about how speed enables Agentic AI applications Get setup on the SambaNovaCloud Build your first Hello World application Quicky setup Enterprise RAG search with our AI Starter KitsCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlyQUESTIONS?1717