基于Photonic Fabric的放大網絡用于芯片到芯片和芯片到內存的連接.pdf

編號:464918 PDF 17頁 4.26MB 下載積分:VIP專享
下載報告請您先登錄!

基于Photonic Fabric的放大網絡用于芯片到芯片和芯片到內存的連接.pdf

1、 2023 Celestial AI.All Rights Reserved.Celestial AI ConfidentialPhotonic FabricTMbased Scale-Up Network for Chip-to-Chip&Chip-to-Memory ConnectivityPreet VirkCo-Founder&COO,Celestial AI 2024 Celestial AI Inc.,All Rights Reserved.Celestial AI,the C logo,and Photonic Fabric are trademarks or registere

2、d trademarks of Celestial AI Inc.in the United States and other countriesAI Hardware Summit,San Jose11th Sep 2024AI The Largest Technological Wave We Have Ever Seen2Notes:1.Source:The economic potential of generative AI(McKinsey,June 2023);Total AI economic potential estimated to range from$17.1Tn t

3、o$25.6Tn)PioneersPioneers1960-1980($200Bn)1980-2000($600Bn)2000-2020($1.2 Trillion)Pioneers2020 and beyond($20 Trillion)PioneersMainframe100s of 1,000s of usersClient/ServerMillions of usersCloud/MobileBillions of usersAI/Large Language Models10s of billions of connected people,devices and applicati

4、onsAll are building their own AI Processorsand Data Center ArchitecturesFour Hyperscalers Represent 70%of Data Center MarketUnprecedented Growth&Market ConcentrationSource:Morgan Stanley Artificial Intelligence is Changing Everything!AI Driving a New Generation of Optical Compute InterconnectHBM3 HB

5、M4 Equivalent Bandwidth:The Minimum Requirement for Accelerated Computing Optical Interconnect Scale-Up Network for XPU-XPU Connectivity:Enabling Cluster Scale Processing of AI models Celestial AIs Photonic Fabric-Optical Interconnect for Accelerated Computing17.8 TbpsCPO 120242026202820302032203420

6、16201820202022Off Package Bandwidth1.Industry Trend based on data by John Wilson,Nvidia Research:“High Bandwidth Density,Energy Efficient Short Reach Signaling that Enables Massively Scalable Parallelism”2.Assuming 6 chiplets per XPU1010011,00010,000Total Off-Package Bandwidth(Tbps)32016080402064010

7、Package BottomMCM Package(Chiplet)InterposerGen1 Chiplet 2(2024)86.4 TbpsGen2 Chiplet 2(2025-2026)172.8 TbpsGen1 IP/Interposer(2024-2026)390 TbpsGen2 IP/Interposer(2026-)650 TbpsPhotonic Fabric AdvantageGen1 Photonic FabricGen 2 Photonic FabricCompact,Thermally Stable Optical ModulatorChip-to-chip p

8、ackaging with XPUs dissipating 100s of watts Revolutionary Integration of Silicon Photonics for Accelerated Computing4Photonic Fabric IPOptical WaveguideFAU(CHIPLET/ASIC/XPU/HBM)(CHIPLET/ASIC/XPU/HBM)FAUFAUFAUOptical WaveguidesSubstrateGrating CouplerPhotonic Interposer OMIBNo DSP Linear Drive Optic

9、sHigh SNR,low BER,close proximity of optics to electronics:Eliminates need for DSPMultiple Packaging Options Tailored for Customer ApplicationsIntegration of Advanced CMOS with Si PhotonicsFull Stack E-O-E Link OptimizationProtocol adaptive Network Convergence Layer(NCL)Full Electrical to Optical to

10、 Electrical(EOE)link management FEC,CRC,FLIT ReplayWhy Photonic FabricTM vs.Copper?ComputeCompute&ComputeMemory Photonic Connectivity for Accelerated ComputingHigher Off-Package Bandwidth100s of Tbps BW,unrestricted by beachfrontEfficient Data MovementZero mass photons vs I2R losses from electronsLo

11、wer Latency&PowerEliminate DSPs,Deep FEC and re-timersFewer System ConnectionsHigher overall system reliabilityLonger ReachConnect multiple racks with less powerMore efficient remote memory transactions(RDMA)10pJ/bit(incl.2.4 pJ/bit for optics)for up to 50m60 pJ/bit for copper for up to 1mPhotonic F

12、abricTM Link:Module&ApplianceShattering the Memory&Interconnect Bandwidth Wall for Accelerated Computing6Side ViewCelestial AI Designed System in Package(SIP)Memory Controllers+Photonic Fabric Link+Network Switch2.07 TB Memory Capacity at 7.2 Tbps Bandwidth with 100ns LatencyHBM3E Operates as Write-

13、through Cache for DDRBandwidth&Latency of HBM3E with Capacity&Cost of DDR5Photonic Fabric ModulePhotonic Fabric ModuleDDR5 DIMMDDR5 DIMMDDR5 DIMMHBM 3EHBM 3EFAUPhotonic Fabric ASICDDR5 DIMMPICPhotonic Fabric AppliancePhotonic Fabric Appliance16x Photonic Fabric Memory Modules in a 2U Appliance33TB M

14、emory Capacity115Tbps Network Switch Enabling a Backend/Scale-Up AI FabricFiber Array Unit(FAU)MicrocontrollerAMSAMSAIXPU1.4 mmUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-

15、AUCIe-AUCIe-AUCIe-AProtocol Adaptive LayerPhotonic FabricTM Link Implementation:The Chiplet ApproachFull Stack High-Bandwidth Optical Interconnect Solution for Accelerated ComputingCustomer AI ProcessorStandard D2D:UCIe-A or MAX PHY2.4 pJ/bit photonic link power(Gen1)Photonic Fabric IP MacrosProtoco

16、l Adaptive Layer14.4 Tbps(Gen1)Per Chiplet28.8 Tbps(Gen2)Per Chiplet Full HBM3E bandwidth3.6X more bandwidth than CPO chiplet10 x Gen1 chiplets per XPU package=144 Tbps10 x Gen2 chiplets per XPU package=288 TbpsProtocol Adaptive:AXI,HBM/DDR,UAL,CXL etc.Photonic Fabric Delivers Superior Off-Package B

17、andwidthPhotonic Fabric Based Chiplets Offer More Package Bandwidth With Full Link Management XPUXPUXPUXPU4 Tbps per Chiplet40 Tbps per Package144 Tbps(Gen1)/288 Tbps(Gen2)per package4x reticle package with 10 chipletsCurrent SoTAPhotonic Fabric14.4 Tbps(Gen1)/28.8 Tbps(Gen2)per chiplet4x reticle pa

18、ckage with 10 chipletsPhotonic FabricTM 14.4 Tbps Photonic Interface ChipletSeamless Integration with Existing AI Accelerators&XPUsCompute-to-ComputeScale-up/backend networksCompute-to-MemoryPhotonically scalable disaggregated memoryFull HBM3E bandwidthUnlocks Photonic Connectivity(E-O-E)for AI XPUs

19、Protocol AdaptiveStandard ProtocolsAXI,HBM/DDR,CXLEmerging ProtocolsUALProprietary ProtocolsStandard D2D InterfaceUCIe MAX PHYStandard 2.5D packaging flows from multiple large OSATsAutomated high-volume,high-throughput fiber-attach processLess beachfront than 1 HBM stack10Compute-to-Compute&Compute-

20、to-Memory Photonic ConnectivityPhotonic FabricTM Technology Platform Enables Cluster Scale AI Processing115 Tbps Switch for backend/scale-up network All-to-All Connectivity for efficient Collective CommsPhotonic FabricTM Appliance33TB MemoryBroadcast&Reduce across all connected XPUs33TB Unified/Shar

21、ed Memory SpacePhotonic FabricTM LinkPhotonic Fabric Link&Appliance:AI Efficiency&PerformancePhotonic Fabric Link&Appliance Delivers Compelling Benefits for AI Workloads56 Conventional XPU with 192GB each of HBM3 required to process a 10T parameter DLRM model16 XPU-Optical connected to Photonic Fabr

22、ic Appliance to process a 10T parameter DLRM model 33TB Memory Capacity Full HBM3 Bandwidth 115Tbps SwitchEach conventional XPU holds 1/56th(1.79%)of the 10T model in its HBM memory(192GB)Each XPU-Optical has access to the full model stored in the PFMAUp to 71%XPU CapEx and Power ReductionHigher Com

23、pute DensityMemory Resources Scalable Independent of Compute 12.5X DLRM Performance Speed-upPhotonic Memory Fabric:Delivers Higher Throughput on GPT412Conventional 16-XPU/2 Server Rack16-XPU/2-Server+Photonic Fabric Appliance Ethernet(400/800 Gbps)36 TB(12x)More Fast Memory+7.2 Tbps Scale-Up Network

24、 per XPU(Backend Network)Scale-up/Backend Network Constrained2 Servers:3 TB of HBM Easily serve GPT4(1.8T)MOE models with large context length and batch sizesMemory Restricted:GPT4(1.8T)MOE models1.8TB1.2T3TBGPT4-1.8T MoE Model Weights(fp8)Memory Available for Inferencing(KV Cache,Context Lengths,Ba

25、tch Size)1.8TB34.2TB36TBGPT4-1.8T MoE Model Weights(fp8)Memory Available for Inferencing(KV Cache,Context Lengths,Batch Size)CPUCPUPCIe SwitchPCIe SwitchEthernet(400/800 Gbps)NICNIC 33TB Memory Capacity Full HBM3 Bandwidth 115Tbps Switch128 XPU Cluster Scale Photonic Fabric for Accelerated Computing

26、13.Illustrative PictureNot all photonic links shownAppliance 1Appliance 2Appliance 16128 XPU Back End/Scale-Up Fabric(7.2Tbps per XPU)CPUCPUPCIe SwitchPCIe SwitchFront-End Network(Ethernet:400/800 Gbps)NICCPUCPUPCIe SwitchPCIe SwitchFront-End Network(Ethernet:400/800 Gbps)NICNICAppliance 15.NICUp to

27、 792 TB System Memory CapacityEnabling Very Large Clusters With Significantly Lower Carbon/Energy(TCO2)ImpactState-of-the-ArtPhotonic Fabric(Gen1)Power Efficiency(pJ/b)3Celestial AI Photonic FabricTM vs.Current State-of-the-ArtPhotonic Fabric Delivers Disruptive Performance,Lowest Latency,Improved E

28、nergy Efficiency for Data Movement&Compute and Lower TCO14Photonic Fabric offers 16x off-package bandwidthPhotonic Fabric offers 5X better latencyPhotonic Fabric offers 8X better power efficiencyPhotonic Fabric offers 26X better cost efficiency2 SOTA Benchmark:DGX H200 Remote Direct Memory Access(RD

29、MA)3 SOTA Benchmark:RDMA Power for Fourth-generation NVLink 4 SOTA Benchmark:NVIDIA DGX H2001 SOTA Benchmarks:Ayar Labs TeraPHYState of the ArtPhotonic Fabric(Gen1)RDMA Latency 25X Lower Latency8X Lower PowerState of the ArtPhotonic Fabric(Gen1)Cost($/GB)426X Lower CostState-of-the-ArtPhotonic Fabri

30、c Chiplet(Gen1)Bandwidth Per Optical Chiplet 13.6X BetterHBM3E Bandwidth(8Tbps)Competitive SolutionThe Photonic FabricTM Designed for Volume DeploymentThe Right Technology at the Right Time Full-Stack Optical Interconnectivity PlatformVolume Manufacturing Si PhotonicsDriven by Data CommunicationsSil

31、icon PhotonicsAdvanced CMOSSilicon Photonics Control Circuitry,SERDES,Router/SwitchOpto-Electronic Systems-in-PackageWafer-Scale Assembly&TestLeveraging Established Supply ChainsHow Does the Photonic Fabric Transform AI?A New Vision of AI infrastructure Unconstrained By Fast Memory Capacity&Scale-Up

32、 BandwidthAccelerates Multi-ModalitySimplify AI Software StackMitigate AI Carbon ImpactSupport Larger Context LengthsEfficiently Process Very Large LLMs Democratizes AILowers cost of AILarger Context Lengths activate new use casesHelps meet carbon neutrality targets even with increasing AI usageTrain and Serve 1T+Models like:Open AI GPT4-1.8T MoEAnthropic Claude Opus(2T)Text-To-Image Text-To-VideoOpen AI SoraRunwayML All-to-All/Broadcast Interconnect simplifies collective commsReduces needs for shardinginfocelestial.aiwww.celestial.ai

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(基于Photonic Fabric的放大網絡用于芯片到芯片和芯片到內存的連接.pdf)為本站 (com) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站