當前位置：首頁 > 報告詳情

基于Photonic Fabric的放大網絡用于芯片到芯片和芯片到內存的連接.pdf

上傳人： c** 編號：464918 2025-01-12 PDF PDF 17頁 4.26MB

該報告所屬合集： 2024年AI硬件和邊緣AI峰會（AI Hardware & Edge AI Summit）嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/17

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《基于Photonic Fabric的放大網絡用于芯片到芯片和芯片到內存的連接.pdf》由會員分享，可在線閱讀，更多相關《基于Photonic Fabric的放大網絡用于芯片到芯片和芯片到內存的連接.pdf（17頁珍藏版）》請在三個皮匠報告上搜索。

1、 2023 Celestial AI.All Rights Reserved.Celestial AI ConfidentialPhotonic FabricTMbased Scale-Up Network for Chip-to-Chip&Chip-to-Memory ConnectivityPreet VirkCo-Founder&COO,Celestial AI 2024 Celestial AI Inc.,All Rights Reserved.Celestial AI,the C logo,and Photonic Fabric are trademarks or registere

2、d trademarks of Celestial AI Inc.in the United States and other countriesAI Hardware Summit,San Jose11th Sep 2024AI The Largest Technological Wave We Have Ever Seen2Notes:1.Source:The economic potential of generative AI(McKinsey,June 2023);Total AI economic potential estimated to range from$17.1Tn t

3、o$25.6Tn)PioneersPioneers1960-1980($200Bn)1980-2000($600Bn)2000-2020($1.2 Trillion)Pioneers2020 and beyond($20 Trillion)PioneersMainframe100s of 1,000s of usersClient/ServerMillions of usersCloud/MobileBillions of usersAI/Large Language Models10s of billions of connected people,devices and applicati

4、onsAll are building their own AI Processorsand Data Center ArchitecturesFour Hyperscalers Represent 70%of Data Center MarketUnprecedented Growth&Market ConcentrationSource:Morgan Stanley Artificial Intelligence is Changing Everything!AI Driving a New Generation of Optical Compute InterconnectHBM3 HB

5、M4 Equivalent Bandwidth:The Minimum Requirement for Accelerated Computing Optical Interconnect Scale-Up Network for XPU-XPU Connectivity:Enabling Cluster Scale Processing of AI models Celestial AIs Photonic Fabric-Optical Interconnect for Accelerated Computing17.8 TbpsCPO 120242026202820302032203420

6、16201820202022Off Package Bandwidth1.Industry Trend based on data by John Wilson,Nvidia Research:“High Bandwidth Density,Energy Efficient Short Reach Signaling that Enables Massively Scalable Parallelism”2.Assuming 6 chiplets per XPU1010011,00010,000Total Off-Package Bandwidth(Tbps)32016080402064010

7、Package BottomMCM Package(Chiplet)InterposerGen1 Chiplet 2(2024)86.4 TbpsGen2 Chiplet 2(2025-2026)172.8 TbpsGen1 IP/Interposer(2024-2026)390 TbpsGen2 IP/Interposer(2026-)650 TbpsPhotonic Fabric AdvantageGen1 Photonic FabricGen 2 Photonic FabricCompact,Thermally Stable Optical ModulatorChip-to-chip p

8、ackaging with XPUs dissipating 100s of watts Revolutionary Integration of Silicon Photonics for Accelerated Computing4Photonic Fabric IPOptical WaveguideFAU(CHIPLET/ASIC/XPU/HBM)(CHIPLET/ASIC/XPU/HBM)FAUFAUFAUOptical WaveguidesSubstrateGrating CouplerPhotonic Interposer OMIBNo DSP Linear Drive Optic

9、sHigh SNR,low BER,close proximity of optics to electronics:Eliminates need for DSPMultiple Packaging Options Tailored for Customer ApplicationsIntegration of Advanced CMOS with Si PhotonicsFull Stack E-O-E Link OptimizationProtocol adaptive Network Convergence Layer(NCL)Full Electrical to Optical to

10、 Electrical(EOE)link management FEC,CRC,FLIT ReplayWhy Photonic FabricTM vs.Copper?ComputeCompute&ComputeMemory Photonic Connectivity for Accelerated ComputingHigher Off-Package Bandwidth100s of Tbps BW,unrestricted by beachfrontEfficient Data MovementZero mass photons vs I2R losses from electronsLo

11、wer Latency&PowerEliminate DSPs,Deep FEC and re-timersFewer System ConnectionsHigher overall system reliabilityLonger ReachConnect multiple racks with less powerMore efficient remote memory transactions(RDMA)10pJ/bit(incl.2.4 pJ/bit for optics)for up to 50m60 pJ/bit for copper for up to 1mPhotonic F

12、abricTM Link:Module&ApplianceShattering the Memory&Interconnect Bandwidth Wall for Accelerated Computing6Side ViewCelestial AI Designed System in Package(SIP)Memory Controllers+Photonic Fabric Link+Network Switch2.07 TB Memory Capacity at 7.2 Tbps Bandwidth with 100ns LatencyHBM3E Operates as Write-

13、through Cache for DDRBandwidth&Latency of HBM3E with Capacity&Cost of DDR5Photonic Fabric ModulePhotonic Fabric ModuleDDR5 DIMMDDR5 DIMMDDR5 DIMMHBM 3EHBM 3EFAUPhotonic Fabric ASICDDR5 DIMMPICPhotonic Fabric AppliancePhotonic Fabric Appliance16x Photonic Fabric Memory Modules in a 2U Appliance33TB M

14、emory Capacity115Tbps Network Switch Enabling a Backend/Scale-Up AI FabricFiber Array Unit(FAU)MicrocontrollerAMSAMSAIXPU1.4 mmUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-AUCIe-

15、AUCIe-AUCIe-AUCIe-AProtocol Adaptive LayerPhotonic FabricTM Link Implementation:The Chiplet ApproachFull Stack High-Bandwidth Optical Interconnect Solution for Accelerated ComputingCustomer AI ProcessorStandard D2D:UCIe-A or MAX PHY2.4 pJ/bit photonic link power(Gen1)Photonic Fabric IP MacrosProtoco

16、l Adaptive Layer14.4 Tbps(Gen1)Per Chiplet28.8 Tbps(Gen2)Per Chiplet Full HBM3E bandwidth3.6X more bandwidth than CPO chiplet10 x Gen1 chiplets per XPU package=144 Tbps10 x Gen2 chiplets per XPU package=288 TbpsProtocol Adaptive:AXI,HBM/DDR,UAL,CXL etc.Photonic Fabric Delivers Superior Off-Package B

17、andwidthPhotonic Fabric Based Chiplets Offer More Package Bandwidth With Full Link Management XPUXPUXPUXPU4 Tbps per Chiplet40 Tbps per Package144 Tbps(Gen1)/288 Tbps(Gen2)per package4x reticle package with 10 chipletsCurrent SoTAPhotonic Fabric14.4 Tbps(Gen1)/28.8 Tbps(Gen2)per chiplet4x reticle pa

18、ckage with 10 chipletsPhotonic FabricTM 14.4 Tbps Photonic Interface ChipletSeamless Integration with Existing AI Accelerators&XPUsCompute-to-ComputeScale-up/backend networksCompute-to-MemoryPhotonically scalable disaggregated memoryFull HBM3E bandwidthUnlocks Photonic Connectivity(E-O-E)for AI XPUs

19、Protocol AdaptiveStandard ProtocolsAXI,HBM/DDR,CXLEmerging ProtocolsUALProprietary ProtocolsStandard D2D InterfaceUCIe MAX PHYStandard 2.5D packaging flows from multiple large OSATsAutomated high-volume,high-throughput fiber-attach processLess beachfront than 1 HBM stack10Compute-to-Compute&Compute-

20、to-Memory Photonic ConnectivityPhotonic FabricTM Technology Platform Enables Cluster Scale AI Processing115 Tbps Switch for backend/scale-up network All-to-All Connectivity for efficient Collective CommsPhotonic FabricTM Appliance33TB MemoryBroadcast&Reduce across all connected XPUs33TB Unified/Shar

21、ed Memory SpacePhotonic FabricTM LinkPhotonic Fabric Link&Appliance:AI Efficiency&PerformancePhotonic Fabric Link&Appliance Delivers Compelling Benefits for AI Workloads56 Conventional XPU with 192GB each of HBM3 required to process a 10T parameter DLRM model16 XPU-Optical connected to Photonic Fabr

22、ic Appliance to process a 10T parameter DLRM model 33TB Memory Capacity Full HBM3 Bandwidth 115Tbps SwitchEach conventional XPU holds 1/56th(1.79%)of the 10T model in its HBM memory(192GB)Each XPU-Optical has access to the full model stored in the PFMAUp to 71%XPU CapEx and Power ReductionHigher Com

23、pute DensityMemory Resources Scalable Independent of Compute 12.5X DLRM Performance Speed-upPhotonic Memory Fabric:Delivers Higher Throughput on GPT412Conventional 16-XPU/2 Server Rack16-XPU/2-Server+Photonic Fabric Appliance Ethernet(400/800 Gbps)36 TB(12x)More Fast Memory+7.2 Tbps Scale-Up Network

24、 per XPU(Backend Network)Scale-up/Backend Network Constrained2 Servers:3 TB of HBM Easily serve GPT4(1.8T)MOE models with large context length and batch sizesMemory Restricted:GPT4(1.8T)MOE models1.8TB1.2T3TBGPT4-1.8T MoE Model Weights(fp8)Memory Available for Inferencing(KV Cache,Context Lengths,Ba

25、tch Size)1.8TB34.2TB36TBGPT4-1.8T MoE Model Weights(fp8)Memory Available for Inferencing(KV Cache,Context Lengths,Batch Size)CPUCPUPCIe SwitchPCIe SwitchEthernet(400/800 Gbps)NICNIC 33TB Memory Capacity Full HBM3 Bandwidth 115Tbps Switch128 XPU Cluster Scale Photonic Fabric for Accelerated Computing

26、13.Illustrative PictureNot all photonic links shownAppliance 1Appliance 2Appliance 16128 XPU Back End/Scale-Up Fabric(7.2Tbps per XPU)CPUCPUPCIe SwitchPCIe SwitchFront-End Network(Ethernet:400/800 Gbps)NICCPUCPUPCIe SwitchPCIe SwitchFront-End Network(Ethernet:400/800 Gbps)NICNICAppliance 15.NICUp to

27、 792 TB System Memory CapacityEnabling Very Large Clusters With Significantly Lower Carbon/Energy(TCO2)ImpactState-of-the-ArtPhotonic Fabric(Gen1)Power Efficiency(pJ/b)3Celestial AI Photonic FabricTM vs.Current State-of-the-ArtPhotonic Fabric Delivers Disruptive Performance,Lowest Latency,Improved E

28、nergy Efficiency for Data Movement&Compute and Lower TCO14Photonic Fabric offers 16x off-package bandwidthPhotonic Fabric offers 5X better latencyPhotonic Fabric offers 8X better power efficiencyPhotonic Fabric offers 26X better cost efficiency2 SOTA Benchmark:DGX H200 Remote Direct Memory Access(RD

29、MA)3 SOTA Benchmark:RDMA Power for Fourth-generation NVLink 4 SOTA Benchmark:NVIDIA DGX H2001 SOTA Benchmarks:Ayar Labs TeraPHYState of the ArtPhotonic Fabric(Gen1)RDMA Latency 25X Lower Latency8X Lower PowerState of the ArtPhotonic Fabric(Gen1)Cost($/GB)426X Lower CostState-of-the-ArtPhotonic Fabri

30、c Chiplet(Gen1)Bandwidth Per Optical Chiplet 13.6X BetterHBM3E Bandwidth(8Tbps)Competitive SolutionThe Photonic FabricTM Designed for Volume DeploymentThe Right Technology at the Right Time Full-Stack Optical Interconnectivity PlatformVolume Manufacturing Si PhotonicsDriven by Data CommunicationsSil

31、icon PhotonicsAdvanced CMOSSilicon Photonics Control Circuitry,SERDES,Router/SwitchOpto-Electronic Systems-in-PackageWafer-Scale Assembly&TestLeveraging Established Supply ChainsHow Does the Photonic Fabric Transform AI?A New Vision of AI infrastructure Unconstrained By Fast Memory Capacity&Scale-Up

32、 BandwidthAccelerates Multi-ModalitySimplify AI Software StackMitigate AI Carbon ImpactSupport Larger Context LengthsEfficiently Process Very Large LLMs Democratizes AILowers cost of AILarger Context Lengths activate new use casesHelps meet carbon neutrality targets even with increasing AI usageTrain and Serve 1T+Models like:Open AI GPT4-1.8T MoEAnthropic Claude Opus(2T)Text-To-Image Text-To-VideoOpen AI SoraRunwayML All-to-All/Broadcast Interconnect simplifies collective commsReduces needs for shardinginfocelestial.aiwww.celestial.ai

相關圖表

本文介紹了Celestial AI的Photonic Fabric技術，這是一種基于光子學的芯片間連接方案，旨在提升芯片與芯片、芯片與內存之間的連接性能。文章指出，Photonic Fabric能提供高達17.8Tbps的CPO連接帶寬，顯著提高數據傳輸效率，降低延遲和能耗。相較于傳統的銅連接，Photonic Fabric在帶寬、延遲和能效方面具有明顯優勢。該技術還能支持更大規模的數據中心，滿足AI模型訓練和推理的需求。例如，通過Photonic Fabric，16個XPU-Optical可以處理一個10T參數的DLRM模型，相較于56個傳統的XPU，能效和性能都有顯著提升。Photonic Fabric的出現，為AI基礎設施的發展提供了新的視角，有望推動AI技術的大規模應用和普及。

如何重塑AI計算？" 開啟數據中心新紀元嗎？" 光子技術將如何引領未來？"

相關報告

用于XPU到XPU連接的4 Tbit每秒光計算互連小芯片.pdf

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站