《Chiplet和網絡加速——互連時代的兩大驅動力.pdf》由會員分享,可在線閱讀,更多相關《Chiplet和網絡加速——互連時代的兩大驅動力.pdf(30頁珍藏版)》請在三個皮匠報告上搜索。
1、遠見,超越芯所未見K i w i m o o r eAgendaKiwi SoChiplet,高性能互聯平臺遠見,超越芯所未見遠見,超越芯所未見摩爾定律增加晶體管密度增加芯片面積提升芯片內傳輸效率提升計算效率突破存儲墻增加計算規模提升 Cluster 內傳輸效率提升 Cluster 間傳輸效率降低傳輸負載遠見,超越芯所未見摩爾定律增加晶體管密度增加芯片面積提升芯片內傳輸效率提升計算效率突破存儲墻增加計算規模提升 Cluster 內傳輸效率提升 Cluster 間傳輸效率降低傳輸負載新的瓶頸:互聯新的途徑:Chiplet,網絡加速遠見,超越芯所未見高性能計算邁進Chiplet 時代,三大主流形態
2、GraceHopperLakeFieldPonte VecchioMeteor LakeSapphire Rapids Zen3+Zen2 Zen MI200MI300RX7000AWS Graviton3M1 UltraDojoBR100昇騰910遠見,超越芯所未見AMD GenoaIntel Emerald Rapids Ampere SyrinAMD Bergama Intel Sapphire RapidsAWS Graviton 4遠見,超越芯所未見nIntel 7nm with 2.5D Advanced PackagenCore Counts 56nThread Counts 1
3、12nTotal LLC 105 MBnPCIe 5.0/CXL1.1 Lanes Up to 128nMemory Bandwidth 307GB/snSPECrate2017_int_base(2P)990遠見,超越芯所未見nIntel Golden Cove CorenLLC 1.875MBnMesh NoC Architecturen15 Cores Per Tile,14 be usednIntel 7nm with 2.5D Advanced PackagenCore Counts 56nThread Counts 112nTotal LLC 105 MBnPCIe 5.0/CXL
4、1.1 Lanes Up to 128nMemory Bandwidth 307GB/snSPECrate2017_int_base(2P)990遠見,超越芯所未見nIntel Golden Cove CorenLLC 1.875MBnMesh NoC Architecturen15 Cores Per Tile,14 be usedn400Gb/s Symmetric Crypton160Gb/s Compression+n160Gb/s De-compressionn400M Load Balancing Decisions per Second nIntel 7nm with 2.5D
5、Advanced PackagenCore Counts 56nThread Counts 112nTotal LLC 105 MBnPCIe 5.0/CXL1.1 Lanes Up to 128nMemory Bandwidth 307GB/snSPECrate2017_int_base(2P)990遠見,超越芯所未見nIntel Golden Cove CorenLLC 1.875MBnMesh NoC Architecturen15 Cores Per Tile,14 be usednHS D2D with 2.5D EMIIB&Si-InterposernD2D Bandwidth:5
6、00GB/snData Rate:5GT/snBump Pitch:55umnEnergy Efficiency:0.5pj/bitnPHY Latency end-to-end TX+RX:2.4nsn400Gb/s Symmetric Crypton160Gb/s Compression+n160Gb/s De-compressionn400M Load Balancing Decisions per Second nIntel 7nm with 2.5D Advanced PackagenCore Counts 56nThread Counts 112nTotal LLC 105 MBn
7、PCIe 5.0/CXL1.1 Lanes Up to 128nMemory Bandwidth 307GB/snSPECrate2017_int_base(2P)990遠見,超越芯所未見nIntel 7nm with 2.5D Advanced PackagenCore Counts 56nThread Counts 112nTotal LLC 105 MBnPCIe 5.0/CXL1.1 Lanes Up to 128nMemory Bandwidth 307GB/snSPECrate2017_int_base(2P)990nIntel Golden Cove CorenLLC 1.875
8、MBnMesh NoC Architecturen15 Cores Per Tile,14 be usednHS D2D with 2.5D EMIIB&Si-InterposernD2D Bandwidth:500GB/snData Rate:5GT/snBump Pitch:55umnEnergy Efficiency:0.5pj/bitnPHY Latency end-to-end TX+RX:2.4nsn400Gb/s Symmetric Crypton160Gb/s Compression+n160Gb/s De-compressionn400M Load Balancing Dec
9、isions per Second n56 High Speed IO,32 for PCIe5(CXL 1.1),24 for UPInLanes Rate:2.5-32GbpsnEnergy Efficiency:6.48pj/bit遠見,超越芯所未見nCore Counts 96nThreads Counts 192nTotal L3 Cache 384MBnMemory Bandwidth 460.8GB/snIFOP(GMI3)Counts 12nPCIe 5.0 Lanes 128nDDR5 Memory Channels 12nSPECrate2017_int_base(2P)1
10、950 D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D52DPC8GZen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4PCIeCXLPCIe 3(8 Lanes)PCIe遠見,超越芯所未見D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D52DPC8GZen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4PCIeCXLPCIe 3(8 Lanes)PCIen1CCD Core Counts 8nTSMC N5 55
11、mm 6.5B Transn32 MB Shared L3 CachenCore Counts 96nThreads Counts 192nTotal L3 Cache 384MBnMemory Bandwidth 460.8GB/snIFOP(GMI3)Counts 12nPCIe 5.0 Lanes 128nDDR5 Memory Channels 12nSPECrate2017_int_base(2P)1950 遠見,超越芯所未見D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D52DPC8GZen4Zen4Zen4Zen4Zen4Zen4Ze
12、n4Zen4Zen4Zen4Zen4Zen4PCIeCXLPCIe 3(8 Lanes)PCIen1CCD Core Counts 8nTSMC N5 55 mm 6.5B Transn32 MB Shared L3 CachenCore Counts 96nThreads Counts 192nTotal L3 Cache 384MBnMemory Bandwidth 460.8GB/snIFOP(GMI3)Counts 12nPCIe 5.0 Lanes 128nDDR5 Memory Channels 12nSPECrate2017_int_base(2P)1950 GMI3GMI3GM
13、I3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3IO Root HUBSerdesSerdesSerdesSerdesSerdesSerdesSerdesSerdesTSMC6 nm 386.88mm 11B Trans 遠見,超越芯所未見D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D52DPC8GZen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4PCIeCXLPCIe 3(8 Lanes)PCIen1CCD Core Counts 8nTSMC N5 55 mm 6.5B
14、 Transn32 MB Shared L3 CachenCore Counts 96nThreads Counts 192nTotal L3 Cache 384MBnMemory Bandwidth 460.8GB/snIFOP(GMI3)Counts 12nPCIe 5.0 Lanes 128nDDR5 Memory Channels 12nSPECrate2017_int_base(2P)1950 IO Root HUBSerdesSerdesSerdesSerdesSerdesSerdesSerdesSerdesGMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3G
15、MI3GMI3GMI3n1CCD Core Counts 8nTSMC N5 mm 6.5B Transn32 MB Shared L3 CacheTSMC6 nm 386.88mm 11B Trans 遠見,超越芯所未見D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D5D52DPC8GZen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4Zen4PCIeCXLPCIe 3(8 Lanes)PCIen1CCD Core Counts 8nTSMC N5 55 mm 6.5B Transn32 MB Shared L3
16、CachenCore Counts 96nThreads Counts 192nTotal L3 Cache 384MBnMemory Bandwidth 460.8GB/snIFOP(GMI3)Counts 12nPCIe 5.0 Lanes 128nDDR5 Memory Channels 12nSPECrate2017_int_base(2P)1950 IO Root HUBGMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3GMI3SerdesSerdesSerdesSerdesSerdesSerdesSerdesSerdesX 16X 16X 8X
17、 8X 4X 4X 4X 4X 16X 8X 8X 4X 4X 4X 4X 2X 2X 2X 2X 2X 2X 2X 2X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X 1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X 16X 8X 8X 4X 4X 4X 4X 16X 2X 2X 2X 2X 2X 2X 2X 2X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1X1SATACLXxGMIPCIen16 Lanes,32 GbpsnCombination xGMI(C2C Link),CXL,PCIe,SATA Serd
18、esTSMC6 nm 386.88mm 11B Trans n1CCD Core Counts 8nTSMC N5 mm 6.5B Transn32 MB Shared L3 Cache遠見,超越芯所未見n1 Computing Die(N5)+2 PCIe5 Controller Die+4 DD5 Controller DienCore Counts 64nThreads Counts 64nMemory Bandwidth:300GB/snTotal L3 Cache 32MBnSPECrate2017_int_base(3P)630 遠見,超越芯所未見n1 Computing Die(
19、N5)+2 PCIe5 Controller Die+4 DD5 Controller DienCore Counts 64nThreads Counts 64nMemory Bandwidth:300GB/snTotal L3 Cache 32MBnSPECrate2017_int_base(3P)630 nARM V1 CorenL2 Cache 1MBnL1 Instruction Cache 64KBnL1 Data Cache 64 KB n282 mmnCMN 650 Fabricn8*8 Mesh,32 Node for Memory,32 Node for CPU遠見,超越芯所
20、未見n1 Computing Die(N5)+2 PCIe5 Controller Die+4 DD5 Controller DienCore Counts 64nThreads Counts 64nMemory Bandwidth:300GB/snTotal L3 Cache 32MBnSPECrate2017_int_base(3P)630 nARM V1 CorenL2 Cache 1MBnL1 Instruction Cache 64KBnL1 Data Cache 64 KB n282 mmnCMN 650 Fabricn8*8 Mesh,32 Node for Memory,32
21、Node for CPU 43.6 mm21.7 mm遠見,超越芯所未見 43.6 mmn1 Computing Die(N5)+2 PCIe5 Controller Die+4 DD5 Controller DienCore Counts 64nThreads Counts 64nMemory Bandwidth:300GB/snTotal L3 Cache 32MBnSPECrate2017_int_base(3P)630 nARM V1 CorenL2 Cache 1MBnL1 Instruction Cache 64KBnL1 Data Cache 64 KB n282 mmnCMN
22、650 Fabricn8*8 Mesh,32 Node for Memory,32 Node for CPU21.7 mmn1 Nitro card with 3 gravition3 sockets,Nitro card manages 3 sockets simultaneouslynOptimize performance,energy efficiency in whole遠見,超越芯所未見nIO Die Central架構提供更好的性能,更小的面積,更低的功耗nIO Die Central架構提供更好的多芯?;ヂ撔阅?,更均衡的延時,更高的靈活性nIO Die Central架構中,計
23、算芯粒和互聯芯粒采用不同的制程,量產成本更低nMulti-Die架構設計更為簡單,且在2-Die產品中具有性價比優勢遠見,超越芯所未見SoCCache3050nS26mm26mm3DICMI300Source:AMDPonte VecchioSource:Intel13mm0.1mm/1nS26mm13mm0.1mmCacheTopdie遠見,超越芯所未見nTbps 級帶寬,nS 級延時,滿足異構核間通信需求n模塊化組合,通用處理器+專用 DSA,滿足不同應用需求n模塊化升級,滿足算法不斷迭代需求4 PetaFLOPS TE|72 Arm CPUs 96GB HBM3|576GB GPU Me
24、moryVector DB 400GBDLRM 500GBLLM 65GB1.0X1.7X9.3X1.0X2.4X12.0X1.0X122X284XX86 CPUGH200 x86+H100Source:NvidiaChiplet:異構計算的黃金搭檔異構計算單元 Grace+Hopper,實現提供 5 倍性能提升遠見,超越芯所未見K i w i m o o r eAgenda互聯時代的挑戰和關鍵技術n3D Base DieDie2Die IP2.5D IO DieNetwork DSAWorkload&Storage Acc.Programable SW Arch RDMAChiplet HW
25、 ArchD2D InterfaceFabric over ChipletLow Manu.CostUltra-High PerformanceLow R&D CostFast TTMTest&ProductionChiplet SolutionChiplet Turn-Key Designnn遠見,超越芯所未見UCIe Compatible高帶寬低延時低功耗32Gbps,0.5pJ,5ns全面覆蓋2.x/2.5/3D 等不同 Chiplet 封裝形態國際標準支持Chiplet國際聯盟,UCIe 標準 超高速 D2D 接口Die2Die 2.5D,Multi-Channel,UCIe全集成高速
26、接口DDR&PCIe MC/PHY,Multi-Channel超高速互聯網絡Coherent OCID2DD2DMemoryAccessMemoryAccessKiwi FabricPCIeSerDesIOHubxCDxCDxCDxCDIO Die大電流 IVR數百瓦高速大電流分布式供電網絡分布式 3D 近存數百 MB Cache3D 高帶寬低延時高密度 3D 堆疊3D 堆疊實現 200%高密度集成超高速片間互聯Coherent OCI全集成高速接口高速 Memory&PCIe MC/PHYTop DiesBase Die高性能 RDMA集成多種硬件加速單元基于 Chiplet 架構可編程 D
27、P EngineBase NICProgrammableDP EngineDSAAcceleratorEngine全面覆蓋 2.x/2.5/3DChiplet 形態高速互聯芯粒3D 近存超高速互聯芯粒數據調度及網絡加速芯粒遠見,超越芯所未見全流程 Turnkey 服務設計-封裝-測試-量產Interposer 戰略合作解決量產問題豐富 Chiplet 項目設計經驗10+,GlobalSubstrate2.5D INTERPOSERLOGIICLOGIICLOGIICLOGIICHBMHBM奇異摩爾與智原科技聯合發布2.5D/3DIC 整體解決方案 2023年11月11日,智原科技與奇異摩爾宣布
28、共同推出 2.5D interposer 及 3DIC 整體解決方案,雙方將基于晶圓對晶圓 3DIC 堆疊封裝平臺,為行業提供 2.5D interposer 及 3DIC“從設計、封裝、測試至量產的全鏈路服務”。奇異摩爾 CEO 田陌晨表示:很榮幸與智原科技達成戰略合作。我們致力于與智原科技進一步擴大及深化合作伙伴關系?;陔p方的合作,我們能更好的為客戶提供從芯粒產品,設計、封裝、測試到量產的全鏈路解決方案。希望雙方能更深入的挖掘 Chiplet 與互聯創新技術應用,促進 Chiplet 生態成熟和商業化落地。智原科技營運長林世欽表示:非常高興與奇異摩爾攜手,共同發布 2.5D/3DIC 整
29、體解決方案。憑借我們在 SoC 設計方面的專業知識,以及與晶圓和封測領域的頂尖企業的合作,能為 3DIC先進封裝服務提供全方位支持。這一合作標志著我們在小芯片整合領域能夠更充分的發揮各種尖端應用潛力,以滿足客戶需求。遠見,超越芯所未見Physical LayerDie-to-Die AdapterFlit-aware D2D Interface(FDI)Raw D2D Interface(RDI)nCompatible UCIe 1.1 StandardnSupport multi-package:standard&advantagenSupport for 4,8,12,16,24 and
30、32 GT/s data ratesnSupports PCIe,CXL,and streaming protocolsnSupports single and multiple PHY modulesnsupport link training,repair,redundancenLink State ManagementProtocol LayernARB/MUX(when applicable)nCRC/Retry(when applicable)nLink State ManagementnParameter NegotiationnLink TrainingnLane Repair(when applicable)nLane Reversal(when applicable)nScrambling/De-scramblingnSideband Initialization and TransfersnAnalog Front EndnClock Forwarding