《利用 CXL® 突破記憶壁壘.pdf》由會員分享,可在線閱讀,更多相關《利用 CXL® 突破記憶壁壘.pdf(17頁珍藏版)》請在三個皮匠報告上搜索。
1、1|2024 SNIA.All Rights Reserved.Breaking Through the Memory Wall with CXLPresented byAhmed MedhioubProduct Manager,CXL Smart Memory Controllers2|2024 SNIA.All Rights Reserved.Memory Wall Breaking through the Memory WallMemory Bound Use CasesCXL for Modular Shared InfrastructureEcosystem EnablementCa
2、lls to ActionAgendaAgenda3|2024 SNIA.All Rights Reserved.Challenges with Previous Attempts1.Limited scalability of memory BW and capacity2.Significant memory latency delta vs local memory3.Proprietary system configuration and deployment4.Complex software integration with popular appsThe Memory WallT
3、he Memory WallBreaking Through the Memory Wall with CXL1.Increase server memory BW and capacity by 50%2.Reduce latency by 25%3.Standard DRAM for flexible supply chain and cost4.Seamlessly expand memory for existing and new applications12 Memory Channels with Two LeosLocal CPUMemory ChannelsDDR5 5600
4、 x16x16HW InterleavingCXL-Attached Memory Channels4|2024 SNIA.All Rights Reserved.eCommerce&Business Intelligence Online Transaction Processing Online Analytics ProcessingBreaking Through the Memory WallBreaking Through the Memory WallWhat is happening?OLTPOLTPWhat has happened?OLAPOLAPOpportunity f
5、or CXL to Boost MySQL Database PerformanceOpportunity for CXL to Boost MySQL Database PerformanceAI Inferencing Recommendation Engines Semantic CacheOpportunity for CXL to Boost Vector Database PerformanceOpportunity for CXL to Boost Vector Database PerformanceVectorDBVector DatabaseInference Server
6、RESTModelsQueryInferenceUsersQuery/StoreInference5|2024 SNIA.All Rights Reserved.OLTP&OLAP ResultsOLTP&OLAP ResultsOLAPOLAPOLTPOLTPCPUStorageLocalLocal+CXLCXL ModeBenchmark5th Gen Intel Xeon Scalable Processor(Single-Socket)4x NVMe PCIe 4.0 SSDs512GB(8x 64GB DDR5-5600)512GB(8x 64GB DDR5-5600)+256GB(
7、4x 64GB DDR5-5600)12-Way Heterogenous InterleavingTPC-H(1000 scale factor)Cut Big Query Times in Cut Big Query Times in HalfHalf with CXL Memorywith CXL MemoryCPUStorageLocalLocal+CXLCXL ModeBenchmark4th Gen Intel Xeon Scalable Processor(Single-Socket)2x NVMe PCIe 4.0 SSDs128GB(8x 16GB DDR5-4800)128
8、GB(8x 16GB DDR5-4800)+128GB(2x 64GB DDR5-5600)Memory Tiering(MemVerge Memory Machine)Sysbench(Percona Labs TPC-C Scripts)150%More TPS with 15%Better CPU Utilization150%More TPS with 15%Better CPU UtilizationSystem Under Test Configuration System Under Test Configuration System Under Test Configurati
9、on System Under Test Configuration 0%10%20%30%40%50%60%70%80%90%100%Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q14Q15Q16Q17Q18Q19Q20Q21Q22Average Query Times(Normalized)TPC-H Query TimesDRAM+CXLDRAM-Only0%40%80%120%160%200%240%280%02004006008001000TPS(Normalized)ClientsTransactions per Second(TPS)DRAMDR
10、AM+CXL0%10%20%30%40%50%60%02004006008001000CPU Utilization(Normalized)ClientsCPU UtilizationDRAMDRAM+CXLUp to 50%Query Times with CXL!150%More TPS15%Better CPU Utilization6|2024 SNIA.All Rights Reserved.CFDWRFEDAROMSRecSysBroad Applicability of CXL InterleavingBroad Applicability of CXL Interleaving
11、System Under Test ConfigurationCPUStorageLocalLocal+CXLCXL ModeSingle 5th Gen Intel Xeon Scalable CPU4x NVMe PCIe 4.0 SSDs512GB(8x 64GB DDR5-5600)512GB(8x 64GB DDR5-5600)+256GB(2 Leos,2x 64GB DDR5-5600 per Leo)12-Way Heterogenous InterleavingCXL Interleaving Up to 50%+Performance Improvement0%20%40%
12、60%80%100%120%140%160%180%AI CachingComputational Fluid DynamicsWeather Research and ForecastingModelComputational ElectromagneticsRegional Ocean Modeling SystemNormalized Benchmark ScoreCXL Interleaving Benchmark ResultsLocalLocal+CXL7|2024 SNIA.All Rights Reserved.Breaking the Memory Wall for Data
13、basesBreaking the Memory Wall for DatabasesInterleavingInterleaving across CX L-Attached Memory2.33x 2.33x memory capacity and 1.66x 1.66x memory bandwidth per socket with CX LLower TCO Lower TCO for memory-intensive applicationPopular Certified&Supported SAP HANA Hardware 48 DIMMs with 48 DIMMs wit
14、h TwoTwo 2 2-Socket SystemsSocket SystemsHigh kWHigh TCOWithout CXLWithout CXLOptimized Hardware for In-Memory Databases56 DIMMs with 56 DIMMs with OneOne 2 2-Socket SystemSocket SystemWith CXLWith CXLLower kWLower TCODDR5DDR5DDR5DDR5CPUCPUCPUCPUCPUx16x16x16x16x16x16x16x16CPU8|2024 SNIA.All Rights R
15、eserved.CXL Memory Expansion for CXL Memory Expansion for HyperscalersHyperscalers2 PCIe 5.0 x16 add-in cards for memory expansion per M-DNO(DeNsity Optimized)HPMEach PCIe 5.0 x16 card with an MXIO cable connectorTarget Use CaseMode:Low-latency memory expansionConfig:2DPC SW-tiering or 1DPC HW inter
16、leavingApps:In-memory databases,semantic cacheComponent Form FactorDesign:Coplanar add-in cardConnector:PCIe 5.0 x16(SFF-TA-1033)Memory:2-4 DIMMs per cardSystem Form FactorPlatform:Modular Hardware System(DC-MHS)Interface:2x PCIe 5.0 x8 MCIO connectorsMB:Density Optimized HPM(M-DNO)Coplanar High-Den
17、sity Memory Expansion with Cold-Swap SupportDC-SIFModular Shared InFrastructureCPUAirflowFrontBackLeoLeoCPU9|2024 SNIA.All Rights Reserved.CXL Memory Expansion for HyperscalersCXL Memory Expansion for Hyperscalers2 PCIe 5.0 x16 add-in cards for memory expansion per M-DNO(DeNsity Optimized)HPMEach PC
18、Ie 5.0 x16 card with an edge connectorTarget Use CaseMode:Low-latency memory expansionConfig:2DPC SW-tiering or 1DPC HW interleavingApps:In-memory databases,semantic cacheComponent Form FactorDesign:Coplanar add-in cardConnector:PCIe 5.0 x16(SFF-TA-1033)Memory:2-4 DIMMs per cardSystem Form FactorPla
19、tform:Modular Hardware System(DC-MHS)Interface:1x PCIe 5.0 x16 SFF-TA-1034MB:Density Optimized HPM(M-DNO)Coplanar High-Density Memory Expansion with Cold-Swap SupportDC-SIFModular Shared InFrastructureAirflowFrontBackLeoLeoCPUCPU10|2024 SNIA.All Rights Reserved.CXL Memory Expansion for CXL Memory Ex
20、pansion for HyperscalersHyperscalers2 PCIe 5.0 x16 add-in cards for memory expansion per M-DNO(DeNsity Optimized)HPMEach PCIe 5.0 x16 card with an MXIO cable connectorComponent Form FactorDesign:Coplanar add-in cardConnector:2x PCIe 5.0 x8(SFF-TA-1016)Memory:2-4 DIMMs per boardHost Processor Module
21、Form FactorInterface:2x PCIe 5.0 x8(SFF-TA-1016)Power:DC-MHS PIC Power(2x3,72A+6)Mechanical:175 x 74 x 35.10 mm(L x H x W)System Form FactorPlatform:DC-MHS 7OU ChassisBlades:8 M-DNO HPM Expansion:Up to 8 DIMMs per BladeCoplanar High-Density Memory Expansion with Cold-Swap Support11|2024 SNIA.All Rig
22、hts Reserved.Modular Shared Infrastructure(MModular Shared Infrastructure(M-SIF)SIF)OCP Alignment with DC-MHS:Flexible CXL Expansion Options(M-DNO)Shared Elements with CXL Support(M-SIF)Standardized DIMM Support Memory Expansion for High-Density Systems High Power Connector(200W-600W)Challenges:Sign
23、al IntegrityLink Bifurcation&ConfigurationLatency/PerformanceDIMM Interoperability16-24 DIMMs per M-DNOCore ElementHost Processor Module(HPM)8-16 DIMMs per M-SIFShared ElementDisaggregated ResourcesPWR/PCIe/CXL MXIOCXL ControllersPCIe/CXL RetimersCXL.MEMSWCXL.MEMAdd-in CardsHost Interface BoardCPUJB
24、OFJBOFJBOGJBOGJBOMJBOMNode 1Node 1Node 2Node 2Node 3Node 3Node 4Node 4JBOMJBOMCXL.MEMCXL.MEM12|2024 SNIA.All Rights Reserved.Enabling CXL Connectivity for MEnabling CXL Connectivity for M-SIFSIFUse Case:Memory ExpansionHPC and AI InferencingMB/PCIe/MXIO ConnectivityUse Case:Shared/Pooled MemoryRecom
25、mendation and Semantic SearchPCIe Cabling ConnectivityCPULocal CXLLocal CXL-AttachedAttachedLong Reach,CXLLong Reach,CXL-AttachedAttachedPCIe/CXL CablingCPUDirect13Unlocking More CapabilitiesUse Case:JBOM EnablementEnterprise In-Memory DatabaseMXIO or Backplane ConnectivityShort Reach,CXLShort Reach
26、,CXL-AttachedAttachedBackplane213|2024 SNIA.All Rights Reserved.Enabling CXL Connectivity for MEnabling CXL Connectivity for M-SIFSIFCPULocal CXLLocal CXL-AttachedAttachedLong Reach,CXLLong Reach,CXL-AttachedAttachedPCIe/CXL CablingCPUDirectShort Reach,CXLShort Reach,CXL-AttachedAttachedBackplaneUnl
27、ocking More Capabilities13214|2024 SNIA.All Rights Reserved.Extending Reach for PCIe/CXL MemoryExtending Reach for PCIe/CXL Memory0%10%20%30%40%50%60%70%80%90%100%110%0%10%20%30%40%50%60%70%80%90%100%110%CXLCXL+1 RetimerCXL+2 RetimersRelative LatencyRelative BandwidthRelative PerformanceBandwidthLat
28、encyHigher is BetterLower is betterCPUCXL SmartMemory ControllerPCIe/CXL Smart DSP RetimerPCIe/CXL Smart DSP RetimerPCIe/CXL Smart DSP Retimer123123ReachSystem Under Test ConfigurationCPUCXL-Attached MemoryOSBenchmark5th Gen Intel Xeon Scalable CPU 128GB(2x 64GB DDR5-5600)Linux Ubuntu 22.04Intel MLC
29、(Memory Latency Checker)Optimizing High Performance&Latency Sensitive Applications through a Total Solution15|2024 SNIA.All Rights Reserved.Ecosystem EnablementEcosystem Enablement CXL Resource Management High Performance HW Interleaving High-Capacity Memory Density TieringBreaking Through Breaking
30、Through the Memory Wallthe Memory Wall CXL Discovery and Allocation DIMM Stability&Performance OS Development&Feature TestingHost&DDRx InteropHost&DDRx Interop COSMOS&DMTF Redfish Support CXL 2.0 RAS&Telemetry SW Integration&OrchestrationCloudCloud-ScaleScaleFleet ManagementFleet Management16|2024 S
31、NIA.All Rights Reserved.CloudCloud-Scale Interop Lab Sample Report for LeoScale Interop Lab Sample Report for LeoExample TestsCLX Compliance TestsPCIe Electrical TestingTransaction Layer TestingArbitrator and MultiplexerPower Management TestsReset and Initialization TestsSystem&Memory TestsDDR Tests
32、Stress TestsTraffic TestsSecurity TestsRASWorking Closely with DDR Vendors to Improve Performance&Stability17|2024 SNIA.All Rights Reserved.Calls to ActionCalls to ActionLearn MoreLearn MoreCXL Products and SpecificationsLeo CXL Memory Controller:Product PageOCP CXL Tiered Memory Expander:v1.0OCP DC-MHS/M-SIF Base Spec:v0.5Visit us at this upcoming eventVisit us at this upcoming eventStay up to date with our PCIe and CXL bulletins:https:/ EngagedGet EngagedCXL Management CollaborationOCP:CMM ProposalLinux:https:/pmem.io/ndctl/collabEcosystem Alliance Contact: