Hot_Chips_2022_CXL_Memory_Expander_final.pdf

編號:136745 PDF 27頁 2.70MB 下載積分:VIP專享
下載報告請您先登錄!

Hot_Chips_2022_CXL_Memory_Expander_final.pdf

1、Scaling of Memory Performance and Capacitywith CXL Memory Expander August,2022|Samsung Electronics Co.,Ltd.S.J.Park,K.-S.Kim,H.Kim,J.So,J.Ahn,J.Jung,I.Yun,S.Ryu,W.-J.Lee,J.-G.Lee,H.-Y.Ryu,C.Y.Lee,J.Prout,K.-C.Ryoo,S.-J.Han,M.-K.Kook,J.S.Choi,J.Gim,Y.S.Ki,S.Ryu,C.Park,D.-G.Lee,J.Cho,H.Song,and J.Y.Le

2、eAgendaIndustry Trends and ChallengesIntroduction of CXL(Compute Express Link)CXL Memory Expander FeaturesSMDK:Unified Software Solution for CXLApplication Benchmark Test ResultsSummary and Future PlanIndustry Trends and ChallengesArtificial IntelligenceBig DataEdgeCloud5GMassive demand for data-cen

3、tric technologies and applicationsMemory bandwidth and density not keeping up with increasing CPU core countNeed a next gen interconnect for heterogeneous computing and server disaggregationThe Fast-Growing Computing WorkloadsNETtalk19602020PerceptronALVINNRNN for SpeechTD-Gammon v2.1LeNet-5BiLSTM f

4、or SpeehDeep Belief Nets&Layer-wise pretrainingAlexNetResNetsGPT-3LaMBDAFirst EraModern EraLarge-scale adoption of AI and MLSmarter devicesHyper-connected networksSuper-intelligent servicesDigital transformationPandemicEvolution of Hyperscale Computing EnvironmentFrom Converged to Composable Archite

5、ctureNetwork ChallengeConverged ArchitectureTOR based Rack Scalable ArchitectureServerTORTOR.CPUDRAMGPUSSDStorageNetwork&Storage SwitchHyper-Converged ArchitectureInterconnect ChallengeDisaggregated/ComposableArchitectureServer&Storage Combined Architecture.SmartNICMS CatapultAWS NitroCPUDRAMGPUSSDD

6、ivergence ChallengeCPUDRAMGPUSSDPooled Arch.:Memory,Compute,StorageThe Rising Need for Better ConnectivitySoC InterconnectProcessorInterconnectData CenterInterconnectCustomerInterconnectDIE/PACKAGENODEDATA CENTERMOBILE/BROADBANDA new class of interconnectfor device connectivity in the era of AI Can

7、be tailored and optimized for various AI applicationsCXL:Solution for the Era of HPCKey Features of CXL InterfaceCache CoherenceConnectivityByte AddressableLow LatencyCXL as the core of composble computing infrastructureCXL FeaturesCXL is a high-performance,low-latency protocol that leverages PCIe p

8、hysical layerProcessorPCle ConnectorPCIeChannelPCIe CardCXL Card High-speed and low-latency interconnect Leverages PCIe Physical layer(PCIe 5.0,PCIe 6.0)Supports various types of memories(volatile,non-volatile)CPU and CXL device memory coherency Supports switching and memory pooling Supports link le

9、vel integrity and data encryption Open standard(non-proprietary)Broad industry support in CXL consortium Regular specification updates(CXL 1.1,CXL 2.0,CXL 3.0)CXL Use Cases(1/2)Capacity and Bandwidth ExpansionIMDB ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMDB ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMDB ServeryTBDRAMC

10、PU 0yTBDRAMCPU 1CXL zTBCXL zTBIMC ServerxGBDRAMCPU 0 xGBDRAMCPU 1IMC ServerxGBDRAMCPU 0 xGBDRAMCPU 1IMC ServeryGBDRAMCPU 0yGBDRAMCPU 1CXL zGBCXL zGBIMC ServerxGBDRAMCPU 0 xGBDRAMCPU 1IMC ServeryGBDRAMCPU 0yGBDRAMCPU 1CXL zGBCXL zGBCapacity ExpansionBandwidth ExpansionCXL Use Cases(2/2)Tiering and Po

11、olingIMC ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMC ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMC ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMC ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMC ServerxTBDRAMCPU 0 xTBDRAMCPU 1IMC ServeryTBDRAMCPU 0yTBDRAMCPU 1CXL zTBCXL zTBMEMORY BOXCXL yTBCXL yTBMemory TieringMemory PoolingCXL Memory Expansion

12、 SolutionDDRx 512GBMax.8TB for 1CPUCPUDDRx 512GBDDRx 512GBDDRx 512GB8x 2DPCCPU8x 2DPC(DIMM/channels)DDRx 512GBDDRx 512GBDDRx 512GBDDRx 512GBMem Ex 512GBMem Ex 512GBMem Ex 512GBMem Ex 512GB8x CXL linksMax.16TB for 1CPUDoubled Capacity than Conventional MemoryNote:Max capacity varies with system confi

13、gurationsCXL Memory ExpanderData AccelerationHigh Capacity/BandwidthEnhanced Security/RASDDRGPUCPUSmartNICCXLinterfaceCXLControllerDDRCXL Memory ExpanderNew Solution for Memory Dominant ApplicationsCXL Memory Expander Line-upFPGAPCIe 3.0(x16)Host(Controller)ASICPCIe 5.0(x8)DDR43200,128GBMedia(DRAM)D

14、DR54800+,512GBAs of August,2022Built with FPGA and ASIC Controller2122CXL Memory Expander(1/3)Solution OverviewEnclosure(2T)SPIPMICs*DDR5 DRAMsDDR5 DRAMsCXLControllerE3.S Form Factor*Bottom-sidePCIex8DebugPortsCXL Memory Expander(2/3)Form Factor-EDSFF(E3.S)Media-DDR5 4800Module Capacity-Max 512 GBCX

15、L Link Width-x8 Maximum CXL Bandwidth-32GB/s(PCIe 5.0)Other Features-RAS,Interleaving,Diagnostics etc.Availability-Q322 for evaluation/testingProduct FeaturesCXL Memory Expander(3/3)Supported Features CXL 2.0 Device Type:Type 3 Support viral and data poisoning Memory error injection Multi-symbol ECC

16、 Media scrubbing Post package repairs(hard/soft)*Image Source:CXL ConsortiumCXL 2.0 Switching BenefitsSMDK*,Unified Interface for Memory*Scalable Memory Development KitSW development kit to enable Software-Define Memory system on heterogeneous memoriesHPC Applications(ML/AI,IMDB,Bigdata,etc)SMDKAllo

17、catorSMDK KernelCompatiblePathIntelligent Tiering EngineOptimizationPathMemory Pool ManagementDRAM PartitionCXL PartitionCPUHardwareSoftwareDRAMServerMainBoardCXL Memory ExpanderSMDKIntelligent Tiering Engine supports memory tieringscenarios with prioritiy,capacity,bandwidth,and so onMemory Pool Man

18、agement supports scalability reflecting memory request status and system resourceMemory Partitioning allows logical memory views for heterogeneous physical DRAM and CXL memoryTwo selectable paths,Compatible and Optimization Path,without or with modification of application SWPluginKernelBenefits of S

19、MDKClient Experience Transparent as well as Optimized Memory usesDifferentiatedCloud PerformanceCXL EcosystemOSS for CXL Industry and Research fieldSMDKUnified Interface for MemoryMain MemoryDDR5,LPDDR5,Etc.CXL Type2Accelerator+Mem.ExpanderCXL Type3MemoryExpanderCXLCXLCXLUnified SW SolutionFull-stac

20、k SW all about heterogenous memory systemSMDK is available as open source on GitHubhttps:/ SetupConfiguration of Test BedBMSMDKCXL CRBML/AIIMDBBertNASNETMemcachedRedisMLCStreamSWHWDDR5 DDR5 DDR5 4800DDR5 DDR5 CXL DRAM(FPGA)DDR5 DDR5 CXL DRAM(ASIC)ContainerMemory Expanderw/EDSFF Riser CardMemory Benc

21、hmark Test ResultsComparable Performance with DDR MemoryMLC 1:1 R/WSTREAM CopyDDRCXL-FPGACXL-ASIC4.6x4.7x0.190.190.880.921.01.0ML/AIProcess#1ML/AIProc#2ML/AIProc#NtestsettestsettestsettestsettestsettestsettestsettestsettestsetDDRDDRML/AI#N+1ML/AI#N+2ML/AI#N+Ktestsettestsettestsettestsettestsettestse

22、ttestsettestsettestsetCPUcorecorecorecorecorecorecorecoreCXL DRAMCXL DRAMDDR BWSaturationCXL BWsaturationNormalized BandwidthSystem Test Results(ML/AI)*Bidirectional Encoder Representations from Transformers1.00 1.12 1.16 1.11 1.00 1.17 1.30 1.45 1.79 CXL-FPGACXL-ASIC1.001.261.391.331.011.351.642.00

23、2.89ML/AI Applications(BERT*&NASNet*)DDR1-CXL2-CXL4-CXLk-CXLDDR1-CXL2-CXL4-CXLk-CXLInferences per Minutes(Normalized)Theoreticalmax.BERTNASNet*Neural Architecture Search Network+100%+45%Theoreticalmax.(See appendix for detail test condition)System Test Results(IMDB)IMDB Redis*Memory Usage(Scale-up v

24、s Scale-out)Single Node(DDR+CXL FPGA)2-Node Cluster(DDR x 3)30455699274966594917218666173189128B4KB1MB128B4KB1MBSETGETSingle-Node(DRAM+CXL)2-Node Cluster(DRAM)System#1RedisDDR5(32GB)CXLMem(64GB)ClientSet60GBGet60GBSystem#1ClientSystem#2EthernetRedisDDR5(32GB)RedisDDR5(32GB)RedisDDR5(32GB)ClusterSet6

25、0GBGet60GBvsCXL Link2.86x2.64xPerformance MB/sScale-up vs Scale-out*Remote dictionary server(See appendix for detail test condition)A Proven Memory Expansion SolutionIncreasing System Memory CapacityWidening Memory BandwidthSupporting RAS/Security based on Memory Controller83%IncreaseRASSecurity2XIn

26、creaseSummary and Future PlanAI and pandemic drive demand for memory bandwidth and capacity,and new interconnect standard CXL allows expansion of memorySamsung developed the industrys first ASIC-based 512GB CXL memory expander,which will be available for early evaluation this quarterMemory intensive

27、 applications such as IMDB and AI/ML have been tested on CXL memory expander with an open-source software,SMDKSamsung to cooperate further on CXL 3.0 and beyond,and providenext-gen memory solutions like memory disaggregation,SDM*,and more*Software-defined memoryEnhanced Data ServiceAI/ML NLP,Recomme

28、ndationEdge ComputingIndustry FirstCXLTMMemory ExpandersAppendixTest Condition(ML/AI and IMDB)ML/AIFor BERT and NasnetTensorFlow(CPU)=1.11.0+,Python 3.7,Numpy 1.20.0For BERTMulti-process,3 cores/process,batch-size:128,max_seq_num:256,num-test-data/process:512dataset=CoLAdo_train=true,do_eval=true,da

29、ta_dir=$GLUE_DIR/CoLAvocab_file=vocab.txtinit_checkpoint=$BERT_BASE_DIR/bert_model.ckptmax_seq_length=128,train_batch_size=32learning_rate=2e-5,num_train_epochs=3.0For NASNetMulti-process,3 cores/process,batch-size:100,eval_image_size:236,num-test-data/process:200 dataset_name=imagenet,num_preproces

30、sing_threads=4labels_offset=0,model_name=inception_v3preprocessing_name=inception_v3moving_average_decay=None,quantize=False,use_grayscale=FalseIMDBFor scale-up vs scale-outRedis-server:mastercluster-enabled yescluster-node-timeout 300000save stop-writes-on-bgsave-error yesrdbcompressionyesrdbchecks

31、umyesrdb-del-sync-files norepl-diskless-sync nordb-del-sync-files noreplica-serve-stale-data yesreplica-read-only yesrepl-diskless-sync-delay 5repl-diskless-load disabledrepl-disable-tcp-nodelaynoreplica-priority 100client-output-buffer-limit replica 0 0 0maxclients1000000maxmemory-policy noeviction

32、maxmemory-samples 10maxmemory-eviction-tenacity 100repl-diskless-sync no#master-replicadisk-based syncrdb-del-sync-files noreplica-serve-stale-data yesreplica-read-only yesreplica-priority 100client-output-buffer-limit replica 0 0 0io-threads 4io-threads-do-reads yesRedis-server:replicasave port 638

33、0replicaof 127.0.0.1 6379replica-read-only yesstop-writes-on-bgsave-error yesrdbcompression yesrdbchecksum yesrdb-del-sync-files norepl-diskless-sync nordb-del-sync-files noreplica-serve-stale-data yesreplica-read-only yesrepl-diskless-sync-delay 5repl-diskless-load disabledrepl-disable-tcp-nodelay noreplica-priority 100client-output-buffer-limit replica 0 0 0maxclients 1000000maxmemory-policy noevictionmaxmemory-samples 10maxmemory-eviction-tenacity 100replica-lazy-flush nolazyfree-lazy-user-del nolazyfree-lazy-user-flush nooom-score-adj nooom-score-adj-values 0 200 800disable-thp yes

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(Hot_Chips_2022_CXL_Memory_Expander_final.pdf)為本站 (2200) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站