《小組討論:以數據為中心的計算現在和未來.pdf》由會員分享,可在線閱讀,更多相關《小組討論:以數據為中心的計算現在和未來.pdf(15頁珍藏版)》請在三個皮匠報告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CABumjun(BJay)Kim,Samsung andDongup Kwon,MangoBoostA Next-Generation DPU-Accelerated Petabyte-Scale Storage Solution to Build Future Data-Centric DatacentersSamsungs memory&storage technology will enable highly competitive storage sub-systems and run like a
2、 SSD in data center scale(PBSSD)PBSSD:Next Storage System-Ultra high capacity:PB in 1U-High performance:UP to 400GbE w/NVMe-oF-Ultimate manageability:NAND health monitoring,Auto failure recovery-Safe security:Encryption-at-rest,Ransomware detectionPBSSD E3.S Reference SystemAMD EPYC(Genoa)-Single So
3、cket-PCIeGen5 128 IO Lanes-Balance Storage&Network IO x64x64Front-E3.S SSD-E3.S CXLRear-OCP 3.0 NIC-PCIeDevicex80 x48ORORE3.S SSD(x2)E3.S SSD(x2)E3.S SSD(x2)U.2 SSD(x4)32x E3.S 1T(x2)24x E3.S 1T(x2)+8x U.2(PCIe x4)E3.S CXL(x8)E3.S SSD(x4)E3.S CXL(x8)E3.S SSD(x4)8x E3.S 1T(x4)+4x E3.S 2T(x8)Type 3Ult
4、ra High Performance StorageE3.S SSD(x4)16x E3.S 1T(x4)Type 1Petabyte Scale Massive StorageType 2High Performance&Density StorageType 4High Performance&Memory Hybrid StorageMangoBoost Provides DPU Solutions to Boost Your DatacenterFounded2022.2.7.Based on 8-year R&D outcomes at SNU(since 2014)Total E
5、mployees(Sep,2023)57(R&D 54,Management 3)Target ApplicationsServer/Datacenter,AI/HPC,Cloud,Storage,SecurityKey ProductsDPU(Data Processing Unit),DPU IP,DPU Software,DPU SoC,DPU CardSeoulKoreaSeattleUSAMangoBoostInc.(Headquarter Seattle,WA)PERFORMANCESCALABILITYMangoBoostDPUCloud AppsBig Data AppsAI
6、AppsCPUVirt.NetworkStorageResourceMgmt.SecurityNICSSDSSDGPUNPUNICSSDSSDGPUNPUNICSSDSSDGPUNPUNICSSDSSDGPUNPUTCOWe provides full-stack DPU HW/SW solutions!MangoBoost Track Record of DPU InnovationsMangoBoostsTop-tier Papers1DCS:A Fast and Scalable Device-Centric Server Architecture2015.12MICRO2DCS-ctr
7、l:A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture2018.06ISCA3CIDR:A Cost-Effective In-line Data Reduction System for Terabit-per-Second Scale SSD Arrays2019.02HPCA4FIDR:A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling20
8、19.10MICRO5Scalable Multi-FPGA Acceleration for Large RNNs with Full Parallelism Levels2020.07DAC6TrainBox:An Extreme-Scale Neural Network Training Server Architecture by Systematically Balancing Operations2020.10MICRO7FVM:FPGA-assisted Virtual Device Emulation for Fast,Scalable,and Flexible Storage
9、 Virtualization2020.11OSDI8A Fast and Flexible Hardware-based Virtualization Mechanism for Computational Storage Devices2021.07ATC9NLP-Fast:A Fast,Scalable,and Flexible System toAccelerate Large-Scale Heterogeneous NLP Models2021.09PACT10DLS:A Fast and Flexible Neural Network Training System with Fi
10、ne-grained Heterogeneous Device Orchestration2022.01TPDS11SmartFVM:A Fast,Flexible,and Scalable Hardware-based Virtualization for Commodity Storage Devices2022.04TOS12A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks2023.02TACO13F4T:A Fast and Flexible FPGA-b
11、ased Full-stack TCP Acceleration Framework2023.06ISCA.MangoBoost Provides Composable Customer-OptimizedDPUs Extensive DPU IP PortfolioCustomization Framework+=Optimized DPU Solutions(on off-the-shelf FPGA HW)Virtualization-SR-IOV,ATS/ATC-Vhost-NVMeAccel.-VirtIO Acceleration-vDPA SupportNetwork-SDN A
12、cceleration-P4 Support-TCP Full Acceleration-RDMA(RoCEv2)AI-Large-scale DNN Training-Device Orchestration-Pre/Post-processing Accel.-MPI AccelerationDisaggregation-NVMeover Fabric-NVMe/TCP-GPU over FabricSecurity-Root of Trust-Crypto Acceleration-Network SecurityStorage-Data Deduplication-In-line Co
13、mpression-Thin Provisioning-LVM/RAIDBig Data Server DPUAI Server DPUCloud Server DPUOther Custom DPUWe are the only DPU solution provider to satisfy varying customer needs!PBSSD is HIGH-performance and HIGH-density NVMe storage server1x PM1743 provides 13GB/s for READ and 6.6GB/s for WRITE16x PM1743
14、 provide 208GB/s for READ and 105GB/s for WRITE!Serving 1600Gbit/s throughput on a single storage server over the networkI/O processing is BURNINGall CPU cores!A CaseStudy:PetaByte scale SSD=1600GbpsMangoDPUOffloads I/O to Save CPU CyclesTraditional InfrastructureProcessing all I/O in CPUDPU-enhance
15、d InfrastructureTCP+NVMe-oFInitiator OffloadingHost OSSSDNICNICVirtualizationNVMe-oFTCPVMVMSSDHost OSNVMe-oFTCPHost OSVMVMVMVMMangoDPUVirtualizationNVMe-oFTCPSSDMangoDPUTCPSSDSSDSSDSSDSSDHost OSNVMe-oFUser-defined FeaturesComparison of TCP and RDMA(Infiniband and RoCE)NVMeover Fabrics:TCP or RDMA?TC
16、PInfinibandRoCEPerformanceXOOEasy ManagementOXXCost-EfficiencyOXOMangoBoost DPU offersNVMe/TCP at performance of NVMe/RDMA!PBSSD System UnderTestDell R750(NVMf Initiator)Mango DPU on Xilinx U55C FPGADell R750 (NVMf Initiator)Mango DPU on Xilinx U55C FPGA400G(200G x2)NVMe-over-TCPNVMe/TCPOffloadingNV
17、Me/TCPOffloadingSamsung PM1743 PCIe Gen5 x16Mango DPU on Xilinx U55C FPGA x2Samsung PBSSD(NVMf Target)CPU:Single AMD EPYC Zen4 SP5 Processors(Genoa,32-core)Memory:24x DIMM slots,2DPC ECC DDR5(384GB)Storage:16x Hot-swap EDSFF E3.S(1T)NVMe slots(Samsung PM1743 PCIe Gen5)TCP OffloadingNVMeoF Target Eva
18、luationSW cannotachieve the line rate with 36 coresMBDPU achieves 400G with*32 CPU coresMBDPU beats TCP and RDMA in throughputSW:NVIDIA CX6+Software TCP stackMBDPU:Mango DPU+HW OffloadingNUMA disabledfio load=randrw,bs=8k,iodepth=6401002003004004812162024283236Bandwidth(Gbit/s)Polling CoresTCPRDMAMB
19、DPU*32 cores are dedicated for I/O request processing.It will also be eliminated with MBDPU target offloading in the future.TPC-C EvaluationTPC-C is a on-line transaction processing benchmark(OLTP)Mango DPU reduces mean and 99.9thlatencies by 48.8%and 74.9%separatelyWarehouse Scale:10,000Number of c
20、onnections:64SW:NVIDIA CX6+Software TCP stackMBDPU:Mango DPU+HW Offloading00.20.40.60.81020406080100CDFLatency(ms)TCPRDMAMBDPU020406080100MedianMean90th95th99th99.9thLatency(ms)TCPRDMAMBDPUMangoBoost offers*5M+IOPSand 400G Duplex NVMf/TCP on off-the-shelf FPGANVMe/TCP at RDMA Performance First Time EverDemo available upon requestcontactmangoboost.io*fio jobs=32,iodepth=64,block size=8kbOCP Global Summit|October 18,2023|San Jose,CA