《ODCC:Ethernet SSD 測試白皮書(2021)(英文版)(22頁).pdf》由會員分享,可在線閱讀,更多相關《ODCC:Ethernet SSD 測試白皮書(2021)(英文版)(22頁).pdf(22頁珍藏版)》請在三個皮匠報告上搜索。
1、 Ethernet SSD Testing White Paper Number:ODCC-2021-05008 Open Data Center Committee 2021-09-15 發布 ODCC-2021-05008 Ethernet SSD 測試白皮書 i Contents Preface.iii Copyright.iv Ethernet SSD Testing White Papers.1 1.Introduction.1 2.Terms.3 3.Overview of Ethernet SSD.4 3.1.Basic concepts.4 3.2.Ethernet SSD f
2、eatures.6 3.2.1.NVMe features.6 3.2.2.RDMA features.6 3.2.3.Ethernet features.6 3.2.4.Network services.6 3.2.5.Peripherals.7 3.2.6.Enterprise-class data protection.7 4.Ethernet Switch Introduction.7 5.Application Scenarios.8 5.1.Object storage based on big data.8 5.2.CDN edge caching.9 6.Test Enviro
3、nment and Configuration.10 ODCC-2021-05008 Ethernet SSD 測試白皮書 ii 6.1.System environment setup.10 6.2.Ethernet switch configuration.12 6.3.Tuning on host server.12 6.4.Ethernet SSD Pre-condition and test item.14 6.4.1.Ethernet SSD Pre-condition.14 6.4.2.Test item.14 7.Performance Test Result.15 7.1.T
4、hroughput test result.15 7.2.IOPS test result.15 8.Conclusion.15 ODCC-2021-05008 Ethernet SSD 測試白皮書 iii Preface Thanks to the following drafting companies(in no particular order):Samsung R&D Institute China Xian,Broadcom Inc,China Academy of Information and Communications Technology(Institute of Clo
5、ud Computing and Big Data)Drafters(in no particular order):Xing He,Kun Dou,Yi Hou,Ziyan Zhao,Siyu Mou,Lining Dou,Zongying He,Maggie Gao.Shaopeng Wang.Changkui Zheng.Shuai Lu.ODCC-2021-05008 Ethernet SSD 測試白皮書 iv Copyright All the achievements published by ODCC(Open Data Center Committee)are protecte
6、d by the Copyright Law,and the copyright is shared by the developer.Reprint,extract or otherwise use words or ideas from ODCC results,the source shall be identified as Open Data Center Committee.ODCC and related units will hold copyright owners accountable for their infringement behaviors such as pl
7、agiarism,copying,modification,sale,adaptation,compilation,translation and publication without written consent of copyright owners.Thanks for your cooperation and support.ODCC-2021-05008 Ethernet SSD 測試白皮書 1 Ethernet SSD Testing White Papers 1.Introduction In the digital era,data is not only a record
8、 of information,but a resource for acquiring information.Due to big data and AI training,IT technology for data processing alters from management to operation,and data-driven emerging technologies update every day.With the promotion of 5G and Chinas new infrastructure,data volume has been growing ex
9、plosively.The traditional storage and computing architecture is facing multiple challenges such as uneven resource utilization,high storage cost,and difficulty in resource sharing.It has become the primary demand to improve storage and computing resource utilization,facilitate flexible storage and c
10、omputing deployment ratio,and schedule computing resources on demand.The term-storage and computing separation-is back in the limelight.Figure 1.Separation of deposit and accounting The separation of computation and storage is not a new term.There existed Network Attached Storage(NAS)twenty years ag
11、o,which is also essentially an Ethernet file server using TCP/IP protocol.At that time,if mass storage was required,the server would save data to NAS.For the purpose of storage-oriented private networks,based on the SCSI storage protocol,protocols such as iSCSI have been derived and they significant
12、ly improve ODCC-2021-05008 Ethernet SSD 測試白皮書 2 CPU usage and performance.However,the iSCSI protocol relies on traditional IP networks and is limited by the performance and overhead of the TCP protocol.iSCSI Extensions for RDMA(iSER)is extended to optimize the performance of iSCSI.iSER is a storage
13、protocol based on RDMA transport layer protocol.It inherites the high efficiency of RDMA,so it has better performance,better CPU utilization and lower latency than iSCSI.However,since it is essentially based on SCSI storage protocol,many aspects such as the number and depth of queues are subject to
14、the shortcomings of the SCSI protocol.These shortcomings were not the primary problem in the HDD era.But with the popularity of Flash technology and SSDs,the flaws of the SCSI protocol itself are becoming increasingly obvious.After the protocol is encapsulated by the network in particular,the IO per
15、formance of the network is far from meeting the data read and write requirements of the server,because network latency and protocol processing bring additional delay in read and write operations after storage device joins the network.In the HDD era,compared with the millisecond latency and low IOPS,
16、network overhead is not the primary issue.However,as to SSDs,the latency overhead of network and protocol processing becomes more apparent,and after several generations of storage-network convergence,the NVMe over Fabrics(NVMe-oF)protocol is created to meet future NVMe storage requirements.Storage d
17、evices that support NVMe-oF are starting to get noticed.Figure 2.Network latency as a percentage of storage system latency In high-performance application scenarios,JBOF can effectively paly its role.In order to bind Nvmes SSD together to connect to Ethernet,either using a PCIe switch and ODCC-2021-
18、05008 Ethernet SSD 測試白皮書 3 external PCIe data cable to connect to a nearby server,or using one or two x86 server processors,some PCIe switches and some RDMA Ethernet cards to support NVMe-oF,which is so cumbersome that the performance bottleneck and users TCO outside the core functionality are incre
19、ased.Since 2018,some vendors such as Marvell have released NVMe-Ethernet converters that allow an NVMe SSD to connect directly to an Ethernet network and in turn access it via NVMe-oF.Vendors such as Samsung have also demonstrated their SSDs that inherit an Ethernet port.With SSDs that integrate the
20、 Ethernet interface,users can effectively reduce the TCO of the storage server by 65%,and the CPU of the storage server as well as the Ethernet port is no longer the overall bottleneck.Figure 3.Advantages of Ethernet SSD over NVMe SSDs Applications This white paper describes Samsungs Ethernet SSD,as
21、 well as performance information and functional verification of application scenarios and network features,introducing the system configuration used to test the performance.The performance characteristics depend on a variety of factors,and all performance characteristics noted in this document were
22、performed on fully per-conditioned SSDs.2.Terms IP Internet Protocol MAC Media Access Control NIC Network Interface Card NVMe-oF NVMe over Fabric ODCC-2021-05008 Ethernet SSD 測試白皮書 4 QD Queue Depth QP Queue Pairs RDMA Remote Direct Memory Access RNIC RDMA-capable NIC SQ Submission Queue 3.Overview o
23、f Ethernet SSD Samsung Ethernet SSD is the first dual port 25GbE Ethernet SSD with Samsung NVMe-oF,NVMe technology and SFF-8639/9639 connector in a 2.5-inch SSD form factor.The form factor is consistent with conventional U.2 SSDs,compatible with existing server chassis and mounting slots,and capable
24、 of supporting existing U.2(SFF 8639/9639)NVMe-based storage systems.Figure 4.Form factor of Ethernet SSD The interface borrows the SFF-8639/9639 connector from the U.2 NvmeSSD,modifying the pin definition and not changing the design of the vendors existing chassis backplane.3.1.Basic concepts Conve
25、ntional NVMe JBOF utilizes existing storage and server architectures to separate NVMe,but also faces the issue of poor scalability,poor operational energy ODCC-2021-05008 Ethernet SSD 測試白皮書 5 consumption and heat dissipation,and bandwidth performance is limited by CPU,PCIe and network.Compared with
26、the conventional EBOF architecture,Ethernet SSD drops the functional modules of NVMe-oF from the storage system to the SSD architecture.The storage device control module inside the EBOF is eliminated,and the interface inherits the advantages of high bandwidth and scalablility of Ethernet.Figure 5.Et
27、hernet SSD EBOF application architecture diagram Meanwhile,the whole equipment is not limited by the performance bottleneck of CPU and memory,and it can provide higher bandwidth,stability and lower power consumption.ODCC-2021-05008 Ethernet SSD 測試白皮書 6 Figure 6.Ethernet SSD Test Environment Setup 3.
28、2.Ethernet SSD features 3.2.1.NVMe features Support NVMe v1.2 Support NVMe-oF v1.0 Support for up to 1024 NVMe queue pairs(QP)Support for up to 128 namespaces per drive Limited to 16 namespaces per drive if reservations are managed by the Bridge Up to 64 entry submission queue(SQ)depth Up to 8K Work
29、 Queue Entries Number of QP*SQ depth must be less than 8K 3.2.2.RDMA features iWARP,RoCE(v1,v2)in hardware Support for up to 1024 iWARP or RoCE QPs TCP Offload Engine in HW(iWARP)3.2.3.Ethernet features Dual port 25GbE Two ports at the same speed Up to 8 MAC addresses 4K VLAN addresses per port Up t
30、o 1024 source IP addresses of IPv4/IPv6 8 destination IP addresses of IPv4/IPv6 HW Link Aggregation 3.2.4.Network services ICMP ARP ODCC-2021-05008 Ethernet SSD 測試白皮書 7 SNTP LLDP 3.2.5.Peripherals SPI Flash interface for firmware 3&4 byte addressing 2 I2C/MDIO ports for managing Ethernet link module
31、s or connecting to BMC 1 general purpose I2C port 8 GPIO pins for configurability and feature set control 3.2.6.Enterprise-class data protection All data paths have overlapping parity protection All memories are ECC protected All errors will be logged to an internal log buffer Internal log buffer is
32、 periodically written to external Flash An immediate write to flash is performed in the event of a non-recoverable error Dual images are stored in external flash Images protected by CRC 4.Ethernet Switch Introduction Broadcom is the leading supplier of advanced Ethernet Switch products serving Data
33、Center,Service Provider and Enterprise markets.Broadcom switches address a diverse set of requirements in the target markets,such as delivering highest bandwidth,highest non-blocking performance,switch programmability,advanced policy based traffic forwarding,advanced congestion management for lossle
34、ss storage traffic,low latency modes,highest scale of forwarding tables and network virtualization,advanced packet buffer to handle traffic bursts,and advanced telemetry to provide visibility and AI based analytics.Tomahawk,Trident and Jericho family of switches from Broadcom are highly used by ODCC
35、-2021-05008 Ethernet SSD 測試白皮書 8 customers,with strong presence in both OEM and White-box switches,and are supported on a wide variety of configurations,with many different network operating systems(NOS)including the fast growing open source SONiC NOS.Tomahawk3 switch used in the Samsung E-SSD test
36、environment is a highly successful switch not only used in data centers in typical ToR and Spine configurations but also to handle emerging workloads such as Deep Learning and NVMe-oF Storage Disaggregation.Tomahawk3 supports many concurrent port speeds such as 10/25/40/50/100/200/400GbE using 56G-P
37、AM4 Serdes also capable of running in NRZ mode for 10/25GbE connectivity.The shared-buffer architecture of Tomahawk3 offers 4X higher burst absorption and improved performance for ROCEv2 workloads as used in NVMe-oF traffic.Tomahawk3 introduces Broadview Gen3,which includes the latest cloud-scale in
38、strumentation and telemetry features,such as in-band telemetry,latency distribution histograms and more.5.Application Scenarios 5.1.Object storage based on big data EBOF consisting of Ethernet SSD can be applied to various JBOF applicable high-performance storage scenarios with high interface bandwi
39、dth to provide more performance resources for distributed computing,virtualization,and other scenarios.For example,when applied to large-scale reliable unified object storage,it can be used to replace the traditional storage nodes with Ethernet SSD/EBOF running OSD daemon,and additionally,Ethernet a
40、ccelerator is adopted for RAID,deduplication,replication and compression to increase the useful value,as shown in the following figure.ODCC-2021-05008 Ethernet SSD 測試白皮書 9 Figure 7.Ethernet SSD application scenario for CEPH OSD object storage In the above diagram,if EBOF uses NVMe disks,the overall
41、performance will be closely dependent on the server configuration of JBOF and this may cause a bottleneck in the traffic on the servers uplink NIC port.If Ethernet SSD is used,EBOF saves the CPU,DRAM and other server configurations required for high performance,and the uplink is directly provided by
42、 the Ethernet switch with a large-bandwidth interface.Therefore,users get high performance while saving CPU configurations that affect overall performance,and white-box switch vendors can develop switch features that are closer to the storage application.5.2.CDN edge caching Figure 8.Ethernet SSD ap
43、plication scenario for CDN edge caching ODCC-2021-05008 Ethernet SSD 測試白皮書 10 The full name of CDN is Content Delivery Network.The basic idea is to avoid bottlenecks and links on the Internet that may affect the speed and stability of data transmission as much as possible,so that content transmissio
44、n is faster and more stable.The basic working principle is that by setting up a layer of intelligent virtual network on top of the existing Internet,placing node servers in various parts of the network,and then redirecting user requests to the nearest service node in real time according to the netwo
45、rk traffic and the comprehensive information of each nodes connection and load status,as well as the distance and response time from the user.This not only allows users to obtain the required content close to them and solve the Internet network congestion,but also improves the response time for user
46、s to access websites.Obviously,CDN nodes solve the problem of cross-territory access as well as cross-operator,and also reduce the access latency.In addition,most requests are completed at the CDN edge nodes,so the CDNs also play a role in diverting traffic and reducing the load on the source site.I
47、f the CDN edge nodes are only used for data storage and data caching,Ethernet SSDs can be used instead of traditional edge servers in this scenario,because Ethernet SSDs can be more sensitive to network traffic,and the additional caching decision logic designed in Ethernet SSDs can also handle the C
48、DN edge caching mechanism well.And this will not only save the cost of edge infrastructure,power,but also save the space of edge infrastructure.6.Test Environment and Configuration 6.1.System environment setup a)The hardware configuration is as follows:Two x86 servers 1*Ethernet SSD 1 Mellanox Conne
49、ctx-5-100GB high-performance network card 1*Boardcom switch b)The host configuration is as follows:Host OS:Ubuntu 18.04 Host linux version:5.2.14-050214-generic ODCC-2021-05008 Ethernet SSD 測試白皮書 11 c)System configuratioin steps are as follows:After the host installs the MLNX_OFED_LINUX-5.0-1.0.0.0
50、version of the driver,download the driver URL:https:/ Execute the following commands to install and update the driver:$sudo./mlnxofedinstall-add-kernel-support -with-nvmf$sudo/etc/init.d/openibd restart Configure IP for Mellanox Connectx-5-100GB network card,and execute the following command:$sudo i
51、fconfig enp4s0f0 x.x.x.x/x(IP/netmask)up Load the nvme_rdma driver and disable the register_always mode:$sudo modprobe nvme_rdma register_always=0 Use the host for discovery and connection,and execute the following commands:$sudo nvme discover-t rdma-a x.x.x.x(IP)-s 4420$sudo nvme connect-t rdma-a x
52、.x.x.x(IP)-s 4420-n testsubsystem Query whether the SSD mapping is successful$sudo nvme list Close connection:$sudo nvme disconnect-n testsubsystems ODCC-2021-05008 Ethernet SSD 測試白皮書 12 6.2.Ethernet switch configuration White box switch based on TH3 platform,some configuration items need to be modi
53、fied according to the default configuration of NIC and Ethernet SSD:a)FEC configuration FEC mode of broadcom switch defaults to none,you need to configure FEC mode to RS-FEC mode(add comment:std 802.3-2012 Clause 91-Reed Solomon FEC for 25G)Ethernet SSD defaults to FC-FEC mode.It is required to modi
54、fy the FEC mode of Ethernet SSD from FC-FEC to RS-FEC.b)Communication rate configuration Broadcom switch ports connected to Ethernet SSD is configured to operate at 25 Gbps.Broadcom switch ports connected to the NIC device are configured to operate at 100 Gbps with NICs auto-negotiation turned on.6.
55、3.Tuning on host server a)Disable receivable adaptive coalescing on network interface Disabling the receivable adaptive coalescing on the network inteface used by the RNIC for RDMA operations can be observed to improve the performance for bindwidth and IOPS.The following is an example for Mellanox R
56、NIC(assuming that the RNIC network interface name is“enp1s0f1”).$sudo ethool-C enp1s0f1 adaptiverx off The command to check whether the above operation takes effect is as follows:$sudo ethtool-c enp1s0f1 ODCC-2021-05008 Ethernet SSD 測試白皮書 13 After testing,it is recommended to enable the receivable a
57、daptive coalescing on the network interface$sudo ethtool-C enp1s0f1 adptiverx on b)Map RDMA interruptions to single system core Bindwidth and IOPS improvement has been observed by disabling the default kernel IRQ balancer and mapping all(RDMA)interruptions of the RNIC to a single CPU/core.The FIO wo
58、rkload should be mapped to that CPU/core as well.The following command disables the default kernel IRQ balancer:$sudo service irqbalance stop The following is a script to map all interruptions associated with the Mellanox RNIC to CPU#1.Execute this script before starting the bindwith and IOPS test:$
59、cat mlnx_irq_balance_latency.py#!/bin/bash num_cpus=$(grep c processor/proc/cpuinfo)Echo“num_cpus=$num_cpus”j=1 for i in/sys/bus/pci/drivers/mlx*_core/*/msi_irqs/*do irq=$i#*/echo“setting IRQ$irq affinity to core$j”echo$j /proc/irq/$irq/smp_affinity_list done c)Switch CPU cores work mode from“powers
60、ave”to“performance”#!/bin/bash num_cpus=$(grep-c processor/proc/cpuinfo)echo cpu_nums=$num_cpus ODCC-2021-05008 Ethernet SSD 測試白皮書 14 for(i=0;i/sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor done 6.4.Ethernet SSD Pre-condition and test item 6.4.1.Ethernet SSD Pre-condition a)Before testing th
61、e sequential read/write performance of the Ethernet SSD,the Ethernet SSD will be sequentially read/written twice with a block size of 128KB,with 1 thread and 8 IO queue depth:$fio-ioengine=libaio-direct=1-thread-norandommap-filename=/dev/nvme0n1-name=init_seq-output=init_seq.log-rw=write-bs=128k-num
62、jobs=1-iodepth=8-loops=2 b)Before testing the random read/write performance of the Ethernet SSD,the Ethernet SSD will be used with a 4KB block size,and the Ethernet SSD will be randomly read/written to a full disk twice with 8 threads and 256 IO queue depth,and the command will be executed as follow
63、s:$fio-ioengine=libaio-direct=1-thread-norandommap-filename=/dev/nvme0n1-name=init_rand-output=init_rand.log-rw=randwrite-bs=4k-numjobs=8-iodepth=256-ramp_time=60-runtime=14400-time_based-group_reporting 6.4.2.Test item a)128K Sequential write performance test b)128K Sequential read performance test
64、 c)4K Random write IOPS test ODCC-2021-05008 Ethernet SSD 測試白皮書 15 d)4K Random read IOPS test 7.Performance Test Result 7.1.Throughput test result This section shows the performance test results of Ethernet SSD throughput for 128K sequential write and read.We tested 128K sequential read and 128K seq
65、uential write in Ethernet SSD by FIO,and the results are as follows:Table 1.128K Sequential read and write test results Throughput Ethernet SSD Remarks 128K Sequential Write(MB/s)2333 1Job/64 QD 128K Sequential Read(MB/s)2717 1Job/64 QD 7.2.IOPS test result This section shows the performance test re
66、sults of Ethernet SSD IOPS for 4K random read and write.We tested 4K random read and write in Ethernet SSD by FIO.The test results are as follows:Table 2.4K Random read and write test results Throughput Ethernet SSD Remarks 4K Random write(IOPS)139k 1Job/64 QD 4K Random read(IOPS)669k 1Job/64 QD 8.C
67、onclusion Large data centers,where storage and computing separation architecture becomes mainstream,have brought an increasing number of application scenarios for high-performance,low-overhead storage connectivity like NVMe-oF.SSDs with natively integrated NVMe-oF Ethernet enable storage systems wit
68、h better connectivity,scalability,performance and cost advantages,which is preferable for component network to switch architectures and which saves additional capacity and node equipment costs as well.ODCC-2021-05008 Ethernet SSD 測試白皮書 16 With the development of NVMe-oF Ethernet SSDs,there will be an increasing number of applications emerging in the future.ODCC-2021-05008 Ethernet SSD 測試白皮書 17 開放數據中心委員會(秘書處)地址:北京市海淀區花園北路 52 號 電話:010-62300095 郵箱:ODCC