1、Matias Bjrling,Distinguished EngineerTowards Large-scale Deployments with Zoned Namespace SSDs9/20-2023 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDStorage at ScaleHyperscalers and Cloud Service Provides(CSPs)Constantly challenged with large volumes of data and increasing c
2、ustomer demand for cost-effective storage and high performance*Performance:IOPS/TB,Throughput,Latency/QoSTCO Impacts:Capacity vs PerformanceLifetime/DWPD:1DWPDMaintain servers online after the typical 5 years service time*Reduce carbon emissions as well as overall fleet costServer repair costs first
3、 increases significant at year 10 and onwards.Servers are retired prematurelyMicrosoft server fleet lifetime increased from 5 to 7 years*.Meta rapidly increasing its fleet time as well.Now 5 years*.Metrics*TypicalPerformanceIOPS/TB,Throughput,Latency/QoSCost ImpactCapacity/PerformanceLifetime/DWPD5-
4、7 years(Req.1DWPD)*Lee Prewitt,Microsoft-How Facebook&Microsoft Leverage NVMe Cloud Storage.https:/ et.al,Myths and Misconceptions Around Reducing Carbon Embedded in Cloud Platforms,HotCarbon,2023*Rich Miller-Meta Will Run its Servers For Up to 5 Years,2023 https:/ 2023 WESTERN DIGITAL CORPORATION O
5、R ITS AFFILIATES ALL RIGHTS RESERVEDStorage at ScaleHyperscalers and Cloud Service Provides(CSPs)Conventional SSDs not able to serve storage at scaleTypical lifetime 3-5 years.7+years wanted.Either High Cost(TLC)and/or Low DWPD(QLC)Need SSDs that eliminates write amplification to fulfill DWPD,Lifeti
6、me and Performance requirementsSSDs with Zoned Namespace(ZNS)support solve these challengesMetric Conventional SSDSSDs with Zoned Namespace SupportTLCQLCTLC(Performance)QLC(Capacity)IOPS/TB+Throughput+(Read/Write)+(Read)+(Read/Write)+(Read)Latency/QoS+Lifetime+(Typ.1 DWPD)+(Typ.0.3-0.5 DWPD)+(Typ.3.
7、5 DWPD)+(Typ.1 DWPD)Cost(TB/$)+Conventional SSD(28%OP)ZNS SSD(0%OP)TLCOver-provisioning needed for drive FTL.Drive run at reduced capacity to improve performance,latency&endurance.0%OP for drive FTL(extra capacity for host).Drive can use full capacity with full performance,low latency&high DWPD.QLCO
8、ver-provisioning needed for drive FTL.Drive run at reduced capacity to improve performance,latency&endurance.Increased capacity per die.0%OP for drive FTL(extra capacity for host).Drive can use full capacity with full performance,low latency&high endurance.Increased capacity per die.15%Cost Saving($
9、/GB)-QLC30%Cost Saving($/GB)-ZNS 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDSSDs with Zoned Namespaces(ZNS)?Performance is Expensive“To achieve these levels of device-level write amplification(1.1x&1.4x),flash is typicallyoverprovisioned by 50%()but reducing flash overprov
10、isioning while maintaining the currentlevel of performance is an open challenge at Facebook.”Source:The CacheLib Caching Engine:Design and Experiences at Scale.USENIX OSDI 2020GeneralCacheLib(7.68TB workload)SSDSSD/w ZNSSSDSSD/w ZNSSSD Capacity7.68T8T15.36T8TNAND Usable$584$584$584$584NAND Over-Prov
11、isioning$39$0$661$0DRAM$40$40$80$40Controller$6$6$6$6Other$10$10$10$10Total Drive Cost$679$640$1341$640Performance Parity2x Cost!CachingUse-CaseSource:https:/ 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDSSDs with Zoned Namespaces(ZNS)?High DWPD and High PerformanceEliminate
12、s SSDs write amplification ZNS solves the mismatch between the storage block interface and the characteristic of NAND flashMajor impact on performance,lifetime,and behavior of any SSDSource:Bjrling et.al.,ZNS:Avoiding the Block Interface Tax for Flash-based SSDs,USENIX ATC 2021 3xThroughputLatencySo
13、urce:Bjrling et.al.,ZNS:Avoiding the Block Interface Tax for Flash-based SSDs,USENIX ATC 2021 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDSSDs/w ZNS AvailableZoned Storage Ecosystem(Linux)EvolvingSolving the Storage at Scale ChallengesHow did we get here?The NVMe ZNS Group
14、was formed at end 2018 to create the Zoned Namespace Command Set specification.The initial specification was ratified June 2020.ZNS support added to Linux software eco-system in June 2020,followed by SPDK support in April 2021.SSDs with Zoned Namespace support announced Q3 2020.Follow on work in the
15、 software eco-system to enable databases,file-systems and cloud use-cases.DenaliZNS StandardizationOpen ChannelGeneralLinux SupportUFSStandardization|2015|2016|2017|2018|2019|2020|2021|2022|2023ZNSRatifiedKubernetes(Longhorn,CSAL,Mayastor)SPDK 21.04+MySQL&RocksDBCephLinux is the registered trademark
16、 of Linus Torvalds in the U.S.and other countries.2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDWhat is a Zoned Namespace?OverviewAn NVMe namespace that adds the abstraction of zonesLogical blocks are divided into fixed-sized zones,which are then utilized for data placement b
17、y the host softwareDevices can simultaneously support both conventional and zoned namespacesMimics the ZAC/ZBC models for host-managed SMR HDDs to take advantage of its existing software ecosystemNVMe Namespace(NVM Command Set)LBA012X-1Zoned Namespace(NVM and the Zoned Namespace Command Set)LBA012X-
18、1ZoneZone 0Zone YZone 1The NVMe word mark is a trademark of NVM Express,Inc.Write pointerpositionWrite operationsadvance the write pointerA Zone Resetrewinds the write pointerZoneRaw SMR HDD and NAND Media Both Require Sequential Write Within Zones 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES
19、ALL RIGHTS RESERVEDZoned StorageHardwareLinux Eco-SystemDistributionsLibrariesLocalFile-Systemsf2fsbtrfsToolslibzbdlibnvmefioqemuCloudOrchestrationDatabasesCachingDistributedStorageCSALDevelopment since 2016Zoned API available since kernel version 4.10(Feb 2017)ZNS support added in kernel version 5.
20、9(Oct 2020)5+Linux Distributions with Zoned Storage SupportRHEL 9+,CentOS 7+,Fedora 33+,Debian 11+,and Ubuntu 21.04+Local File-systemsf2fs(client-UFS)and btrfs(enterprise-ZNS/SMR)Storage SystemsCeph,OpenEBS,Mayastor,SPDKs CSAL,Library/Tools supportlibzbd,libnvme,SPDK,fio,qemu,blkzone,blktests,End-to
21、-end Application EnablementsCloud Orchestration PlatformsDatabases,Databases,CachingMature,robust,and used in production by some of the biggest consumers of storage 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDDeploymentsTypical ApproachesStorage ArrayLocal StorageDistributi
22、onsZoned StorageDevicesLocal File-System btrfs/f2fsEnd-to-End(Highest Performance)Any Application(Great Performance)NVMe over Fabrics(Conventional and/or ZNS)Great Scalability and Capacity UtilizationStorage Array/w ZNS SSDs 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDStora
23、ge Array For Performance and CapacityStorage box,such as an All-Flash Array(AFA)Storage is accessed through a common network protocol such as NVMeoF,NFS,SMB,The storage box runs software that supports zoned storage and exposes it as conventional storageUse-CasesPerformance:Very high-performance stor
24、age system for AI/ML,streaming and databasesCapacity:Replace HDDs with QLC SSDs with a DWPD 1Example:Alibaba replaced HDDs with QLC SSDs in their 3rd generation big data local disk ECS instances to double the performance and density vs.2nd generation while holding the price to their customers consta
25、nt.10SPDKStorage ArrayNVMe over Fabrics(Conventional and/or ZNS)Great Scalability and Capacity UtilizationStorage Array/w ZNS SSDs 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDTurnkey Storage Array CSAL together with Reference Storage Platform(RSPs)What is CSAL?Open-source c
26、loud-scale shared-nothing Flash Translation Layer(FTL bdev)in Storage Performance Development Kit(SPDK)Ultra fast cache and write shaping tier to improve performance and endurance to scale QLC valueFlexible scaling of NAND performance and capacity to the user/workload needsUsed and deployed by Aliba
27、ba to adopt QLC SSDs into their data centers*Reference Storage Platforms partners.Turnkey solution that quick and easy deploymentWD collaboration with Solidigm and the CSAL community for broadening its adoptionZNS support being upstreamed.Available upon request11SPDK*https:/ bdevPersistent Write Buf
28、ferFTLCoreL2PTableNVMeoF(TCP,RDMA,)High Capacity/Performance Storage Application Reads/WriteRSP PartnersStarWindIntelSolidigmSPDKWestern Digital 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDConventional Storage for Any ApplicationsLocal StorageConventional storage is often r
29、equired by the end-userIncremental roll out of software that specifically takes advantage of zoned storage,or;Applications that are not performance sensitiveConventional storage access through local file systems with zoned storage supportNo changes necessary to applications as zoned storage support
30、is baked into local Linux file systemsf2fs Linux kernel 5.10+btrfs Linux kernel 5.12+Outperforms conventional SSDsNo OP 7%/28%additional storageBetter performance than conventional Works natively with hint-based placement(e.g.,streams)No software modifications required12020406080100Read1000 Ops/sF2F
31、S(Conv.)F2FS(ZNS)ReadReadWrite0246810P99P99.99Latency(ms)05101520P99P99.99010203040P99P99.99Random key lookupsConcurrent writes(20MB/s)/w lookupsConcurrent writes(no limit)/w lookups4x+5-10%Random key lookupsConcurrent writes(20MB/s)/w lookupsConcurrent writes(no limit)/w lookupsKey-Value Store on t
32、op of local file-system 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDEnd-to-End Application IntegrationsLocal StorageHighest performance and optimized per use-caseI/O intensive applicationsLarge-scale storage systems that uses local storageApplications are tightly integrated
33、 with the storage stackAware of the underlying storage type and performs intelligent data placementTypical candidates are distributed file-systens,databases,caching Integrations:Databases:Percona MySQL,RocksDB,TerarkDB,CacheLibFile-SystemsCeph,f2fs,btrfsCloud IntegrationsLonghorn,Mayastor/OpenEBS,CS
34、AL13DistributionsZoned StorageDevicesEnd-to-End(Highest Performance and Benefits)2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDPercona MySQLImproving OLTP workloadsZoned storage support integrated into RocksDB,which Percona MySQL can use as its storage backendDeveloped the Ze
35、nFS storage backend,and added support upstreamCollaboration with PerconaSupport upstream in the general cloud contained used for cloud deploymentsWhen using ZenFS:Up to 80%Higher throughput Lower tail latency by up to 10 xLower is betterHigher is betterBenchmark:Write-heavy OLTP workloads using sysb
36、enchXFS:1TB SN540(Conventional Namespace)ZenFS:1TB ZN540(Zoned Namespace)ZenFS(GC):ZenFS with Garbage collection enabled 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDCeph CrimsonUsing Zoned Storage Very popular distributed file-system used in the industry and HPC space CERN,
37、Intel,Google,Blizzard,At least 1.1EiB deployed(opt-in telemetry)Next release(Crimson)will have native support for zoned storage Both applicable to SMR HDDs as well as ZNS SSD.30%higher throughput for both reads and writes once garbage collection kicks in0100200300400500Throughput(MB/s)Time0200400600
38、80010001200Throughput(MB/s)TimeWrite ThroughputRead ThroughputRados block device,fio 80/20%RW WorkloadHigher is better 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDCloud IntegrationsTransparent use of Zoned Storage in Cloud Applications Kubernetes is the main cloud orchestra
39、tion platform for deploying applications We made zoned storage natively integrated without any end-user modifications Exposes conventional storage to containers/VMs Integrated into Longhorn,OpenEBS,and CSAL16POSIX Filesystem onConv Block DeviceBacked by Zoned StorageZoned Block DeviceExample workloa
40、d 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDEco-SystemGrowing Number of Vendors and Use-CasesZoned Namespace Command Set support has been announced or added support into products across a broad set of vendors Solid support in the Linux software eco-system.Built on the exi
41、sting foundation of SMR HDDs.Enabling rapid support and development.Major achievements include local file-systems and relational and key-value database systemsWhile broad industry support has been achieved,successful large-scale deployment of ZNS SSDs also require1.Multi-sourcing through standardize
42、d device models and reference platforms2.Large-Scale Deployments through cloud orchestration platforms as well as distributed file-systemsDistributionsZoned StorageVendorsLibrariesLocalFile-Systemsf2fsbtrfsToolslibzbdlibnvmefioqemuCloudOrchestrationDatabasesCachingDistributedStorageCSALInnoGrit,Kiox
43、ia,Teledyne LeCroy,Marvell,MicroChip,Micron,Silicon Motion,Samsung,Radian,SK Hynix,Western Digital 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDStandardized Device ModelsSNIA Zoned Storage Technical WorkgroupWhile there is a thriving eco-system for zoned storage,it depends o
44、n SSDs,in addition to conventional attributes,to implement the Zoned Namespace Command Set specification similarlySSD vendors initially developed ZNS SSDs with somewhat different attributes,leading to confusion at adopters on what software changes were needed to support zoned storage To unify indust
45、ry offerings,improve multi-sourcing,and grow software interoperability,a set of SNIA organization members formed the Zoned Storage Technical Workgroup,which recently released the Zoned Storage Models v1.0 specification Common requirements for zoned storage devices,aiding common software developmentT
46、wo common device models,that each inherit the common requirementsSource:SNIA Zoned Storage Models Version 1.0,July 2023,https:/www.snia.org/standards/technology-standards-software/standards-portfolio/zoned-storage-modelsCommon Requirements of a Zoned Storage DeviceFixed Zone Capacity,No Zone Excursi
47、ons,Reliability Expectations,Model AHigh-Performance Use-CasesStreaming,Databases,and AFAsModel BHigh-Capacity Use-CasesCapacity Optimized,Tiered SystemsEnhancedMulti-SourcingCommon SoftwareRequirements 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDZoned Storage Device Models
48、SNIA Zoned Storage Technical WorkgroupStandardized document that describes the common requirements that host software can expect a zoned storage device to supportDefined how a zoned device is expected to behave towards the host.This included statingA device always manages reliability(e.g.,wear-level
49、ing)A zones writeable capacity is constant and not variableNo Active Zone Excursions.I.e.,zones are guaranteed to always have its writeable capacity availableHow a device behaves end-of-life.Read-only mode,etc.Devices that these definition into account are then able to reap the benefits of the exist
50、ing zoned storage software eco-system and be sure that there is a common understanding of the devices behavior.Source:SNIA Zoned Storage Models Version 1.0,July 2023,https:/www.snia.org/standards/technology-standards-software/standards-portfolio/zoned-storage-modelsCommon Requirements of a Zoned Sto
51、rage DeviceFixed Zone Capacity,No Zone Excursions,Reliability Expectations,Model AHigh-Performance Use-CasesStreaming,Databases,and AFAsModel BHigh-Capacity Use-CasesCapacity Optimized,Tiered Systems 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDZoned Storage for Embedded Dev
52、icesGoogle Pushing Zoned Storage to Mobile Towards large-scale deployments in mobileDriven by Google and targeted the Android hardware eco-systemBart Van Assche from Google discussed their roadmap for adopting Zoned UFS into Android at Flash Memory Summit 23JEDECs Zoned UFS specification completed i
53、n JulyAndroid vendors targeting productization in their next-generation mobile productsWith Zoned UFS in place,the zoned storage interface is now ubiquitous across all major storage devices(HDDs,SSDs,Embedded/Mobile)All utilized the same software eco-systemIt is expected that adoption of Zoned UFS i
54、s quick as file-system support(f2fs)is completed and Android already switched to f2fs for many mobile platforms(e.g.,Pixel).20Source:Bart Van Assche,Google,“Zoned Storage for UFS”,Flash Memory Summit 23 2023 WESTERN DIGITAL CORPORATION OR ITS AFFILIATES ALL RIGHTS RESERVEDSummarySSDs with Zoned Name
55、space support enables hyperscalers and CSPs to Meet their increasing customer demand for cost-effective storage as well as high performanceExtend the lifetime of their server fleet beyond their typical five-year service timeMature eco-system that enables a broad range of use-casesStorage Arrays Turn
56、key Solution ZNS+QLCNo changes necessary to conventional applications to take advantage of SSDs with Zoned Namespace supportHigh-performant end-to-end integrations benefiting distributed storage systems,databases and caching applicationsGood vendor eco-systemStandardized SNIA zoned storage device mo
57、delsBroad set of vendorsPath to large-scale deployment into embedded/mobileDistributionsZoned StorageVendorsLibrariesLocalFile-Systemsf2fsbtrfsToolslibzbdlibnvmefioqemuCloudOrchestrationDatabasesCachingDistributedStorageCSALInnoGrit,Kioxia,Teledyne LeCroy,Marvell,MicroChip,Micron,Silicon Motion,Samsung,Radian,SK Hynix,Western Digital