Data Center Life Cycle Manageability at Scale.pdf

編號:161444 PDF 17頁 862.28KB 下載積分:VIP專享
下載報告請您先登錄!

Data Center Life Cycle Manageability at Scale.pdf

1、Life cycle management for hyperscale introduces unique challenges for manageability.This presentation showcases a blueprint for a data model which can be aggregated across various levels and stages of hyperscale evolution.Data Center Life Cycle Management ScaleNirav Shah,Cloud Software Architect,Int

2、elJim Harford,System Architect,BroadcomScott Ramsey,Technologist,Dell Technologies Data Center Life Cycle Management ScaleSUSTAINABLE SCALABLE COMPUTATIONAL INFRASTRUCTUREHW MGMTPurposeDefine comprehensive at scale remote service model(inclusive of different sizes)Standardize interfaces for at scale

3、 remote service model&enable added servicesDeliver services across the boundaries of ownershipAbility to integrate vendor tools using common frameworkInfluence spans across multiple OCP disciplinesCSM WG OverviewLife Cycle Opportunities RecapPlan/Design1Procure/Deploy2Operate3Decommission4(Sustainab

4、ility)(Modular Data Center)(Ready Facility Recognition)Networking,Security Security,Storage,Server,Power,F/W Hardware ManagementFault Management Data Center scale10s of thousands of geo located and/or distributed servers!Racks!Racks!Rack 1Rack N(geo-located)Rack Z(distributed)A universal data model

5、across components,systems and physical/virtual aggregates.Use case driven tops down approach to managing a hyperscale data center.Physical AggregateVirtual AggregateLife Cycle InterconnectPlan/Design1Procure/Deploy2Operate3Decommission4ConfigureValidateTarget Performance?YesNoUtilizeHealthy?AcquireY

6、esDecommissionNoDefine Life cycle cannot be a waterfall Need to(re)plan,(re)procure/deploy,decommission at any stageConfig issue?Bad Parts?NoYesYesUtilization Use cases Define power budget&optimal utilization/power ratio.Identify system/aggregate configurations meeting power and utilization budgetsI

7、dentify components vendors supporting configurations and budgets.Measure utilization for Debuggability,efficiency and/or meteringMeasure average performance o/power consumption for TCOMeasure conformance to specd performance targets.Detect thermal margining overloadDetect power margining overloads.C

8、ompare the target utilization to operationUtilization Plan/Design1Procure/Deploy2Operate3Decommission4Utilization snapshot of component/system/aggregate utilization.Power utilization snapshot of power utilized by component/system/aggregateThermal utilization snapshot of operating temperature of comp

9、onent/system/aggregateUptime time in seconds since the last reset/power cycleAverage utilization Utilization averaged over uptime(set to 0 on reset/power cycle)Average power utilization power utilization averaged over uptime.(set to 0 on reset/power cycle)Average thermal utilization thermal output a

10、veraged over uptime.(set to 0 on reset/power cycle)Average Uptime Uptime in seconds averaged over the#of resetsPower threshold Highest power utilization above which component/system/aggregate“may”be throttled/reset.Thermal threshold Highest temperate above which component/system/aggregate“may”be pow

11、er-cycled.Utilization threshold Lowest utilization at which component/system/aggregate“may”not-operate or sleep.Utilization Requirements ExampleHealth StatusHealth Status Use cases Define availability/redundancy targetsDefine fault tolerance targetsDefine fault recover/sparing targetsIdentify suppli

12、ers/configurations that meet availability,fault tolerance and recovery targetsMeasure correctable and uncorrectable faults against thresholdsTrack health status to prevent failures.Spare failed or about to fail hardwareIdentify patterns of failures,proactively address those failures.Backup and offli

13、ne faulty hardwarePostmortem analysisSustainably decommission fault hardware.Plan/Design1Procure/Deploy2Operate3Decommission4Fault/ErrorFailure Id 256 bit Unique ID allocated to any new failure with distinct symptomsType Correctable or uncorrectable but not fatal or fatalCount Number of error occurr

14、ences since component/system/aggregate installedFix Status Unknown/Fixed/Not available/Available/CorrectedTelemetry Signature Log of a unique set of events leading up to the error.Health Correctable error#Count of all corrected errors since component/system/aggregate installed.Fatal error#Count of a

15、ll fatal errors since component/system/aggregate installedUncorrectable non fatal error#Count of all uncorrectable non fatal errors since component/system/aggregate installed.Health Score A positive decreasing#representing the current health of a component/system/aggregateHealth Warning Threshold Th

16、reshold below which component/system/aggregate operates sub optimallyHealth Critical Threshold Threshold below which component/system/aggregate“may”break downHealth Requirements ExampleConfigurationNetwork Configuration Use Cases ExampleDefine required NIC port speeds and partitioning of PCIe PFs am

17、ong NIC portsDefine TX traffic shaping characteristics for multiple classes of service to be used on NICsUpgrade NIC firmware,driver(s),and/or SW tool(s)Configure network&link parameters needed for basic connectivity to the networkConfigure TX traffic shaping characteristics for multiple classes of

18、service to be used on NICOptimize performance via adjustments to BIOS settings,NIC resource allocation,flow steering,and pinning of interrupts to specific CPUsPlan/Design1Procure/Deploy2Operate3Decommission4Initiation&monitoring of upgrades to NIC firmware,driver(s),and SW tool(s)Configuration of NI

19、C hardware parametersConfiguration of servers BIOS settingsDownload of scripts to local storage used by OS running on serverExecution of configuration scripts by OS running on serverDiscover results from other(opaque to OCP)configuration mechanismsFor example:switch-NIC neighbor configuration via LL

20、DPConfiguration Requirement ExamplePerformancePerformance Use casesDetermine benchmarks for off-line performance testing of components/system/aggregates.o“micro-benchmarks”(perf,all-gather,fio,SPEChpc)osynthetic workloads that resemble the demands of production workloadsRun off-line performance benc

21、hmarks on properly configured elementsMeasure and record performance statistics while running production workloadsCompare performance results of off-line benchmarks versus production workloadsPlan/Design1Procure/Deploy2Operate3Decommission4Availability of performance“micro-benchmarks”Availability of

22、 synthetic workloads relevant to a given data center or cloud environmentAvailability of aggregate level benchmarks to simulate a data center environment.Installation,initiation,and monitoring of benchmark programs on server(s)Configuration of any HW parameters that affect performanceDownload of tes

23、t scripts to local storage used by OS running on serverExecution of test scripts by OS running on serverAccess to log files or other artifacts containing measured performance statisticsIntegrate the logs to a system level telemetry for a customizable view.Configurability at component/system/aggregat

24、e level for the scale of telemetry.Performance Requirement ExampleSummaryData centers growing at an exponential scaleData center already at a hyperscale introduces unique challenges.Not just the scale but the required sophistication presenting a need for a top-down model.Identify opportunities for a

25、dditional needed standardization Walk through scenarios to present unique use cases and requirements.Invite industry to help define the framework that meets these requirementsPublish a white paper with requirement for hyperscale life cycle manageability Engage with OCP Hardware Management Project“OCP-Cloud Service Model”workstream to participate in defining “Data Center Life Cycle Management Scale”requirements and interfaceshttps:/ocp-all.groups.io/g/Cloud-Service-Model/calendarhttps:/www.opencompute.org/projects/hardware-managementCall to ActionThank you!

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(Data Center Life Cycle Manageability at Scale.pdf)為本站 (張5G) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站