《1A-201_Simplify SmartNIC System Testing with Open APIs and Data-center-in-a-box.PPTX》由會員分享,可在線閱讀,更多相關《1A-201_Simplify SmartNIC System Testing with Open APIs and Data-center-in-a-box.PPTX(28頁珍藏版)》請在三個皮匠報告上搜索。
1、Razvan StanSr Engineering ManagerKeysight TechnologiesSimplify SmartNIC System Testingwith Open APIs and Data-center-in-a-boxSan Jose,CA April 26-28,2022SmartNIC testing at different stages2B2BDev/Test B2BAt some stage start running system tests with a small spine/leaf setupStart using some distribu
2、ted apps for realismHyperscale SimulationSystem TestFirst version of the NIC readyStart running B2B using traffic generators like iperf,eval bandwidthStart building CI/CD around this setupRun simulation at scale to prove out superior behavior to hyperscalersSan Jose,CA April 26-28,2022SmartNIC testi
3、ng at different stages3B2BDev/Test B2BHyperscale SimulationSystem TestBuilding physical setups is hard!Realistic and repeatable Data Center conditions even harder!Distributed apps for realism,where do I get them?San Jose,CA April 26-28,2022SmartNIC testing at different stages4B2BDev/Test B2BHypersca
4、le SimulationSystem TestHow can I trust the results are real?How do I prove out the fidelity of simulation models?If Im making design changes,how can I automate the fidelity calibration?San Jose,CA April 26-28,2022SmartNIC testing at different stages5B2BDev/Test B2BHyperscale SimulationSystem TestWh
5、y using different workload generators at different stages?Different tools=different scriptsVendor lock in?San Jose,CA April 26-28,2022If I had a wish list6 NONO to operating cluster of switches,learning NOSes,researching for data center distributed apps EasyEasy to deploy and use a data center envir
6、onment with realisticrealistic fabric conditions and data flows SameSame tool in all stages of testing OpenOpen standard with Model-based,declarative API,vendor-agnostic,community ecosystem SimplifiedSimplified calibration of simulation models for fidelityI would want to focus on innovating and not
7、on how to operate test equipmentSan Jose,CA April 26-28,2022Traditional Distributed System test environment7Lots of switches and cablesData FlowFabricServers,NICsWorkloads?Choices?Install actual apps(which ones?How many)Build inhouse tools and start maintaining themHunt for whats available out there
8、San Jose,CA April 26-28,2022Data-center-in-a-boxFabric and Data Flow8Data Center Fabric in a BoxFlexible topology at the click of a buttonData Flow GeneratorServers,NICsData FlowFabricSan Jose,CA April 26-28,2022Data-center-in-a-boxFabric and Data Flow9Data Center Fabric in a BoxFlexible topology at
9、 the click of a buttonData Flow GeneratorData FlowFabricIndependentSan Jose,CA April 26-28,2022Automated,RepeatableExperiments with Open API10Data Center Fabric in a BoxFlexible topology at the click of a buttonData Flow GeneratorModel/APIFabricAPIServers,NICsModel/APIData FlowModel/APIChaos,ImpairS
10、an Jose,CA April 26-28,2022Data Center Fabric in a BoxWorkload 1Workload 2Data Flow GeneratorData Flow GeneratorExperiment withECMP QoSECNPFCBuffersDC conditionsCongestionPacket lossGoodbye Re-cablingFlip easily through different fabric configs 11San Jose,CA April 26-28,2022Distributed Data Flows Ge
11、nerator12 Generates complex Distributed Data Flow PatternsGoalEmulate any kind of Data Center East-West trafficSan Jose,CA April 26-28,2022Distributed Data Flows Generator13 Generates complex Distributed Data Flow Patterns Mix different workloads on a timelineReality checkFlat line traffic not reali
12、sticActual DC traffic has bursts,peaks and valleysSan Jose,CA April 26-28,2022Distributed Data Flows Generator14Job Completion TimeFlow Completion TimeInsightsAre all flows treated fairly?Server or Network bottleneck?How much is CPU actually offloaded?Congested flows impact other flows on the Nic?Ge
13、nerates complex Distributed Data Flow Patterns Mix different workloads on a timeline Provides metrics for Job Completion Time and individual Flow Completion Time,Transport StatsSan Jose,CA April 26-28,2022Distributed Data Flows Generator15 Generates complex Distributed Data Flow Patterns Mix differe
14、nt workloads on a timeline Provides metrics for Job Completion Time and individual Flow Completion Time,Transport Stats Define data flow using Higher level constructs(ex:scatter/gather/all reduce)UsabilitySimple constructs create complex workloadsCollective CommunicationsSan Jose,CA April 26-28,2022
15、Distributed Data Flows Generator16 Generates complex Distributed Data Flow Patterns Mix different workloads on a timeline Provides metrics for Job Completion Time and individual Flow Completion Time,Transport Stats Define data flow using Higher level constructs(ex:scatter/gather/all reduce)Run over
16、different stacks,MPI libraries or proprietary solutions highly tuned for repeatability of resultsValidateValidate with real collective communication libraries used in multi-node systems for deep learning trainingSan Jose,CA April 26-28,2022Distributed Data Flows Generator17TCPRoCERoCE Generates comp
17、lex Distributed Data Flow Patterns Mix different workloads on a timeline Provides metrics for Job Completion Time and individual Flow Completion Time,Transport Stats Define data flow using Higher level constructs(ex:scatter/gather/all reduce)Run over different stacks,MPI libraries or proprietary sol
18、utions highly tuned for repeatability of results Configure Transport Layer(TCP vs RoCE for ex),Flow Steering,GRO,LROCalibrateExperiment with knobs for optimal throughputSan Jose,CA April 26-28,2022Open Network ExperimentsDeclarative models,lean APIsAPIsSingle source of truthModelsOpenAPI Models&API1
19、8San Jose,CA April 26-28,2022Enabling coexisting implementations SIMULATIONEMULATIONTEST PODSmodel dataRESTOpenAPI Models&APISDKOpen source,vendor-agnostic API,Rich tooling supportVendor-specific implementationsSame script works with any implementation19San Jose,CA April 26-28,2022Demo Setup20Tools
20、used in the demoData Flow softwareFabric NOSCompute nodesFabric EmulatorSan Jose,CA April 26-28,202221DEMO Time!San Jose,CA April 26-28,202222San Jose,CA April 26-28,202223San Jose,CA April 26-28,2022San Jose,CA April 26-28,2022San Jose,CA April 26-28,2022San Jose,CA April 26-28,2022Q&A27Data Center Fabric in a BoxFlexible topology at the click of a buttonData Flow GeneratorModel/APIFabricAPIServers,NICsModel/APIData FlowModel/APIChaos,ImpairRazvan StanSr Engineering ManagerKeysight TechnologiesThank you!