高性能網絡加速智能推薦系統.pdf

編號:29555 PDF 29頁 4MB 下載積分:VIP專享
下載報告請您先登錄!

高性能網絡加速智能推薦系統.pdf

1、NVIDIAHIGH PERFORMANCE E2EETHERNET SOLUTIONACCELERATERECOMMENDERSYSTEMGTC China,Oct 2020#page#Recommendation PipelinesExampleExperimentationDATALAKETrain dataFeature engineringData Pre-processingTBstPBsModel(s)trainingTrain dataGBstOTBProduction InferenceProduction Re-training0(10)Feature engineerin

2、gRecommender5ystemImprowedaccuracy?DataPreprocessingCanddate generationModel(s)trainingweekly/0oil2電座D#page#Recommendation PipelinesChallengesData (ETL)TrainingInferenceFeatureThroughput&HugeembeddingPerformance &Data loadingtablesexplorationAccuracyLatencyHuge data sets:Data loading canLarge embedd

3、ingHard to achievDifficult to havebe50%oftotaltablesexceedTBs,PBsormorehigh scalinghigh throughputefficiencywithtraining time.single GPUand low latencyComplex databoth model andmemorywhen ranking preprocessing andTabular datadata parallelism.huge number ofloading scalesSub-optimalfeatureitems.Longer

4、 iterationengineeringpoorlywitharlookupsopscycles reducethepipelines.item-by-itemimplementation.abilitytoreachapproach.Many iterationshigheraccuraciesrequired.quickly#page#Nvidia Ethernet Switch addressthe challengesSpeed, Feed and Latency-Fast interconnectFast access datasetRDMA and RoCELow latency

5、 access GPU memoryLoW latency access external datasetMonitoring and Management#page#SPEED AND FEED-THE NEED OF BANDWIDTHIntra-layer model parallelData parallelIntra-layer model parallel leaves collectives exposedCommunication speedup mustAccelerating math without accelerationmatch math speedup,other

6、wisecommunication suffers from basic Amadahls lawproblemwe achieve little E2E speedupTypically collectives span NVLink domain onlyAlreduce spans both NVLink and networking domains:bandwidth must be available in each#page#NVIDIAS MULTI-GPU,MULTI-NODE NETWORKING AND STORAGE IOOPTIMIZATION STACKBuild l

7、arger 8 lower latency resource poolMagnum IOUCXNCCLOpenMPINVLINK FabricGPUDirect P2PGPUDirect RDMAGPUDirect StorageInterconnectTopologyStorageTransport日-日INFINIBANDX1同三園多BO出色售團NVLINKRoCEGPU DirectXBARNVLINKNVLINK SwitchOver RoCE orIBOn ChipGPUSGPUSNodes#page#NCCL Ring on multi nodeIBdomain(across no

8、des)NVLINK domain(withinnode)Network domain(acrossnodes)#page#NETWORK DESIGN FORAI CLUSTER業務分離設計,保證應用間低干擾和可擴展性計算網絡-視服務器設計,可為單平面或多平面存儲網絡根據數據集和存儲軟件的不同,采用不同的技術,可為單平面或多平面”管理網絡如有較大規模StorageComputeManagement中中中1GPU#page#ETHERNET Al SWITCHESPurpose-built for Rack Level 8 Multi-Rack Deep Learning Solutions

9、NVIDIA Certified PerformanceBestinClass latency&throughputROCE Acceleration for GPU-Direct 8 StorageSN370032200GbEAdaptive Routing DeepOps IntegrationAutomated network configurationAutomaticadvanced network monitoringSN460064200GbEAuto-verification of health ofthe DGX POD deploymentNative 200Gigabit

10、 Ethernet-without splitters市計32or64 port switchesDGX PODJust like DGXA100nVIDIA#page#ENABLING WORLD-CLASS Al SOLUTIONSFastFastFastE發InterconnectComputeStoragen NetApp中PURESTORAGEzxceleroNUTANIXNVIDIAnVIDIA.WEkA.ioGnVIDIIA#page#BROAD ETHERNET SWITCH PORTFOLIOAl SwitchesEdge/ESF SwitchesLeaf SwitchesS

11、pine SwitchesSN2100:16x100G+16x100GSN2410:48x25G+8x100GSN2700:32100GSN3700-V:32x200GSN2010:18x25G+4100GSN3700-C:32x100GSN4600-V:64x200GSN3420:48x25G+12x100GSN3510:48x50G+6x400GSN4700:32400GSN4410:48100G+8x400GBest-in-class-ASICs店SpecrmSN4600-C:64x100GPower consumption歡Throughput flatencySpectr2Fair

12、traffic distribution歡Securityvirtualization scaleSpectm3SN4800:128x100G#page#SPECTRUM SWITCH ADVANTAGESCongestion ManagementAvoidable Packet LoSSFairness & QoSSpectrumSpectrumMicroburst Absorption CapabilityMB128256wm0.91.00.3Competition64B51281.5B9KBPacket Size口Spectrum口CompetitionCompetition電話8mLo

13、west Latency50%樓SpectmTolly.#page#LARGER FABRICS WITH SPECTRUMPgONN6PO99-Upto65,000 Nonblocking100GbE portsf/spine netwoin3TUiT#page#RDMA AND ROCE#page#ROCE ACCELERATED Al SOLUTIONSRDMA Supercharges Leading AI FrameworksMicrosoft重靠S66mCaffe21PaddlePaddleFTensorFlowUptoUpto60%2.5X95%50%ScalingHigher

14、ROIBetterSavings on CapitalPerformanceEfficiency& Operation CostTencent騰訊Baidu百度ORACLE阿甲巴巴集團CLOUDll ByteDanceSONYRJDNVIDIA.#page#GPU DIRECT RDMA ROCETECHNOLOGYGPU-Direct with RoCE/RDMAMoving RoCE/RDMAtypetraffic between networkandGPU memory directly,bypassing CPUandCPU-MemoryWithout GPU DirectWith G

15、PU DirectGPU-Direct with Ethernet UDP/IPMoving EthernetUDP/MCtypetraffic between networkandGPU memory directly,bypassing CPU and CPU-MemoryD中GPUDirectMPowered byRoCEDeliversNetwork10XBetter PerformancenVIDIA#page#RoCE ACCELERATES SCALE-OUT STORAGEWindowsServer2016S2D-WriteloPsSATSSDs(100s)WindowsSer

16、ver2016S2D-ReadIop寶Storage is Getting a Lot Fasterl#page#ROCE IN A NUTSHELLWhat is RoCE(RDMA over Converged Ethernet)zWikipedial RoCE is a network protocol thatallows RDMA over an Ethernet network.It does thisby encapsulating an InfiniBand transport packet overEthernet.There are two RoCE versions,RO

17、CEv1 andhence allows communication between any two hostsin the same Ethernet broadcast domain.RoCE v2isaninternet layer protocol which means that RoCE v2packets canberoutedRoCE is a standard for RDMA over Ethernet defined bythe IBTA(InfiniBand Trade Association) How to RoCE?Quality of Service (QoS)m

18、 Congestion Management (ECN and DCOCN Algorithm)= Flow Control (L2 PCP/ L3 DSCP)Advanced Algorithms(part of ZTR development)#page#WHAT MAKES A GREAT ROCE SWITCHSimple Configuration1Command CLI config01 click GUI configAROCE1High PerformanceHigh PPS 8 Low latencyoFair 8 Predictable performanceAdvance

19、d Congestion ControlEarly detection and preventionRoCE over VXLANExtensive VisibilitySingle pane-of-glassReal time RoCE Telemetry#page#NVIDIA以太交換全線產品支持ROCE一鍵RDMA部署支持Lossless、Semi-Lossless、Lossy多種RDMA部署模式支持RDMA和非RDMA混合部署m CLI“RoCE vs 26+commands in other NOS支持RoCEoverVxLAN”支持RDMA的最佳硬件設計支持FastECN”低轉發時

20、延和優秀的共享緩存設計NEO網管軟件端到端管理LosyPont tust ode L3VVLDPEthL2CRCIPHaaderpeoAedFCSHeadeorswpro-CmappingEtherTypoUDP dpo4759DoED1-Eod-Msindcale8UOPprio 6-TC6ICNPsIPl.e1B.BTHnextPhL2HaadRort ETSrkingfolECNTC6(CNP)-stiatIPHeadermlude DSCPTCOother traffie-WWR509Port EON1O05-OSPOLOSSY中LOSSLESS3(RoCE)VEnable PFCio

21、3(RoE21nviDt#page#MONITOR8MANAGEMENT#page#WJH-輕量、可部署、事件驅動的TELEMETRY1.SDK generates:The Important QuestionsGrafangkibanaWJH messagesWHOis beingimpactedWHENit happenedNetQNEO2.Agent collectsWHATis causing the problem區apstra.data:WHEREis the problemStreams to DatabaseWHYitis happening3.PresentationRoot

22、 Cause+howto fix itLayer:Shows What Just HappenedNetwork OSSDK/SAIPackets 12 Tuple+Matabauvery detaileddescription#page#WJHTMAccelerates the Time to Root-CauseSNMP SYSLOGsFou卡MMellanoxHAPPENEDsFonnVID#page#WJH TAWhat Does it Monitor?Packet DropNo Packet DropL1CongestionBad CRCIncastFlaky cableBusy s

23、torage device2/L3Latency0=Pause framesVLANCongestion3latencyBufferRoute ValidationIncastPacket doesnt reach the firewallRate LimitPacket gothrough a suboptimal pathACLSLoad Balance ValidationDeny based on IPSuboptimal ECMPDeny based on VLANSuboptimal LAG#page#HISTOGRAMS強大的BUFFER統計工具QueueHistogram-Eg

24、ressQueueUtilizationUtilizationSampling 64ns granularity90%m 10 sample buckets90%Port HistogramsRX/TXbandwidthPFC duration40%Queue histograms30% Fill Level20%mLatency0-10% Micro-bursts detectionWatermarks (capture highest value)Thresholds Trigger events (Hysteresis)304times195time207time10%20%90%10

25、buckets per histogram#page#CUMULUS NETQModern toolset that accelerates the network transformationCI/CDFabric wideContainers 8AnalyticsStreamingTelemetryvisibilityMicroservices8DiagnosticsnVID#page#NVIDIA關注高性能互聯應用中的需求多對一和微突發通過優化的緩存設計,吸收突發、減少丟包應用常態,多對一帶來丟包進而導致時延增加、性能下降?!蔽⑼话l帶來難以觀察的丟包,從而導致性能問題距離帶來的需求-通過優化的緩存設計,直接適配長距離RTT,降低部署成本性能-時延、帶寬和PPS-低轉發時延、高帶寬、更好適配高性能系統“與網卡結合ROCE-應用加速容器應用-iptables加速形態和業務部署”半寬平臺,適配集成交付、邊緣計算、移動計算大表項,輕松適配容器應用要求#page#NVIDIA

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(高性能網絡加速智能推薦系統.pdf)為本站 (X-iao) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站