1、可預期數據中心網絡周維,席永青阿里云資深技術專家,阿里云高級技術專家Predictable Datacenter NetworkWei Zhou,Yongqing XiContents目錄01可預期高性能的數據中心02可預期網絡質量Predictable high-performance data centerPredictable network quality可預期高性能的數據中心build Datacenter Network with Predictable High performance01數據中心網絡data center network網絡服務:高效穩定的鏈接Network S
2、ervice計算服務:低時延高帶寬的IOComputing IO ServiceNetworkServicesComputing IO數據中心發展趨勢AI/BD workload growthAI算力需求持續增長,依賴網絡釋放更大算力,解除通信規模和時延的束縛Heterogeneous&high performance storageARM+RISC-V+GPU,accelerator新型算力模型和存儲介質open inter-conn to break the closed systemCXL等inbox開放互聯衍生出新的開放式multi-host互聯,異構+池化數據中心趨勢端網協同DC a
3、s a Computer,Network as IO帶寬持續升級、極限低時延技術inbox PCIe+outbox Eth-inbox、inrack、inbuilding分級、分池端網協同、軟硬一體trend of data center network development數據中心發展趨勢trend of data center developmentDC as a Computerscale upMoores Law增速放緩爆炸半徑大規模分布式Best-effort IO互聯分布式計算效率高帶寬 vs 低時延數據傳輸的能耗有效算力能耗高性能的同時,要求可預期的結果scale outene
4、rgy數據中心高性能網絡現狀靜態動態擁塞1us1us1us1us1usNusNus1us1us1us1ususmsNusNus0.5us/100m0.5us/100mqueuing&scheduling靜態時延、吞吐的要點0 copy,DATA DMA協議/CC-硬化單跳低時延,減少跳數大帶寬更短的路徑距離23445動態時延、吞吐的要點網絡語義和操作調度高效精確的流控算法更好更精細的擁塞反饋均衡的負載和優化的路徑1345協議/CCAPPmem copy協議/CCAPPmem copy123454網卡硬件卸載擁塞控制算法物理網絡架構端網協同互聯協議典型距離150m,靜態時延約20us擁塞調度St
5、atus of High-Performance Networks in Data Centers傳統方式AAPPDDDPPAAProtoAPPProtoAPP網絡端網協同AAPPDDDPPAAProtoAPPProtoAPP擁塞鏈路故障網絡路徑擁塞程度感知(道路擁堵)故障狀態感知(道路維修)路徑信息感知(導航選擇)端側網絡棧對物理fabric狀態不感知端端網端端網端+網+全視角無導航,盲開blind driving,No navigation導航反饋,城市大腦navigation system,City Brain阿里巴巴端網協同數據中心網絡Alibaba data center netwo
6、rk with terminal networks integration阿里巴巴高性能可預期數據中心實踐實時狀態感知IO模型的拓撲抽象擁塞優化HPCCRDMAAlibabas high-performance predictable data center network practiceReal-time status awarenessHPCC to optimize congestion controlTopological abstraction for a specific IO patternDC as computer,Network as IO低時延全棧優化和卸載端網協同流控
7、充分滿足 計算、存儲、大數據、AI等,分布式大規模系統的集群IO互聯性能SLA可用性SLA目標轉變為性能SLA目標穩定IO網絡事件的可視和感知,ms級異常切換基于端網協同的可預期高性能數據中心軟硬一體全自研數據中心系統自研協議Solar-RDMA自研流控HPCCHigh Availability,Intelligence,and Low latencyHAIL數據中心架構全棧自研全自研的網卡、交換機、光互聯自動化平臺NET可視化監控北斗、telemetry阿里巴巴高性能可預期數據中心體系Alibabas high-performance predictable data center netw
8、ork systemTORCPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU CPU xPU TOR1122m距離,1us100m距離,5n*10us2m距離,1usEthernetPCIe/CXLPCIe/CXLEthernet path 層端網協同低時延、可預期網絡服務可預期高性能數據中心的未來Expect the future of Predictable&high-performance data c
9、enters Networks全棧協同,軟硬一體可預期的性能和服務化Full stack collaboration and integration of software and hardwarePredictable performance and servicing可預期網絡質量Predictable Network Quality02曾經的網絡監控Network Monitoring in Old Days故障視角Focus on Network Outage事后響應Passive response缺乏網絡質量指標Lack of Meaningful Network Quality M
10、etricsNetworkQualityMetricsSYSLOGSNMPPINGGRPCCLI可預期網絡質量Predictable Network Quality網絡質量路徑丟包時延PathLatencyPacket LossNetwork Quality流級別精確路徑Flow level network path納秒級端到端,網絡節點間時延Nanosecond level end-to-end,node-to-node latency measurement丟包關鍵信息及原因Packet loss information and root cause關鍵技術Key Technologies
11、Precision Time Protocol精確路徑時延丟包In-band Network TelemetryIn-band Network TelemetryMirror on RetranSmartflowAggregate DropPathLatencyPacket LossHashlib關鍵技術-INTKey Technologies流級精確路徑Flow level network path網絡節點間時延Node to node latency 隊列級信息Queue infoASWPSWDSWPSWASWCollectorINT probeINT metadata原始報文路徑INT報
12、文路徑DSCP標識原始報文INTINT反彈報文路徑INTGRES-NCD-NC關鍵技術-PTPKey Technologies納秒級端到端延時Nanosecond level end to end latency網卡硬件時間戳,去除協議棧抖動Use Hardware timestamp to improve accuracy 關鍵技術-MORKey Technologies鏡像重傳數據包Mirror on Retransmission PacketS-NCASWPSWDSWPSWASWCollectorRetrans PktGRED-NCRetrans PktGRE精細網絡質量SLA虛擬網絡全覆蓋物理網絡全覆蓋未來規劃Future PlanFull Coverage of Physical NetworkFull Coverage of Virtual NetworkMore Granular Network SLAFrom Best effort to Deterministic QoS THANKS