《啟用人工智能基礎設施.pdf》由會員分享,可在線閱讀,更多相關《啟用人工智能基礎設施.pdf(23頁珍藏版)》請在三個皮匠報告上搜索。
1、Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Ethernet for AI Scale Hasan SirajHead of Software and AI Infrastructure Products,BroadcomBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.A
2、ll Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Exponential Acceleration of Compute for AIOptimized forSerial TasksGPUOptimized forParallel TasksCPUMultiple CoresThousands of CoresGPU ClustersScale-up GPU Network for AI WorkloadsTens of Thousands of GPUsBroadcom
3、Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|COMPUTERIS THETHE NETWORK3Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its s
4、ubsidiaries.|OCP keynote by Alexis Bjorlinat 2022 OCP Global SummitRanking requires high injection&bisection bandwidthM#=ML model#TIME SPENT IN NETWORKING35%57%18%38%Network I/O is Keyfor RecommendationWorkloads.“4Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The
5、term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|What Makes AI Networks Unique?5ComputeVery high bandwidth RDMA traffic Bulk data transfersIntermittent data surgesStraggler data significantly impacts job completion timeTraining jobs run over long durations(hours,days)CommunicateSynchron
6、izeBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Broadcoms AI Network SolutionsJericho3-AI(Leaf)Jericho3-AIJericho3-AIJericho3-AIRamonRamonRamon(Spine)Tomahawk 5(Leaf)Tomahawk 5Tomahawk 5Tomahawk 5T
7、omahawk 5Tomahawk 5Tomahawk 5(Spine)Switch ScheduledEndpoint ScheduledGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU6GPUGPUGPUGPUGPUGPUBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Jericho3-
8、AI Fabric:Switch Scheduled Ethernet Network32,000 AI Accelerators at 800Gbps eachLowest time spent in networkingAI AcceleratorJericho3-AI10%Performance improvement=network morethan paysfor itselfAI AcceleratorAI AcceleratorJericho3-AI Fabric7Broadcom Proprietary and Confidential.Copyright 2024 Broad
9、com.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Tomahawk:Endpoint Scheduled Ethernet Network8Trident2Tomahawk51.2T640G1.28T3.2T6.4T12.8T25.6TTomahawk 5*G=Gbps,T=TbpsRELENTLESS,UNMATCHED ADVANCEMENT 80 x Bandwidth Increase90+%Energy Consumption Reduction20122
10、01420162018202020222010Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Introducing THOR2:AI Optimized NIC 400G high-performance NICHigh-scale RDMALongest reach 100G SerdesIndustrys lowest powerTHOR29B
11、roadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|BoardChipletIP10THOR2 Consumption ModelsBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Bro
12、adcom Inc.and/or its subsidiaries.|End-to-End High Performance Ethernet AI NetworkPerfectly load balanced fabric Programmable E2E Congestion ControlZero Impact Failover(ZIF)Secure&Multi-tenant fabricEfficient InterconnectsGPUTHOR2GPUTHOR211Ethernet AI NetworkLargest Cluster ScaleBroadcom Proprietary
13、 and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Ethernet Beats InfiniBand:10+%Improvement in Job Completion Time90.00100.00110.00120.00130.0016MB32MB64MB128MB256MB512MB1024MBEthernetInfiniBandMessage Size(MB)Bus Bandwidt
14、h(GBps)12Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Ethernet Provides 30 x Faster Failover than InfiniBand*Typical industry failure rate.*Assuming 4K node cluster using 9.2K optic modules 15Optic
15、s Annual Failover Rate*2%Failures per Month*13Fast recovery reduces job completion time53160002004006008001000120014001600EthernetInfiniBandRecovery Time(microseconds)30 xReductionBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom I
16、nc.and/or its subsidiaries.|Reducing AI Interconnect Cost and Power4mExtended Reach forCopper Cables4m+DAC(2x IEEE spec)Linear PluggableOptics33%Lower PowerCo-PackagedOpticsLowest Power&Cost14Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers t
17、o Broadcom Inc.and/or its subsidiaries.|Ethernet is the De-facto AI Network15Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Large Ethernet AI Clusters1660,000+30,000+10,000+20,000+Broadcom Proprietar
18、y and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|New AI Models 100X+Scale Distributed Computing1 MillionAI Accelerators17Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom
19、”refers to Broadcom Inc.and/or its subsidiaries.|Ultra Ethernet:AI at ScaleIncredibly Strong Industry Reception:55+companies18Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Modernizing RDMAClassic RD
20、MAIn-order packet deliveryGo-back-n inefficientNo multipathingDCQCN hard to tuneOut-of-order placement,in-order message completionSelective Ack and retransmitPacket-level multipathingConfiguration-free congestion control19Broadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reser
21、ved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Ethernet for AI NetworksBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|20Pervasively deployed,Open&standards-based technology Hig
22、hest RDMA performance for AI fabricsLowest cost compared to proprietary technologiesProvides deployment consistency across front-end,back-end,storage and management networks no technology islandsHighly available,reliable and easy to useBroad silicon,hardware,software,automation,monitoring&debugging
23、tools ecosystem staffing and operational skills widely availableBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Ethernet Network for AI21MTIAXPUMAIAMIGaudiGPUDojoTrainiumInferentiaEthernet AI NetworkB
24、roadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|Comprehensive Ethernet Portfolio and Ecosystem Network Control&AutomationOperation SystemHardware PlatformJerichoTridentTomahawk22ThorBroadcom Proprietary and Confidential.Copyright 2024 Broadcom.All Rights Reserved.The term“Broadcom”refers to Broadcom Inc.and/or its subsidiaries.|