1、OCP Global Summit October 18,2023|San Jose,CAPresented by Matthew Williams,CTO,Rockport Networks Now CerioThe Future of AI/ML Innovation Is Row-Scale DisaggregationAcceleration and memory are key for AI/ML innovation,growth and profitabilityDecades-old monolithic system model traps GPUs and other re
2、sources inside serversThe Problem:Closed ecosystems,low GPU utilization,and operational complexity at scaleTraditional Systems GPU“Capacity Trap”Open Systems GPU DisaggregationUnderlay FabricServer Pool(100s)Device Pool(100s)GPUGPUNVMeGPUNVMeTPUPCIePCIeDevice Enclosures(100s of devices)Servers(100s)
3、FabricNodeFabricNodeFabricNodeFabricNodeFabricNodeFabricNodeFabricNodeSHFLsGPUGPUNVMeGPUNVMeTPUGPUGPUNVMeGPUNVMeTPUGPUGPUNVMeGPUNVMeTPULogicalPhysicalFabric ManagerDiscovery,PolicyIT Service Management&OrchestrationAny device from any vendor best fit Commodity componentsLinear scale,highly resilient
4、Software-based repairOpen Systems GPU DisaggregationUnderlay FabricServer Pool(100s)Device Pool(100s)GPUGPUNVMeGPUNVMeTPUPCIePCIeDevice Enclosures(100s of devices)Servers(100s)FabricNodeFabricNodeFabricNodeFabricNodeFabricNodeFabricNodeFabricNodeSHFLsGPUGPUNVMeGPUNVMeTPUGPUGPUNVMeGPUNVMeTPUGPUGPUNVM
5、eGPUNVMeTPULogicalPhysicalFabric ManagerDiscovery,PolicyIT Service Management&OrchestrationAny device from any vendor best fit Commodity componentsLinear scale,highly resilientSoftware-based repairPCIe ServicesDevice CompositionEthernet ServicesLayer 2 SwitchingCXL ServicesAdvanced MemoryOverlayServ
6、ices(Logical)Adaptation ServicesResources.Optical Interconnect(Physical)ComposableServices(Logical)Capacity Calibration Deadlock-Free RoutingLink ReliabilityFLIT SwitchingTopology DiscoveryUnderlayFabric(Physical)Adaptive MultipathUltra High PrioritySoftware-based RepairApplication AccelerationDynam
7、ic AttachmentRapid IntegrationReassemblyE2E ReliabilityClass of ServiceOpenAPIsSegmentationOpen Systems PlatformTopology AgnosticPassive CablingOTS OpticsUse Case-optimizedRockport Fabric Node in HostFull PCIe Gen 5 CompatibilityPCIe TLPssegmented into FLITsFLIT switch forwards the FLITs across mult
8、iple optical pathsQSFP-DDQSFP-DD8 Links8 LinksPCIeTLPsx16Up to 32 devicesPCIe hierarchy enumerated by hostVirtual PCIe SwitchUpstream PortPlaceholder for Remote DeviceDownstreamPortPlaceholder for Remote DeviceDownstreamPortPlaceholder for Remote DeviceDownstreamPortPlaceholder for Remote DeviceDown
9、streamPortPlaceholder for Remote DeviceDownstreamPortVirtual PCIe SwitchUPDPDevDPDevDPDevDPDevDPDevPlaceholder for Remote DeviceVirtual PCIe SwitchUpstream PortGPU 1 Pseudo-deviceDownstreamPortGPU 2Pseudo-deviceDownstreamPortPlaceholder for Remote DeviceDownstreamPortPlaceholder for Remote DeviceDow
10、nstreamPortPlaceholder for Remote DeviceDownstreamPortPerformance per Dollar53%lower cost per GPU than highly dense specialized servers50%more GPUs at 34%lower cost than highly dense specialized serversEliminates stranded assets and maximizes GPU efficiencyDisaggregated GPU Capacity/Cost ValueLearn more about the Cerio open systems platformmattcerio.ioMatt Williams,CTO at Rockport Networks now CerioOCP Global Summit|October 18,2023|San Jose,CA