適用于 AI 工作負載的不可知軟件將軟件與硬件解耦.pdf

編號:464921 PDF 18頁 5.11MB 下載積分:VIP專享
下載報告請您先登錄!

適用于 AI 工作負載的不可知軟件將軟件與硬件解耦.pdf

1、Agnostic Software for AI WorkloadsDecoupling Software from Hardware2Change is ComingAdapt or3Workloads are Evolving at an Accelerated PaceModels are increasing in size exponentially,with no foreseeable slow downNetwork architectures are changing rapidlyTransformers are not the only solution that wil

2、l existThe balance between training and inference is 50/50,even at scaleThe progression from ML to AI to Agents is not slowing downInsight:Observations:4Machine Learning is the Gateway to Autonomous AgentsSoftware 1.0Rule based and deterministic1957Software 2.0Data driven and discriminative2011Softw

3、are 3.0Data driven and generative2018Software 4.0Goal-Directed agents2024Software 5.0Semi-Autonomous agents2030Software 6.0General Purpose agents20365Can Hardware Architecture Evolution Keep Up?Industry is hungry for a GPU alternative that has better efficiency Weve reached the limit of physics for

4、improved efficiency Even at max capacity,Fabs cannot supply enough hardware,GPU or otherwise Physical build out of data centers can only give 10%of the capacity we need Novel hardware architectures could help,but software takes a decade to catch upTheres enough capacity in todays compute architectur

5、es if we could maximize utilization.Insight:Observations:6If Not Specialized Hardware,Then What?What We Have Build out is insufficient compute and cost prohibitive Models gain adoption based on how well they map to GPUs Increased reliance on GPU leaves increasingly unused compute elsewhere Barrier t

6、o adopt any alternatives to GPUs are high What We Need Heterogeneous compute is necessary to meet the compute demand of AI Automated performance optimization on heterogenous clusters Architectures and compilers designed in the context of a cluster unitGAP=6 Orders of Magnitude*By 20327Can Software T

7、ools and Performance Engineering Bridge the Gap?Software tools lagged 10 years behind the advent of parallel computingDiscrete compilers for each architecture are ineffective in maximizing utilizationOnus of performance and hardware efficiency falls to finite Performance EngineersFinite resources op

8、erating a slow,tedious process of performance engineering cannot keep pace with workloads and models The hardware architecture that wins is the one that is most easily programmableEfficient utilization requires software innovationInsight:Observations:8Generality is Necessary for Scale9GENERALITYSoft

9、ware infrastructure and tools lack portability which limits adoption of new hardware architecturesDependency on massive kernel library availability to evaluate and adopt new hardware architectures AI hardware industry is mimicking disconnected elements of the HPC,cloud,and commodity industries CURRE

10、NT STATEScalable abstractions that span the unit of compute from a component to a cluster to create more opportunitiesSoftware should mimic elements of HPC,cloud and commodity industriesReplace platform specific software with a universal software stackFUTURE STATECreating Scalability by Focusing on

11、Compute Clusters10Hardware Lock In Caused by Compilers The Missing Middle EndNo common interchange for mapping compute IR to hardware IRExplosion of KernelsOptimally transformed graphs introduce an excessive reliance on hand-written kernels.The World is DynamicWorkloads and the hardware each contrib

12、ute to variable execution times variability,so static scheduling introduces overhead.The Operators are Not the KernelsThe operators expressed in applications imply false memory barriersIncreased OverheadLimited PerformanceLimited UtilizationIncreased Vendor Lock-In11Easy to Use Portable Performance

13、Automatically fuses and partitions compute graphsAutomatically lowers hardware-specific custom kernel code to machine-codeControl dynamic variance in execution time to directly manage memory,power,and utilization12Bind Compute IR and Hardware IREnumerate any specific target hardware into a hardware

14、intermediate representationPrecisely optimize workloads to target hardware throughout the full stackComprehend memory and compute boundaries from the frontend onwardsReduce overhead with a generic framework to easily enumerate hardware Compiler Innovation#1-The Operators are Not the Kernels13Paralle

15、l-Programming-Pattern Intermediate-RepresentationRepresents workloads holisticallyIncludes dynamic shapes,dynamic control flow,and serial code.Maintain hi-level semantic informationSupport idiomatic graph transformationDelivers aggressive and comprehensive optimization to achieve performanceCompiler

16、 Innovation#2-The Missing Middle End14Procedural Kernel-Code GenerationComprehensive graph transformation requires support for any kernel.Utilize high-level semantic information for optimal loweringLeverage general HPC techniques incorporating hardware architecture insightExceed performance of exper

17、t kernel ninjas and create portabilityCompiler Innovation#3-Explosion of Kernels15Heterogeneous Hierarchical Dynamic RuntimeDirectly scheduled data movement for performance and power consumptionTake advantage of load balance asymmetryProtect from NUMA effects and dark-silicon.Achieve higher utilizat

18、ionCompiler Innovation#4-The World is Dynamic16Portability-No-friction migration reducing barriers to hardware adoptionPerformance-Reduced latency,reduced energy,and increased throughput.Productivity-Achieve desired performance quickly,without expert performance engineeringScalability-Compiler abstractions and architecture for scale out to cluster level compute by designImprove Performance by 10 x17Scale Requires HeterogeneityHeterogeneity Requires Full Stack Level InnovationsThank YouJay Dawani

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(適用于 AI 工作負載的不可知軟件將軟件與硬件解耦.pdf)為本站 (com) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站