《AIIA:2024年AI基礎設施(AI Infra)規?,F狀報告:揭示未來前景、關鍵見解和商業基準(英文版)(22頁).pdf》由會員分享,可在線閱讀,更多相關《AIIA:2024年AI基礎設施(AI Infra)規?,F狀報告:揭示未來前景、關鍵見解和商業基準(英文版)(22頁).pdf(22頁珍藏版)》請在三個皮匠報告上搜索。
1、Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.The State of AI Infrastructure at Scale 2024Unveiling Future Landscapes,Key Insights,and Business BenchmarksPresented by in partnership withandCopyright 2024 by ClearML.All rights reserved.|Al
2、l trademarks are the properties of their respective owners.The Artificial Intelligence(AI)sector has experienced an unprecedented surge in recent years.As the applications and use cases of Generative AI expand and more companies shift from research and evaluation to production,we expect that the gro
3、wing need for robust infrastructure and computational capabilities will drive hyper market demand.According to new research published by Allied Market Research and reported by CIO News1,the trajectory of the Global AI Infrastructure Market reflects this burgeoning demand.In fact,the report notes tha
4、t the AI infrastructure market was valued at$23.5 billion in 2021 and is estimated to soar to an astounding$309.4 billion by 2031,growing at a Compound Annual Growth Rate(CAGR)of 29.8%from 2022 to 2031.One of the primary drivers of growth in the AI infrastructure market is the realization among ente
5、rprises of how AI can elevate their operational efficiency and enhance productivity,as well as expand revenue and reduce costs through the automation and orchestration of AI/ML workflows.There is a wide array of new AI infrastructure tools,leaving prospective buyers grappling with the challenge of s
6、orting through their critical AI infrastructure business needs for scaling AI into production.That may be why prior research conducted in 2023 indicated that only 5-10%of enterprises had started to move Gen AI into production.As organizations navigate the AI infrastructure market,they are actively s
7、eeking clarity on AI/ML platforms that can support scale while optimizing their compute utilization.To deliver that clarity,ClearML,along with the AI Infrastructure Alliance and FuriosaAI,conducted a global AI Infrastructure research survey of AI/ML and technology leaders at 1,000 companies across m
8、ultiple geographies(North America,Europe,and Asia Pacific)and various company sizes:we wanted to explore and map out new market trends and insights after the first year of preliminary Generative AI mainstream adoption.One thing is clear:scalable AI infrastructure is crucial for global businesses com
9、mercializing AI,as it ensures that their AI systems can handle growing computational demands and AI workloads.Thats why ClearML recently announced newly enhanced orchestration and scheduling capabilities along with GPU Partioning and MiG capabilities driving GPU maximal utilization and an enterprise
10、-grade unified platform offering state-of-the-art LLMs called ClearGPT.Meanwhile,FuriosaAI offers its first-gen NPU WARBOY and next-gen LLM-optimized NPU with High Bandwidth Memory 3(HBM3)for cutting-edge distributed inference.1-https:/ 2024 by ClearML.All rights reserved.|All trademarks are the pro
11、perties of their respective owners.Both companies provide solutions that allow organizations to develop and host LLMs in a cost-efficient and performant way tailored to the organizations internal data and running securely on their network to power Enterprise AI transformation.As global startups that
12、 believe in providing companies with information that enables hardware awareness and clarity into the AI/ML space,we partnered with the AI Infrastructure Alliance(AIIA),a nonprofit whose mission is to help the AI/ML community make informed decisions about their AI infrastructure decisions.Together,w
13、e looked at how AI and technology leaders are approaching the build of their AI infrastructure,the key challenges and considerations they face,and how they rank priorities when evaluating AI infrastructure solutions against their current needs and business use cases.This survey is ClearMLs third glo
14、bal AI research survey,following two previous surveys that covered generative AI adoption and hidden costs,challenges,and the TCO of Gen AI adoption in the enterprise.In the first survey,1,000 executives and tech leaders(CTOs,VPs of AI,Chief Data and Analytics Officers,and others)from Fortune 1000 c
15、ompanies reported wasted opportunities and missed financial goals as a result of poor AI/ML operationalization or commercialization1,some incurring losses of more than$200 million2.In our second survey,when asked about key Generative AI cost drivers,the top response was tools,systems,and infrastruct
16、ure integration costs,followed by GPU and compute costs of model development and training3.Despite these challenges,56.8%of companies surveyed expected double-digit increases to revenues from AI/ML investments and enterprise AI transformation in 2024.Based on these recent surveys,industry metrics,an
17、d data from the AI Infrastructure Alliance,we expect that compute infrastructure,especially AI chips,will continue to be in high demand as Generative AI and the number of Large Language Models(LLMs)increase in production and at scale.Optimizing a companys current tech stack to maximize existing comp
18、ute resources is an efficient way to get more for less,but AI leaders and executives need to look ahead at future-proof technology that is flexible and scalable to support future AI/ML compute needs.When thinking about AI/ML in production,model training,where models are trained on a test data set an
19、d learn how to make sense of data,is just one part of a more holistic workflow.Inference is a key part of moving AI into production.Inference involves taking a trained machine learning(ML)model and using it to make real-time predictions or to solve tasks.It powers use cases across a multitude of ind
20、ustries,including health care,automotive,and telecommunications.With the demand for Large Language Model(LLM)-powered products,inference can power real-time answers to enterprise end users.In this report,we share our global AI Infrastructure survey results,including 1)respondents compute infrastruct
21、ure growth plans,2)current scheduling and compute solutions experience,and 3)model and AI framework use and plans for 2024.Read on to dive into key findings!1-Page 10 and/or 14 of AIIA and ClearML,Enterprise Generative AI Adoption:C-Level Key Considerations,Challenges,and Strategies for Unleashing A
22、I at Scale-https:/go.clear.ml/new-research-report-on-enterprise-generative-ai-adoption2-Page 16 of AIIA and ClearML,IBID.3-Page 13 of AIIA and ClearML,The Hidden Costs,Challenges,and Total Cost of Ownership of Generative AI Adoption in the Enterprise-https:/go.clear.ml/the_hidden_costs_challenges_an
23、d_tco_of_gen_ai_adoption3Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.96%of companies plan to expand their AI compute capacity and investment with availability,cost,and infrastructure challenges weighing on their minds.Nearly all respond
24、ents(96%)plan to expand their AI compute infrastructure,with 40%considering more on-premise and 60%considering more cloud,and they are looking for flexibility and speed.The top concern for cloud compute is wastage and idle costs.When asked about challenges in scaling AI for 2024,compute limitations(
25、availability and cost)topped the list,followed by infrastructure issues.Respondents felt they lacked automation or did not have the right systems in place.The biggest concern for deploying generative AI was moving too fast and missing important considerations(e.g.prioritizing the wrong business use
26、cases).The second-ranked concern was moving too slowly due to a lack of ability to execute.A staggering 74%of companies are dissatisfied with their current job scheduling tools and face resource allocation constraints regularly,while limited on-demand and self-serve access to GPU compute inhibits pr
27、oductivity.Job scheduling capabilities vary,and executives are generally not 12 KEY FINDINGSsatisfied with their job scheduling tools,and report that productivity would dramatically increase if real-time compute was self-served by data science and machine learning(DSML)team members.74%of respondents
28、 see value in having compute and scheduling functionality as part of a single,unified AI/ML platform(instead of cobbling together an AI infrastructure tech stack of stand-alone point solutions),but only 19%of respondents actually have a scheduling tool that supports the ability to view and manage jo
29、bs within queues and effectively optimize GPU utilization.Respondents reported they have varying levels of scheduling functionality and features,leading with quota management(56%),and followed by Dynamic Multi-instance GPUs/GPU partioning(42%),and the creation of node pools(38%).65%of companies surv
30、eyed use a vendor-specific solution or cloud service provider for managing and scheduling their AI/ML jobs.25%of respondents use Slurm or another open source tool,and 9%use Kubernetes alone,which does not support scheduling capabilities.74%of respondents report feeling dissatisfied or only somewhat
31、satisfied with their current scheduling tool.The ability for DSML practitioners to self-serve compute resources independently and manage job scheduling hovers between 22-27%.However,93%of survey respondents believe that their AI team productivity would substantially increase if real-time compute res
32、ources could be self-served easily by anyone who needed it.4Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.The key buying factor for inference solutions is cost.To address GPU scarcity,approximately 52%of respondents reported actively look
33、ing for cost-effective alternatives to GPUs for inference in 2024 as compared to 27%for training,signaling a shift in AI hardware usage.Yet,one-fifth of respondents(20%)reported that they were interested in cost-effective alternatives to GPU but were not aware of existing alternatives.This indicates
34、 that cost is a key buying factor for inference solutions,and we expect that as most companies have not reached Gen AI production at scale,the demand for cost-efficient inference compute will grow.The biggest challenges for compute were latency,followed by access to compute and power consumption.Lat
35、ency,access to compute,and power consumption were consistently ranked as the top compute concerns across all company sizes and regions.More than half of respondents plan to use LLMs(LLama and LLama-like models)in 2024,followed by embedding models(BERT and family)(26%)in their commercial deployments
36、in 2024.Mitigating compute challenges will be essential in realizing their aspirations.Optimizing GPU utilization is a major concern for 2024-2025,with the majority of GPUs underutilized during peak times.40%of respondents,regardless of company size,are planning to use 564orchestration and schedulin
37、g technology to maximize their existing compute infrastructure.When asked about peak periods for GPU usage,15%of respondents report that less than 50%of their available and purchased GPUs are in use.53%believe 51-70%of GPU resources are utilized,and just 25%believe their GPU utilization reaches 85%.
38、Only 7%of companies believe their GPU infrastructure achieves more than 85%utilization during peak periods.When asked about current methods employed for managing GPU usage,respondents are employing queue management and job scheduling(67%),multi-instance GPUs(39%),and quotas(34%).Methods of optimizin
39、g GPU allocation between users include Open Source solutions(24%),HPC solutions(27%),and vendor-specific solutions(34%).Another 11%use Excel and 5%have a home-grown solution.Only 1%of respondents do not maximize or optimize their GPU utilization.Open Source AI solutions and model customization are t
40、op priorities,with 96%of companies focused on customizing primarily Open Source models.Almost all executives(95%)reported that having and using external Open Source technology solutions is important for their organization.In addition,96%of companies surveyed are currently or planning to customize Op
41、en Source models in 2024,with Open Source frameworks having the highest adoption globally.PyTorch was the leading framework for customizing Open Source models,with 61%of respondents using PyTorch,43%using TensorFlow,and 16%using Jax.Approximately one-third of respondents currently use or plan to use
42、 CUDA for model customization.35Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.DEMOGRAPHYWe surveyed a mix of company sizes,with 20%of respondents working in companies with 500-2000 employees,25%with 2,000-10,000 employees,and a majority o
43、f 55%skewing to enterprises with more than 10,000 employees.We included larger companies due to our hypothesis that they would have higher AI infrastructure maturity and thus be best suited to share relevant experiences in the Generative AI space.We talked primarily with AI/ML and technology leaders
44、hip and team leads,with job titles such as CTO,Head of AI,VP of Data,or VP of Artificial Intelligence.We also included Directors of Engineering and Heads of Data Science.Therefore our results primarily represent the C-suite and more senior engineers with decision-making power.We targeted major econo
45、mies in North America,Europe,and Asia-Pacific.We included a large range of verticals,including manufacturing,telecommunications,energy,food,healthcare,legal,and more.The largest representations came from Information Technology companies,but no vertical surveyed represented more than 8%of the total r
46、espondents.Title/Position of Surveyed Respondents 26.6%9.2%5.7%7.8%8.8%6.0%4.3%3.8%4.4%Director of AI Infrastructure2%VP Data Science1.6%Head of AIVP Machine LearningVP of EngineeringHead of AI Infrastructure2.2%VP of Artificial IntelligenceHead of Data ScienceDirector of AI/ML/MLOPsHead of AI/ML/ML
47、OPs2%SVP of AI/ML/MLOPs1.3%SVP of Engineering2.6%Chief Technology OfficerVP of AI InfrastructureDirector of DevOps2.4%SVP of IT0.9%Head of ITVP of IT3.5%VP of DevOps1.5%Head of Engineering0.4%Head of Machine Learning0.3%Head of DevOps0.5%6Copyright 2024 by ClearML.All rights reserved.|All trademarks
48、 are the properties of their respective owners.Headquarters of Responding Businesses0.00%2.50%5.00%7.50%10.00%12.50%AccountingAutomotiveAviationBanking/FinancialBio-TechBrokerageChemicalsCommunications/InformatioComputer HardwareComputer SoftwareConsumer ElectronicsConsumer Packaged GoodsEnergy/Util
49、ities/Oil and GasEngineeringFashion/ApparelFood/BeverageHealthcareHospitality/TourismInformation Technology/ITInsuranceInternetLegal/LawManufacturingMedia/EntertainmentPharmaceuticalsRetail/Wholesale tradeSecurityTelecommunicationsTransportationIndustries Represented by Respondents0.00%20.00%40.00%6
50、0.00%AustraliaCanadaDenmarkFranceGermanyItalyJapanSingaporeSouth KoreaSpainSwedenUnited KingdomUSA20%25.1%54.9%Greater than 10,000500-20002001-10,000Company SizeGeographic Regions of Headquarters60.0%30.0%10.0%APACEMEANorth America7Copyright 2024 by ClearML.All rights reserved.|All trademarks are th
51、e properties of their respective owners.QUESTIONS AND INSIGHTS SECTION I Planning for 2024:Key Drivers of Expanding AI Infrastructure&Challenges in Scaling AI and Compute1)What are your biggest challenges in scaling AI at your organization?Companies biggest challenge in scaling AI for 2024 is comput
52、e limitations(availability and cost);its the top-ranked issue for 32%of respondents.The next biggest challenge was infrastructure issues,which was the top challenge for 27%of respondents and second-ranked challenge for 29%of respondents.Respondents felt they lacked automation or did not have the rig
53、ht systems in place for scale.2)Rank your organizations compute concerns for 2024When asked about their organizations compute concerns,latency was the top-ranked answer for 28%of respondents,followed by power consumption which was 21%of respondents top-ranked issue.Time delays in getting access to c
54、ompute is also weighing on respondents minds;although it was top-ranked for only 14%of respondents,it received 30%of the votes as the second-ranked concern.Compute limitations(availability,compute costs)TalentLack of executive support0%25%50%75%100%Rank 1Rank 2Rank 3Rank 427.8%20.8%19.9%17.2%14.3%Ti
55、me delays in getting compute accessThroughputAccuracyLatencyPower consumption8Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.3)Cloud,On-premise,or Hybrid?What type of AI Infrastructure setup does your organization currently have for AI com
56、pute resources?Respondents are fairly evenly divided between their current infrastructure setup.33%have compute fully on-premise,38%are fully cloud,and a little less than 29%have hybrid environments of both on-prem and cloud.4)What is your organizations greatest concern about deploying Generative AI
57、?The biggest concern for deploying Generative AI was moving too fast and missing important considerations(e.g.prioritizing the wrong use cases),whereas the second most-important concern was moving too slow due to lack of ability to execute,exposing ambiguity amidst leadership.It appears that executi
58、ves are caught between the desire to move quickly and the danger of costly mistakes.Governance also weighed in the back of respondents minds,with upcoming regulations and lack of control over usage and scaling as the next two most-important concerns.33.2%38.1%28.7%HybridIn the cloudOn-premiseRank 1R
59、ank 2Rank 3Rank 4Rank 5Rank 6Rank 7Moving too fast and missing important considerationsMoving too slow due to lack of ability to executeUpcoming regulations with hard-to-predict consequences Financial cost of computeNegative customer experiencesPower consumption of computeLack of control over usage
60、and scaling0%25%50%75%100%9Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.5)In 2024,what are your plans for expanding your AI compute infrastructure?96%of companies surveyed plan to expand their AI Infrastructure in 2024.Overall,more than
61、60%plan to use more cloud compute,and 40%of respondents are planning to buy more GPU machines on-premise in 2024.40%plan to use orchestration technology to maximize existing compute resources.Respondents from companies of all sizes are planning to use orchestration technology to get more from existi
62、ng resources.6)What are your organizations biggest concerns about cloud compute?With 60%planning to expand AI infrastructure with more cloud compute,they will need to plan to face challenges regarding cost.Wastage/idle costs were executives biggest concern with cloud compute,followed by the expensiv
63、e cost of compute power consumption.0200400600800Buy more GPU machines on-premiseUse more cloud computeUse orchestration technology to maximize existing compute resources No plans to add more/use what we already have 500-2000 employees2001-10,000 employees10,000+employeesRanked 1Ranked 2Ranked 3Rank
64、ed 4Ranked 5Ranked 6Ranked 7Wastage/Idle costsExpensive cost of compute power consumptionAvailabilityInability to monitor,manage and control overall costsData security and privacyExpensive cost to rent or purchase computeEnvironmental sustainability0%25%50%75%100%10Copyright 2024 by ClearML.All righ
65、ts reserved.|All trademarks are the properties of their respective owners.7)What are the key drivers and considerations in your 2024 plans for expanding your AI compute infrastructure?Respondents reported that flexibility and speed are the top drivers,with 65%and 55%citing these factors(respectively
66、)when expanding AI infrastructure.This supersedes security and even budget,suggesting that respondents may be willing to pay a premium for infrastructure options that are more extensible and facilitate AI/ML output,even if it means overspending.8)Estimate your current allocation of existing GPU reso
67、urces(i.e.non-idle GPUs)during peak periods.When asked about peak periods for GPU usage,15%of respondents report that fewer than 50%of available GPUs are in use.53%believe 51-70%of GPU resources are utilized,and 25%believe their GPU utilization reaches 85%.Only 7%of companies believe their GPU infra
68、structure achieves more than 85%utilization during peak periods.Most respondents(78%)are using more than 50%of their total allocation of existing GPU resources during peak periods,indicating the need to better manage their existing compute and/or expand their compute with alternatives.0%20%40%60%80%
69、FlexibilitySpeedSecurityPower consumptionBudget14.7%53.4%25.4%6.5%86-100%71-85%50%or less51-70%11Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.9)How is your company planning to address GPU scarcity in 2024?To address GPU scarcity,approxim
70、ately 52%of respondents reported actively looking for cost-effective alternatives to GPUs for inference as compared to 27%for training in 2024,signaling a significant shift in AI hardware usage.One-fifth of respondents(20%)reported that they were interested in cost-effective alternatives to GPU,but
71、were not aware of existing alternatives.Cost appears to be a key driver of buying decisions for inference solutions.10)In 2024,what LLM models do you plan to use in your commercial deployments?More than half of respondents plan to use LLMs(LLama and LLama-like models)in 2024,followed by embedding mo
72、dels(BERT and family)(26%)in their commercial deployments in 2024.This indicates that inference workloads are expected to grow substantially-more than half of respondents plan to use LLMs in production.Notably,only 15%reported plans to use diffusion models and only 7%had plans to use multi-modal mod
73、els.Why might this be?One potential explanation is that tech leaders are leveraging approaches such as model chaining to enable the use of multiple models in production.Another potential reason could be that when thinking about LLMs,tech leaders are thinking about multimodal.In future research,we wi
74、ll delve into tech leaders perceptions,plans,and use of ML models in production.26.7%51.6%19.6%2.1%Interested in cost effective alternatives to GPU but not aware of existing alternativesActively looking for cost effective alternatives to GPUs for inferenceDiffusion modelsMulti-modal models0200400600
75、500-2000 employees2001-10,000 employeesGreater than 10,000 employees12Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.SECTION II Open Source Demand&Model Customization Plans1)How important is it for your organizations external technology so
76、lutions to be Open Source?Almost all executives(95%)reported that having Open Source external technology solutions is at least somewhat important for their organization,with 26%deeming it very important or critical.2)How is your company planning/what is your company currently using to customize your
77、 Open Source models in 2024?96%of companies surveyed are currently or planning to customize models in 2024,with Open Source frameworks having the highest adoption globally.Across survey responses,PyTorch is the leading framework for customizing Open Source models,with 61%of respondents using PyTorch
78、,43%using TensorFlow,and 16%using Jax.Approximately one-third of respondents currently use or plan to use CUDA for model customization.Only 4%do not currently or plan to customize models in 2024.PyTorch and TensorFlow had significant market share in APAC,whereas CUDA adoption is generally equal acro
79、ss the regions,and Jax the most popular in North America.5.1%44.4%25.6%22.2%Critical2.7%Very importantImportantNot importantSomewhat importantRespondents0200400600800CUDAPyTorchTensorflowJaxNot customizing open source modelsAPACEMEANorth America13Copyright 2024 by ClearML.All rights reserved.|All tr
80、ademarks are the properties of their respective owners.3)How satisfied are you with your companys current solutions to customize Open Source models?(e.g.PyTorch,CUDA)Most respondents use Open Source frameworks for model customization and were satisfied with their current solutions to customize Open
81、Source models.More than 78%of respondents are satisfied or very satisfied with their current solution,indicating that Open Source frameworks are providing respondents with what they need.4)What types of solutions does your organization currently lack in your AI/ML tech stack?Tech leaders and executi
82、ves are dealing with issues with scheduling and job management(63%),model training solutions(52%),and model serving(36%).To effectively deal with these current issues while driving success in their plans for 2024,they will need to carefully manage their infrastructure expansion while planning for hi
83、gher demand for compute these are decisions that can not be easily changed and are costly if wrong.With ambitious plans to use LLMs in production and to address GPU scarcity with alternatives for inference in the future,executives are making tough decisions that require balancing their current chall
84、enges with their readiness for future plans.Scheduling remains a big pain point in AI stacks.Model serving,which refers to hosting ML models and enabling access to model functionalities through APIs,is a core component of building applications that integrate AI(e.g.for AI-driven applications).Approx
85、imately one-third of respondents currently lack model serving solutions.Due to Generative AI models requiring highly performant inference workloads,we expect the need for model serving solutions to grow.25.0%53.4%15.0%5.6%Very dissatisfied1.0%DissatisfiedNeutralSatisfiedVery satisfiedScheduling and
86、jobmanagementModel training solutionsModel serving solutionsPipeline workflow buildingCompute utilization andmonitoring(orchestration)Real time data responsiveness0%20%40%60%14Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.5)Do you see the
87、 value in having compute and scheduling functionality as part of an AI/ML platform or as a stand-alone point solution?Almost 75%of respondents see value in having compute and scheduling as part of an end-to-end platform.(74%).Compute is a fundamental part of such a platform,enabling fast and efficie
88、nt model development and deployment.Coupling compute access with scheduling capabilities can be low-hanging fruit as a catalyst for AI/ML development.SECTION III Job Scheduling1)How is job scheduling managed at your organization(queues,job prioritization,etc.)?Approximately half of respondents(47%)r
89、eported that job scheduling was managed by DevOps and ML Engineers,and only 27%of companies offer users the ability to self-serve,indicating a significant opportunity for companies to improve their AI/ML infrastructure to streamline development.74.3%25.7%Stand-alone point solutionMLOps Platform01002
90、00300400500It is managed by DevOps and ML EngineersUsers can prioritize their own jobs for the resources they can access and can self serve as needed It is completely managed by DevOpsWe do not have a scheduling solution15Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properti
91、es of their respective owners.2)Which computing resource scheduling/job management tool do you currently use as part of your AI/ML tech stack?65%of companies surveyed use vendor-specific solutions or cloud solution providers for managing and scheduling their AI/ML jobs.25%use Slurm or another Open S
92、ource tool,and 9%use Kubernetes alone,which does not support scheduling.3)What are you able to do with the scheduling tool you currently have?More than half of respondents(56%)can do quota management,followed by only 42%who have the capability to manage Dynamic MiG/GPU partioning(42%)capabilities to
93、 optimize GPU utilization.Only 19%of respondents are able to view and manage jobs within queues,potentially resulting in inefficiency.KubernetesSlurmVendor-specific solutionCloud providersOther open source tool0100200300400Quota managementNode poolsUser self-serve0%20%40%60%16Copyright 2024 by Clear
94、ML.All rights reserved.|All trademarks are the properties of their respective owners.4)Are you satisfied with the computing resource scheduling/job management tool youve chosen?61%of respondents are either dissatisfied,somewhat dissatisfied,or partially satisfied with the scheduling tool they have c
95、hosen,with another 12%reporting they are neutral,indicating room for improvement.5)If you chose Neutral,Somewhat Dissatisfied,or Dissatisfied in the previous question,what are the main reasons for your dissatisfaction?The main drivers of pain points were that the tool cannot do enough to optimize GP
96、U usage(53%),followed by the tool is not easy for developers or data scientists to use(47%).It is also notable that approximately 25%reported lack of control and friction with existing AI/ML stacks as reasons for dissatisfaction.Given these results,we recommend investment in a scheduling tool that o
97、ffers optimization of GPU usage,is easy to use,and works well with other pieces of AI/ML infrastructure.26.1%56.0%11.9%5%Dissatisfied1%Somewhat DissatisfiedNeutralSomewhat SatisfiedSatisfiedThe tool cannot do enough tooptimize GPU usageThe tool creates friction with my existing AI/ML tech stackThe t
98、ool does not provideenough controlNot applicable0%20%40%60%17Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.6)How would your organizations AI team productivity be impacted/increased if real-time compute could be self-served and easily acce
99、ssed by anyone who needed it in a seamless and cost-controlled way?93%reported that their organizations AI team productivity would increase if real-time compute could be self-served easily by anyone who needed it.SECTION IV Optimizing Compute Utilization1)How does your organization currently maximiz
100、e utilization of GPU usage?We were impressed to see the sophistication of how companies are managing their compute infrastructure.For respondents,the top 3 methods being employed to maximize GPU utilization are queue management and job scheduling(67%),multi-instance GPUs(39%),and quotas(34%).7%26.3%
101、48.4%17.2%51%or more increase1.1%21-50%increase11-20%increaseNo impact5-10%increaseQuotasNone of the above0%20%40%60%80%18Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.2)What tool do you use to optimize GPU allocation between users?Method
102、s of optimizing GPU allocation between users include Open Source solutions(24%),HPC solutions(27%),and vendor-specific solutions(34%).Another 11%use Excel and 5%have a home-grown solution.Only 1%of respondents are doing nothing to maximize or optimize their GPU utilization.SECTION V Monitoring Compu
103、te1)What tool does your organization use to monitor GPU cluster utilization?For monitoring GPU cluster utilization,using GCP-GPU utilization metrics 36%tops the list,followed by NVIDIA AI Enterprise(30%).IBM LSF and Kubernetes were selected by 15%and 13%of respondents,respectively.24%26.6%33.8%10.5%
104、4.5%Home-grownExcelVendor-specific solutionOpen sourceHPC solutions0.6%Nothing currentlyKubernetes monitoring toolsGCP-GPU utilization metricsNVIDIA AI EnterpriseIBM LSFAltair PBS ProOther vendor-specific HPC solution010020030040019Copyright 2024 by ClearML.All rights reserved.|All trademarks are th
105、e properties of their respective owners.CONCLUSIONAs weve seen from these global survey results,while most organizations are planning to expand their AI infrastructure,executives and tech leaders face diverse challenges in their current workloads.Their ambitious plans for the future signal a need fo
106、r highly performant,cost-effective alternatives to GPUs and seamless,end-to-end AI/ML platforms.GPU utilization is a major concern for 2024,with the majority of respondents saying they are not maximizing their GPUs at peak times.Nearly all companies we surveyed reported that AI team productivity wou
107、ld increase if real-time compute could be self-served easily by anyone who needed it.Tech leaders and executives have ambitious plans for LLMs-and mitigating compute challenges will be essential in realizing their aspirations.More than half plan to use LLMs in commercial deployments and more than ha
108、lf are looking for cost-effective alternatives to GPUs for inference.We believe that highly performant inference workloads with low latency and efficient power consumption will be crucial to reducing the TCO of Generative AI deploymentsAlongside this shift,executives and tech leaders value Open Sour
109、ce solutions.Nearly all reported that having Open Source solutions was at least somewhat important for their organization.Accordingly,we observed that Open Source AI frameworks are preferred for model customization with PyTorch leading TensorFlow and Jax in production.For successful deployment of AI
110、 at scale,taking a holistic view of AI workloads will be essential.We observed a lack of scheduling tools that support the ability to view and manage jobs within queues.While current gaps in AI tech stacks include training,model serving is top of mind.Executives and tech leaders will need to balance
111、 solving their current pain points while executing on their future plans.Time delays in getting compute access,and high latency can break product experiences.We recommend that AI leaders consider the multiple factors that impact their TCO,such as:compute,scheduling,and power-efficient inference with
112、 low latency when planning for Gen AI business adoption.Only then can they be confident in accurately predicting and forecasting the TCO for Gen AI in their organization.We hope that the insights in this report have shed light into the experiences of leaders making decisions for today and in the fut
113、ure,and that these insights will empower you to find solutions that bring AI into production in a way thats aligned with your organizations vision.20Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.NEXT STEPSFrom AI/ML model development to t
114、raining,and inference,ClearMLs Dynamic Scheduling,Orchestration,and MiG helps you maintain optimal GPU/compute utilization for any cluster size and scale.To request a demo of ClearML,please visit https:/clear.ml/demo.FuriosaAIs highly efficient inference-focused products run cutting-edge distributed
115、 inference with low TCO.To request a demo of FuriosaAIs 1st-gen AI chip WARBOY for Computer Vision,please visit https:/www.furiosa.ai/getstarted.For more information about FuriosaAIs 2nd-gen AI chip with High Bandwidth Memory 3(HBM3),which provides H100-level performance to power ChatGPT-scale model
116、s,visit https:/www.furiosa.ai/comingsoon.Copyright 2024 by ClearML.All rights reserved.All trademarks are the properties of their respective owners.21Copyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.Copyright 2024 by ClearML.All rights reser
117、ved.All trademarks are the properties of their respective owners.About AIIAThe AI Infrastructure Alliance is dedicated to bringing together the essential building blocks for the Artificial Intelligence applications of today and tomorrow.The Alliance and its members bring striking clarity to this qui
118、ckly developing field by highlighting the strongest platforms and showing how different components of a complete enterprise machine-learning stack can and should interoperate.They deliver essential reports and research,virtual events packed with fantastic speakers,and visual graphics that make sense
119、 of an ever-changing landscape.To learn more,visit https:/ai-infrastructure.org/.About FuriosaAIFuriosaAI is a semiconductor company designing high-performance data center AI accelerators with vastly improved power efficiency.Through our innovative architecture and products,we strive to unlock the t
120、ransformative potential of AI and make its benefits accessible to all.Our first generation chip WARBOY,which runs computer vision applications for data centers and enterprise customers,is available now and our second generation chip for LLM and multimodal deployment will launch later this year.To le
121、arn more,visit the companys website at:https:/www.furiosa.ai/.About ClearMLAs the leading Open Source,end-to-end solution for unleashing AI in the enterprise,ClearML is used by more than 1,600 enterprise customers to develop a highly repeatable process for their end-to-end AI model lifecycle,from pr
122、oduct feature exploration to model deployment and monitoring in production.Use all of our modules for a complete ecosystem or plug in and play with the tools you have.ClearML is an NVIDIA DGX-ready Software Partner and is trusted by more than 250,000 forward-thinking Data Scientists,Data Engineers,ML Engineers,DevOps,Product Managers and business unit decision makers at leading Fortune 500 companies,enterprises,academia,and innovative start-ups worldwide.To learn more,visit the companys website at https:/clear.ml.