當前位置：首頁 > 報告詳情

可預測的擴展和基礎設施.pdf

上傳人： c** 編號：465037 2025-01-12 PDF PDF 27頁 1.76MB

該報告所屬合集： 2024年高性能芯片研討會（HOT Chips 2024）嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/27

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《可預測的擴展和基礎設施.pdf》由會員分享，可在線閱讀，更多相關《可預測的擴展和基礎設施.pdf（27頁珍藏版）》請在三個皮匠報告上搜索。

1、ConfidentialPredictable Scaling and InfrastructureTrevor CaiConfidentialPredictable Scaling2Confidential What Collect a dataset of text,code,images,audio,math.Pre-train a model to predict the next word.Post-train it to e.g.follow instructions,be conversational,use tools.How Synchronous SGD of a tran

2、sformer on a large cluster of accelerators.Combine many forms of data and model parallelism.Reinforcement learning from human feedback.3The What and How of ChatGPTConfidential Trained a character-level neural network on product reviews.Observed:there is a neuron encoding sentiment!Results in state-o

3、f-the-art sentiment analysis.4The Sentiment Neuron(2017)Learning to Generate Reviews and Discovering Sentiment,Radford et al(2017)Confidential Next word prediction models the underlying generative process.If the data is the internet,the underlying generative process is the world.Grandiose.But theore

4、tically justified via Solomonoff Inductive Inference.5Prediction is CompressionLearning to Generate Reviews and Discovering Sentiment,Radford et al(2017)ConfidentialReleaseNew BehaviorsGPT-1 June 2018 State-of-the-art language understanding(using task-specific fine tuning).GPT-2 Feb 2019Coherent tex

5、t generation and zero-shot transfer.GPT-3 Mar 2020In-context learning.GPT-4 Mar 2023Actually being useful.6Returns to Scale(2018-2023)Confidential7Scale Works(2023)This example required GPT-4 to:Understand both English and French.Interpret a diagram in context of the text.Solve a physics problem!GPT

6、-4 Technical Report,OpenAI(2023)Confidential8Predictable ScalingGPT-4 Technical Report,OpenAI(2023)Confidential9Predictable Scaling of Practical CapabilitiesGPT-4 Technical Report,OpenAI(2023)Confidential10What Log-Log Plots ObscureHow predictable is language model benchmark performance?,Owen(2024)C

7、onfidential1.Next-word prediction is meaningful.2.There are returns to scale3.which are predictable and extrapolative(!)11Recap:Scaling Laws for AI ModelsConfidentialImplications for Infrastructure12Confidential13Industry Compute TrendsTraining Compute of Frontier AI Models Grows by 4-5x per Year,Se

8、villa and Roldn(2024)Confidential14OpenAI Compute TrendsMicrosoft Build 2024Confidential15Industry Compute TrendsTraining Compute of Frontier AI Models Grows by 4-5x per Year,Sevilla and Roldn(2024)Confidential16Inference Demand is Driven By IntelligenceConfidential1.Compute scaling has been predict

9、able and looks to continue.2.Intelligence drives inference demand.3.Technology and economics are ripe for scale this decade.17The Bull Case for AI ComputeConfidential18Confidential19“Sometimes lines really do go up”Photovoltaic growth:reality versus projections of the International Energy Agency,Hoe

10、kstra(2018)Confidential20“Sometimes lines really do go up”What is Moores Law?,Roser and Ritchie(2020)ConfidentialDesign for Mass Deployment21Confidential22Cluster-Level RAS Optics MTBF alone measured in minutes.Not to mention HBM DUE,board failures,etc.SDCs:disturbingly common,and sometimes unreprod

11、ucible.Failures have a very wide blast radius.Confidential23Cluster-Level RAS Minimize cost of repair.Exception Process Restart GPU Reset Node Reboot RMA.Example:Ideally,failed write over scale-up is a catchable exception.Minimize blast radius.Example:Link flaps on one port should not affect neighbo

12、ring ports.Example:Ideally,uncorrectable memory error only affects own GPU.Even in presence of coherent memory fabric.Confidential Consider graceful degradation.Some failures are more worth technician time than others.Example:Disable faulty banks of second-tier memory instead of requiring RMA.Valida

13、tion must be automated,fast,extensive,and performable in-field.Example:In-depth correctness checks after SDC accusation.24Cluster-Level RASConfidential Power bottlenecks mean we need to maximize power we have.Synchronized training steps results in power draw jitter.Need:Low-latency power telemetry a

14、nd OOB power management.Want:Dynamic power sloshing.25Power Management Confidential26Takeaways1.Predictable scaling motivates rapidly expand AI training compute usage.2.Delivering AI to the world will demand massive infrastructure buildout.3.Design for mass deployment.4.Performance is only one of many requirements.Confidential27Thank You!

相關圖表

本文概述了人工智能模型GPT的發展歷程，從GPT-1到GPT-4的技術進步，并強調了預測模型在語言理解和生成中的應用。文章指出，這些模型的訓練基于大數據集，通過預測下一個詞來模擬語言的生成過程。GPT系列模型展示了預測語言模型在理解和生成文本方面的能力，以及其在情感分析等任務中的應用。特別是GPT-4模型，它在多語言理解、圖形解讀和物理問題解答方面的表現展示了模型的實用價值。文章還討論了計算能力的擴展和可預測性，以及這對于人工智能基礎設施設計的意義。它強調了在設計大規模部署的AI系統時，需要考慮容錯性、自動驗證和低延遲的電力管理等關鍵因素?？傮w而言，文章強調了可預測的擴展性是推動AI訓練計算能力快速增長的關鍵，同時指出要將AI技術推向世界，需要大規模的基礎設施建設，并且要為大規模部署設計系統，同時確保性能以外的其他要求得到滿足。

"AI模型規模如何影響智能基礎設施？" "預測語言模型背后的生成過程是怎樣的？" "大規模AI訓練計算需求增長的原因是什么？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站