當前位置：首頁 > 報告詳情

幕后：智能工作負載管理.pdf

上傳人： 2*** 編號：139027 2023-06-04 PDF PDF 26頁 224.44KB

該報告所屬合集： 2023年數據和人工智能峰會（data+ai summit2023）演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/26

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《幕后：智能工作負載管理.pdf》由會員分享，可在線閱讀，更多相關《幕后：智能工作負載管理.pdf（26頁珍藏版）》請在三個皮匠報告上搜索。

1、Intelligent Workload Management in Databricks SQLUnder the H2023 Databricks Inc.All rights reservedConfidential and ProprietaryOverview Background Workload Management in Databricks SQL Load based workload management Query Costing Whats next2023 Databricks Inc.All rights reservedConfidential and Prop

2、rietaryBackground32023 Databricks Inc.All rights reservedConfidential and ProprietaryWhat is Workload Management Workload Management-efficient compute utilization in Databricks SQLWhen and where to run a queryWhen to Scale up or down2023 Databricks Inc.All rights reservedConfidential and Proprietary

3、Databricks SQL Logical Architecture52023 Databricks Inc.All rights reservedConfidential and ProprietaryWhen and where to run a query Whether to run the query or to put it in a queue Which compute resource to run the query on2023 Databricks Inc.All rights reservedConfidential and ProprietaryWhen to S

4、cale up/down Upscale when we see queueing we see high utilization Downscale when we see idle compute we see low utilization2023 Databricks Inc.All rights reservedConfidential and ProprietaryHow to do this rightOptimize For Latency?Keep the latency same even if we increase the cost Throughput?Process

5、 as many queries as possible Cost?Use as few resources as possible2023 Databricks Inc.All rights reservedConfidential and ProprietaryHow to do this rightPrinciples Latency is important for short queries Throughput is important for longer queries Both of the above should be optimized against cost2023

6、 Databricks Inc.All rights reservedConfidential and ProprietaryDatabricks SQL:Workload Management102023 Databricks Inc.All rights reservedConfidential and ProprietaryWorkload Management TodayQuery Concurrency based Allows a static concurrency Autoscaling based on query throughput,rate of incoming qu

7、eries and queued queries Autoscale decision evaluated every 2 mins2023 Databricks Inc.All rights reservedConfidential and ProprietaryDatabricks SQL:KnobsThere are two knobs that users use to tune workload management1.Cluster Size(S,L,XL,)2.Scaling Min/MaxIf you see a high execution latency,use a lar

8、ger cluster.If you see a high queueing latency,use more clusters.2023 Databricks Inc.All rights reservedConfidential and ProprietaryCommon Workload mgmt IssuesThe current solution actually works well for a large variety of cases.However,it doesnt work well in the some situations:When quicker Autosca

9、ling is needed When you are running many extremely large queries When you have suboptimal cluster size for the workload2023 Databricks Inc.All rights reservedConfidential and ProprietaryThe Solution:Intelligent Workload MgmtKeep queries running fastQuery Prioritization Before admittance-Prioritize q

10、ueued queries based on query size After admittance-Reserve a higher share of compute for shorter queriesQuery Admittance based on Cluster Utilization Metrics Evaluate Compute utilization based on the current queries Allow new queries if the compute utilization is low2023 Databricks Inc.All rights re

11、servedConfidential and ProprietaryIntelligent Workload MgmtAutoscaling with Compute Utilization Faster Autoscaling Continuous evaluation of Compute Load Special handling for low/no workload for quicker downscaleImproved Observability Improved Monitoring pages System Tables2023 Databricks Inc.All rig

12、hts reservedConfidential and ProprietaryAdaptive Workload Mgmt162023 Databricks Inc.All rights reservedConfidential and ProprietaryQuery PrioritizationPre-admittance:Based on a rejection model Applies the principle that latency of short queries are important.AI based.Queries get an estimated cost fr

13、om an AI model.Queries with a low cost are sent for execution immediately Queries with high cost are“rejected”and put at the back of the queueRisks A high cost query may be starved if there are a lot of low cost queries-we consider this an acceptable trade off.2023 Databricks Inc.All rights reserved

14、Confidential and ProprietaryQuery Prioritization contdPost-admittance:Prioritization of compute resources Prioritize short queries when many concurrent queries are running Every query starts as a short query and gets a share of the reserved capacity.As the query takes longer,its share in the reserve

15、d capacity goes down The reserved capacity itself is dynamic-it is small if only long running queries are observed and large if we get many short running queries2023 Databricks Inc.All rights reservedConfidential and ProprietaryLoad Based SchedulingBuilds on the rejection model and combines it with

16、utilization A utilization metric based oncost estimatescurrently running and scheduled tasks for admitted queries New queries are rejected if current utilization is above a threshold Learning model to improve cost estimates Aware of Autoscaling state2023 Databricks Inc.All rights reservedConfidentia

17、l and ProprietaryAutoscalingScale up faster with changing workloads Optimized for quick serverless provisioning Scaling decision based on queueing and new utilization metric Faster scaling up/down for spiky workloads with continuous evaluations2023 Databricks Inc.All rights reservedConfidential and

18、ProprietaryQuery Cost Estimates212023 Databricks Inc.All rights reservedConfidential and ProprietaryChallenges with Query Cost EstimationEstimation of a query cost is hard!A short query and a long query may“look”similar High impact of cached data Especially hard with AQEEstimates get better as you g

19、o towards the later stages of query execution,however,this adds cost and latency if a query is rejected.2023 Databricks Inc.All rights reservedConfidential and ProprietaryHow Databricks SQL does costing History based costingDatabricks uses an ML model for cost categorization based on query featuresI

20、n addition,Databricks,build a local model for past observed queries and predicts the cost with a confidence score.Plan based costingUsed in case of low confidence score for history based costingRelies on query plan stats0 overhead for short queries.2023 Databricks Inc.All rights reservedConfidential

21、 and ProprietaryWhat Next242023 Databricks Inc.All rights reservedConfidential and ProprietaryTrying out Intelligent Workload Managment Available for Serverless Warehouses Features are in various states of rolloutQuery prioritization(GA)Intelligent/Faster Autoscale(Public Preview)Load Based Scheduling(Public Preview)More improvements Coming Soon2023 Databricks Inc.All rights reservedConfidential and ProprietaryThank You26

相關圖表

標記中的內容詳細介紹了Databricks SQL中的智能工作負載管理。主要內容包括： 1. 工作負載管理：旨在提高Databricks SQL中的計算利用率，決定何時運行查詢以及何時擴展或縮減資源。 2. 工作負載管理原則：對于短查詢，優化延遲；對于長查詢，優化吞吐量；同時盡量減少成本。 3. 當前工作負載管理：基于查詢并發性，允許靜態并發，根據查詢吞吐量、傳入查詢率和排隊查詢進行自動擴展，每2分鐘評估一次自動擴展決策。 4. 智能工作負載管理：通過查詢優先級、集群利用率和自適應工作負載管理，提高查詢執行速度和系統觀測性。 5. 查詢優先級：預先拒絕模型根據查詢大小和AI模型估計的成本進行優先級排序；后拒絕模型在查詢被接納后，根據并發查詢的數量動態分配計算資源。 6. 查詢成本估計： Databricks SQL使用基于歷史數據的成本分類和機器學習模型，以及基于查詢計劃統計的本地模型，預測查詢成本。 7. 未來計劃：智能工作負載管理功能正在逐步推出，包括查詢優先級（已全面推出）、智能/快速自動擴展（公共預覽）、基于負載的調度（公共預覽）等。綜上所述，Databricks SQL通過智能工作負載管理，優化查詢執行速度和計算利用率，同時盡量減少成本，以滿足不同查詢的需求。

"Databricks SQL如何實現智能工作負載管理？" "如何通過查詢成本估算優化Databricks SQL的工作負載管理？" "Databricks SQL的智能工作負載管理有哪些特點和優勢？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站