當前位置：首頁 > 報告詳情

基準數據和人工智能平臺：如何選擇和使用好的基準測試（重復）.pdf

上傳人：張** 編號：167568 2024-06-15 PDF PDF 45頁 6.21MB

該報告所屬合集： 2024年數據和人工智能峰會（data+ai summit2024）演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/45

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《基準數據和人工智能平臺：如何選擇和使用好的基準測試（重復）.pdf》由會員分享，可在線閱讀，更多相關《基準數據和人工智能平臺：如何選擇和使用好的基準測試（重復）.pdf（45頁珍藏版）》請在三個皮匠報告上搜索。

1、2024 Databricks Inc.All rights reserved2Benchmarking Benchmarking Data and AI Data and AI PlatformsPlatformsShannon Barrow,Lead Solutions Architect,DatabricksShannon Barrow,Lead Solutions Architect,DatabricksJoe Harris,Sr Software Engineer,DatabricksJoe Harris,Sr Software Engineer,Databricks2024 Dat

2、abricks Inc.All rights reservedBenchmarking Data and AI PlatformsBenchmarking Data and AI Platforms3Joe HarrisJoe Harris,Sr Software EngineerSr Software EngineerShannon Barrow,Lead Solutions ArchitectShannon Barrow,Lead Solutions Architect2024 Databricks Inc.All rights reservedBenchmarking Data and

3、AI PlatformsBenchmarking Data and AI Platforms4Shannon Barrow,Lead Solutions ArchitectShannon Barrow,Lead Solutions ArchitectHow Much You Bench,Bro?How Much You Bench,Bro?Joined Databricks in March 2019Previously:Principal-Innovation and Thought Leadership,Accenture Applied AnalyticsDespite overarch

4、ing benchmark discussion,I may:Put extra focus on TPC-DIPut on my Databricks hat for short segments2024 Databricks Inc.All rights reservedPrimarily TPC but others will be mentionedSuggestion:view the following benchmarks through the lens of a full endfull end-toto-end Lakehouse architectureend Lakeh

5、ouse architectureHow can an organization get a“full picture”of an end-to-end TCO?Highlights and challengesFocus on Gen AIWhatWhat to benchmark?Lessons learned from MosaicLakehouse/OLAPLakehouse/OLAPML/Gen AIML/Gen AI5Todays ScopeTodays ScopeFocus on LAKEHOUSE and AI Related BenchmarksFocus on LAKEHO

6、USE and AI Related Benchmarks2024 Databricks Inc.All rights reservedWhy Benchmark?Why Benchmark?6LevelLevel-setting on the Value and Limitations of Benchmarkssetting on the Value and Limitations of Benchmarkshttps:/ Databricks Inc.All rights reservedWhy Benchmark?Why Benchmark?7LevelLevel-setting on

7、 the Value and Limitations of Benchmarkssetting on the Value and Limitations of BenchmarksStandardization and repeatabilityTo conform to the same practicesTo conform to common industry operations,use cases,input/output,and scaleIndustry“agreed upon”testing heuristics“Official”submissionsPotential fo

8、r:CheatingBiasAbuseSlow pace of modernizationLevel playing field for all platformsLevel playing field for all platformsCan be hard to believe any resultsCan be hard to believe any results2024 Databricks Inc.All rights reserved8Lakehouse Lakehouse BenchmarkingBenchmarking2024 Databricks Inc.All right

9、s reservedActive Benchmarks Per TPC Website:TP(P)C TP(P)C The Ubiquitous StandardThe Ubiquitous Standard9Most prevalent and wellMost prevalent and well-knownknownT Transaction P Processing P Performance C CouncilFormed in 1988Benchmarks across multiple domainsDecision Support(OLAP)TPC-DI,TPC-H,TPC-D

10、STransaction Processing(OLTP)TPC-C,TPC-H“Big Data”TPC-HS,TPC-BBVirtualizationTPC-V,TPC-HCIInternet of Things(TPC-IOT)AI(TPC-AI)These are the only ones in scope today2024 Databricks Inc.All rights reservedSSB(Star Schema Benchmark)Designed to measure the performance of databases in a star schema setu

11、pSimpler than TPC benchmarks but focused on specific aspects of OLAP queryingClickBench,The No-Join BenchmarkFocuses on workloads without joinsSimulates scenarios common in clickstream analyticsOther OLAP BenchmarksOther OLAP Benchmarks10Are TPC Benchmarks The Only Game in Town?Are TPC Benchmarks Th

12、e Only Game in Town?2024 Databricks Inc.All rights reservedThere is no SQL consumption in TPC-DINo transformations in TPC-H or TPC-DSMost“unofficial”results even skip the data loading step altogetherLakehouse Focus Lakehouse Focus-10k Foot View 10k Foot View 11TPC fragments the Lakehouse Architectur

13、e into separate benchmarksTPC fragments the Lakehouse Architecture into separate benchmarksTPC-DITPC-HTPC-DS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedETL:TPCETL:TPC-DIDI122024 Databricks Inc.All rights reservedZEROZERO official submissionsWas Databricks first to c

14、ode it?I originally presented completedcompletedbenchmark at DAIS 2022Not submittedsubmitted(not for lack of trying)Extremely short TL;DR:ZERO code givenIngest:TXT,CSV,XMLTransform:based upon 100+pages of business rulesLoad:all 3 medallion layersTPCTPC-DI:Data IntegrationDI:Data Integration13The Ing

15、estion and ETL OneThe Ingestion and ETL One2024 Databricks Inc.All rights reservedThe The bestbest official ETL benchmark availableofficial ETL benchmark availableThe The worstworst official ETL benchmark official ETL benchmark availableavailable14Is the TPCIs the TPC-DI Valuable?DI Valuable?A Frust

16、rating Benchmark that hides some real valuable insightA Frustrating Benchmark that hides some real valuable insight2024 Databricks Inc.All rights reservedThe The bestbest official ETL benchmark availableofficial ETL benchmark availableThe The worstworst official ETL benchmark official ETL benchmark

17、availableavailable15Is the TPCIs the TPC-DI Valuable?DI Valuable?A Frustrating Benchmark that hides some real valuable insightA Frustrating Benchmark that hides some real valuable insightRobust even though built for legacy DWsBusiness rules make for realistic testThough it suffers DQ issues with dat

18、a generator at higher scale factorsFlexibility in how rules are codedAllows practitioners to optimize to their platformIs anyone aware of another“official”ETL benchmark?No”official”submittalsScoring metrics are confusing and do not even allow for cloud platformsNo provided code means it is extremely

19、 frustrating to attempt this benchmarkMade worse by long,confusing business rules2024 Databricks Inc.All rights reservedWhy?Why?A)Because meA)Because me2)Because everybody2)Because everybodyD)Because Joe Abandoned UsD)Because Joe Abandoned UsExcuse Me While I DigressExcuse Me While I Digress16I will

20、 speak longer on TPCI will speak longer on TPC-DI than originally plannedDI than originally planned2024 Databricks Inc.All rights reservedThis is a slide from the DAIS 2022 DAIS 2022 Session in which we announced the TPC-DI had been finally implementedPhoton price per billion rows:$1.51$1.512 Years

21、of TPC2 Years of TPC-DI on DatabricksDI on Databricks17From Initial Implementation to Scorching Performance TodayFrom Initial Implementation to Scorching Performance Today2024 Databricks Inc.All rights reservedIn April 2023April 2023,we published a blog,How We Performed ETL on One Billion Records Fo

22、r Under a Dollar,to tout the power and TCO of Delta Live Tables Delta Live Tables on this benchmark.Photon price per billion rows:$0.96$0.962 Years of TPC2 Years of TPC-DI on DatabricksDI on Databricks18From Initial Implementation to Scorching Performance TodayFrom Initial Implementation to Scorchin

23、g Performance Today2024 Databricks Inc.All rights reservedThis video compiled in September 2023 September 2023 compares a dbtdbtimplementation against CDW competitorsPhoton price per billion rows:$0.73$0.732 Years of TPC2 Years of TPC-DI on DatabricksDI on Databricks19From Initial Implementation to

24、Scorching Performance TodayFrom Initial Implementation to Scorching Performance Today2024 Databricks Inc.All rights reservedA Prominent CDW Was missingA Prominent CDW Was missing20The Truth is Out ThereThe Truth is Out There2024 Databricks Inc.All rights reservedA Prominent CDW Was missingA Prominen

25、t CDW Was missing21Some can handle large file sizes,others cantSome can handle large file sizes,others cantWe tried benchmarking the other CDW other CDW but found it wholly intractable at larger scale factors since it is the only one that is unable to split raw files nativelyWe werent the only ones

26、to notice2024 Databricks Inc.All rights reservedLast month(May 2024)we expanded the benchmark to test non-DWs which required moving from dbtSince AWS has been improving EMR over the last few years this became the obvious first choice for non-dbt tests2 Years of TPC2 Years of TPC-DI on DatabricksDI o

27、n Databricks22From Initial Implementation to Scorching Performance TodayFrom Initial Implementation to Scorching Performance Today9 Graviton16-core workers=144 cores2024 Databricks Inc.All rights reserved2.2x faster on the cores 2.2x faster on the cores as 2 years ago!24 minutes down to 10.75 minute

28、s576 cores down to 144 coresImprovements from:PHOTON shifting into overdriveoverdriveGradual code and orchestration improvementsNo code is provided-optimize code to match the platformNewer generation VMsOther platform enhancements1 year ago:“A billion rows for under a dollar”Today:as low as 2020 on

29、spot(27 on-2 Years of TPC2 Years of TPC-DI on DatabricksDI on Databricks23From Initial Implementation to Scorching Performance TodayFrom Initial Implementation to Scorching Performance Today$-$0.20$0.40$0.60$0.80$1.00$1.20$1.40$1.60Jun-22Apr-23Sep-23May-24Price per Billion Rows Databricks Digression

30、 over 2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSQL:SQL:TPCTPC-H&TPCH&TPC-DS DS 262024 Databricks Inc.All rights reservedReleased in April 1999 to“fix”issues with TPC-DHowever,the following year TPC moved to develop a new decision support benchmark to better refle

31、ct modern OLAP implementationsIn January 2012,the TPC-DS was released begging the question why the TPC-H is still used by organizationsSo TPCSo TPC-H this still a thing?H this still a thing?27The“OG”OLAP benchmarkThe“OG”OLAP benchmark2024 Databricks Inc.All rights reservedInmon-style DW model with 1

32、 very large table(lineitem)and 7 smaller tablesAll tables contain DATE and STRING columns that are joined using numeric business keysLow Query Complexity:22 queries with only 1 LEFT JOIN,simple aggregates and subqueries,no nested CTEs,and predicates applied directly to large tablesEasier Tuning Comp

33、lexity:often“super-tuned”and each query gets a perfectly covering indexDoes not require a sophisticated optimizer:needs join reordering and predicate pushdownHow is TPCHow is TPC-H constructed?H constructed?28With the“Easy Button”With the“Easy Button”2024 Databricks Inc.All rights reservedFirst publ

34、ished in 2012 to counter aging TPC-Hs limitations and tackle current OLAP trendsA benchmark that is often misused/misrepresented by skipping one or more of the 3 components designed to ensure operational considerations arent forgotten for the sake of over-indexing on this benchmarks SQL queriesLoad

35、Test,Throughput Test,and Data Maintenance TestA multimulti-cloud data warehouse platform cloud data warehouse platform even publishes a highly-tuned,preloaded TPC-DS dataset in all deployed warehouses for users to consume Still valuable in a vacuum when results can be trusted and validatedTPCTPC-DS:

36、The Popular Kid in SchoolDS:The Popular Kid in School29The one we all fight overThe one we all fight over2024 Databricks Inc.All rights reservedBased on Kimball dimensional modeling(The Data Warehouse Toolkit)Replaced TPC-H 3NF approach with hybrid approach between 3NF and star schema,or a“multiple

37、snowflake schema”How is TPCHow is TPC-DS constructedDS constructed30Retailer selling goods via 3 different distribution channels:Store,Catalog,InternetRetailer selling goods via 3 different distribution channels:Store,Catalog,InternetSignificantly more complicated than TPC-HHeavy on advanced SQL fea

38、tures/functions and lopsided filters99 queries compared to meager 22 in TPC-H4 query classes:pure reporting queriespure ad-hoc queriesiterative OLAP queriesextraction or data mining queries2024 Databricks Inc.All rights reservedFeature Comparison:TPCFeature Comparison:TPC-H&TPCH&TPC-DSDS31Easier to

39、Easier to consume cheat sheetconsume cheat sheetFeatureFeatureTPCTPC-H HTPCTPC-DSDSData Data ModelModelSimpler schema,uses Inmon style DW modelComplex schema,Kimball style dimensional model.SchemaSchema1 very large table(lineitem)7 smaller tables6 fact tables(3 _sales,3 _returns)18 dimension tablesD

40、ata Data TypesTypesAll tables contain DATE and STRING columns.Tables are joined using numeric business keysFact tables use only INTEGER and NUMERIC columns.Only dimension tables use TIMESTAMP and STRINGTables are joined using numeric surrogate keysQuery Query ComplexiComplexityty22 queries:Low compl

41、exity Only uses 1 LEFT JOIN Only uses simple aggregates Subqueries are simple,no nested CTEs Predicates applied directly to large tables99 queries:High complexity 9 queries use LEFT JOIN,3 use a cross join Complex aggregates,15 queries use window functions Complex nested CTEs used in most queries Pr

42、edicates applied only to dimension tablesTuning Tuning ComplexiComplexitytyEasier for vendors to tune Often“super-tuned”with perfect indexes Does not require a sophisticated optimizer:Harder for vendors to tune Optimizing a specific query can make others slower Requires a sophisticated query optimiz

43、er:must be 2024 Databricks Inc.All rights reservedSimpler schema,easier to understand and manage.Fewer benchmark queries and they are easy to understandTables contain DATE and STRING columns that are used as predicatesLess realistic,not representative of more complex modern data warehousing needs.Si

44、mple queries do no reflect hyper-complex real world queries from tools like Tableau and dbtSimple schema does not reflect best practices such as SCD type 2.Easy Easy PeasyPeasy ManManToo Simple and Easy to Shortcut.Been Too Simple and Easy to Shortcut.Been replaced!replaced!32Is the TPCIs the TPC-H

45、Valuable?H Valuable?Best for:AdBest for:Ad-Hoc Manual BenchmarkingHoc Manual Benchmarking2024 Databricks Inc.All rights reservedComplex,realistic schema that better mimics enterprise data warehouses.Covers a broad spectrum of query types,SQL operators,and complex joins.Requires a sophisticated optim

46、izer,testing more capabilities.Higher complexity in setup and longer time to implement and tune.Many complex queries can make the results hard to evaluate.Can require significant resources to fully utilize and understand performance implications.Most modern of the Common SQL Most modern of the Commo

47、n SQL BenchmarksBenchmarksComplexity&Popularity Result in Missed Complexity&Popularity Result in Missed StagesStages33Is the TPCIs the TPC-DS Valuable?DS Valuable?Best for:Vendor Supported POC EvaluationsBest for:Vendor Supported POC Evaluations2024 Databricks Inc.All rights reserved2024 Databricks

48、Inc.All rights reservedTPCTPC-?34How do we get to here?Benchmark it allLHLH2024 Databricks Inc.All rights reservedThe State of“Lakehouse”BenchmarksThe State of“Lakehouse”Benchmarks35Each benchmark focuses on a portion of end-to-end Lakehouse platformFavors bias and“shortcuts”to improve performance R

49、eveals flaws in keeping benchmarks currentExample:TPC-DI has no way to calculate its benchmarked metric for cloud platformsNo Real“Official”Lakehouse BenchmarkNo Real“Official”Lakehouse BenchmarkLHBenchLHBench:Berkeley white paper implementing a Lakehouse benchmark on EMRComposed of 4 tests:TPC-DSTP

50、C-DS RefreshMerge MicrobenchmarkLarge File CountPattern appears sound and its a great start-but not“officialofficial”Community may want to move beyond TPC-DS as the core of the benchmark.How can the industry do better?How can the industry do better?2024 Databricks Inc.All rights reservedThought Expe

51、riment:Cluster TPCThought Experiment:Cluster TPC-DIDI36How costly is optimizing and what can be learned to“balance”a Lakehouse How costly is optimizing and what can be learned to“balance”a Lakehouse benchmark?benchmark?Modified for OPTIMIZEOPTIMIZEon all fact tables Adjusting for cluster start times

52、,it takes 44%longer44%longer15 29s*adjust for cluster start diff$6.42 On-Demand nodes($4.92 on spot)Despite tuning the tables this price is still less than:Half the price of EMR 1/3 the price of Big Query Over 15x cheaper than any other CDW we d f2024 Databricks Inc.All rights reservedThought Experi

53、ment:Cluster TPCThought Experiment:Cluster TPC-DIDI37How costly is optimizing and what can be learned to“balance”a Lakehouse How costly is optimizing and what can be learned to“balance”a Lakehouse benchmark?benchmark?Modified for OPTIMIZEOPTIMIZEon all fact tables Adjusting for cluster start times,i

54、t takes 44%longer44%longer15 29s*adjust for cluster start diffIs this worth it?Is this worth it?The answer is always the same:if if consumption savingssavings are greatergreater thanthanthan the costscosts to optimize the data2024 Databricks Inc.All rights reservedThought Experiment:Cluster TPCThoug

55、ht Experiment:Cluster TPC-DIDI38 Autostatsfeature:Stats on Write!Leverages Liquid ClusteringHow costly is optimizing and what can be learned to“balance”a Lakehouse How costly is optimizing and what can be learned to“balance”a Lakehouse benchmark?benchmark?STATSNO STATS2024 Databricks Inc.All rights

56、reservedThought Experiment:Cluster TPCThought Experiment:Cluster TPC-DIDI39 Autostatsfeature:Stats on Write!Leverages Liquid ClusteringHow costly is optimizing and what can be learned to“balance”a Lakehouse How costly is optimizing and what can be learned to“balance”a Lakehouse benchmark?benchmark?P

57、oint lookup ad-hoc type queryNot OptimizedOptimized.35s vs 10.2s30 x improvement2024 Databricks Inc.All rights reserved40Thought Experiment:Cluster TPCThought Experiment:Cluster TPC-DIDI Autostatsfeature:Stats on Write!Leverages Liquid ClusteringHow costly is optimizing and what can be learned to“ba

58、lance”a Lakehouse How costly is optimizing and what can be learned to“balance”a Lakehouse benchmark?benchmark?Point lookup ad-hoc type query BI-like Query using Dimensional filtering and dynamic file pruningNot OptimizedOptimized20 x task time improvement!2024 Databricks Inc.All rights reserved41Tho

59、ught Experiment:Cluster TPCThought Experiment:Cluster TPC-DIDIHow costly is optimizing and what can be learned to“balance”a Lakehouse How costly is optimizing and what can be learned to“balance”a Lakehouse benchmark?benchmark?How do we balance the Query Load based on the TPC-DI optimize added latenc

60、y?Back of the napkin mathConservatively assume 2x performance gains for SQLAssume 5 minutes longer to optimize ETL(at 10k scale factor)need to save at 5 minutes in queries5=2Therefore if there is 2x improvement in the SQL times and we need to make up 5 minutes,we need approximately 10 minutes of non

61、-optimized tables query time5 minutes on optimized tables2024 Databricks Inc.All rights reserved42AI BenchmarkingAI Benchmarkinghttps:/harvard-edge.github.io/cs249r_book/contents/benchmarking/benchmarking.html2024 Databricks Inc.All rights reservedStandardized methods allow us to quantitatively know

62、 the capabilities of different models,software,and hardware enabling fair comparisons across different solutions.Allow ML developers to measure the inference time,memory usage,power consumption,and other metrics that characterize a system.Goals and Objectives:Performance assessmentResource evaluatio

63、nValidation and verificationCompetitive analysisCredibilityRegulation and StandardizationWhy Benchmark in AI/ML?Why Benchmark in AI/ML?432024 Databricks Inc.All rights reserved3 primary categories:Hardware/SystemModelDataGranularity:MicroMacroEnd to EndTraining vs InferenceWhat to Benchmark in AI/ML

64、?What to Benchmark in AI/ML?44How does one benchmark something so subjective?How does one benchmark something so subjective?https:/harvard-edge.github.io/cs249r_book/contents/benchmarking/benchmarking.html2024 Databricks Inc.All rights reserved45In an LLM Not Far Away2024 Databricks Inc.All rights r

65、eserved According to Stanfords 2024 AI INDEX REPORT 15 benchmarks were deprecateddeprecated in 2023 alone-many of which were less than 4 years old 18 newnew benchmarks were added in 2023 The“Mosaic Evaluation Gauntlet”(blog)Evaluated 39 public benchmarks split across 6 core competenciesIn order to p

66、rioritize the metrics that are most useful for research tasks across model scales,we tested the benchmarks using a series of increasingly advanced modelsKeeping Pace and Choosing WiselyKeeping Pace and Choosing Wisely46Benchmarks are rapidly created and deprecated,what can Mosaics Benchmarks are rap

67、idly created and deprecated,what can Mosaics Gauntlet teach us?Gauntlet teach us?2024 Databricks Inc.All rights reservedPractitioners are growing incredibly skeptical about Academic BenchmarksHabitual issues overfitting models to existing benchmarksMMLU,HumanEval,Hellaswag are bona fide benchmarks b

68、ut model creators game the system for models to do well on themAccordingly,practitioners today tend to prefer evaluating their LLM options by human preference in the real-world-like LMSYSThe HAI Stanford Report even points out“human evaluation is in”(Chapter 2)LMSYS:Allows users to vote on the better response based on a prompt they provide to the LLMs-the user is blind to the choice of the models theyre given).Challenges and TrendsChallenges and Trends47Human evaluation is“in”Human evaluation is“in”2024 Databricks Inc.All rights reserved48Q&AQ&A

相關圖表

本文主要探討了數據倉庫和湖倉架構的基準測試問題。文章指出，雖然TPC（Transaction Processing Performance Council）基準測試在評估數據倉庫性能方面具有普遍性，但其對于湖倉架構的全面性存在局限。作者以TPC-DI和TPC-H為例，說明了這些測試在實際應用中的不足，如缺乏對現代數據倉庫操作的全面模擬，以及評分指標的混淆等。文章還提到了湖倉架構的其它基準測試，如SSB和ClickBench，并強調了在評估平臺性能時，需要綜合考慮整個數據處理流程。此外，文章討論了AI模型性能評估的挑戰，并引用了斯坦福大學2024年AI指數報告，指出學術基準測試的快速創建和廢棄，以及模型創建者可能通過過度擬合來優化模型性能的問題。報告還提到，實踐者越來越傾向于通過現實世界的實際應用來評估語言模型，如LMSYS，它允許用戶根據提供的提示對LLMs進行投票，而用戶不知道他們正在評估的模型的具體選擇。

"湖倉架構如何優化ETL流程？" "TPC-H與TPC-DS基準測試有何區別？" 如何選擇合適的基準測試？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站