2017年新一代數據倉庫-Apache HAWQ.pdf

編號:92461 PDF 34頁 13.69MB 下載積分:VIP專享
下載報告請您先登錄!

2017年新一代數據倉庫-Apache HAWQ.pdf

1、Copyright 2016.All rights reserved新一代數據倉庫:HAWQCopyright 2016.All rights reserved目錄公司簡介HAWQ成功案例Copyright 2016.All rights reserved數據生態系統應用用戶行為分析、反欺詐、用戶畫像、信用模型BIQlik,PowerBI分析挖掘機器學習/AISAS,SPSS,TensorflowETLInformaticaTalendKettleOLAP數據倉庫數據倉庫(Data Warehouse)MPP,SQL-on-Hadoop,NewDataWarehouse數據治理數據安全OLTP

2、關系數據庫,NoSQL,NewSQL全球數據倉庫市場規模2016年達數百億美金Cloud(公有云和私有云)920140320913302507072781027052.50%49.04%57.91%53.54%43.55%41.11%0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%0200040006000800010000120002014201520162017201820192020?1038169224853517590793471362626.47%63.01%46.87%41.53%67.96%58.69%45.36%0%20%40%6

3、0%80%0500010000150002014201520162017201820192020?Copyright 2016.All rights reserved數據庫:55年Database:1962年出現InvertedFileDatabaseSystemSystemDevelopmentCorporation數據庫的幾個階段1960s:NavigationalDBMS(網狀&層次模型)IntegratedDataStore(IDS)InformationManagementSystem(IMS)1970s-1990s:SQL/RelationalDBMSOLTP,Datawareho

4、use,MPP2000s-Present:PostRelationalNoSQL(XML,KV,Graph,Tree),NewSQL,NewDWCopyright 2016.All rights reserved數據庫的核心 數據模型&查詢語言 查詢優化和執行 索引與存儲 事務處理Copyright 2016.All rights reserved關系模型EdgarF.Codd1981 TuringAwardJimGray1998 TuringAwardMichaelStonebraker2014 TuringAward找出住在Harrison的所有客戶Select customer_name

5、FromcustomerWherecustomer_city=Harrison;A Relational Model of Data for Large Shared Data Banks.Copyright 2016.All rights reservedGraph/Tree/KV模型Key-ValueCassandra:CQLHBase:APIGraphModelNeo4jGiraph/PregelTreeXMLDatabaseMongoDBStreamingCopyright 2016.All rights reserved其他分類方法 事務處理 vs 分析分析處理處理 并行 vs 串行

6、 硬件:CPU vsGPU vsFPGAvsMemory 云數據庫 vs 非云數據庫?Copyright 2016.All rights reserved數據倉庫的演進MPPDB實例2DB實例1DB實例4DB實例3磁盤磁盤磁盤磁盤share-nothing硬件/軟件架構傳統數倉傳統數倉DB實例2DB實例1DB實例4DB實例3share-storage硬件/軟件架構共享存儲新一代數倉(New Data Warehouse)DB實例2DB實例1DB實例3分布式文件系統share-nothing硬件架構+軟件實現distributed shared-storage磁盤磁盤磁盤硬件配置架構可擴展性缺乏

7、彈性不易調整大多工業標準的x86服務器面向傳統BI分析復雜的計算需求幾十個節點工業標準的x86服務器面向大數據和人工智能支持數據湖彈性伸縮,支持CaaS平臺靈活配置上千個節點缺乏彈性不易調整適用場景大多專有硬件平臺面向傳統的BI分析十幾個節點Oracle,DB2Teradata,Vertica,Greenplum,RedshiftHive,HAWQ,SparkSQL,Snowflake數倉代表Copyright 2016.All rights reserved數據倉庫引擎比較開源開源&開放開放&線性可線性可擴展擴展私有軟件&閉源&非線性可擴展受限的性能受限的性能及及SQL兼容性兼容性高性能及高

8、性能及SQL兼容性兼容性SQLAmazonAthenaCopyright 2016.All rights reservedNewDW的細分類別 SQLonHadoop SparkSQL,Hive,HAWQ2.x,Presto SQLonObjectStore Snowflake(onS3),AmazonAthena(onS3)Hybrid:有自己的存儲,對外部存儲可插拔 HAWQ3.x,Oushu Database ImpalaCopyright 2016.All rights reservedNewDW特性比較SQLonHadoopSQL onObjectStoreSQL onHybridS

9、torageFeaturesHiveSparkSQLPrestoSnowflakeAthenaHAWQOushuImpala性能lowmiddlelowlowlowhightopmiddle可擴展性highhighhighhighhighhighhighhighUpdate/DeletebadN/AN/AweakN/AN/AGoodweak索引badN/AN/AN/AN/AN/AYesweakSQL兼容性middlemiddlebadmiddlebadgoodgoodmiddle高并發查詢nonononononoyesnoCopyright 2016.All rights reservedHA

10、WQCopyright 2016.All rights reservedApache HAWQ 發展歷程2011年-常雷博士在EMC/Pivotal提出創意,HAWQ項目啟動。2013年-HAWQ 1.0發布,性能是Hive的數百倍。2014年-HAWQ SIGMOD論文發表,得到國際數據庫界認可。2014年-HAWQ為全球多家大型企業客戶采用。2015年-HAWQ開源成為Apache項目。2016年-常雷博士及HAWQ核心團隊創立偶數科技。2017年-偶數得到國際頂級VC投資,致力于HAWQ的發展。2017年-Oushu Database3.0企業版本發布,全新執行器,世界上最快的數據倉庫H

11、AWQ主要發展歷程10倍倍性能提升性能提升Copyright 2016.All rights reservedGreenplum database(2003)replicationPrimarySegmentSegment hostMaster hostInterconnectPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMi

12、rrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentreplicationData/CatalogData/CatalogDegree of Parallelism=8#Segment Per Node=4Copyright 2016.All rights reservedHAWQAlpha:Greenplum DatabaseonHDFS(2011)PrimarySegmentSegment hostMaster hostInterconnectPrimarySegmentMirrorSeg

13、mentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentPrimarySegmentSegment hostPrimarySegmentMirrorSegmentMirrorSegmentNamenodeBreplicationRack1Rack2DatanodeDatanodeDatanodeMeta OpsDatanodereplicationrepl

14、icationCatalogCatalogDataDegree of Parallelism=8#Segment Per Node=4Issues:Recovery complexityExpansion complexityManagement complexity(many segments per node)Fixed Degree of ParallelismCopyright 2016.All rights reservedHAWQ1.0GAArchitecture(2013)SegmentSegment hostMaster hostInterconnectSegmentSegme

15、nt hostSegment hostSegment hostNamenodereplicationRack1Rack2DatanodeDatanodeDatanodeMeta OpsDatanodeDataStatelessDegree of Parallelism=8#Segment Per Node=2SegmentSegmentSegmentSegmentSegmentSegmentCatalogIssues:Recovery complexityExpansion complexityManagement complexity(many segments per node)Fixed

16、 Degree of ParallelismCopyright 2016.All rights reservedHAWQ2.0:ArchitectureChange(2016Q2)SegmentSegment hostMaster hostInterconnectSegment hostSegment hostSegment hostNamenodeRack1Rack2Meta OpsCatalogStatelessDegree of Parallelism=Any(#vseg)#Segment Per Node=1ResourceManagervsegvsegvsegvsegSegmentv

17、segvsegvsegvsegSegmentvsegvsegvsegvsegSegmentvsegvsegvsegvsegreplicationDatanodeDatanodeDatanodeDatanodeDataIssues:Recovery complexityExpansion complexityManagement complexity(many segments per node)Fixed Degree of Parallelism世界上第一個世界上第一個和和PaaS/Docker云平臺原生云平臺原生結合的結合的并行并行SQL引擎引擎Copyright 2016.All rig

18、hts reservedHAWQ+3.0:HornetExecutionEngine(2017Q3)SegmentSegment hostMaster hostInterconnectSegment hostSegment hostSegment hostNamenodeRack1Rack2Meta OpsCatalogStatelessHornet Execution Engine:SIMD/New hardwareResourceManagervsegvsegSegmentvsegvsegSegmentvsegvsegSegmentvsegvsegreplicationDatanodeDa

19、tanodeDatanodeDatanodeDataHornetHornetHornetHornet10times fasterThe Fastest Engine in the WorldCopyright 2016.All rights reservedOushu Database3.0 vs SparkSQL 2.2單位(毫秒單位(毫秒ms)OushuSparkratioselectcount(*)fromlineitem;21.282555120.06selectcount(*)fromlineitem;22.772440107.16AVERAGE22.032497.50113.61C

20、opyright 2016.All rights reservedcount不同數據類型的列單位(毫秒單位(毫秒ms)OushuSparkRatioselectcount(l_orderkey)fromlineitem;306.70392512.80selectcount(l_partkey)fromlineitem;274.35367413.39selectcount(l_suppkey)fromlineitem;244.77346614.16selectcount(l_linenumber)fromlineitem;133.67326524.43selectcount(l_quantity

21、)fromlineitem;110.12368933.50selectcount(l_extendedprice)fromlineitem;112.05362732.37selectcount(l_discount)fromlineitem;108.64388635.77selectcount(l_tax)fromlineitem;115.14372332.33selectcount(l_returnflag)fromlineitem;70.41459165.20selectcount(l_linestatus)fromlineitem;73.01420857.64selectcount(l_

22、shipdate)fromlineitem;127.12421833.18selectcount(l_commitdate)fromlineitem;135.43450633.27selectcount(l_receiptdate)fromlineitem;134.36419331.21selectcount(l_shipinstruct)fromlineitem;236.63431118.22selectcount(l_shipmode)fromlineitem;177.66417323.49selectcount(l_comment)fromlineitem;344.94588517.06

23、AVERAGE169.064083.7529.88Copyright 2016.All rights reservedsum/avg不同數據類型的列單位(毫秒單位(毫秒ms)OushuSparkRatioselectsum(l_orderkey)fromlineitem;323.16341410.56selectsum(l_partkey)fromlineitem;298.30332111.13selectsum(l_suppkey)fromlineitem;263.69324312.30selectsum(l_linenumber)fromlineitem;154.20319320.71se

24、lectsum(l_quantity)fromlineitem;128.39400431.19selectsum(l_extendedprice)fromlineitem;138.48404229.19selectsum(l_discount)fromlineitem;141.68350024.70selectsum(l_tax)fromlineitem;143.07353624.72selectavg(l_orderkey)fromlineitem;327.68351110.71selectavg(l_partkey)fromlineitem;303.51358311.81selectavg

25、(l_suppkey)fromlineitem;269.36333112.37selectavg(l_linenumber)fromlineitem;161.41319619.80selectavg(l_quantity)fromlineitem;131.92361427.40selectavg(l_extendedprice)fromlineitem;138.48355425.66selectavg(l_discount)fromlineitem;134.01361827.00selectavg(l_tax)fromlineitem;137.92354925.73AVERAGE199.703

26、513.0620.31Copyright 2016.All rights reservedgroupby(某一列)取count單位(毫秒單位(毫秒ms)OushuSparkRatioselectl_orderkey,count(*)fromlineitem groupbyl_orderkey;14314.14OOMNANselectl_partkey,count(*)fromlineitemgroupbyl_partkey;4127.98292997.10selectl_suppkey,count(*)fromlineitem groupbyl_suppkey;1142.611818115.9

27、1selectl_linenumber,count(*)fromlineitem group byl_linenumber;363.51957026.33selectl_quantity,count(*)fromlineitem groupbyl_quantity;370.151136730.71selectl_extendedprice,count(*)fromlineitem group byl_extendedprice;4929.78297366.03selectl_discount,count(*)fromlineitem groupbyl_discount;392.41103712

28、6.43selectl_tax,count(*)fromlineitemgroupbyl_tax;352.991037129.38selectl_returnflag,count(*)fromlineitem groupbyl_returnflag;545.861134620.79selectl_linestatus,count(*)fromlineitem groupbyl_linestatus;329.301121734.06selectl_shipdate,count(*)fromlineitem groupbyl_shipdate;638.511607725.18selectl_com

29、mitdate,count(*)fromlineitem groupbyl_commitdate;642.311616125.16selectl_receiptdate,count(*)fromlineitem groupbyl_receiptdate;647.121564924.18selectl_shipinstruct,count(*)fromlineitem groupbyl_shipinstruct;823.091153914.02selectl_shipmode,count(*)fromlineitem groupbyl_shipmode;630.631137118.03selec

30、tl_comment,count(*)fromlineitem groupbyl_comment;39032.16OOMNANAVERAGE(除去除去sparkOOM語句語句)1138.3015161.0721.66Copyright 2016.All rights reservedgroupby不同數據類型的列,取其sum和avg單位(毫秒單位(毫秒ms)OushuSparkRatioselectl_partkey,sum(l_partkey),avg(l_partkey)fromlineitemgroupbyl_partkey;8333.37544706.54selectl_suppkey

31、,sum(l_suppkey),avg(l_suppkey)fromlineitemgroupbyl_suppkey;1527.321950512.77selectl_linenumber,sum(l_linenumber),avg(l_linenumber)fromlineitemgroupbyl_linenumber;416.03991423.83selectl_quantity,sum(l_quantity),avg(l_quantity)fromlineitemgroupbyl_quantity;390.821194930.57selectl_extendedprice,sum(l_e

32、xtendedprice),avg(l_extendedprice)fromlineitemgroupbyl_extendedprice;9148.20320053.50selectl_discount,sum(l_discount),avg(l_discount)fromlineitemgroupbyl_discount;418.811075725.68selectl_tax,sum(l_tax),avg(l_tax)fromlineitemgroupbyl_tax;357.991073329.98AVERAGE2941.7921333.2918.98Copyright 2016.All r

33、ights reservedGroupby多列單位(毫秒單位(毫秒ms)OushuSparkRatioselectl_partkey,l_suppkey,count(*)fromlineitemgroupbyl_partkey,l_suppkey;13074.79OOMNANselectl_partkey,l_linenumber,count(*)fromlineitemgroupbyl_partkey,l_linenumber;18091.03OOMNANselectl_suppkey,l_extendedprice,count(*)fromlineitemgroupbyl_suppkey,

34、l_extendedprice;145543.51OOMNANselectl_partkey,l_shipmode,count(*)fromlineitemgroupbyl_partkey,l_shipmode;21298.14OOMNANselectl_partkey,l_shipdate,count(*)fromlineitemgroupbyl_partkey,l_shipdate;71890.82OOMNANselectl_suppkey,l_tax,count(*)fromlineitemgroupbyl_suppkey,l_tax;3994.25283347.09selectl_sh

35、ipdate,l_commitdate,count(*)fromlineitemgroupbyl_shipdate,l_commitdate;3159.433281110.39selectcount(l_orderkey)fromlineitemgroupbyl_linenumber,l_quantity,l_tax;1179.851808015.32AVERAGE2777.8426408.3310.93Copyright 2016.All rights reservedGroupby表達式單位(毫秒單位(毫秒ms)OushuSparkRatioselectl_partkey+l_suppke

36、y,count(*)fromlineitemgroupbyl_partkey+l_suppkey;4050.55316017.80selectl_partkey+1000fromlineitemgroupbyl_partkey+1000;2869.51270839.44selectl_tax*100fromlineitemgroupbyl_tax*100;426.141000523.48AVERAGE groupby表達式表達式2448.7322896.3313.57Copyright 2016.All rights reserved多個聚集函數單位(毫秒單位(毫秒ms)OushuSparkR

37、atioselectl_partkey,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_partkey;11878.22OOMNANselectl_suppkey,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_suppkey;2399.98237459.89selectl_linenumber,count(*),count(l_orderkey),sum(l_orderkey

38、),avg(l_orderkey)fromlineitemgroupbyl_linenumber;698.181094315.67selectl_quantity,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_quantity;702.601349619.21selectl_discount,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_discount;741.17126

39、6817.09selectl_tax,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_tax;670.631204617.96selectl_returnflag,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_returnflag;913.231281214.03selectl_linestatus,count(*),count(l_orderkey),sum(l_order

40、key),avg(l_orderkey)fromlineitemgroupbyl_linestatus;675.941244418.41selectl_shipdate,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_shipdate;1025.861784617.40selectl_shipmode,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_shipmode;selec

41、tl_comment,count(*),count(l_orderkey),sum(l_orderkey),avg(l_orderkey)fromlineitemgroupbyl_comment;117636.74OOMNANAVERAGE1722.5817189.4614.97Copyright 2016.All rights reservedTPCHQuery單位(毫秒單位(毫秒ms)OushuSparkRatioTPCHQ11175.991862615.84TPCHQ11140.011806015.84TPCHQ11161.931809615.57AVERAGE1159.3118260.

42、6715.75TPCHQ1selectl_returnflag,l_linestatus,sum(l_quantity)assum_qty,sum(l_extendedprice)assum_base_price,sum(l_extendedprice*(1-l_discount)assum_disc_price,sum(l_extendedprice*(1-l_discount)*(1+l_tax)assum_charge,avg(l_quantity)asavg_qty,avg(l_extendedprice)asavg_price,avg(l_discount)asavg_disc,co

43、unt(*)ascount_orderfromlineitem_1gorc_nonewherel_shipdate=1998-08-20groupbyl_returnflag,l_linestatus;Copyright 2016.All rights reservedOushu Database4.0:GlobalScale(2017H1)GlobalScale:No master,P2P,Geo-replication,mixedworkloadHornetHornetHornetHornetHornetHornetHornetHornetHornetHornetHornetHornetH

44、ornetHornetHornetCopyright 2016.All rights reservedHAWQ全球用戶(部分)Copyright 2016.All rights reserved某大型制造企業案例背景l大量傳感器數據無法及時處理l故障無法及時檢測帶來很大損失l傳統解決方案過于昂貴實現目標l搭建大數據平臺,提高其處理處理能力l200+節點分析平臺集群lPB級數據存儲l實現實時故障預測等應用Copyright 2016.All rights reserved某大型證券交易所 挑戰為了應對每天增長的交易量,替換現有OracleEDW平臺為了合規需要保存最細力度的交易數據經濟有效的方式

45、保證每天處理TB級別增量數據 解決方案把所有交易數據放入Hadoop和HAWQ把12億條記錄放到HAWQ里面進行查詢分析,獲得更好的性能Copyright 2016.All rights reserved偶數科技簡介EMC/Pivotal HAWQ創始人及HAWQ核心團隊成員創立偶數兩大數據倉庫/AI產品 Oushu Database(HAWQ+)Apache HAWQ成員大多為Apache Committer&PMC成員,來自各大云計算和大數據公司:EMC/Pivotal,Oracle,IBM,Teradata等畢業于國內外頂級學府,多個ACM程序設計大賽獎牌得主團隊研究成果發布在國際頂級數據管理會議上(比如SIGMOD等),并擁有多項國際專利獲得國際頂級VC投資:紅點和紅杉Copyright 2016.All rights reserved謝謝!

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(2017年新一代數據倉庫-Apache HAWQ.pdf)為本站 (云閑) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站