《2.Apache Doris 在中國電信及省公司的應用 2.pdf》由會員分享,可在線閱讀,更多相關《2.Apache Doris 在中國電信及省公司的應用 2.pdf(25頁珍藏版)》請在三個皮匠報告上搜索。
1、Apache Doris 在中國電信及省公司的應用尚書杰()中國電信 大數據部門技術專家Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
2、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023目錄1.TDP-Doris 的技術背景與產品介紹新一代 MPP 技術背景TDP-Doris 產品介紹2.TDP-Doris 在電信集團和省公司的應用TDP-Doris 遷移案例 1TDP-Doris 遷移案例 2TDP-Doris 遷移案例 33.Doris 社區互動Doris Summit Asia 2023Dori
3、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
4、2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20231TDP-Doris 的技術背景與產品介紹Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summi
5、t Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023為什么我們需要新一代 MPP 數據庫(1)數據入庫實時性不足,大部分天級,少部分小時級,實質是離線數倉 數據查詢交互性不足,分鐘級、小時級 組件眾多,復雜度高,門檻高 數據湖組件是加強,但還不夠 仍然存在以上問題,實際生產情況
6、下,入庫最多分鐘級,無法涵蓋所有場景 仍在快速發展期 復雜的 kerberos 配置Hadoop+Hive 的數倉體系存在弱點 大數據不只有大表大文件,還有中數據,小數據 要解決的是數據問題,而不只是大數據問題 大數據不只有 batch insert 和批量查詢 還有 append,delete,update 點查詢,點更新 還有 Rullup Google 不只有三架馬車,還有其它組件 比如 Google Mesa Google 也不是大數據的開始,也不是大數據的結束,只是大數據的插曲 新一代 MPP 蓬勃發展 Clickhouse/Doris 等更全面地看待大數據Doris Summit
7、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
8、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023為什么我們需要新一代 MPP 數據庫(2)由單機數據庫發展而來 單機庫偏OLTP 主要基于行存,列存是補充,列存方面優化不足 未使用CPU指令加速上一代 MPP 難以適應新變化 人手少,要維護的集群規模大,壓力大,任務重 絕大多數表,單天/月十億級以內 組件眾多,版本老舊 gbase,kudu+impala,phonix+hbase等 面臨去oracle任務 國產化MPP替代,國產化機器適配電信省公司現狀,易用好用夠用是關鍵Dor
9、is Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
10、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023TDP-MPP 軟件包 包含Tdp-Doris(2.0),Tdp-Clickhouse,Tdp-Trino 完全兼容開源功能,不改變用戶使用習慣 基于最新穩定版開發而成,并不斷迭代升級,保證與社區版完全兼容 集成Tdp-manager集群管理工具 支持可視化部署、配置修改一鍵下發、滾動升級 監控、報警、日志采集分析 集成大數據中臺組件 支持可視化表管理、SQL編寫與執行、可視化數據同步配置 自主編譯,國產化適配
11、,默認最佳參數配置 自研新功能,負載均衡,產品白皮書(案例集成)支持物理機部署,容器化K8S部署,docker混部,冷熱分層存儲TDP-MPP 是電信二次開發形成的 MPP 數據庫發行版本存儲節點遠端存儲集群(Ceph/Hdfs/S3.)存儲節點 存儲節點存儲節點渠道系統專家系統標簽試算自助取數TDP-MPP 新一代分布式數據庫Kafka實時數據Hive表TDP:Telecom Data Platform 電信大數據平臺,包含30多種組件 自研可視化部署、管理組件 自研數據中臺、監控報警、日志采集系統Doris Summit Asia 2023Doris Summit Asia 2023Dor
12、is Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
13、 2023Doris Summit Asia 2023Doris Summit Asia 2023TDP-MPP 軟件包存儲節點遠端存儲集群(Ceph/Hdfs/S3.)存儲節點存儲節點存儲節點渠道系統專家系統標簽試算自助取數TDP-MPP 新一代分布式數據庫Kafka實時數據Hive表 省公司問題第一時間響應 協助方案設計、架構優化、使用優化 省公司特殊需求優先開發 doris jdbc catalog for gbase doris hbase遷移50人專家團隊,社區密切合作 Haproxy Milvus Kudu 1.15+Impala 4.1 TSDB GraphDB其它Doris S
14、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202
15、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023TDP manager-自研集群管理工具 支持電信自研MPP引擎的自動化部署 支持30+組件的自動運維 最優新版本,定制化,全網經驗 一鍵集群部署,滾動重啟,最優配置(以doris fe為例)進程監控,失敗自動起 指標監控,日志接入ELK可視化部署,簡化運維Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi
16、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
17、mmit Asia 2023自研數據中臺 可視化sql查詢界面 可視化表管理工具 可視化導入界面 創建數據源 字段映射 任務調度可視化數據中臺Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
18、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023TDP-Doris 主要應用場景應用層,匯總層導入MPP,做查詢加速配置交互式報表,做多維查詢簡化數倉體系將DWD,DWS層直接入庫,直接查詢,無須再加工1.查詢加速 Kafka數據直接入MPP,實入實查 保留歷史數據,方便回查問題2.實時應用、實時數倉Doris
19、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
20、23Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232TDP-Doris 在電信集團和省公司的應用TDP-Doris 性能對比測試Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
21、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris/Spark 查詢性能對比 spark on yarn:32 executors、16G、2core doris-hive:4be物理機、2
22、.0.1版本,parallism設置為10 doris-inner:3be物理機,2.0.1版本,parallism設置為10TDP-DS 1TDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
23、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023性能測試:10*doris 30*gbase對比測試 1 10臺 doris be 和同等配置的30臺 gbase 多表 join,t645 2億,t17 5億,t705 3千萬Doris Summit Asia 2023Doris Summit Asia 202
24、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
25、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023性能測試:10*doris 30*gbase 10臺 doris be 和同等配置的30臺 gbase 簡單聚合查詢對比測試 2 gbase 對sum/count等作了預先統 doris可用物化視圖實現同樣功能 添加 where 條件,gbase 預先統計失效 下圖均是添加 where 條件后的執行時間 查詢表數據量均在百億級Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
26、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
27、mit Asia 2023導入性能測試 經過調優,16個并行任務,20分鐘內導入完成,整體腳本在30分鐘完成 測試seatunel 任務出錯時長時間無響應環境:3個fe,10個be不同測試條件(資源、參數、模式等不同情況)運行耗時說明(從操作易用性、特別注意事項方面補充)broker load3分鐘異步作業,需要后臺查進度seatunnel parallelism=163分鐘使用 spark 資源,減輕 doris 壓力spark load,20*5G5分鐘并發數加大,單個內存需求降低,運行加快Doris Summit Asia 2023Doris Summit Asia 2023Doris
28、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
29、23Doris Summit Asia 2023Doris Summit Asia 2023Hive數據導入Doris性能測試1.doris在數據關聯分析比Spark快3倍,比Oracle快8倍2.大數據平臺數據同步至doris和oracle效率相當3.某省電信將大數據平臺和Oracle加工流程遷移至Doris,其中大數據平臺加工1小時縮短至20分鐘,數據遷移0.5小時保持不動,Oracle數據加工2小時縮短至30分鐘以內,總體流程由原來的3.5小時縮短至80分鐘以內,應用展示提速2小時。Doris集群環境說明:集群節點:3個fe、9個be,配置都為40核256GB內存Oracle集群環境說明
30、:集群節點:計算節點2個,72核512G內存,存儲節點15臺,16核64G內存環境配置測試場景測試場景數據量大小說明Spark時長(秒)Doris時長(秒)Oracle時長(秒)決策支持隨翼選套餐發展清單加工單表加工寬表1.7億36771782多表加工1.寬表1.7億2.2表4000萬3.中間過程表1.7億關聯分析15583623數據加工效率驗證數據同步效率驗證測試場景測試場景Doris(broker load方式)Oracle(sqoop方式)銷售品實例寬表數據回寫地市11818地市21311地市31412地市4148地市599地市61410地市7158地市81410地市989地市10131
31、0地市1177地市1287地市13148地市141411地市1577所有地市并發數據同步1818結論Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
32、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris KV 查詢測試 3 fe,4 be,128G/16核,Doris 1.2.4 測試語句:指定key,查標簽 select lab3,lab4,.from dm_awt_all_tran where key=xx;要求:1億行數據,800列,8000/min 還可支持非key字段的多維查詢,joi
33、n Doris 2.0 kv QPS 更高,達到8767用戶標簽場景thread-num平均響應時間/msQPSErr率1091040.30.00%20101958.80.00%50123826.80.00%100(5*20)17.65534.10.00%Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
34、2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232TDP-Doris 在電信集團和省公司的應用TDP-Doris 遷移案例分享Doris Summit Asia 2023Doris Summit As
35、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S
36、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023案例 1:將應用系統遷移 Doris 數據庫數據模型遷移:整理網格全息系統、渠道支撐系統、企業健康度系統、財務智慧展示系統、企業數據門戶等系統的模型,在doris數據庫中新建需遷移的數據模型存量數據遷移:將存量數據從oracle和大數據平臺遷移到doris數據庫加工腳本改造:對照數據庫的SQL語法差異,改造所有遷移的數據加工腳本,并確保運行效率調度流程開發:基于ETL平臺,為所有數據加工腳本新配置對應調度流程(前置信號量、加工組件、輸出信號量、質量校驗與告警)前置數據下發:新增
37、大數據平臺到doris數據的數據下發流程、doris數據庫到應用展現存儲數據庫的數據下發流程新增API接口:去DBLINK數據共享形式,新增API接口支撐承包助手、包區助手、掌上BI等系統用數Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
38、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023案例 2:持續探索大數據實時數據入湖,打造 Hudi+Doris 湖倉一體架構實時數倉架構:以大數據平臺為主,MPP為輔,基于Hudi+Doris構建混搭架構的實時數倉:實時數倉在離線數倉的基礎上引入實時整合層、準實
39、時整合層、準實時匯總層、準實時集市層接口層:計費話單類數據實時入Doris庫(文件方式),其他實時數據入大數據集群(消息方式)整合層:全面切換云化模型并落地原生數據、基于增量實時接口形成全量數據(涉敏數據加密入安全區)匯總層:跟進業務發展熱點,分主題構建天翼云眼、云業務、天翼視訊、隨翼選等準實時寬表;實現用戶訂購、劃配、位置、使用量類標簽準實時化;集市層:構建實時/準實時的業務發展、產數標品、數字生活、客戶感知等集市模型Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023
40、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
41、sia 2023案例 3:使用 Hudi+Doris 取代老舊組件 基于hudi+Doris的湖倉一體新架構 Hudi做為數據湖組件,使數據就緒時間從天級到分鐘級 Tdp-doris作為統一查詢組件,替換Oracle,Kudu+Impala,Gbase,部分Hbase 支持標簽kv實時查詢 支持多維查詢 支持kafka數據直接入庫 支持聯邦查詢,讀hive表,本地表,舊庫表,方便數據遷移 新架構邏輯清晰,簡化數據流,使用組件少,維護壓力小Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
42、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
43、 Summit Asia 2023Oracle/Gbase替換-數據同步使用 jdbc-catalog的方式,連接oracle/gbase自研JDBC catalog不需建外表,建catalog后,庫、表自動可見灰度上線,逐步替換直接查詢select*from mysql_catalog.db.table where k1 1000 and k3=term;雙向同步數據使用insert into 本地表 select mysql_catalog.db.table where partition 的方式導入新增Trino Gbase/Doris Connectortrino worker 到 b
44、e,高并發導入CREATE CATALOG jdbc_oracle PROPERTIES(type=jdbc,user=root,password=123456,jdbc_url=jdbc:oracle:thin:127.0.0.1:1521:helowin,driver_url=ojdbc8.jar,driver_class=oracle.jdbc.driver.OracleDriver);Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
45、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20233Doris
46、社區互動Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summ
47、it Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023TDP-Doris 社區互動 Doris創始人,SelectDB CEO 馬如悅老師來電信訪問 連續3小時講解Doris 2.0最新優化,用戶使用情況 協助處理線上問題 技術會議、共享文檔、現場教學等 近期開發計劃 doris mixed operator oracle語法兼容、存儲過程自動化遷移 doris ranger適配Doris Summit Asia 2023Doris Summit A
48、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
49、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023獲取更多社區動態與最佳實踐Doris Summit 峰會官網:doris- Doris Summit 峰會回放:https:/ Doris 官網:doris.apache.orgApache Doris GitHub: Doris 官方平臺:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023