《5. Apache Doris 在小米大數據場景的應用實踐.pdf》由會員分享,可在線閱讀,更多相關《5. Apache Doris 在小米大數據場景的應用實踐.pdf(24頁珍藏版)》請在三個皮匠報告上搜索。
1、Apache Doris 在小米大數據場景的應用實踐魏 祚小米 數據庫研發工程師Apache Doris PMC 成員Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202
2、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023目錄2.Apache Doris 在小米的應用實踐3.Apache Doris 在小米的優化實踐1.小米 OLAP 的選型歷史和應用現狀4.Doris 在小米的未來規劃Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
3、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
4、mit Asia 2023Doris Summit Asia 20231小米 OLAP 的選型歷史和應用現狀Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dori
5、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023小米 OLAP 選型歷史 在小米A/B實驗場景,Doris 向量化版本(1.1.2版本)相比 Doris 0.13 非向量化版本的查詢性能整體提升超過了1倍。其他部分場景查詢性能提升達到35倍。Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
6、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
7、oris Summit Asia 2023Doris Summit Asia 2023Doris 的優勢 物化視圖/Rollup加速 支持豐富的索引 向量化引擎.分布式能力強 擴/縮容操作方便 不依賴外部組件.支持標準SQL.社區活躍 商業化公司主導社區發展.Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi
8、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米的應用現狀 Doris在小米內部主要服務于BI看板和報表分析的業務場景。小米內部支持了數百個業務,支持的核心業務有數十個。集群數量有
9、數十個,機器規模達數百臺。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
10、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米的應用現狀 單集群最大規模Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi
11、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232Apache Doris 在小米的應用實踐Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
12、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023
13、Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米BI平臺的應用實踐 Doris在小米最重要的使用場景之一是作為BI平臺的數據源。BI平臺底層支持多種數據源:Mysql、Doris、Hive、Iceberg、Execl、Csv、飛書表格。通過SQL或拖拽組件創建看板,支持自定義指標、維度。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
14、2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米BI平臺的應用實踐Doris Summit As
15、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S
16、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米用戶行為分析平臺的應用實踐 數據來源于各業務在網頁或APP上的埋點數據。用戶在網頁或APP上的各種操作都會抽象成事件實體?;谑录M行建模實現用戶行為分析。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
17、oris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米用戶行為分析平臺的應用實踐留存分析SELECT reten
18、tion_count(c.retention_info)FROM(SELECT distinct_id ,retention_info(1664553600000,day,timestamp,CASE WHEN event_name=view THEN 1 ELSE 0 END|CASE WHEN event_name=buy THEN 2 ELSE 0 END)AS retention_info FROM retention_analysis_test WHERE timestamp=1664553600000 GROUP BY distinct_id)c;Doris Summit Asia
19、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
20、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris在小米用戶行為分析平臺的應用實踐漏斗分析SELECT funnel_count(c.funnel_info)FROM(SELECT distinct_id ,funnel_info(1664586000000,604800000,CASE WHEN event_name=view THEN 1 ELSE 0 END|CASE WHEN event_name=open THEN 2 ELSE 0 END|CASE WHEN eve
21、nt_name=buy THEN 4 ELSE 0 END,timestamp)AS funnel_info FROM funnel_analysis_test WHERE timestamp=1664586000000 GROUP BY distinct_id)c;Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
22、 Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023小米 Doris 數據作業管理實踐小米數據生態Doris數據寫入Doris Summit Asia 2023Doris Summit Asi
23、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
24、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023小米Doris數據作業管理實踐表數據原子更新分區數據原子更新Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
25、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20233Apache Doris 在小米的優化實踐Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
26、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
27、23支持 Flink Exactly-Once 語義 問題:Doris 不支持 Stream Load 兩階段提交,在 Flink 重啟之后,數據可能會出現重復寫入。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2
28、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持 Flink Exactly-Once 語義 優化:優化數據寫入事務流程,增加事務預提交狀態,支持Stream Load兩階段提交。Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
29、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023D
30、oris Summit Asia 2023Doris Summit Asia 2023支持 Flink Exactly-Once 語義 結果:通過 Stream Load 兩階段提交支持 Flink Exactly-Once 語義,保證多個 Stream Load 任務原子性。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S
31、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持單副本數據寫入能力 副本角色:1Master副本+2 Slave副本。Master副本執行排序、聚合、編碼、壓縮等耗費資源的操
32、作,刷寫文件。Slave副本同步Master副本文件,保證數據高可用。降低CPU和內存的使用量。三副本數據寫入單副本數據寫入 副本角色:3副本地位相同。3副本同時執行完全相同的排序、聚合、編碼、壓縮等操作,刷寫文件。CPU和內存的使用量高。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dori
33、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023支持單副本數據寫入能力 高并發寫入場景:數據寫入作業執行效率提升1.5倍。單任務場景:內存使用量節省2/3,CPU使用量節省1/3。Doris Summit Asia
34、2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summ
35、it Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20234Doris 在小米的未來規劃Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
36、mit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris 在小米的未來規劃 引入社區 Apache Doris 新版本,支持租戶隔離能力。開發元數據和用戶使用監控平臺,支持精細化服務監控和治理能力。Doris Summit Asia 2023Doris Summit
37、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
38、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023獲取更多社區動態與最佳實踐Doris Summit 峰會官網:doris- Doris Summit 峰會回放:https:/ Doris 官網:doris.apache.orgApache Doris GitHub: Doris 官方平臺:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023