1、Apache Doris 在雨潤集團數據場景的最佳實踐石公星雨潤集團 大數據架構師Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
2、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023個人介紹石公星|雨潤集團基礎數據平臺架構師師畢業于南京大學,擁有8年大數據工作經驗。曾就職科大訊飛、蘇寧等公司,擔任過大數據開發工程師和大數據架構師。2021年加入雨潤集團,負責雨潤基礎數據平臺搭建以及數據中臺規劃。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summi
3、t Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dor
4、is Summit Asia 2023Doris Summit Asia 2023目錄2.架構演進3.Doris 的應用實踐1.背景介紹4.總結和展望Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
5、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20231業務背景Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
6、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023雨潤集團公司介紹雨潤集團是一家集食品、商業、農產品物流、
7、地產、酒店、互聯網、物業、金融和建筑等產業于一體的大型企業集團,總部位于江蘇南京,下屬子(分)公司 300 多家,遍布全國 30 個省、直轄市和自治區。目前,集團旗下擁有 雨潤食品(1068.HK)、中央商場(600280.SH)兩家上市 公司。雨潤不僅是一個品牌,更是一種生活方式,涉及人 民大眾生活的方方面面:衣、食、住、行、娛樂雨潤 致力于民生產業,圍繞人民大眾的生活需求,生產好的產 品,提供細致服務,用心創造各種可能Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202
8、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
9、Asia 2023業務背景業務場景數據類型離線報表:每天定時產出T+1離線報表,方便高層晨會使用。實時分析:對當日門店經營銷售數據實時展示,運營人員可以及時根據銷售數據,改變營銷策略。即席查詢:數倉工程師給業務人員提供寬表,自助式報表和數據門戶。生鮮數據:全國25家工廠,每個工廠屠宰信息、凍品信息、供應商信息、銷售信息深加工數據:全國17家工廠,每個工廠的倉庫信息、生產費用、采購費用、銷售信息養殖數據:全國8家養殖廠,大概7w頭豬,記錄每只豬各種單據數據,包括飼料、產仔、哺乳、配種Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
10、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
11、 Summit Asia 2023Doris Summit Asia 20232架構演進Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
12、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023基于 Hive 離線數倉架構數據源TMPT+1 全量數據ODS歷史全量數據DWD清洗關聯DWS輕度匯總ADS應用層mergeSqoopHive離線數倉計算引擎新零售養殖OA生鮮深加工第三方數據類型數據應用BI報表數據大屏數據推送數據接口SqoopDoris Summit Asia 2023Doris Summit As
13、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris S
14、ummit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023原實時架構HBaseFlume 文件解析Kafka消息隊列Binlog實時同步ProducerIoTFlinkCDCFlinkSparkHDFSHiveSQLETL離線同步Data SourceT+1IndicatorsT+0IndicatorsMySQLCron JobAppRedis數據源數據計算數據存儲應用層Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
15、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Sum
16、mit Asia 2023痛點問題人員效率Hadoop的運維復雜人員能力要求高開發流程復雜歷史數據更新昨日更新數據與歷史數據Merge在插入ODS查詢速度慢生產出現問題需要數據追溯有些大的任務跑完就需要20分鐘小文件問題有些業務線數據量比較小,HDFS不適合存儲小文件。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summi
17、t Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Why Doris遇到開發問題,社區反饋及時有利于后面系統升級維護社區高度活躍自研的報表平臺就不用再次開發,只要修改分頁語法兼容Mysql
18、協議ZSTD最高獲得了近 10 倍的壓縮率向量化執行引擎物化視圖性能強悍不依賴外部其他系統,運維簡單不存在小文件問題節點可線性擴展運維簡單Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summi
19、t Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023基于 Doris 2.0 離線數倉架構數據源ODS歷史全量數據DWD清洗關聯DWS輕度匯總ADS應用層Doris離線數倉新零售養殖OA生鮮深加工第三方數據類型數據應用BI報表數據大屏數據推送數據服務數據集成jarCatalogDoris Summit Asia 2023
20、Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit A
21、sia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023實時Flink CDC多表同時同步離線補數原始層ODS明細層DWD服務層 DWS零代碼SQL維護分鐘級別調度實時大屏實時報表實時預警消息推送AI算法JDBC數據源數據采集實時數倉應用層基于 Doris 2.0 實時數倉架構應用層ADSDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202
22、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023改進后成效計算效率成本費用
23、節省存儲資源減少人員效率提升3倍90%30倍100萬Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summ
24、it Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20233Doris 的應用實踐Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
25、 Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023DDL 同步工具識別Mysql和Oracle的DDL的字段類型轉換Doris的字段類型。Varchar類型長度乘以3DDL轉換當原表DDL發
26、生變化自動同步結構。DDL更新通過頁面勾選多表批量創建對應的Doris批量創建Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2
27、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023數據集成因為我們調度器使用的是DolphinScheduler,所以我們把導入程序打包成jar通過shell方式運行,目前支持MYSQL、ORACLE、Doris數據庫類型導入。運行方式通過指定:1.數據庫id、2.查詢的SQL、3.要導入的Doris表Doris Summit Asia 2023Doris Summit Asi
28、a 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Su
29、mmit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023數據服務配置數據源注冊API配置SQL數據服務認證授權黑白名單精準限流API網關BI大屏APP應用業務系統數據應用注冊服務調用調用調用查詢返回Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia
30、 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023數據服務Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
31、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2
32、023Doris Summit Asia 20234總結與展望Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
33、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023實踐經驗分享分析計算表模型數據集成大表之間的 Join,盡量讓它 Colocation Join,因為大表之間的網絡開銷是很大的,如果需要去做 Shuffle 的話,代價是很高的。Join順序:左表是大表,右表是小表,小表構建哈希表開銷小同時更好的利用Runtime Filter。索引:枚舉類型比較多建議使用Bloom索引,比較少用位圖索引。分
34、桶:為了避免數據傾斜,最好選擇分散廣的列作為分桶列。我們一般把where條件列和join列作為分桶列。實時導入建議攢批后導入,每次導入和更新都會生成segment,攢批寫入降低compaction次數。離線大數據導入(初始化)建議分批導入,例如:先查詢一列的最大值和最小值,劃分不同的區間拼接成多個SQL分步執行。Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summi
35、t Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023未來展望1.基于 Doris 做數據血緣管理解析調度平臺Doris的SQL,然后將依賴關系存入
36、到圖數據庫中??梢暬硪约白侄窝夑P系,方便追溯問題提升開發效率。2.基于 Doris 做數據地圖Doris的元數據管理,對表按照主題域和數據域劃分支持主題檢索、表檢索、字段檢索功能。3.加強與 Doris 深度合作,密切關注 Doris 的更新和改進Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
37、23Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023獲取更多社區動態與最佳實踐Doris Summit 峰會官網:doris- Doris Summit 峰會回放:https:/ Doris 官網:doris
38、.apache.orgApache Doris GitHub: Doris 官方平臺:Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023