1、基基于于 A Ap pa ac ch he e D Do or ri is s 構構建建 1 10 0 倍倍性性價價比比的的日日志志分分析析方方案案肖康飛輪科技 技術副總裁Apache Doris CommitterDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asi
2、a 2023目目錄錄1.日志存儲分析場景需求2.基于 ES 的日志平臺痛點3.基于 Doris 的新一代日志分析平臺4.實踐案例Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20231 1日日志志存存儲儲分分析析場場景景需需求求Doris Summit As
3、ia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023日日志志存存儲儲分分析析的的典典型型應應用用場場景景降低安全風險 提升系統安全性網網絡絡安安全全支持業務分析 加速業務增長業業務務分分析析保障服務穩定 提升用戶體驗可可觀觀測測性性Doris Summit Asia 2023Doris
4、 Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023日日志志存存儲儲分分析析的的3 3V V存儲規模大、存儲周期長對存儲成本敏感V Vo ol lu umme e -數數據據量量大大海量數據實時寫入、低延遲可見實時交互式分析V Ve el lo oc ci it ty y -實實時時寫寫入入與與檢檢索索數據類
5、型多樣,Text和JSONSchema EvolutionV Va ar ri ie et ty y -S Sc ch he emma a F Fr re ee eDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20232 2基基于于 E ES S 的的日日志
6、志系系統統痛痛點點Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023基基于于 E ES S的的典典型型日日志志系系統統架架構構KafkaLogstashKibaba UIES DSL API RequestLogCollectoreg.filebeat數
7、數據據總總量量:P PB B級級查查詢詢并并發發:數數十十Q QP PS S每每日日增增量量:T TB B級級Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023挑挑戰戰1 1 S Sc ch he emma a F Fr re ee e 支支持持有有限限
8、字字段段類類型型固固定定不不變變字段類型沖突不允許寫入字段類型不能更改 =reindex重寫數據已有字段的索引不能增加或刪除 =全建索引已有字段的索引不能調整分詞等參數索索引引固固定定不不變變Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023挑挑戰戰2
9、2 分分析析能能力力弱弱Q Qu ue er ry y D DS SL L學學習習門門檻檻高高DSL(Domain Specific Language)面向搜索場景設計不符合使用習慣,寫查詢經常需要查手冊D DS SL L功功能能單單一一不不支支持持J Jo oi in n只支持簡單的單表分析不支持多表 Join、子查詢、視圖等復雜分析D DS SL L生生態態封封閉閉ES生態自成體系,與BI類系統或數據生態工具打通較為困難Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
10、23Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023挑挑戰戰3 3 性性價價比比低低寫寫入入性性能能低低數據寫入需要構建索引、消耗大量CPU資源、寫入效率低下業務高峰期容易觸發reject,寫入延遲升高存存儲儲成成本本高高正排、倒排、列存等多份數據存儲,高度冗余整體數據壓縮比約1:1.5,遠低于常見的1:5大大查查詢詢不不穩穩定定寫入高峰時易導致集群不穩定大查詢易觸發JVM OOM,影響整個集群寫入和查詢TB百TBPBDor
11、is Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20233 3基基于于D Do or ri is s的的新新一一代代日日志志分分析析平平臺臺Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2
12、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023統統一一存存儲儲,消消除除數數據據孤孤島島開開放放生生態態,更更強強分分析析能能力力更更多多日日志志接接入入方方式式KafkaLogStashDiscoverMySQLAPI RequestLogCollectoreg.filebeatflume,fluent BI ToolsFlinkGrafana基基于于 A Ap pa a
13、c ch he e D Do or ri is s的的新新一一代代日日志志系系統統架架構構Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023豐豐富富的的數數據據類類型型Text,JSON,Array,MapVariant,允許一個字段多種類型在線毫秒級增
14、減字段在線按需增減索引,增量構建索引在線按需更改類型S Sc ch he emma a E Ev vo ol lu ut ti io on n優優勢勢1 1 原原生生的的半半結結構構化化數數據據支支持持Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023簡
15、簡單單易易用用支持標準SQL,無額外學習成本SQL語法與MySQL高度兼容豐豐富富的的數數據據生生態態MySQL協議兼容,可直接使用MySQL CLI無縫對接各類BI工具以及大數據生態組件強強大大的的分分析析能能力力支持檢索、聚合、多表JOIN、子查詢、窗口函數、UDF、視圖/物化視圖等功能優優勢勢2 2 基基于于S SQ QL L的的分分析析引引擎擎Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
16、Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20230100200300400500600寫入速度(MB/s)0100200300400500600700800存儲空間(GB)01020304050607080查詢耗時(s)A Ap pa ac ch he e D Do or ri is s E El la as st ti ic cs se ea ar rc ch h0100200300400500600寫入速度(MB/s)0510152025存儲空間(GB)00.511.522.5查詢耗時(s
17、)A Ap pa ac ch he e D Do or ri is s E El la as st ti ic cs se ea ar rc ch hES 官方benchmark httplogs測試集,32GB、2.47億條數據,11個查詢Microsoft Azure logsbench測試集,1TB、40億條數據,10個查詢相對于ES 3 35 5倍倍 寫入吞吐提升,8 80 0%存儲空間降低,2 23 3倍倍 查詢性能提升優優勢勢3 3 超超高高性性價價比比Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 202
18、3Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023優優勢勢3 3 超超高高性性價價比比E El la as st ti ic cs se ea ar rc ch hA Ap pa ac ch he e D Do or ri is sA Ap pa ac ch he e D Do or ri is s 冷冷熱熱分分層層日日增增數數據據(T TB B)100100100熱熱數數據據天天
19、數數377冷冷數數據據天天數數272323數數據據壓壓縮縮比比1.57.57.5熱熱數數據據存存儲儲空空間間(T TB B)2004040冷冷數數據據存存儲儲空空間間(T TB B)1800360360服服務務器器配配置置16C 64G 2 26 6.3 3T TB B16C 64G 2 26 6.3 3T TB B16C 64G 6 6.1 1T TB B服服務務器器數數量量951919計計算算資資源源成成本本(萬萬元元/月月)23.14.64.6云云盤盤存存儲儲成成本本(萬萬元元/月月)71.714.31.4對對象象存存儲儲成成本本(萬萬元元/月月)003.8云云資資源源總總成成本本(萬萬
20、元元/月月)9 94 4.8 81 18 8.9 99 9.8 8綜綜合合性性價價比比1 15 5倍倍9 9.7 7倍倍Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023關關鍵鍵技技術術 倒倒排排索索引引Doris Summit Asia 2023Dor
21、is Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023效果:百百億億日志檢索秒秒級級響應關關鍵鍵技技術術 日日志志檢檢索索查查詢詢優優化化Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2
22、023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023關關鍵鍵技技術術 導導入入性性能能優優化化Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit
23、 Asia 2023Doris Summit Asia 2023正排倒排列存倒排列存倒排列存倒排列存倒排列存倒排列存去掉正排簡化倒排索引列存+ZSTD壓縮冷熱數據分層關關鍵鍵技技術術 存存儲儲成成本本優優化化Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 20
24、234 4實實踐踐案案例例Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023實實踐踐案案例例1 1 網網絡絡安安全全“Doris只用原來 1 1/5 5 的服務器,承載了 1 1G GB B/s s 的寫入流量,安全分析查詢響應速度更快”統統一一日日志志
25、存存儲儲分分析析平平臺臺安安全全數數據據分分析析消消息息系系統統數數據據導導入入集群規模:10臺物理機數據增量:每天新增150億條日志、8.3TB,Doris壓縮后1.4TB(包括倒排索引,壓縮率5.9)數據總量:3副本保存60天,總共252TB、9千億條寫入性能:線上平均20w/s、100MB/s,峰值100w/s、500MB/s,壓測3臺機器200w/s、1GB/sDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Dori
26、s Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023實實踐踐案案例例2 2 通通信信制制造造統統一一日日志志存存儲儲分分析析平平臺臺日日志志檢檢索索下下載載L Lo og gs st ta as sh h日日志志采采集集處處理理集群規模:20臺物理機數據增量:每天新增1000億條日志、50TB,Doris壓縮后7TB(包括倒排索引,壓縮率7.1)數據總量:2副本保存90天,總共1.26PB、9萬億條寫入性能:線上平均150w/s、500MB/s,峰值600w/s,2GB/s“Do
27、ris全文檢索能滿足日志檢索分析的需求,日志存儲空間下降到ES的 1 1/6 6,系統成本大幅降低”Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023實實踐踐案案例例3 3 可可觀觀測測性性“Doris提供了靈活的半結構化數據類型v va ar ri i
28、a an nt t,成本相比云上ES節省 7 70 0%,查詢性能提升2 2-3 3倍倍”L Lo og g T Tr ra ac ce e 統統一一存存儲儲平平臺臺可可觀觀測測性性可可視視化化分分析析可可觀觀測測性性數數據據采采集集器器集群規模:10臺虛擬機數據增量:每天新增400億條數據、40TB,Doris壓縮后7TB(包括倒排索引,壓縮率5.7)數據總量:1副本保存30天,總共150TB、1.2萬億條寫入性能:線上平均40w/s、400MB/s,峰值100w/s,1GB/s,秒級實時寫入查詢并發:線上百QPS,p99延遲230msDoris Summit Asia 2023Doris
29、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023可可視視化化日日志志檢檢索索Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris
30、Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023謝 謝 觀 看T Th ha an nk k Y Yo ou uDoris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023Doris Summit Asia 2023