《基于對象存儲構建實時云原生數倉-張雁飛.pdf》由會員分享,可在線閱讀,更多相關《基于對象存儲構建實時云原生數倉-張雁飛.pdf(48頁珍藏版)》請在三個皮匠報告上搜索。
1、DatabendA modern warehouse with Rust for your massive-scale analyticshttps:/ and Data WarehouseOLTPHigh concurrencyLow latencyRow-orientedResource isolation?Duplicate data?Row+Column-orientedComplex queryQuery speedColumn-orientedMySQL etc.ClickHouse etc.HTAPOLAPTiDB etc.大數據分析遇到了什么”新”問題?傳統數倉為什么無法解決這
2、些“新”問題?新一代實時彈性數倉如何設計?使用 Rust 從 0 到 1 研發一款數倉是種什么體驗?大綱大綱Bohu TANG(張雁飛)Co-Creator of Databend:https:/ and MySQL(TokuDB)重度貢獻者Database Kernel|Distributed Database|Data Warehousehttps:/bohutang.me/當今當今(2023)大數據新問題大數據新問題 01 全球數據指數級增長全球數據指數級增長1024PB=1EB,1024EB=1ZB 大數據量下的資源利用率問題,IO-bound-CPU-bound如何讓系統更加智能,根
3、據查詢模式自動創建索引?越跑越快.如何面向 Warehouse+Datalake 需求設計?挑戰挑戰 SQL 生成執行計劃 執行計劃包含多個算子(Operator)算子需要被執行:Pull-Based 或 Push-Based 為什么不適合對象存儲?Cloud-Native Execution 設計設計 執行計劃被編排成 Pipelines Pull-Based+Push-Based 雙模式 負載感知(Workload-Aware),運行時動態擴展Cloud-Native Execution 設計設計 Metadata 緩存 Index 緩存 Data 緩存 緩存介質:Memory/Local
4、 Disk/Redis 等Cloud-Native Caching 設計設計 ParserSQL Query Optimizer Executor Storage Catalog數倉單體功能模塊數倉單體功能模塊Meta Services(User/Schema)Metadata Security TransactionCompute Services(Query Engine+Table Format)Executor Executor ExecutorStorage Services(AWS S3)Data Data DataSQL Query數倉模塊微服務化數倉模塊微服務化 Databen
5、d 架構架構 Databend 架構架構 Databend 架構架構 Databend 架構架構 Cluster Key(age)Age=20Age=35Age=80min:5max:100min:15max:40min:20max:60file2file3f i l e 1Automatic Tuning Cluster Key(age)Age=20Depth:2Age=35Depth:3Age=80Depth:1min:5max:100min:15max:40min:20max:60file2file3f i l e 1select from table where age=20 Auto
6、matic Tuning Cluster Key(age)Age=20Depth:2Age=35Depth:3Age=80Depth:1min:5max:100min:15max:40min:20max:60file2file3f i l e 1select from table where age=80 Automatic Tuning Cluster Key(age)Age=20Depth:2Age=35Depth:3Age=80Depth:1min:5max:100min:15max:40min:20max:60file2file3f i l e 1select from table w
7、here age=35 Automatic Tuning Cluster Key(age)Age=20Depth:2Age=35Depth:2Age=80Depth:1min:5max:100min:15max:60file2f i l e 1select from table where age=35 Automatic TuningCREATE CATALOG my_hive TYPE=HIVE CONNECTION=(URL=THRIFT_PROTOCOL=BINARY);SELECT*FROM my_hive.db1.table;Multiple Catalog RFC:https:/
8、databend.rs/doc/contributing/rfcs/multiple-catalog Databend+HiveCREATE CATALOG my_iceberg TYPE=ICEBERG CONNECTION=(URL=s3:/my_bucket/path/to/iceberg);SELECT*FROM my_iceberg.db1.table;Multiple Catalog RFC:https:/databend.rs/doc/contributing/rfcs/multiple-catalogDatabend+Iceberg Hive 替換 Trino/Presto 場
9、景成本降低至 25%(快手)歸檔場景成本降低至 5%(Dmall,微盟)400TB/天(2023.1統計)在使用 Databend 寫入公有云對象存儲 開源、開放,運維簡單、易上手成本成本Databend 開源社區開源社區 04 140+Contributors 5.3K Stars 迭代非??靐ttps:/ 開源社區開源社區 社區開發者:SAP Yahoo Fortinet Shopee PingCAP Alibaba Tencent ByteDance EMQ 快手(湖倉一體共建)Databend 社區被頂級需求頂級需求、頂級場景驅動頂級場景驅動 Databend 開源社區開源社區On-Premises社區版:https:/databend.rsServerless Cloud海外(AWS)https:/國內(阿里云)https:/Databend 體驗體驗:On-Premises,ServerlessDatabend 用戶用戶More Thanks