1、2024 Databricks Inc.All rights reservedBuilding an Building an open lakehouse open lakehouse with Delta Lake with Delta Lake UniFormUniFormJonathan BritoJune 13,20242024 Databricks Inc.All rights reservedTransformSourcesBI&ReportingTableauMySQLSystem LogsSalesforceS3 BucketStorageServingClosed archi
2、tectureSeparate stacks for data science and data warehousing2024 Databricks Inc.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringBI&ReportingTableauMySQLSystem LogsSalesforceS3 BucketNotebooksStorageServingClosed architectureSeparate stacks for data science and data
3、 warehousing2024 Databricks Inc.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringBI&ReportingTableauMySQLSystem LogsSalesforceAnalyticsS3 BucketNotebooksStorageServingClosed architectureSeparate stacks for data science and data warehousing2024 Databricks Inc.All rig
4、hts reservedMachine LearningTransformSourcesModel ServingFeature EngineeringBI&ReportingTableauMySQLSystem LogsSalesforceAnalyticsS3 BucketNotebooksStorageServingETL run in data warehouse in a proprietary format Closed architectureSeparate stacks for data science and data warehousing2024 Databricks
5、Inc.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringBI&ReportingTableauMySQLSystem LogsSalesforceAnalyticsS3 BucketNotebooksStorageServingData copied back to S3 for ML use casesETL run in data warehouse in a proprietary format Closed architectureSeparate stacks for
6、 data science and data warehousing2024 Databricks Inc.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringBI&ReportingTableauMySQLSystem LogsSalesforceAnalyticsS3 BucketNotebooksStorageServingData copied back to S3 for ML use casesETL run in data warehouse in a proprie
7、tary format Closed architectureSeparate stacks for data science and data warehousingAnalytics workloads inaccessible to Trino users2024 Databricks Inc.All rights reservedMy goals8Build an open data LakehouseOptimize price PerformanceAvoid Data DuplicationEfficiently scale costs as data growsInterope
8、rate on a single copy of dataChoose the best tool for the workloadUse any compute Engine2024 Databricks Inc.All rights reservedMy goals9Build an open data LakehouseOptimize price PerformanceAvoid Data DuplicationEfficiently scale costs as data growsInteroperate on a single copy of dataChoose the bes
9、t tool for the workloadUse any compute EngineMigration will requires significant effort,costs,and risk,so we need high confidence our new architecture will meet these goals!2024 Databricks Inc.All rights reserved10My EnginesChallenge:Pick a format that supports all my enginesStorageApache HudiDelta
10、LakeApache Iceberg2024 Databricks Inc.All rights reserved11My Enginesbut do I actually need to choose?ParquetMetadataParquetMetadataParquetMetadataStorageMetadataUsed for transaction source of truth,concurrency control,etc.DataAll formats use Parquet!2024 Databricks Inc.All rights reserved12My Engin
11、esDelta Universal FormatStorageMetadataUsed for transaction source of truth,concurrency control,etc.DataAll formats use Parquet!ParquetMetadataDelta Universal Format2024 Databricks Inc.All rights reserved13My EnginesThe open data lakehouseStorageMetadataUsed for transaction source of truth,concurren
12、cy control,etc.DataAll formats use Parquet!ParquetMetadataDelta Universal FormatCatalogOpen interfaces for systems can connect toUnity CatalogREST CatalogOpen APIs2024 Databricks Inc.All rights reserved14Delta Lake supports all ecosystemsSupport for any architecture you choose today or in the future
13、2024 Databricks Inc.All rights reservedHow UniForm works15 Metadata automatically generated to make Delta accessible as Iceberg/Hudi Parquet files remain the same Metadata is co-located with data Delta Lake with UniFormData stored in Delta can be read as if it were Iceberg or Hudi2024 Databricks Inc
14、.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringETL and Process EngineMySQLSystem LogsSalesforceS3 BucketNotebooksStorageServingOpen data architectureUnified serving layer for analytics,BI,AI,and ML bronzesilvergold2024 Databricks Inc.All rights reservedMachine Le
15、arningTransformSourcesModel ServingFeature EngineeringETL and Process EngineMySQLSystem LogsSalesforceS3 BucketNotebooksStorageServingOpen data architectureUnified serving layer for analytics,BI,AI,and ML BI&ReportingTableauAnalyticsbronzesilvergold2024 Databricks Inc.All rights reservedMachine Lear
16、ningTransformSourcesModel ServingFeature EngineeringETL and Process EngineMySQLSystem LogsSalesforceS3 BucketNotebooksStorageServingOpen data architectureUnified serving layer for analytics,BI,AI,and ML BI&ReportingTableauAnalyticsbronzesilvergoldETL run cost effectively in an open format 2024 Datab
17、ricks Inc.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringETL and Process EngineMySQLSystem LogsSalesforceS3 BucketNotebooksStorageServingOpen data architectureUnified serving layer for analytics,BI,AI,and ML BI&ReportingTableauAnalyticsbronzesilvergoldETL run cost
18、 effectively in an open format Single copy of data read as Delta or Iceberg2024 Databricks Inc.All rights reservedMachine LearningTransformSourcesModel ServingFeature EngineeringETL and Process EngineMySQLSystem LogsSalesforceS3 BucketNotebooksStorageServingOpen data architectureUnified serving laye
19、r for analytics,BI,AI,and ML BI&ReportingTableauAnalyticsbronzesilvergoldETL run cost effectively in an open format Single copy of data read as Delta or IcebergOption to swap compute for a workload2024 Databricks Inc.All rights reserved21BRONZERaw DataCleaned,Joined,EnrichedAggregatedSILVERGOLD1.Ena
20、ble Delta Universal FormatCreate a table using new table featureCREATE TABLE main.default.uniFormTable(c1 INT)TBLPROPERTIES(delta.universalFormat.enableIcebergCompatV2=true)2.Write to the Delta tableIceberg metadata is automatically generatedINSERT INTO main.default.uniFormTable VALUES(111)Perform h
21、igh performing,cost effective ETL on data lake Write UniForm in DatabricksDelta LakeUniFormEnable UniForm on gold layer tables read by downstream Iceberg clients 2024 Databricks Inc.All rights reservedIcebergSNOWFLAKEUniFormDATABRICKSIngestion performanceLower is better 6x90%lessexpensiveCost effect
22、ive ingestion and ETL2024 Databricks Inc.All rights reservedUniForm ONDeltaDATABRICKSETL performanceLower is betterIcebergSNOWFLAKEUniFormDATABRICKSIngestion performanceLower is better 6x90%lessexpensiveCost effective ingestion and ETLNo support for MERGE or Partitioning,making ETL impracticalIceber
23、gSNOWFLAKE s3:/tmp/v10.metadata.json5.Read the table in SnowflakeRead the table as IcebergSELECT*FROM my_uniform_tableRead an Iceberg from object storageRead UniForm in SnowflakeUse a catalog integration2024 Databricks Inc.All rights reservedComparable query performance in Snowflake minimizes disrup
24、tion to downstream BI and analytics workflowsIcebergSNOWFLAKEUniFormDATABRICKS10%Read performance in SnowflakeLower is better Minimize disruption to end users 2024 Databricks Inc.All rights reservedathenaRedshiftBigQuerySnowflakeSparkTrinoFlinkDBSQLIcebergREST CatalogIceberg REST CatalogDatabricksSO
25、URCESETLSERVINGIceberg Metadata LocationSource 1Source 2Source 3Connect to any Delta or Iceberg client2024 Databricks Inc.All rights reservedSince launch last year28-Ben TallmanChief Technology Officer at M ScienceDuring Public Preview:Used by 250+customers,including JPMC,AT&T,Disney,Instacart,and G
26、oldman SachsFully open sourced in Delta 3.0New features in UniForm:Proven compatibility with popular Iceberg readers,including Snowflake,Athena,RedshiftCompatibility with Liquid ClusteringSupport for Hudi“At M Science,UniForm provides us with the flexibility to write a single copy of our data that c
27、an be queried by any engine that supports Delta or Icebergthis is key to reducing costs and accelerating time-to-value”2024 Databricks Inc.All rights reserved29UniForm is now GA!To get started,see thepublic documentationDelta Lake with UniFormSupporting Other PlatformsConnect to Other PlatformsInter
28、operate across formatsUnity CatalogInteroperate with other systems regardless of table formatProvide open interfaces for other systems to connect to Unity CatalogEnable Databricks compute to federate to other catalogs Our Vision2024 Databricks Inc.All rights reserved31Sample Slides2024 Databricks In
29、c.All rights reservedPrimary iconsIncluded are a few various icons and illustrations.To access the full library of icons,please follow this link:ExamplesClick for primary iconsLife SciencesCloud SecurityAnalyticsData SharingCollaborationRetailMulti-cloudGamingPublic SectorPredictionData ScienceData
30、Lake2024 Databricks Inc.All rights reservedSecondary icons33Included are a few various icons and illustrations.To access the full library of icons,please follow this link:Click for secondary iconsExamplesThis information is provided to outline Databricks general product direction and is for informat
31、ional purposes only.Customers who purchase Databricks services should make their purchase decisions relying solely upon services,features,and functions that are currently available.Unreleased features or functionality described in forward-looking statements are subject to change at Databricks discretion and may not be delivered as planned or at allProduct safe harbor statement2024 Databricks Inc.All rights reserved