《藍圖:聯合利華如何利用元數據為其湖倉一體提供支持.pdf》由會員分享,可在線閱讀,更多相關《藍圖:聯合利華如何利用元數據為其湖倉一體提供支持.pdf(46頁珍藏版)》請在三個皮匠報告上搜索。
1、2024 Databricks Inc.All rights reservedBLUEPRINTBLUEPRINTRoberto Alejandro Flores MeregoteRoberto Alejandro Flores Meregote20242024-0606-080812024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved2NICE TO MEET YOUNICE TO MEET YOU2024 Databricks Inc.All rights reserved2024 Dat
2、abricks Inc.All rights reservedBLUEPRINTBLUEPRINT3THE DE TEAM!THE DE TEAM!2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedBLUEPRINTBLUEPRINT4Roberto Flores MeregoteEurope Head of Data Engineering Unilever2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rig
3、hts reservedBLUEPRINTBLUEPRINT5AGENDA!AGENDA!2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved6CONTEXTCONTEXT2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedUNILEVERS DATA ESTATEUNILEVERS DATA ESTATE2024 Databricks Inc.All rights reserved202
4、4 Databricks Inc.All rights reservedSourcesIngestionFoundational Data Lake SystemsSemantic LayerGlobal ProductsSAP ECCSAP BWTeradata100+sourcesPOSProductStoreInternalRetailer/Purchased DataMarket Local Data SourcesPOSProductAzure DatabricksAzure Data FactoryUniversal Data LakeBusiness Data LakesRawS
5、taging CuratedAzure DatabricksDatabricks SQL Serverless3rd Party Data Acquisition Platformse.g;TradeEdgePower BI InternalMachine Learning&Self ServeDatabricks Machine LearningAzure Synapse AnalyticsAzure MLData Distribution LayerUnity CatalogAzure PurviewMarket Data LakeMarket Data LakeAzure Databri
6、cksRawStaging CuratedAzure Synapse AnalyticsMarket(s)ProductsUnity CatalogAzure PurviewAzure Active DirectoryPower BI Azure Analysis ServicesUNILEVERS DATA ESTATEUNILEVERS DATA ESTATE2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved9EUROPE IS A COMPLEX MARKETEUROPE IS A
7、COMPLEX MARKETPRIOR TO MDL,COUNTRIES HAD DIFFERENT DATA MATURITY LEVELSPRIOR TO MDL,COUNTRIES HAD DIFFERENT DATA MATURITY LEVELS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved10EUROPE IS A COMPLEX MARKETEUROPE IS A COMPLEX MARKETSIMILAR IN SIZE TO NORTH AMERICA,BUT WIT
8、H HIGH DATA COMPLEXITYSIMILAR IN SIZE TO NORTH AMERICA,BUT WITH HIGH DATA COMPLEXITYSource:Europe MDL|Country/Category Cells with a Nielsen subscriptionDATA COMPLEXITYUL Markets23838UL BG Cells10129129Official Languages12424Databases(external)203163162024 Databricks Inc.All rights reserved2024 Datab
9、ricks Inc.All rights reserved11EUROPE IS A COMPLEX MARKETEUROPE IS A COMPLEX MARKETPREVIOUS SETUP MEANT THAT WE STARTED WITH MANY ENVIRONMENTSPREVIOUS SETUP MEANT THAT WE STARTED WITH MANY ENVIRONMENTSMDL_mkt_devMDL_mkt_qaMDL_mktEU1EU2EU3EU4EU5EU6EU72024 Databricks Inc.All rights reserved2024 Databr
10、icks Inc.All rights reserved12FRAMEWORKFRAMEWORK2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWHAT WE WANTED TO MANAGE IN A TECH AGNOSTIC WAYWHAT WE WANTED TO MANAGE IN A TECH AGNOSTIC WAYFAMEWORK OVERVIEWFAMEWORK OVERVIEW2024 Databricks Inc.All rights reserved2024 Da
11、tabricks Inc.All rights reservedWHAT WE WANTED TO MANAGE IN A TECH AGNOSTIC WAYWHAT WE WANTED TO MANAGE IN A TECH AGNOSTIC WAYFAMEWORK OVERVIEWFAMEWORK OVERVIEWCommon Artifacts 1.Onboarding2.Unity Catalog Management3.Power Platform Integrations4.Data Factory Pipelines2024 Databricks Inc.All rights r
12、eserved2024 Databricks Inc.All rights reservedDATABASEDATABASEFRAMEWORK OVERVIEWFRAMEWORK OVERVIEW15Stored Procedure SampleCREATE OR ALTER PROCEDURE utils.sp_staging_infoflow_group_name NVARCHAR(255),staging_type NVARCHAR(255),meta_schema NVARCHAR(255),flow_filter NVARCHAR(255)=NULLASSELECT DISTINCT
13、fw.flow_name,fw.blueprint_version,az.kv_url,az.landing_adls_url,az.delta_lake_path,az.subscription_id,az.tenant_id,az.resourcegroup_name,az.secret_scope,az.azure_devops_organization,ct.user_id,ct.user_name,(.)152024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWHAT DOES I
14、T LOOK LIKEWHAT DOES IT LOOK LIKE16PARENT PIPELINEPARENT PIPELINE2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWHAT DOES IT LOOK LIKEWHAT DOES IT LOOK LIKE17STAGING PIPELINESSTAGING PIPELINES2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserve
15、dWHAT DOES IT LOOK LIKEWHAT DOES IT LOOK LIKE18STAGING PIPELINE SAMPLE STAGING PIPELINE SAMPLE-SFTPSFTP2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWHAT DOES IT LOOK LIKEWHAT DOES IT LOOK LIKE19ONBOARDINGONBOARDING2024 Databricks Inc.All rights reserved2024 Databrick
16、s Inc.All rights reservedWHAT DOES IT LOOK LIKEWHAT DOES IT LOOK LIKE20LAYER EXECUTION,DEPENDENCIES AND RUN ORDERLAYER EXECUTION,DEPENDENCIES AND RUN ORDER2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved21REPO STRUCTUREREPO STRUCTURE2024 Databricks Inc.All rights reserv
17、ed2024 Databricks Inc.All rights reservedFLOW FILESFLOW FILESMETADATA REPOSITORY STRUCTUREMETADATA REPOSITORY STRUCTURE22sample:Flow.yamlFlows:-flow_name:DACH_Retailerazure_id:Azure Infobi_id:EPOS_DE_PBIdatabricks_id:UC_Enabled_Clusterdownstream_adhoc:downstream_flow_name:mail_id:DACH ePos Mailrun_d
18、ownstream:Truemaster_pipeline:PL_Master_MetadataDrivenblueprint_version:2.0.12Contracts:-contract_name:DACH_REWE_SALES_ASadls_mount:/dbfs/mnt/landingzone/business_keys:nullconnection_id:DACH SFTPdelimiter:;file_type:.csvheader:0layer:bronze(.)222024 Databricks Inc.All rights reserved2024 Databricks
19、Inc.All rights reservedSCHEMASSCHEMASMETADATA REPOSITORY STRUCTUREMETADATA REPOSITORY STRUCTURE23Sample:Schemas.json$schema:http:/json-schema.org/draft-07/schema#,type:object,properties:Flows:type:array,items:type:object,properties:flow_name:type:integer,string,downstream_flow_name:type:string,null,
20、downstream_adhoc:type:string,null(.)232024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedGENERALIZED ONBOARDERGENERALIZED ONBOARDERMETADATA REPOSITORY STRUCTUREMETADATA REPOSITORY STRUCTURE24Sample:Onboarder.py#Databricks notebook sourcefrom blueprint.engineering.kiln impo
21、rt Kilnimport os#COMMAND-dbutils.widgets.removeAll()flow_name=dbutils.widgets.get(flow_name)contract_name=dbutils.widgets.get(contract_name)meta_schema=dbutils.widgets.get(meta_schema)print(fFlow Name:flow_name)print(fOnboarding Contract:contract_name)print(fMetadata Schema:meta_schema)#COMMAND-prin
22、t(fRunning with version of Blueprint:os.environPKG_VERSION)(.)242024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedDATA QUALITY(GREAT EXPECTATIONS)DATA QUALITY(GREAT EXPECTATIONS)METADATA REPOSITORY STRUCTUREMETADATA REPOSITORY STRUCTURE25Sample:DataQuality.pyimport pyspar
23、k.sql.functions as Fimport jsonfrom pyspark.sql.types import IntegerType,StringType,DoubleType,LongTypefrom delta.tables import*from great_expectations.core.batch import RuntimeBatchRequestfrom great_expectations.util import get_contextfrom great_expectations.data_context.types.base import(DataConte
24、xtConfig,FilesystemStoreBackendDefaults,)import great_expectations as gefrom great_expectations.dataset import(SparkDFDataset,MetaSparkDFDataset,)(.)252024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedUNITY CATALOG METADATAUNITY CATALOG METADATAMETADATA REPOSITORY STRUCTU
25、REMETADATA REPOSITORY STRUCTURE26Sample:UnityCatalogDefinition.yamlUC_Def:-flow_name:Init_schemaschemas:-name:init_schemacomment:This is just a dummy file to init the correct schematags:-Testaccess:-acls_group_name:SEC-ES-DA-p-903444-europe-analystacls_access_type:SELECT,MODIFYacls_access_catalogue:
26、mdl_europe_anz_devtables:-name:init_shecomment:#Supports Markdown 1.First item 2.Second itemtags:-Testcolumns:-name:Country_Codecomment:Country code key as of 07/03/2024tags:-PK262024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedINIT SCRIPTSINIT SCRIPTSMETADATA REPOSITORY
27、 STRUCTUREMETADATA REPOSITORY STRUCTURE27Sample:BluePrintInit.shcurl-sL https:/aka.ms/InstallAzureCLIDeb|bashpython-m pip install-upgrade pipaz login-service-principal-u$AZ_DEVOPS_SP_APP_ID-p$AZ_DEVOPS_SP_SECRET-tenant$AZ_TENANT_IDAZ_SP_TOKEN=$(az account get-access-token-query accessToken-o tsv)pip
28、 install BluePrint=$PKG_VERSION-index-urlhttps:/token:$AZ_SP_TOKEN Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedUNITY CATALOG METADATAUNITY CATALOG METADATAMETADATA REPOSITORY STRUCTUREMETADATA REPOSITORY STRUCTURE28Sample:EnvMetadata_dev.yamlDatabricksDefinition:-databri
29、cks_id:Strong Clusterdefault_catalog:mdl_europe_anz_devworkspace_url:https:/adb-resource_id:/subscriptions/blablablapolicy_id:0000EB555555cluster_version:12.2.x-scala2.12cluster_type:Standard_E8d_v4scaling:5:20init_script:/Metadata/InitScripts/BluePrintInit.shAzureDefinition:-azure_id:Azure Infokv_u
30、rl:https:/ Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved29ARTIFACTSARTIFACTS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPYTHON ARTIFACTS PYTHON ARTIFACTS HOW DO WE DEPLOY ITHOW DO WE DEPLOY IT30YAML sample:Blueprint Python Artifact Creati
31、onstages:-stage:Building_new_versionjobs:-job:Buildpool:name:Azure PipelinesvmImage:windows-2019steps:-checkout:selfpersistCredentials:trueclean:true-script:|pip install twinepip install builddisplayName:Install twine and build-script:|python-m build-w blueprintdisplayName:Build wheel-task:TwineAuth
32、enticate1inputs:artifactFeed:Europe_MDL-script:|python-m twine upload-r Europe_MDL-config-file$(PYPIRC_PATH)blueprint/dist/*.whldisplayName:Upload build whl to feed302024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPYTHON PACKAGE ARTIFACT FEEDPYTHON PACKAGE ARTIFACT FEED
33、31PACKAGEPACKAGE2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPYTHON PACKAGE ARTIFACT FEEDPYTHON PACKAGE ARTIFACT FEED32DATA FACTORY ARM TEMPLATESDATA FACTORY ARM TEMPLATES2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedTESTING AIMS TO EN
34、SURE STABILITY ACROSS VERSIONS TESTING AIMS TO ENSURE STABILITY ACROSS VERSIONS REUSABILITY REQUIRES RESILIENCEREUSABILITY REQUIRES RESILIENCE33Testing Pipeline Screenshot332024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved34INTEGRATION WITH INTEGRATION WITH DATA ESTATEDA
35、TA ESTATE2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPBI PREMIUM CAPACITYPBI PREMIUM CAPACITY35REFRESHING VIA ENHANCED APIREFRESHING VIA ENHANCED API2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPBI PREMIUM CAPACITYPBI PREMIUM CAPACIT
36、Y36USAGE STATISTICSUSAGE STATISTICS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedD365 INTEGRATIOND365 INTEGRATION37D365 API D365 API SERVICE PRINCIPALSERVICE PRINCIPAL2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedTABLE CREATION BASED ON
37、 YAML DEFINITIONTABLE CREATION BASED ON YAML DEFINITIOND365 INTEGRATIOND365 INTEGRATION38API Permissions38dv_target_yaml=table:edf_SnOP_Target_Test:unique_column:IDcolumn_definitions:-column:Timeframedata_type:string_100-column:TargetMetricdata_type:string_100-column:Portfoliodata_type:choice_portfo
38、lio-column:Regiondata_type:string_100-column:ProductCategorydata_type:string_50 2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedD365 INTEGRATIOND365 INTEGRATION39DATA FLOW DATA FLOW INBOUNDINBOUND2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights rese
39、rvedD365 INTEGRATIOND365 INTEGRATION40DATA FLOW DATA FLOW OUTBOUNDOUTBOUND2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved41SELF SERVICE SELF SERVICE ENGINEERINGENGINEERING2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSELF SERVICESELF SER
40、VICE42REDUCES BOTTLENECKSREDUCES BOTTLENECKS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved43MONITORINGMONITORING2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedCOST MONITORINGCOST MONITORING44ACTIVITY BASED COSTINGACTIVITY BASED COSTINGFl
41、ow AFlow BFlow X2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedUSAGE MONITORINGUSAGE MONITORING45USAGE SEGMENTED BY PERSONA AND TOOLUSAGE SEGMENTED BY PERSONA AND TOOL2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved46THANK YOU!THANK YOU!Q&AQ&A