《使用 UNITY CATALOG 實現醫療保健數據智能:普羅維登斯的旅程.pdf》由會員分享,可在線閱讀,更多相關《使用 UNITY CATALOG 實現醫療保健數據智能:普羅維登斯的旅程.pdf(20頁珍藏版)》請在三個皮匠報告上搜索。
1、2024 Databricks Inc.All rights reservedAnna Erickson,Satish Marripelli,Janet Vickers,Lawrence YappAnna Erickson,Satish Marripelli,Janet Vickers,Lawrence Yapp1Healthcare Data with Healthcare Data with Unity Catalog:Unity Catalog:Providences JourneyProvidences JourneyProvidence St Joseph Health2$1.9bC
2、OMMUNITYBENEFIT950CLINICS28mTOTALPATENT VISIT1780+PUBLICRESEARCH STUDIES1HEALTHPLAN17SUPPORTIVEHOUSING FACILITIESHIGH SCHOOLNURSING SCHOOLS&UNIVERSITYIES950CLINICS2.1mCOVEREDLIVES52HOSPITALS950CLINICS122KCAREGIVERS38KNURSES34KPHYSICIANS$2.1BCOMMUNITYBENEFIT51HOSPITALS1000CLINICS29MTOTALPATIENT VISIT
3、S2.6MCOVEREDLIVES1700+PUBLIC RESEARCHSTUDIES1HEALTHPLAN18SUPPORTIVEHOUSINGFACILITIESHIGH SCHOOLNURSING SCHOOLS&UNIVERSITIES178PARTNER SITES87COMMUNITY CONNECT PARTNERS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved131 unique workspaces across 10 subscriptionsNo visibil
4、ity into cost and usageExperimentation with no governance,no cluster policiesDuplicate data as each workspace had its own source dataOpportunities for optimizationOpportunities for tighter collaboration across teams3Our Problem Our Problem The WThe Wi ild Westld West2024 Databricks Inc.All rights re
5、served2024 Databricks Inc.All rights reservedWe are a cross functional group spanning multiple IS teams.Healthcare IntelligenceCloud InfrastructureCyber SecurityNetworking&FirewallAnd Databricks!4The TeamThe Team2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWe prepped
6、 and planned for multiple in-person sessions that lasted a week each timeFocus was on collaboration and achieving a specific set of deliverablesBecause we were in one room,we could quickly clear barriers“Let the architects,architect”5Onsite SessionsOnsite Sessions2024 Databricks Inc.All rights reser
7、ved2024 Databricks Inc.All rights reserved6Team Building,Even After Hours!Team Building,Even After Hours!While we were onsite,we continued to collaborate and discussed project plans over dinner.It may seem insignificant but social gatherings were needed for team forming and building trust and rappor
8、t.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedNetworking and firewall configurationInvolved Cyber Security throughout entire processInfrastructure clean upSource data stored in ADLS instead of DBFS,Code Repo and MonitoringWorkspace by workspace information gathering
9、Deployed single workspace for dev,test,and production in the central subscription7Setting up InfrastructureSetting up Infrastructure2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedDatabricks Unity Catalog OverviewDatabricks Unity Catalog OverviewUnity Catalog enabled in
10、 a centrally managed subscriptions for the enterprise.This makes it easier to govern,manage users,and permissions.Data Catalogs structured similarly to Snowflake DatabasesMetadata layer that interacts with data schemas,tables,views,etc.Data resides in ADLSGen2Data CatalogsUnity CatalogUserManagement
11、MetastoreProdWorkspaceNotebooksClustersJobsTestWorkspaceNotebooksClustersJobsDevWorkspaceNotebooksClustersJobsSystem Tables(usage,metadata,lineage,etc.)HIAdminTechCRCAACOERev Cycle InsightsPopulationHealthISBSupply Chain.Consult with Data Architect to avoid data swamp2024 Databricks Inc.All rights r
12、eserved2024 Databricks Inc.All rights reserved9Unity Catalog Unity Catalog Benefits and ImpactBenefits and ImpactWhy are we consolidating,cleaning up,and migrating to Unity Catalog?Reduces the number of workspaces from 131 to 3(one per environment)Centralizing makes it easier to manage and remove du
13、plicate dataRemove low usage&low value workspaces and clustersEasier to secure,govern,and apply policies and manage users/permissionsEasier to enable CICD and Infrastructure as Code(IaC)Easier to monitor and to enable lifecycle management(remove unused clusters,etc.)Easier to enable tagging for cost
14、 and usage reportingEasier to enable new AI features2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWe began the migration in Jan 2024 and its a work in progress Requires prep work with legacy workspace ownersUnderstand existing workflows,data sources and destinations,a
15、nd how to organize their data in new UCTest new jobs and pipelines before shutting down legacy workspacesOnboarded 20+different teams to UC!10Migration is not a lift and shiftMigration is not a lift and shift2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved11Design Desig
16、n Patterns Patterns Machine LearningMachine LearningReviewed existing HI Workspaces and created 7 design patternsHelped inform cluster policy recommendations and cross collaboration team on use cases2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved12Enable Guardrails via
17、 Enable Guardrails via PoliciesPoliciesCluster GuardrailsCluster GuardrailsRestrict to types of clusters(job,interactive,dlt,etc.)?Restrict from scheduling interactive clustersRestrict Interactive Clusters in Prod(only for development)Set default to small cluster/VM sizeSet default to auto-terminate
18、 in 30 minutesAre there certain types of VMs with reservations we want to promote the use of(make default)?How many machines in the cluster?(min/max range)Default#nodesDo we want to restrict dbus used in hour?Assign tags with default values based on type of processing/patternOther GuardrailsOther Gu
19、ardrailsRestrict new workspace creation outside of Unity CatalogAbility to opt-in to new features(deny preview items)40 Policies put in place thus farCluster Policies are tied to a users AAD groups2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedWe used to have one works
20、pace per projectNow that were consolidating workspaces,it was important for us to use tags for visibility into telemetry,cost/usage,and lifecycle management.oPolicies by users AAD Group,Pattern,and cluster typeoAutomatically assign tags or have defined list of values(no more free text)13Reinforced T
21、agging Through PolicyReinforced Tagging Through PolicySupport EmailProject StateBusiness EffortBusiness CustomerTeam OwnerWho is this for?Which downstream customer benefits from this processing?Business friendly name for the project/effortUsed to indicate where the cluster is in the lifecycle and to
22、 manage guardrails.Contact email for operational supportFunctional team that manages the cluster2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedFor Azure Databricks,we saved$420K in 2023oPartnered with Databricks and Microsoft to understand Total Cost of Ownership DBU,S
23、upport,Influenced CostsoReviewed highest cost workspacesUnderstand value the data processing providesCollaborate with Databricks to identify key data patterns to inform guardrails and provide visibilityShow backs leading to changes in behaviorIdentified low value workspaces that should be sunsetoFoc
24、us on optimizationChanging timeouts and right-sizing high cost VMsOptimizing Foundational Data(Clarity Ingest,De-Identification,Truveta Trucking Service)Remove low cost/unused workspacesoUse data to better inform P3 Reservations and future commit agreementsContinued focus on optimization in 2024oUni
25、ty Catalog Migration:focus on rightsizing VMs,optimization,and turning off low value clusters.oInteractive vs Job Clusters:policy put in place to restrict scheduling clusters(up to 5X the cost of job clusters).Work on changing behavior to use interactive clusters for development only.14Cloud Value O
26、ptimization GoalCloud Value Optimization Goal2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved15Databricks Total Cost of OwnershipDatabricks Total Cost of OwnershipNon-Managed Resource Group Includes Databricks DBU+Support CostsCentral ManagedSubscriptionrg-dbx-enterpris
27、e-dev-wus2rg-mgmd-dbx-enterprise-dev-wus2Managed resource group includes Databricks Influenced CostsWhen a Databricks cluster is running,it incurs DBU,Support,and Influenced Costs.Use Azure Portal for big picture view of total cost of ownership.Note:Serverless clusters will not show Influenced Costs
28、 in Azure Portal.Expect DBU spend to increase to reflect these costs.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedExtract data to find oversized and underutilized Databricks Virtual Machines16Cost Optimization Cost Optimization Rightsizing VMsRightsizing VMsAzure Por
29、talAPIExtract Databricks VM DataVM Utilization ReportLow Average CPU%Utilized and High Available Memory(GB)indicated we need to follow up with the project teams to right size the clusters/VMs.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedAllows for reporting of both U
30、nity Catalog and non-Unity Catalog workspacesIdentified clusters with time out of 120+minutesJobs usage(the exact DBUs that each jobs cluster consumes)DBUs by cluster type/SKUAudit logs tracks user activities17Unity Catalog Unity Catalog System TablesSystem Tables2024 Databricks Inc.All rights reser
31、ved2024 Databricks Inc.All rights reserved18What about serverless?What about serverless?Whats Next?Whats Next?Emerging AI ForcesHealthcare is a highly regulated environment.Tension between:Being InnovativeBeing Safe/SecureWe are focused on serverless and other Databricks features that advance our AI
32、 work.Enabled Serverless Model ServingInitial use case is to deploy one of our machine learning models as a restAPI so it can be called by downstream applications2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPartner with Databricks to share our AI use cases and learn
33、about upcoming featuresWe are focused on infrastructure to enable Azure Open AI ModelsKeep Databricks Feature Road Map up to date and plan quarterly for next set of features to enable19Enabling AI FeaturesEnabling AI Features2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved20Questions?Questions?Thank you for attending our presentation!