1、2024 Databricks Inc.All rights reservedFABRICATOR:A FABRICATOR:A DECLARATIVEDECLARATIVEFEATURE FEATURE PLATFORMPLATFORMHebo Yang,Kunal ShahHebo Yang,Kunal Shah6/12/20246/12/202412024 Databricks Inc.All rights reservedFABRICATOR:FABRICATOR:A DECLARATIVE FEATURE A DECLARATIVE FEATURE PLATFORM AT DOORD
2、ASHPLATFORM AT DOORDASHAgendaAgenda Machine Learning at DoordashMachine Learning at Doordash Feature Platform Journey Fabricator:overview Architecture deep dives Results and learnings22024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved3MACHINE LEARNING AT MACHINE LEARNING
3、AT 3Fraud Fraud DetectionDetectionRobot and Robot and Drone Drone DeliveriesDeliveriesPersonalized Personalized MarketingMarketingVirtual Virtual AssistantsAssistantsEstimated Estimated Delivery TimeDelivery TimeRestaurantRestaurantRecommendationRecommendations s2024 Databricks Inc.All rights reserv
4、ed2024 Databricks Inc.All rights reserved4MACHINE LEARNING SYSTEMMACHINE LEARNING SYSTEMHidden Technical Debt in Machine Learning Systems-Google 20152024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved5MACHINE LEARNING PLATFORMMACHINE LEARNING PLATFORMCentralized team to ac
5、celerate the ML development velocityCentralized team to accelerate the ML development velocity2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSimplify&reduce complexity in ML development processProvide needed infrastructureBuilt once and leveraged by multiple ML&DS team
6、s within the company6MACHINE LEARNING PLATFORMMACHINE LEARNING PLATFORMCentralized team to accelerate the ML development velocityCentralized team to accelerate the ML development velocity2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSimplify&reduce complexity in ML de
7、velopment processProvide needed infrastructureBuilt once and leveraged by multiple ML&DS teams within the company7MACHINE LEARNING PLATFORMMACHINE LEARNING PLATFORMCentralized team to accelerate the ML development velocityCentralized team to accelerate the ML development velocity2024 Databricks Inc.
8、All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform JourneyFeature Platform Journey Fabricator:overview Architecture deep dives Results and learnings82024 Databricks Inc.All
9、rights reserved2024 Databricks Inc.All rights reserved9FEATURE PLATFORMFEATURE PLATFORMBackfils JobsBackfils Jobs0 02020UsersBatch jobsUsersBatch jobs60601111Realtime jobsUnique FeaturesRealtime jobsUnique Features400400100100B BFeature ValuesFeature ValuesLooking back when we started in 2021Looking
10、 back when we started in 20212024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved10Efficient feature storeEfficient feature storeETL framework with a ETL framework with a robust warehouserobust warehouseManual steps for Manual steps for everything elseeverything elseOUR LEG
11、ACY SYSTEMOUR LEGACY SYSTEM2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedData Scientists have to interface with many loosely coupled systemsFragmentation hampers Fragmentation hampers velocityvelocityInfrastructure evolution is Infrastructure evolution is slowslowNo c
12、ontrol planeNo control plane11Maintaining features requires more than just codeImproving best practices and integrations takes way too longPAIN POINTSPAIN POINTS112024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSingle entrypointSemantic feature representationSimplified
13、abstractionsHigh iteration velocityAutomatic feature lifecycle management12WHAT DOES AN IDEAL PLATFORM LOOK WHAT DOES AN IDEAL PLATFORM LOOK LIKE?LIKE?2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved13ARCHITECTURE OF AN IDEAL PLATFORMARCHITECTURE OF AN IDEAL PLATFORM202
14、4 Databricks Inc.All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform journey Fabricator:overviewFabricator:overview Architecture deep dives Results and learnings142024 Databr
15、icks Inc.All rights reserved2024 Databricks Inc.All rights reservedCentralized Declarative Centralized Declarative RegistryRegistryUnified Execution Unified Execution EnvironmentEnvironmentInfrastructure AutomationInfrastructure Automation15FABRICATOR VISIONFABRICATOR VISIONAn entrypoint that allows
16、 ML practitioners to define E2E feature semantics in simple abstractionsAn execution environment with simple APIs for high iteration velocityAn automated integration for all other downstream operationsEnable Data Scientists to declarativelydeclaratively define efficientefficient endend-toto-endend f
17、eature pipelines and automate the operational life cycle of features152024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedRegistry as central entrypointUnified execution env for dev and prodInfra automation for downstreams16FABRICATOR ARCHITECTUREFABRICATOR ARCHITECTURE2024
18、 Databricks Inc.All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform journey Fabricator:overview Architecture deep divesArchitecture deep dives Results and learnings172024 Dat
19、abricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSimple YAML definitions for feature semanticsProtobuf backed schema for YAML objectsDB backed service for global access for definitionsContinuously deployed for every change18FEATURE REGISTRYFEATURE REGISTRY2024 Databricks Inc.All
20、 rights reserved2024 Databricks Inc.All rights reserved19Feature SemanticsFeature SemanticsAn E2E pipeline requires only a few YAML definitions.SourceSinkFeatureFEATURE REGISTRYFEATURE REGISTRY2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedDatasetsDatasetsThe registry
21、also supports generating intermediate,training and validation datasets20FEATURE REGISTRYFEATURE REGISTRY2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedEvolution is easySupport for infrastructure Evolution is easySupport for infrastructure flexibilityflexibilityGlobal a
22、vailabilityGlobal availability21Protobuf based backend makes our definitions robust to extensionNew storage and compute paradigms can be adopted without significant shiftsEvery downstream has immediate access to definitions21FEATURE REGISTRYFEATURE REGISTRYBenefits of the Design2024 Databricks Inc.A
23、ll rights reserved2024 Databricks Inc.All rights reservedLibrary suite that bridges registry and infrastructureEnables contextual executions of registry definitionsProvides black box optimizations22UNIFIED EXECUTION ENVIRONMENTUNIFIED EXECUTION ENVIRONMENT2024 Databricks Inc.All rights reserved2024
24、Databricks Inc.All rights reservedContextual ExecutionsContextual ExecutionsPythonic wrappers around YAML definitions designed to“execute”the YAMLs efficiently23UNIFIED EXECUTION ENVIRONMENTUNIFIED EXECUTION ENVIRONMENT2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedMos
25、t jobs are noMost jobs are no-codecodeHigh fidelity testingEfficient executionHigh fidelity testingEfficient execution24Unless you need customizations,same code executes multiple YAMLsNotebook clusters mimic production job setup.Users dont have to optimize for different storage or compute choices24B
26、enefits of the DesignUNIFIED EXECUTION ENVIRONMENTUNIFIED EXECUTION ENVIRONMENT2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedA central registry and a unified library suite to provide every downstream integration to a feature definition for freeOrchestrationOnline Serv
27、ingFeature Discovery25INFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AUTOMATION2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedOrchestrationOrchestrationAutomated DAG constructionDate partitioningScalable and parallelized backfilling26INFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AU
28、TOMATION2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedOnline ServingOnline ServingAutomate materialization of features to our scalable feature store27Upload ServiceRedis StoreINFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AUTOMATION2024 Databricks Inc.All rights reserved2024
29、 Databricks Inc.All rights reservedFeature DiscoveryFeature DiscoveryAutomate registry synchronization with data catalogsRegistry enables metadata extractors28INFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AUTOMATION2024 Databricks Inc.All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFA
30、BRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform journey Fabricator:overview Architecture deep dives Results and learningsResults and learnings292024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved30FEATURE PLATFORMF
31、EATURE PLATFORMBackfils JobsBackfils Jobs0 02020UsersBatch jobsUsersBatch jobs60601111Realtime jobsUnique FeaturesRealtime jobsUnique Features400400100100B BFeature ValuesFeature ValuesFlashing back when we started in 2021Flashing back when we started in 20212024 Databricks Inc.All rights reserved20
32、24 Databricks Inc.All rights reserved31FEATURE PLATFORMFEATURE PLATFORMBackfils JobsBackfils Jobs500500400400UsersBatch jobsUsersBatch jobs1.2K1.2K8080Realtime jobsUnique FeaturesRealtime jobsUnique Features7K7K2.4T2.4TFeature ValuesFeature ValuesCurrent daily scale on FabricatorCurrent daily scale
33、on Fabricator2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedBuild products,not Build products,not systemssystemsMake it easy to do the right Make it easy to do the right thingthingBuild for IntegrationsBuild for Integrations32LEARNINGSLEARNINGSAdoption was slower when
34、users interfaced with systems,rather than a single productSimplify the most common patterns,and leave room for customizationScale beyond one team by actively integrate with other services322024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved33Declaratively feature pipelines
35、Declaratively feature pipelinesVery easy to author feature jobsVery easy to author feature jobsLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved34Declaratively feature pipelinesDeclaratively feature pipelinesVery easy to author feature jobsVery easy to
36、author feature jobsWarehouse cost and compute Warehouse cost and compute contention contention Spark job performance&costSpark job performance&costAbandoned jobsAbandoned jobsLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved35Declaratively feature pipel
37、inesDeclaratively feature pipelinesVery easy to author feature jobsVery easy to author feature jobsWarehouse cost and compute Warehouse cost and compute contention contention team warehouses&queueteam warehouses&queueSpark job performance&costSpark job performance&costAbandoned jobsAbandoned jobsLEA
38、RNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved36Declaratively feature pipelinesDeclaratively feature pipelinesVery easy to author jobsVery easy to author jobsWarehouse cost and compute Warehouse cost and compute contention contention team warehouses&que
39、ueteam warehouses&queueSpark job performance&costSpark job performance&costspark tuning guidelines spark tuning guidelines auto optimizationsauto optimizationsattribution&reportingattribution&reportingAbandoned jobsAbandoned jobsauto disable failing jobsauto disable failing jobslineage with active m
40、odel lineage with active model LEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved37Auto backfill downstream jobs to create Auto backfill downstream jobs to create historical data for model traininghistorical data for model trainingImproved dev velocityIm
41、proved dev velocityLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved38Auto backfill downstream jobs to create Auto backfill downstream jobs to create historical data for model traininghistorical data for model trainingImproved dev velocityImproved dev v
42、elocityHigh complexity and costHigh complexity and costexplicit flag to trigger explicit flag to trigger downstreamsdownstreamsLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedThree core componentsThree core componentsFeature RegistryUnified Execution E
43、nvironment(Library Suite)Infrastructure Automation(Orchestration&Integrations)39LEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedThree core componentsThree core componentsFeature RegistryUnified Execution Environment(Library Suite)Infrastructure Automat
44、ion(Orchestration&Integrations)Single entry point,endSingle entry point,end-toto-end experienceend experienceComplexity&error detectionComplexity&error detectionCategorize errorsCategorize errorsAnalytical metrics in addition to system observabilityAnalytical metrics in addition to system observability40LEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved41THANK YOU!THANK YOU!Q&AQ&A