制造商:簡化 DOORDASH 的聲明性特征工程.pdf

編號:167737 PDF 41頁 1.62MB 下載積分:VIP專享
下載報告請您先登錄!

制造商:簡化 DOORDASH 的聲明性特征工程.pdf

1、2024 Databricks Inc.All rights reservedFABRICATOR:A FABRICATOR:A DECLARATIVEDECLARATIVEFEATURE FEATURE PLATFORMPLATFORMHebo Yang,Kunal ShahHebo Yang,Kunal Shah6/12/20246/12/202412024 Databricks Inc.All rights reservedFABRICATOR:FABRICATOR:A DECLARATIVE FEATURE A DECLARATIVE FEATURE PLATFORM AT DOORD

2、ASHPLATFORM AT DOORDASHAgendaAgenda Machine Learning at DoordashMachine Learning at Doordash Feature Platform Journey Fabricator:overview Architecture deep dives Results and learnings22024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved3MACHINE LEARNING AT MACHINE LEARNING

3、AT 3Fraud Fraud DetectionDetectionRobot and Robot and Drone Drone DeliveriesDeliveriesPersonalized Personalized MarketingMarketingVirtual Virtual AssistantsAssistantsEstimated Estimated Delivery TimeDelivery TimeRestaurantRestaurantRecommendationRecommendations s2024 Databricks Inc.All rights reserv

4、ed2024 Databricks Inc.All rights reserved4MACHINE LEARNING SYSTEMMACHINE LEARNING SYSTEMHidden Technical Debt in Machine Learning Systems-Google 20152024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved5MACHINE LEARNING PLATFORMMACHINE LEARNING PLATFORMCentralized team to ac

5、celerate the ML development velocityCentralized team to accelerate the ML development velocity2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSimplify&reduce complexity in ML development processProvide needed infrastructureBuilt once and leveraged by multiple ML&DS team

6、s within the company6MACHINE LEARNING PLATFORMMACHINE LEARNING PLATFORMCentralized team to accelerate the ML development velocityCentralized team to accelerate the ML development velocity2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSimplify&reduce complexity in ML de

7、velopment processProvide needed infrastructureBuilt once and leveraged by multiple ML&DS teams within the company7MACHINE LEARNING PLATFORMMACHINE LEARNING PLATFORMCentralized team to accelerate the ML development velocityCentralized team to accelerate the ML development velocity2024 Databricks Inc.

8、All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform JourneyFeature Platform Journey Fabricator:overview Architecture deep dives Results and learnings82024 Databricks Inc.All

9、rights reserved2024 Databricks Inc.All rights reserved9FEATURE PLATFORMFEATURE PLATFORMBackfils JobsBackfils Jobs0 02020UsersBatch jobsUsersBatch jobs60601111Realtime jobsUnique FeaturesRealtime jobsUnique Features400400100100B BFeature ValuesFeature ValuesLooking back when we started in 2021Looking

10、 back when we started in 20212024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved10Efficient feature storeEfficient feature storeETL framework with a ETL framework with a robust warehouserobust warehouseManual steps for Manual steps for everything elseeverything elseOUR LEG

11、ACY SYSTEMOUR LEGACY SYSTEM2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedData Scientists have to interface with many loosely coupled systemsFragmentation hampers Fragmentation hampers velocityvelocityInfrastructure evolution is Infrastructure evolution is slowslowNo c

12、ontrol planeNo control plane11Maintaining features requires more than just codeImproving best practices and integrations takes way too longPAIN POINTSPAIN POINTS112024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSingle entrypointSemantic feature representationSimplified

13、abstractionsHigh iteration velocityAutomatic feature lifecycle management12WHAT DOES AN IDEAL PLATFORM LOOK WHAT DOES AN IDEAL PLATFORM LOOK LIKE?LIKE?2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved13ARCHITECTURE OF AN IDEAL PLATFORMARCHITECTURE OF AN IDEAL PLATFORM202

14、4 Databricks Inc.All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform journey Fabricator:overviewFabricator:overview Architecture deep dives Results and learnings142024 Databr

15、icks Inc.All rights reserved2024 Databricks Inc.All rights reservedCentralized Declarative Centralized Declarative RegistryRegistryUnified Execution Unified Execution EnvironmentEnvironmentInfrastructure AutomationInfrastructure Automation15FABRICATOR VISIONFABRICATOR VISIONAn entrypoint that allows

16、 ML practitioners to define E2E feature semantics in simple abstractionsAn execution environment with simple APIs for high iteration velocityAn automated integration for all other downstream operationsEnable Data Scientists to declarativelydeclaratively define efficientefficient endend-toto-endend f

17、eature pipelines and automate the operational life cycle of features152024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedRegistry as central entrypointUnified execution env for dev and prodInfra automation for downstreams16FABRICATOR ARCHITECTUREFABRICATOR ARCHITECTURE2024

18、 Databricks Inc.All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform journey Fabricator:overview Architecture deep divesArchitecture deep dives Results and learnings172024 Dat

19、abricks Inc.All rights reserved2024 Databricks Inc.All rights reservedSimple YAML definitions for feature semanticsProtobuf backed schema for YAML objectsDB backed service for global access for definitionsContinuously deployed for every change18FEATURE REGISTRYFEATURE REGISTRY2024 Databricks Inc.All

20、 rights reserved2024 Databricks Inc.All rights reserved19Feature SemanticsFeature SemanticsAn E2E pipeline requires only a few YAML definitions.SourceSinkFeatureFEATURE REGISTRYFEATURE REGISTRY2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedDatasetsDatasetsThe registry

21、also supports generating intermediate,training and validation datasets20FEATURE REGISTRYFEATURE REGISTRY2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedEvolution is easySupport for infrastructure Evolution is easySupport for infrastructure flexibilityflexibilityGlobal a

22、vailabilityGlobal availability21Protobuf based backend makes our definitions robust to extensionNew storage and compute paradigms can be adopted without significant shiftsEvery downstream has immediate access to definitions21FEATURE REGISTRYFEATURE REGISTRYBenefits of the Design2024 Databricks Inc.A

23、ll rights reserved2024 Databricks Inc.All rights reservedLibrary suite that bridges registry and infrastructureEnables contextual executions of registry definitionsProvides black box optimizations22UNIFIED EXECUTION ENVIRONMENTUNIFIED EXECUTION ENVIRONMENT2024 Databricks Inc.All rights reserved2024

24、Databricks Inc.All rights reservedContextual ExecutionsContextual ExecutionsPythonic wrappers around YAML definitions designed to“execute”the YAMLs efficiently23UNIFIED EXECUTION ENVIRONMENTUNIFIED EXECUTION ENVIRONMENT2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedMos

25、t jobs are noMost jobs are no-codecodeHigh fidelity testingEfficient executionHigh fidelity testingEfficient execution24Unless you need customizations,same code executes multiple YAMLsNotebook clusters mimic production job setup.Users dont have to optimize for different storage or compute choices24B

26、enefits of the DesignUNIFIED EXECUTION ENVIRONMENTUNIFIED EXECUTION ENVIRONMENT2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedA central registry and a unified library suite to provide every downstream integration to a feature definition for freeOrchestrationOnline Serv

27、ingFeature Discovery25INFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AUTOMATION2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedOrchestrationOrchestrationAutomated DAG constructionDate partitioningScalable and parallelized backfilling26INFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AU

28、TOMATION2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedOnline ServingOnline ServingAutomate materialization of features to our scalable feature store27Upload ServiceRedis StoreINFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AUTOMATION2024 Databricks Inc.All rights reserved2024

29、 Databricks Inc.All rights reservedFeature DiscoveryFeature DiscoveryAutomate registry synchronization with data catalogsRegistry enables metadata extractors28INFRASTRUCTURE AUTOMATIONINFRASTRUCTURE AUTOMATION2024 Databricks Inc.All rights reservedFABRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHFA

30、BRICATOR:DECLARATIVEFEATURE PLATFORM AT DOORDASHAgendaAgenda Machine Learning at Doordash Feature Platform journey Fabricator:overview Architecture deep dives Results and learningsResults and learnings292024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved30FEATURE PLATFORMF

31、EATURE PLATFORMBackfils JobsBackfils Jobs0 02020UsersBatch jobsUsersBatch jobs60601111Realtime jobsUnique FeaturesRealtime jobsUnique Features400400100100B BFeature ValuesFeature ValuesFlashing back when we started in 2021Flashing back when we started in 20212024 Databricks Inc.All rights reserved20

32、24 Databricks Inc.All rights reserved31FEATURE PLATFORMFEATURE PLATFORMBackfils JobsBackfils Jobs500500400400UsersBatch jobsUsersBatch jobs1.2K1.2K8080Realtime jobsUnique FeaturesRealtime jobsUnique Features7K7K2.4T2.4TFeature ValuesFeature ValuesCurrent daily scale on FabricatorCurrent daily scale

33、on Fabricator2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedBuild products,not Build products,not systemssystemsMake it easy to do the right Make it easy to do the right thingthingBuild for IntegrationsBuild for Integrations32LEARNINGSLEARNINGSAdoption was slower when

34、users interfaced with systems,rather than a single productSimplify the most common patterns,and leave room for customizationScale beyond one team by actively integrate with other services322024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved33Declaratively feature pipelines

35、Declaratively feature pipelinesVery easy to author feature jobsVery easy to author feature jobsLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved34Declaratively feature pipelinesDeclaratively feature pipelinesVery easy to author feature jobsVery easy to

36、author feature jobsWarehouse cost and compute Warehouse cost and compute contention contention Spark job performance&costSpark job performance&costAbandoned jobsAbandoned jobsLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved35Declaratively feature pipel

37、inesDeclaratively feature pipelinesVery easy to author feature jobsVery easy to author feature jobsWarehouse cost and compute Warehouse cost and compute contention contention team warehouses&queueteam warehouses&queueSpark job performance&costSpark job performance&costAbandoned jobsAbandoned jobsLEA

38、RNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved36Declaratively feature pipelinesDeclaratively feature pipelinesVery easy to author jobsVery easy to author jobsWarehouse cost and compute Warehouse cost and compute contention contention team warehouses&que

39、ueteam warehouses&queueSpark job performance&costSpark job performance&costspark tuning guidelines spark tuning guidelines auto optimizationsauto optimizationsattribution&reportingattribution&reportingAbandoned jobsAbandoned jobsauto disable failing jobsauto disable failing jobslineage with active m

40、odel lineage with active model LEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved37Auto backfill downstream jobs to create Auto backfill downstream jobs to create historical data for model traininghistorical data for model trainingImproved dev velocityIm

41、proved dev velocityLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved38Auto backfill downstream jobs to create Auto backfill downstream jobs to create historical data for model traininghistorical data for model trainingImproved dev velocityImproved dev v

42、elocityHigh complexity and costHigh complexity and costexplicit flag to trigger explicit flag to trigger downstreamsdownstreamsLEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedThree core componentsThree core componentsFeature RegistryUnified Execution E

43、nvironment(Library Suite)Infrastructure Automation(Orchestration&Integrations)39LEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedThree core componentsThree core componentsFeature RegistryUnified Execution Environment(Library Suite)Infrastructure Automat

44、ion(Orchestration&Integrations)Single entry point,endSingle entry point,end-toto-end experienceend experienceComplexity&error detectionComplexity&error detectionCategorize errorsCategorize errorsAnalytical metrics in addition to system observabilityAnalytical metrics in addition to system observability40LEARNINGSLEARNINGS2024 Databricks Inc.All rights reserved41THANK YOU!THANK YOU!Q&AQ&A

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(制造商:簡化 DOORDASH 的聲明性特征工程.pdf)為本站 (張5G) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站