《使用 Databricks 在線商店實時進行許多模型預測.pdf》由會員分享,可在線閱讀,更多相關《使用 Databricks 在線商店實時進行許多模型預測.pdf(16頁珍藏版)》請在三個皮匠報告上搜索。
1、2024 Databricks Inc.All rights reservedMany Model Many Model Forecasting Forecasting in Realin Real-TimeTimeAnastasia ProkaievaAnastasia Prokaieva13 May 202413 May 202412024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedAnastasia Prokaieva Anastasia Prokaieva-Specialist Ar
2、chitect-AI and GeoSpatial-Databricks since 2021,Global SME on AI and product champion on Model Serving-Background in Physics&Applied Mathematics-Book co-Author-“Databricks ML in Action”“Databricks ML in Action”by PacktMeet your SpeakerMeet your Speaker2Lets connect!Lets connect!22024 Databricks Inc.
3、All rights reservedProblem StatementTime Series Forecasting3tz(t)x1(t)x2(t)x3(t)2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedTypes of Forecasting Algorithms4Predicting individual time series separately.Each model is trained and applied to a specific time series,makin
4、g it suitable for forecasting at a granular level,such as product-level sales forecasting in a large enterprise.Local ModelsGlobal ModelsLocal ModelsGlobal ModelsConsider multiple time series collectively.They forecast across a broader set of data.Global models are useful for capturing complex depen
5、dencies between different time series,making them valuable for broader,cross-entity forecasting tasks.4f()=f()=f()Takes only one time series at a timef()=Learns parameters for multiple time series2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedTypes of Forecasting Algor
6、ithms5Predicting individual time series separately.Each model is trained and applied to a specific time series,making it suitable for forecasting at a granular level,such as product-level sales forecasting in a large enterprise.Local ModelsGlobal ModelsLocal ModelsGlobal ModelsConsider multiple time
7、 series collectively.They forecast across a broader set of data.Global models are useful for capturing complex dependencies between different time series,making them valuable for broader,cross-entity forecasting tasks.5f()=f()=f()Takes only one time series at a timef()=Learns parameters for multiple
8、 time seriesOur Focus todayOur Focus today2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedKey problems today:Key problems today:-Training takes weeks-Problems on joining freshly arriving features(weather,promos,marketing campaigns etc.)-Data volumes are hard to maintain
9、-Requires to deliver updated forecasts per demand-Would like to standardize on MLOpsOur use caseOur use case6Lets talk business firstLets talk business firstRetailer that operates hundreds of thousands stores and want to bring operational forecasting of sales across all stores with all the available
10、 data in real-time taking into account metadata available.Your dataML model ensembleStore 1Store 2Store 32024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedYour Final Architecture(one of many)Your Final Architecture(one of many)72024 Databricks Inc.All rights reserved2024
11、Databricks Inc.All rights reservedYour Final Architecture FS(zoom in)Your Final Architecture FS(zoom in)8A.Model Serving with Online store for Feature A.Model Serving with Online store for Feature LookUp&all models are inside a containerLookUp&all models are inside a containerB.Model Serving with On
12、line store for Feature B.Model Serving with Online store for Feature&MODELS LookUp&MODELS LookUp C.Model Serving with Online store for Feature C.Model Serving with Online store for Feature LookUp&all models are inside Online StoreLookUp&all models are inside Online Store2024 Databricks Inc.All right
13、s reserved2024 Databricks Inc.All rights reservedPart 1.aPart 1.a9Creating a FS Training DatasetA Delta table with A Delta table with constraints and a primary constraints and a primary key=Feature Table.key=Feature Table.FS creates a DAG with FS creates a DAG with metadata behind the scene metadata
14、 behind the scene that can be attached to our that can be attached to our model as a Spec.model as a Spec.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPart 1.bPart 1.b10Publish Features to Online StoreYour Delta Tables with PKs and Your Delta Tables with PKs and Chan
15、ge Data Feed enabledChange Data Feed enabledYour Delta Tables with PKs published Your Delta Tables with PKs published to Online Store that syncs to tables to Online Store that syncs to tables You can serve your features externally with lowYou can serve your features externally with low-latency laten
16、cy-Feature Serving via published SpecFeature Serving via published Spec2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPart 2.aPart 2.a11Training our models on scale1)Make sure to pass a class otherwise Spark does not serialise this properly.3)We serialise our object in
17、to a str and return array of strings!2)Log your models,parameters,errors into MLFlow with a nested run.1)Make sure to return the same type as the provided schema-otherwise will cause a type problem.2)applyInPandasapplyInPandas will apply your function to the grouped data,the function gets a pdDF as
18、input.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPart 3.aPart 3.a12Wrap your model with Artifact2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPart 3.bPart 3.b13Wrap your model with Online StoreDAG behind the scene is attached to the
19、metadata of FS.When you evoke the model on batch/serving the features will be“looked”and joined to the dataset on PK.2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedPart 4Part 414Serving our models on scale-all 3 models can be queried using same schema-you can pass data
20、,and it will be replaced 2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved-Feature Engineering Client and Online Tables from Databricks combined with Model Serving significantly simplifies features lookups and joints with a TimeStamp dependency on features updates.-We ca
21、n store various type of data under Online Tables,e.g serialized models for real-time calls.-Feature Engineering Client and Online tables can be used across any project like Forecasting,Recommender Systems,GenAI Agents etcConclusionsConclusions15What have we learned by doing?What have we learned by d
22、oing?2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedLimitation:4 Gb RAM for a CPU containerSolution:Will be lifted,if needed contact your Dbx teamUse small GPU containerMove into pure Online store solutionFixed Container MemoryOnline Store Str sizeMLOpsFixed Container
23、MemoryOnline Store Str sizeMLOpsLimitation:65Kb of a string type per rowSolutionPublish your serialized model as array(string)Use smaller modelsCompress your modelLimitation:To update models under an artifact have to redeploy a model containerSolution:Use pure Online Store solution with a TimeStamp Key on model updatesWarnings/LimitationsWarnings/Limitations16Few tricks and tips to make it successful Few tricks and tips to make it successful 16