《Rec Room 如何使用 Databricks 和 RudderStack 每天處理數十億個事件.pdf》由會員分享,可在線閱讀,更多相關《Rec Room 如何使用 Databricks 和 RudderStack 每天處理數十億個事件.pdf(27頁珍藏版)》請在三個皮匠報告上搜索。
1、Rec Room RudderStackHow Rec Room Processes Billions of Events Per Day with Databricks and RudderStackDatabricks2023Albert HuSenior Analytics EngineerRec RoomLewis MbaeHead of Customer EngineeringRudderStackAbout Rec Room Rec Room is the best place to build and play games together.With more than 80M
2、lifetime users,you can party up with friends from all around the world to play,hang out,explore MILLIONS of player-created rooms,or build something new!Founded in 2016More Users,More DataMaking Impact with Data at Rec RoomData-driven culture powered by high data volume and quick adoptionA/B Test as
3、many decisions and features as possibleRecommend new,interesting rooms and items to playersShare metrics with Creators to help grow their player baseRec Rooms Data ChallengesDisparate data sourcesDisparate data destinationsMultiple environmentsdevprodamplitudefacebookazure event hubs webhookflutterd
4、atabricks150+experiments launched as of Q2 2023 Started A/B testing March 2022One A/B test,big impactFrom a negative to positive 25%change in chat messages between playersChat Messages RudderStack is the leading lakehouse native customer data platform.RudderStack runs on top of your lakehouse and do
5、es not store data,alleviating security concerns,reducing costs,and unlocking the value of your lakehouse investment Founded in 2019About RudderStack10The Data Activation LifecycleRec Room Data ArchitectureEvent StreamDatabricks Unity CatalogEvent StreamSourcesCloud ToolsReverse ETL dbt ModelingWhat
6、is RudderStack Transformations?12Transformations lets users customize event data in real-time using JavaScript or Python.Benefits:Guarantee high quality dataSecure and build data trustQuickly adapt to new business needsSourcesDestinationsTransformationsHow Rec Room uses TransformationsFiltering even
7、ts by destinationCleaning and enriching event data before it lands in the destinationTesting Transformations in dev environments before shipping to productionTips for RudderStack TransformationsDefine functions that are used repeatedly to keep code DRYTreat RudderStack Transformations like any other
8、 codeVersion Control and PR processAutomate testing using unit tests via Github ActionsSimplifying Databricks Data ProcessingRudderStacks clean lakehouse data model enables painless data processingBronzeRaw ingestionCopied from blob storage and inserted hourlyNo table stats collectionSilverFiltered,
9、cleaned,augmentedMerge w/2-day lookback window to process late arriving dataPartition by server received dateTable stats collected DailyGoldBusiness-facingAppend by server received datePartition by event dateTable stats collected WeeklySimplifying Databricks Data ProcessingUsing a medallion data arc
10、hitecture to power multiple use-casesConfigure tblproperties sooner than laterConfigure appropriate tblproperties during the pipeline building processStructure table columns to optimize z-ordering and take advantage of data skipping by data type:keys/numericals on the left and strings on the rightCh
11、oose the right computeChoose the appropriate compute based on frequency and processing time(including cluster startup)For Scheduled automated jobs check if Job vs.All-Purpose Cluster are appropriate(3x cost difference)For SQL-only job,try Serverless SQL WarehousesOptimize compute cost over timeMap w
12、orkloads to compute that optimizes cost over timeSelecting the smallest compute is not always cheapest(and vice versa)Find clues workloads might not be compatible with compute such as if bytes are spilling to diskChoosing a partitionQuery the DeltaLog to understand if a table is partitioned appropri
13、atelyTable Size 1TBSize of each partition is 1GB+Field is low cardinality and will be used for filtering and merge operationsPrune Partitions to Optimize MergeRewrite the least number of filesFilter on the partition during merge3x time difference after partition pruningMaximize Photon RuntimeIf you
14、have Photon enabled,maximize the%of tasks using PhotonQuery performance can be dramatically faster on Photon(2x+)Check if queries are using Photon-compatible functionsMeasure and MeetAnalyze job metadata and create dashboards to measure pipeline processing time.Translate time into dollarsCreate team
15、 rituals to review and understand drivers behind trendsRudderStack+Databricks:Unlocking value benefits for Rec RoomWith RudderStack,we can collect data once and unify it across our entire stackRudderstack is the backbone of A/B testing the user experience,providing us product insights as well as add
16、itional data on user preferencesEngineering teams are spending more time on product features and less time troubleshooting integrationsTransformations enable the team to quickly make changes to existing events A/B testing(150+experiments executed through Statsig as of Q2 2023)Clean data for product
17、analytics in AmplitudeRecommendation models to improve DiscoveryClean,high-quality dataSeamless integrationsUnlocking business outcomesWhats Next for RecRoom?Enable even more personalized recommendations for our players Share new metrics and datasets with our Creator Community Create tools to automa
18、te the optimization decisions needed to scale the lakehouseWhats Next for RudderStack?Build unified 360 view of a customer on top of the lakehouse to power more impactful use-cases Bolster data governance offering for collection of high quality first-party data Provide more out-of-the box integrations for our customers Unlock real-time use cases with near real-time sync to DatabricksAlbert HLewis MThank you for Attending!