《數據工程的未來.pdf》由會員分享,可在線閱讀,更多相關《數據工程的未來.pdf(22頁珍藏版)》請在三個皮匠報告上搜索。
1、1Future Future of of DataDataEngineeringEngineeringDATA AND ANAYTI CS PERSPECTIVEDATA SUMMIT 2025MAY152Who We AreWere a global digital consultancy global digital consultancy transforming how the worlds leading enterprises and biggest brands connect with customers and grow their businesses.With Perfi
2、cient,you get experience and expertise,speed and agility,and a healthy dose of pragmatism to drive your business forward.3Who Am IJerry Locke(Snowflake Practice Director)Jerry Locke(Snowflake Practice Director)Been in data my entire 20+careerWorked in the cloud for the past 10(mostly in Snowflake)Ad
3、junct Professor at USD(University Of San Diego)Been part of hundreds of cloud deploymentsBelieve almost all human problems can be solved with data“If you dont measure,it can never get better”4MISSIONTo shatter boundaries,obsess over outcomes,and forge the future.To be the place where great mindsand
4、great companies converge to boldly advance business.VISIONPurposePurposeOurOur47Our PartnershipsSTRENGTHEN ED BY8A Glimpse Into Our A Glimpse Into Our Technology PartnerTechnology Partner EcosystemEcosystem9Data EngineeringMovement of data for decision making,analytics and reporting is framework the
5、 data industry has accepted.The near future still holds true with the outcomes.However,the volume,speed to decisions and AI frameworks will enable business insights in ways we have just begun to comprehend and utilize.10The future is now11Growth:Modern data engineering paradigm Growth:Modern data en
6、gineering paradigm Challenges Driving Our Industry:Global data volume is expected to reach 180 zettabytes this year(2025)Up from 64 zettabytes in 2020 IoT devices(rate of adoption globally is 2x since 2020)Video growthAI Generative and Agentic Data integration and source fragmentationData Quality La
7、ck of GovernanceScaling and abundance of tooling(Dbt,Airflow,Apache,Kafka)DevOps,SecOps12Modern Data Engineering Reference Architecture Example Modern Data Engineering Reference Architecture Example 13Adoption:Cloud Data Engineering PerspectiveAdoption:Cloud Data Engineering PerspectivePublic Cloud
8、Adoption:Cloud migrations products and frameworksAI Adoption(ELT,ETL,Code Generations)AI Agents(How they work and where)The global cloud computing market is projected to reach$912.77 billion by the end of 2025,with an anticipated compound annual growth rate(CAGR)of 21.2%through 2034.Worldwide spendi
9、ng on public cloud services is forecasted to hit$805 billion in 2024 and is expected to double by 2028.14Cloud Data Engineering Reference Architecture Example Cloud Data Engineering Reference Architecture Example 15Predictions:Scale of AdoptionWith the challenges we discussed,how do we imagine data
10、engineering will be in the next 2 5 years16Year 1-3:Serverless Data EngineeringServerless Computing:No server provisioning/management.Event-driven(e.g.,triggered by new data,scheduled tasks).Scales automatically.Pay-per-use Model:You only pay when your functions run.Cost-efficient for intermittent w
11、orkloads.Stateless Functions:Functions(like AWS Lambda)do one job quickly.State is stored in external systems(e.g.,S3,DynamoDB).17Typical Serverless Stack(AWS example)StepTool/ServicePurposeData IngestionAmazon Kinesis/API Gateway/S3Ingest streaming or batch dataData ProcessingAWS Lambda/AWS Glue/EM
12、R Serverless/DBTTransform,clean,enrich dataStorageAmazon S3/DynamoDB/Redshift ServerlessStore raw,processed,or aggregated dataOrchestrationAWS Step Functions/EventBridge/AirflowCoordinate workflows and handle retriesMonitoringCloudWatch/X-RayLog,monitor,and trace performance18Serverless Data Enginee
13、ring Reference ArchitectureServerless Data Engineering Reference Architecture1920Year 2-5:AI Data EngineeringData Ingestion with AI:Building pipelines automatically Event-driven /AI validated data movementSemantic Layers generated from these pipelinesData Cleansing/Quality with AICost-efficient for
14、ingestion and anomaly detectionPlatforms are using IDE(Visual Studio)to leverage this nowAI Detection-Agents:Functions like pattern driven access controls Creation of new entities and standards not being adhered to21AI Data Engineering vs Traditional Data EngineeringAI Data Engineering vs Traditiona
15、l Data EngineeringAspectTraditional Data EngineeringAI Data EngineeringData UseBI,dashboards,reportsML training&inferenceProcessingETL/ELT for structured dataFeature pipelines,time series,text,imagesStorageData warehouses,relational DBsData lakes,NoSQL,vector DBsLatencyOften batchBatch+real-time(for
16、 model serving)ToolsSQL,Spark,AirflowPandas,PySpark,Kubeflow,MLflow,TFX22Composable Composable What that means to our data?What that means to our data?What Composable Means in Data SoftwareIn simple terms,composable means that the system is made of modular,interchangeable components that can be mixe
17、d,matched,and reconfigured as needed kind of like LEGO blocks.So instead of a big,all-in-one data platform(monolith),you get a flexible architecture where you can plug in best-in-class tools for each part of your data stack.LayerComposable ToolsData ingestionFivetran,Airbyte,StitchData storage/wareh
18、ouseSnowflake,BigQuery,RedshiftTransformationdbtOrchestrationAirflow,PrefectAnalytics/BILooker,Mode,TableauReverse ETLHightouch,CensusExample:A Composable Data Stack23Composable Data Reference CDP Architecture Example Composable Data Reference CDP Architecture Example 24Open QuestionsThe dumbest question is the one not asked