AT&T從HADOOP遷移數十億個事件處理.pdf

編號:167527 PDF 21頁 1.81MB 下載積分:VIP專享
下載報告請您先登錄!

AT&T從HADOOP遷移數十億個事件處理.pdf

1、 2024 AT&T Intellectual Property.AT&T and globe logo are registered trademarks and service marks of AT&T Intellectual Property and/or AT&T affiliated companies.All other marks are the property of their respective ownersAT&T Proprietary(Internal Use Only)-Not for use or disclosure outside the AT&T co

2、mpanies except under written agreementAT&T Billions of Events Processing migrationPraveen Vemulapalli,Director Technology,AT&T Akshay Sharma,Sr.Solutions Consultant,Databricks June 11,2024 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)Praveen VemulapalliThings I love to do.Love

3、Hiking&Camping Love motorcycle riding Spend loads of time with my familyData&AI Technology evangelismDrive change&evolution AT&T Background 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)AT&T started with Bell Patent Association,a legal entity established in 1874 to protect the p

4、atent rights of Alexander Graham Bell after he invented the telephone system.Originally a verbal agreement,it was formalized in writing in 1875 as Bell Telephone Company.2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)By 2024,Were turning to public cloud providers to host our non-

5、network workloads.Think traditional IT applications like billing and customer care,and corporate applications like HR and finance(stated in 2019)(source:https:/ June 2021,Microsoft and AT&T reached a major milestone when we announced an industry-first collaboration to evolve Microsofts hybrid cloud

6、technology to support AT&Ts 5G core network workloads.(source:https:/ DriversFuture-State GoalsSuccess To-DateSingle Version of TruthParallelize,Simplify&AutomateMove Resources up the Value ChainFree Capital for Growth-Oriented InvestmentsEnable streaming pipelines&analyticsEmpower citizen data scie

7、ntists&analytics+60 BUs5-year Migration ROI of+300%Source:https:/ Chief Data Office-Enterprise Data Technology/June 27,2023/2023 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)Daily batch run times on Proprietary analy

8、tics platform for processing22-30hrsCore Hadoop system was used to manage the daily processing6400 CPUsEvents generated by network daily across our apps that do analytics17B+Problem to Solve:Large scale event time correlation process 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only

9、)60%reduction in data processing time.from 30hrs to 8hrsAnalytics processing moved to Spark&Scala8HrsUsed dynamically for analytics processing1000 CPUsCost reduction compared to Hadoop environment Substantial savings at scale30%End state:Large scale event time correlation process 2024 AT&T Intellect

10、ual Property-AT&T Proprietary(Internal Use Only)Akshay SharmaThings I love to do.Listening MusicLearning new technologiesPlaying PC gamesLeetCode challenges.High level Solution Architecture 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)Streaming FlowDatabricksConsume EventsEvent

11、dataInput filesKafka streaming processes(Extract and Load Data)Transform DataData sourcesEventsOutput files15+millions ofuploads daily2000+Kafka streaming servers across multipleKafka clusters13-15+TBfiles daily 45+Billion rowsOutput daily!10 x filestemp storage Kafka TopicsKafka ConnectorAzure Data

12、 Lake Gen 2Azure Data Lake Gen 2Challenges 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)2.Tuning Storage account API Rate limits 1.Code Migration(Loops,Disk IO)MR-RDDs-Dataframes3.Data Quality issues(DeDuplication,Nulls,DateTime formats)Task Orchestration 2024 AT&T Intellectual

13、 Property-AT&T Proprietary(Internal Use Only)A=30 mins B=20 mins C=60 mins D=15 mins E=5 mins 30+20+60+15+5=130 mins(2 hrs 10 mins)Here A,B,C,D,E are individual tasks or lets say Notebooks which are going to get executed one after the other.Task Orchestration 2024 AT&T Intellectual Property-AT&T Pro

14、prietary(Internal Use Only)Total Time:A+max(B,C)+D+E New Time:30+60+15+5=110 mins(1 hr 50 mins)(Lessby 20 mins)Cluster 1:A,C,D,E Cluster 2:BHere we have enabled parallelism By having A FAN-OUT to B and C Best Practices in Action 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)Cach

15、e and Persist Flexible Databricks RuntimesData DistributionPhoton Execution Data Skew Example 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)Photon 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)The next-generation engine for the lakehouseKey Takeaways 2024 AT

16、&T Intellectual Property-AT&T Proprietary(Internal Use Only)2.Consider your Storage Account.1.Stick with Dataframes and its supported features3.Data quality impacts parallel processing.Databricks Workflows 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)Databricks WorkflowsDatabri

17、cks NotebooksPythonScriptsPythonWheelsSQL Files/QueriesDelta Live TablesPipelinedbtJavaJAR fileSparkSubmitJobs consist of one or more TasksSequentialParallelConditionals(Run If)Jobs-as-a-Task(Modular)Control flows can be established between Tasks.Jobs supportsdifferent TriggersManualTriggerScheduled

18、(Cron)APITriggerFile Arrival TriggersContinuous(Streaming)For-Each LoopTask DependenciesParameterisationWebhooksJob ParametersPassed into each Task with behaviour based on the typee.g.additional options for JARs,spark-submit,Python ArgsJob ContextsSpecial set of templated variables that provide intr

19、ospective metadata about job and taske.g.run_id,job_id,start_timeTask ValuesCustom parameters that can be shared between Tasks in a Jobe.g.anything that can be programmatically set or retrieved!All SucceededDefault behaviour On start:Send a message to a when a job or a parent run is startedOn succes

20、s:when a job or a parent run finished without any errorsOn failure:when a job fails or a parent run is terminated with one of the children in a failed state.Allows customers to build event-driven integrations with Databricks.When a task is Done,it can be in a Success,Failure,or Excluded state.At Lea

21、st 1 Succeedede.g.Fan in with at least some successNone Failede.g.Run task(s)at the end of DAG if nothing failsAll Donee.g.Perform clean up even if tasks have failed or excludedAll Least 1 Failede.g.Perform clean-up with observability or specific actionsAll Failede.g.Perform clean-up with observability or specific actionsSupported destinations are Slack and Webhooks,with the below notification events:For example,you can send a message to a Slack#channel when:Databricks WorkflowsTHANK YOU 2024 AT&T Intellectual Property-AT&T Proprietary(Internal Use Only)

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(AT&T從HADOOP遷移數十億個事件處理.pdf)為本站 (張5G) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站