1、 全球敏捷運維峰會 廣州站PUBLICBig Data Intelligent Processing&Data Visualization演講人:吳仕櫓 全球敏捷運維峰會 廣州站PUBLICBusiness Insights&Analytics How it Works123456781)Source systems are ingested into staging(a shared preparation area).Typically through Sqoop(database copy)or CDC(streaming style change updates)or Juniper(
2、in the house platform)2)System tables are copied into the Discovery environment,where this production data is processed and models/insight are developed post Data Factory3)The Data Factory takes raw data through a number of steps:i.Profiling:looking at the data to identify its contents and tag it wi
3、th the correct metadataii.Cleansing&curating:restructuring the data into the simplest and most efficient form,highlighting errors to revert back to source system ownersiii.Enriching:creating new derived fields based on the raw data(e.g.flags)and appending reference data for models to utiliseiv.Recor
4、d linking:using advanced techniques to join up disparate data and masses of separate sources into a single logical modelv.Indexing:organising the final data asset into an index,making it quickly searchable4)Stabilised assets and models are pushed through our UAT environment for testing and data vali
5、dation from the consuming users5)Final models and assets are then landed in our production environment;their insight ready for consumption through agreed patterns(typically APIs or file transfers)6)The Data Guardian will control all consumption compliance7)Data Exchange hosts APIs/APPs to source dat
6、a to consumers 全球敏捷運維峰會 廣州站PUBLICData&Analytics ExecutionAutomated feed of data,copying the source systems into the GBM Data&Analytics LakeData is pre-processed,transformed and optimised by Data EngineersThe tagged data is linked and enriched using machine learning,generating unique identifiers for
7、clientsThe enriched data is validated against business rules to ensure that it is fit for purpose Data is profiled to tag components for metadata analysisAlgorithms used to predict data type and automatically tagThe finalised data is passed into a range of MI,analytics and data science applications
8、to generate business valueIngest TransformProfileLinkAnalyse ConsumeRaw XMLTrade DataPre-processedSource DataMetadata ModellingRecord linked Network GraphData Validation ResultsTime-series ApplicationCase 1Use Cases in ExecutePipelineExampleTechnologiesCase 2Case 3Case 4Case 5Case 6Case 8Case 7 全球敏捷
9、運維峰會 廣州站PUBLICData Guardian-1Information Asset RegistryGolden source for physical to logical mappings,mastered in data factoryRepository for logical attribute hierarchy,containing terms where necessarySource DataData ingested from hundreds of source systemsData cleansed via GBM Data FactoryData pres
10、ented in use case assetsData GuardianPolicy Administration tool linked up with meta data store,allows policy rules to be entered in logical termsEach“data access request type”is assed by Policy Engine in order to produce a Policy Decision Point summarizing the resultant compliant datasetAutomatic ad
11、aption of queries and in process filters in order to produce compliant data viewData Sharing Policys obtained from regional legal and compliance teamsPolicy converted into set of sharing rulesRules converted into Standard Rules Template ready for consumptionData AssetComplianceRulesAttribute TaggingAuditAutomaticImpact 全球敏捷運維峰會 廣州站PUBLICData Guardian-2 全球敏捷運維峰會 廣州站PUBLICData Exchange 全球敏捷運維峰會 廣州站PUBLICRapid-V Design 全球敏捷運維峰會 廣州站PUBLICRapid-V Demo 全球敏捷運維峰會 廣州站PUBLICRapid-V Sample