《ATscale:數據治理和語義層如何賦能數據網格白皮書(英文版)(27頁).pdf》由會員分享,可在線閱讀,更多相關《ATscale:數據治理和語義層如何賦能數據網格白皮書(英文版)(27頁).pdf(27頁珍藏版)》請在三個皮匠報告上搜索。
1、How Data Governance and a Semantic Layer Supports Data MeshBy George FiricanGeorge is a passionate advocate for the importance of data,a frequent conference speaker and a YouTuber,being ranked among Top 5 Global Thought Leaders and Influencers on Big Data,Digital Disruption and Top 15 on Innovation.
2、Wh i t e pap e r2Data are Assets -Valuable and ImpactfulWhen Performance Matters-Insights and Analytics DeliverData is valuable!Data is an asset!We hear this a lot.Clive Humby,a mathematician and the co-creator of the Tesco Clubcard the worlds first supermarket loyalty card,coined the phrase,“data i
3、s the new oil”,in 2006.We also hear the expressions“data is valuable”and“data is an asset”a lot.The message that Mr.Humby and others want to convey is that a companys data is a tangible asset.In fact,data is arguably one of the most important assets that any organization has.Why?We need data for two
4、 key reasons:to help us answer key business questions and to provide feedback about our performance,customers,markets and competitors.We hear so much today about the benefits of using machine learning-but what makes machine learning so valuable is the data that it uses.Enterprises learn from data.Da
5、ta is the catalyst for improving enterprise relevance and distinctiveness.Data helps accelerate productivity,efficiency,and competitive advantage.Companies that learn from their data improve their performance.Research from Tableau confirms that 83%of CEOs want their companies to be data driven,but o
6、nly 46%are achieving that goal.For those that do achieve it,the rewards are plentiful.According to a survey of more than 1,000 senior executives conducted by PwC,highly data-driven organizations are three times more likely to report significant improvements in decision-making compared to those that
7、rely less on data.The benefits of using data to generate actionable insights and analytics are real and extensive across every industry and functional capability.For example,in banking the value could be driven by improving fraud detection and real-time analysis of market data.For insurance agencies
8、,insights and analytics help predict and mitigate risk.The logistics industry relies on insights and analytics to optimize inventories and coordinate shipments to meet demand while minimizing costs.The healthcare industry uses analytics to improve the diagnosis of illness and medical conditions in p
9、atients while also improving predictions and planning for infectious disease threats or outbreaks.3Perhaps no other industry has transformed itself by applying data,insights and analytics like the sports industry.Sports has moved from a“gut feel”culture to an analytical/statistical-based culture whe
10、re most strategic and tactical decisions-selecting players,improving plays,improving fitness and training-are guided by data,insights and analytics.Why?Because in sports,performance is measurable and,ultimately,winning is what matters.The winners reaping the greatest rewards.Data is most valuable wh
11、en its used to create actionable insights that lead to improvements in business outcomes.These benefits can be seen across all industries.For example,for banking the value could be driven from improving fraud detection and real-time analysis of market data.For insurance agencies,we see how big data
12、aids in mitigating risks or reducing the calculation time of the value at risk.The supply chain industry relies on data to ensure the inventories as well as the shipments are optimized to reduce costs while meeting customers demands.The healthcare industry can use data to better help diagnose illnes
13、ses and medical conditions in patients,while also enabling better predictions and planning for infectious disease threats or outbreaks.The sports industry is a favorite for anyone who watched Moneyball,as it showcases how data can be used to improve performance strategically and tactically.Even gove
14、rnments rely on data for crime prevention,better emergency response,and smart city initiatives that provide more personalized and efficient services for constituents.Recent research reveals that 60%of enterprise organizations use data and analytics to drive strategy and change,improve processes,and
15、realize cost-efficiency(MicroStrategy,2020).The resulting investment in big data technology reveals the scope of this transformation.According to research firm International Data Corporation(IDC),worldwide spending on big data and business analytics(BDA)solutions in 2021 was forecast to reach$215.7
16、billion,an increase of 10.1%over 2020.Furthermore,IDC forecasts that BDA spending will gain strength over the next five years as the global economy recovers from the COVID-19 pandemic.The compound annual growth rate(CAGR)for global BDA spending over the 2021-2025 forecast period will be 12.8%,much l
17、arger than every other category of IT spending.Delivering Actionable Insights 4We cant deliver actionable,effective,consistent insights and analytics without governing the inputs.To that end,enterprises need to understand and manage how the data sources,processes for integrating data,and the data re
18、ports are created and deployed.Increasingly,enterprises need data to understand the decisions and actions that are being taken,and the impact they have on customers and compliance.This is why we need to deploy effective data governance.Traditionally,data governance has been viewed as bureaucratic,co
19、ntrolling and restrictive activities that create constraints and slow down progress.In many cases,the processes were slow because the systems used to manage governance,i.e.managing access for users and usage,were done manually with approvals done in batch mode vs being done continuously.With organiz
20、ations migrating to cloud-based data platforms,they are demanding that data governance do the same,and move from being restrictive to enabling greater speed,scale and productivity.This is what modern data governance helps deliver.Recent research from Deloitte shows that modernizing data is among the
21、#1 or#2 reasons for moving to the cloud,along with security and cost savings.Modern Data Governance-The CatalystsSecurity and data protectionPercentage of respondents who ranked each category as No.1 or No.2Data modernizationCost and performance of IT operations21%37%22%22%17%15%No.1No.2top drivers
22、for cloud migration5We cant deliver actionable,effective,consistent insights and analytics without governing the inputs.To that end,enterprises need to understand and manage how the data sources,processes for integrating data,and the data reports are created and deployed.Increasingly,enterprises nee
23、d data to understand the decisions and actions that are being taken,and the impact they have on customers and compliance.This is why we need to deploy effective data governance.Traditionally,data governance has been viewed as bureaucratic,controlling and restrictive activities that create constraint
24、s and slow down progress.In many cases,the processes were slow because the systems used to manage governance,i.e.managing access for users and usage,were done manually with approvals done in batch mode vs being done continuously.With organizations migrating to cloud-based data platforms,they are dem
25、anding that data governance do the same,and move from being restrictive to enabling greater speed,scale and productivity.This is what modern data governance helps deliver.Recent research from Deloitte shows that modernizing data is among the#1 or#2 reasons for moving to the cloud,along with security
26、 and cost savings.Whats driving the evolution of modern data governance?Fundamentally,we use data to answer business questions,and to answer those questions businesses need actionable insights and analytics,delivered with speed,scale,governance and cost-effectiveness.To make progress and have an imp
27、act with data,companies will need to do the following?Deliver actionable insights faster via automation,self-service and a hub-and-spoke delivery model?Achieve scale in terms of data sources,users and usage?Manage costs via cloud-based infrastructure?Reduced redundancy,reuse,collaboration and comput
28、e optimization Modern Data Governance-Enabling Speed,Scale and Cost Effectiveness6Modern Data Landscape Key Evolution DriversSpeedScaleCost-EffectivenessGovernanceFaster time to insights with fewer resourcesMore data sources,users and uses,including self-serveActionable Insights and Analytics-Releva
29、nt,Actionable,ImpactfulImproved productivity and infrastructure utilization/optimizationGoverned access,activities,usage and complianceWhy is Data Governance Important?Defining Data GovernanceBefore we define modern data governance,heres a list of the core questions that need to be addressed by data
30、 governance:?What foundation do we need to have for collecting clean data,documented metadata,and categorized and classified data?We need some policies in place.?What do we need for creating repeatable steps to clean our data,to make it consistent,to provide access to it,to secure it,and to define i
31、t?We need to establish and follow processes?How do we ensure consistency in our cleanliness,definitions,and categorization?We need to establish and comply with certain standards?Whos going to create all of these policies,processes,standards,rules and definitions?Who will approve them,who will mainta
32、in them,and who will enforce them?We need to define and assign certain roles and responsibilities.The answers to the core questions above help define why companies need to implement data governance.So then,what is data governance?7Data governance is a collection of processes,roles,policies,and stand
33、ards that ensure the effective and efficient use of data in enabling an organization to achieve its goals.It establishes the policies,processes,standards,roles and responsibilities that ensure the quality and security of the data used across a business or organization.Data governance also defines wh
34、o can take what action,upon what data,in what situations,using what methods while following set standards and definitions.Lets define modern data governance.Modern data governance is defined as the use of cloud-based technology and tools to govern the use of data effectively and continuously focusin
35、g on the following five key capabilities?Modern Cloud-based Data Platforms?Data Democratizatio?Data as a Produc?Federated,Hub-and-Spoke Deliver?Data Observability/Accountability In this whitepaper we will cover in more detail the key capabilities enabling modern data governance,including a brief rev
36、iew of the core elements of all data governance programs.Then,we will cover the importance of using a semantic layer to deliver improved data governance for data products.Defining Modern Data Governance Modern Data Governance Key Capability ElementsModern Data PlatformsData DemocratizationData as a
37、ProductHub-and Spoke Delivery ModelCloud-based data platforms and toolsRapid data access and self-service enablement Data Observability/Accountability -data usage,decisions,actions and complianceCreate and manage data as a product Decentralize insights creation-centralize data management8Whats drivi
38、ng the evolution of modern data governance?Fundamentally,we use data to answer business questions,and what businesses need are actionable insights and analytics to answer those business questions-delivered with speed,scale,governance and cost-effectiveness.To make progress and have an impact with da
39、ta,companies will need to do the following?Deliver actionable insights faster via automation,self-service and a hub-and-spoke delivery model,?Achieve scale in terms of data sources,users and usage and?Manage costs via cloud-based infrastructure,?Reduced redundancy,reuse,collaboration and compute opt
40、imization will continue to power progress and impact.Next,lets explore the current challenges,the importance of data governance,and what needs to change to modernize data governance for everyones benefit.Modern Data Governance-Enabling Speed,Scale and Cost EffectivenessModern Data Landscape Key Evol
41、ution DriversSpeedScaleCost-EffectivenessGovernanceFaster time to insights with fewer resourcesMore data sources,users and uses,including self-serveActionable Insights and Analytics-Relevant,Actionable,ImpactfulImproved productivity and infrastructure utilization/optimizationGoverned access,activiti
42、es,usage and compliance9Delivering actionable data is a process that requires many steps.Data that is actionable is often created from multiple data sources,each of which needs to be assessed,cleansed,prepared and made ready for use.The steps to create actionable insights often require as many as se
43、ven steps,which are as follows?Acces?Profil?Prepar?Integrat?Extract/Aggregat?Analyz?Publish/PresentFurther,the data thats sourced and transformed into actionable insights must also address data governance requirements for the following?Quality-Consistent,accurate data delivered on time with actionab
44、le recency?Security-Secure storage,access,transformation and usag?Privacy-Compliance with privacy standards?Legal Compliance-Compliance to local laws regarding data privacy and usag?Data Usage Compliance-Compliance with standards for how data is used to prevent identificationThe core elements and de
45、liverables of data governance include:?Roles and responsibilities?Data policies,?Data standard?Data processe?Defined metadataThe Challenge-Delivering Actionable Insights at ScaleCore Elements of Traditional Data Governance-The Basic Building Blocks10Roles&ResponsibilitiesData governance ensures that
46、 the right people are assigned the right data responsibilities.The main responsibilities are as follows:Data governance lead-Responsible for all aspects of defining and operating the data governance policies and supporting the multiple data domains.They are ultimately responsible for implementing th
47、e data governance program vision,promoting the role of governance,and enforcing policy,all while following data governance best practices.Data governance council-A governing body that is responsible for the strategic guidance of the data governance program,prioritization for the data governance proj
48、ects and initiatives,approval of organization-wide data policies and standards,as well as providing ongoing support,understanding and awareness of the data governance program.Data stakeholder Anyone that could affect,or be affected by,data governance decisions,processes,policies,standards,etc.Data o
49、wner An internal data stakeholder that has the authority to make decisions about business term definitions,data quality,accessibility and retention requirements as they tie to the business needs.Data steward An internal data stakeholder responsible for ensuring the quality and fitness of the organiz
50、ations data assets,including the technical and business metadata related to those data assets.Data custodian A data stakeholder responsible for maintaining the data and its relevant systems and infrastructure in accordance with the businesses requirements.11data governance councilA governing body wh
51、ich is responsible for the strategic guidance of the data governance program,prioritization for the data governance project and initiatives,approval of organization-wide data policies and standards,as well as enabling ongoing support,under standing and awareness od the data governance programInclude
52、s?Sponsor?Data governance lead?Lead data steward?IT Lead?Key business/data stakeholdersdata stakeholderAnyone that could affect,or be affected by data governance decisions,processes,policies,standards,etc.data governance leadResponsible for all aspects of defining and operating the data governance p
53、olicies and supporting the multiple data domains.They are ultimately responsible for implementing the data governance program vision,promoting the role of governance and enforcing policy,while following data governance best practicesdata Stewarddata ownerdata CustodianAn internal data stakeholder re
54、sponsible for ensuring the quality and fitness for purpose of the organizations data assets,including the technical and business metadata related to those data assetsAn internal data stakeholder that has the authority to make decisions about business term definitions,data quality,accessibility and r
55、etention requirements as they tie to the business needsA data stakeholder responsible for maintaining the data and its relevant systems and infrastructure in accordance with the businesses requirem ents.A couple of other notable roles that are not necessarily tied to data governance?Insights Creator
56、 Any user,user interface,automation,service or device that creates or collects data relevant to a business,and turns it into actionable insights and analytic?Insights Consumer-Any user,application,or system that uses data collected or produced by another user or system or is stored in a data reposit
57、oryIn order for the data insights creator to effectively communicate to the data consumer,we also need the following elements of data governance:data policies,standards,procedures,and defined metadata.12A policy is a statement of a selected course of action and high-level description of desired beha
58、vior to achieve a set of goals.Data governance typically defines policies related to privacy,security,access,usage,analytics(algorithms),compliance,and quality.Guidelines also cover the previously discussed roles and responsibilities of those implementing policies and compliance measures.In the end,
59、the purpose of these policies is to ensure that organizations are able to maintain and secure high-quality data.Governance data policies form the base of your larger data governance strategy and enable you to clearly define how data governance is carried out.A few common areas covered by data govern
60、ance policies are?Data quality Ensuring data is correct,consistent,and free of“noise”that might impeded usage and analysis?Data accessibility and availability Ensuring that data is available,accessible and easy to consume by the business functions that require it?Data usability Ensuring that data is
61、 clearly structured,documented and labeled,that it enables easy search and retrieval,and that it is compatible with tools used by business users?Data integrity Ensuring data retains its essential qualities even as it is stored,converted,transferred,and viewed across different platforms?Data security
62、 Ensuring data is classified according to its sensitivity and defining processes for safeguarding information and preventing data loss and leakage.A data standard is an agreement on representation,format,definition,structure,tagging,transmission,manipulation,use,and management of data.We need standa
63、rds to create,share,integrate,and use data.Standards aid in data cleansing and data transformation,but they also support data policies and adherence to them.Data standards range from anything and everything on how to record master data,reference data,transactional data,and analytics data such as mea
64、sures/metrics.Data PoliciesData Standards13A lot of the data governance processes will have the purpose of carrying out data governance policies and ensuring standards are followed and met.The remaining ones will ensure that the activities of data governance are carried out.The typical processes add
65、ress the planning,designing,managing,operating,and sustaining of?Regulatory complianc?Standards and policie?Master data suppor?Data analytics and ML/AI?Security and privac?Enterprise data mode?Technical and business metadat?Data governance programs plan and managementThere are two types of metadata
66、for which data governance sets the roles and responsibilities,processes,and standards to create and maintain?Business metadata i.e.the business concepts for an organization or industry.Business metadata defines things like what a customer is,sale conversion rate,credit,and funds received.It is mostl
67、y information authored and controlled by data stewards?Technical metadata i.e.the specifications about a field in a database.These specifications typically include data type,allowed values,default values,constraints,relations to other data elements,meaning and purpose.This information is mainly hand
68、led by data custodians.Both of these types of metadata,if properly defined and maintained,provide the necessary information and context for data stakeholders and data consumers to use the data without making incorrect assumptions.Data ProcessesDefined Metadata14Understanding why your organization is
69、 implementing data governance is crucial for several reasons.But most importantly you need to know the why in order to guide your data governance strategy and change management adoption strategy within your organization.Moreover,it will provide you with a direction on what should be tackled first.Af
70、ter all,you need to start somewhere and you cant just tackle everything at once.While not every organization will implement data governance for exactly the same reasons,there are some general whys that can provide a prioritization guide on what areas of the data and the data governance program you s
71、hould focus on first.These are:ScalabilityThe immense potential of data is well recognized by organizations across all industries.But for any data initiatives to succeed and scale,the data must first be accessible and available,compliant,defined and understood,and of high quality.This is where data
72、governance comes in to set the right foundations for scalable data initiatives.Because in order to scale up one needs to ensure that?Data is not siloed,but made available to the right people for the right usag?Data is put into context,described,and understood through common business verbiage and dat
73、a dictionary?Data quality is at a level needed to meet the business requirement?There are clear rules and roles for data creation/acquisition,maintenance,usage/dissemination,and archival/destruction?Mechanisms are in place for managing the data lifecycl?The same data can be used for different purpos
74、es by different teamsAccess and availability Data governance helps with the mapping,lineage and organization of a companys data and ensures organizations have more visibility and control over the data being gathered across it.This allows data stakeholders,data producers and data consumers to share i
75、nsights and eliminate data silos.In turn,it often establishes a consistent and complete single source of truth of critical data and metrics that business stakeholders can agree upon to make better cooperative decisions.The Main Reasons for Implementing Data Governance15Proper governance enables coll
76、aboration across departments,fostering broader insights,fueling better decisions,and promoting a more data-driven organization.In the end,data needs to be accessed and available by the right people at the right time.ComplianceData governance allows organizations to have clear control processes over
77、their data to align with pre-set business rules around regulatory compliance,data privacy and security.Data governance allows organizations to ensure they have policies,standards,and processes in place to identify and control data covered under specific regulations and assure that all relevant compl
78、iance regulations are met in all your organizations data practices.This makes it easier to stay compliant-and avoid big fines.ActionabilityData governance is the first step in creating an organization which is driven to make decisions that are based on undisputable data.And its a simple fact that or
79、ganizations that take action based on their data are more likely to achieve growth than organizations that continue to operate in data silos.In order to be actionable,data first needs to comply with data quality requirements and second it needs to be defined and understood.In many organizations the
80、data producers are actually not fully aware of the data quality requirements of the data consumers.Or theyre not aware of the different business processes and data products such as:data visualizations and dashboards,data warehouse,recommender systems,unstructured data classifications,etc.A data gove
81、rnance program factors in those clearly defined roles and responsibilities to allow data stakeholders and in particular,data stewards,to measure,monitor,and improve the data quality dimensions that are relevant to their line of business.Data governance also allows users to define and understand the
82、data,its context,its usage,while allowing them to better troubleshoot and prevent data issues.Data governance helps create,establish,and socialize common verbiage around metrics and datasets that provides all stakeholders with a common data language and consistent terminology thats easily understood
83、 across the organization.Without quality data thats defined and understood,organizations cant drive the correct actions out of their data.Users will make poor decisions because they are either based on bad quality data or incorrect assumptions derived from a lack of common data definitions.16access&
84、accessibilityActionabilityCompliancebusiness outcomesShared dataIncrreased confidence in data qualityClear rules and data processesData policy alignment to regulationsCommon data dictionary&business verbiageData is no longer siloedSingle source of truthReduced costsActionable results Increased effic
85、iencyData driven decisionsIt,data,business teams mode agileHigher compliance and reduced risksTraditional vs.Modern Data GovernanceLets refer to traditional data governance as data governance practices done in a pre-cloud environment and modern data governance as those practices done in a post-cloud
86、 environment.Traditional data governanceDue to the gargantuan efforts of breaking down technology and people silos and tackling data management and data governance organization-wide,data governance used to be focused on either one of the following two?A core system usually an Enterprise Resource Pla
87、nning(ERP)or a Customer Relationship Management(CRM)syste?A particular business unit such as marketing,procurement,finance,privacy and securityAs noted before,starting a data governance program can be challenging,so setting the initial focus on a core system or a particular business unit can get thi
88、ngs off the ground quicker and easier.Afterall,the scope is bound by the data housed within that core system or pertaining to that particular business unit.If the focus was set on a core system,some of the data governance definitions,standards,and processes were simply inherited from it,or in better
89、 cases served as a starting point.The downside of this pathway was that data governance was mainly led by IT with a lot of the requirements dictated by the technical limitations of that core system.17The business usually lacked sufficient representation as there would have been data stakeholders tha
90、t were not also system stakeholders.Therefore,these data stakeholders,although key in helping establish an enterprise-wide data governance program,were often omitted from consultations and decisions.Moreover,these omitted data stakeholders often included the data analysts and data scientists as thei
91、r work would not necessarily yield data that had to be fed back into the system.This meant that their work often resided outside the boundaries and the scope of a data governance program,which started with a focus on a core system.If the focus was on a particular business unit,there were some benefi
92、ts coming out of already knowing the stakeholders and having the relationships within the business unit to gain support and influence adoption.This was much easier than having a business units data governance office reach into the unknown of other units or lines of business.The good thing was that t
93、he data from multiple systems could have been in scope,if those systems had the business unit as an owner or a key stakeholder.Furthermore,if data analysts and data scientists created data products for that business unit,those would also fall under the data governance umbrella.The common challenges
94、with both choices on how to focus this traditional data governance program were?Organization-wide business requirements were not capture?Organization-wide level data needs would not be me?Scaling beyond the core system or the business unit was difficult and costly?Conducting redundant and sometimes
95、conflicting efforts if different data governance initiatives were started around separate systems and business unitsTypically,there are three operating models of traditional data governance:decentralized,centralized,and federated.Decentralized Data GovernanceIn a decentralized model we would find mu
96、ltiple,concurrent data governance initiatives and groups that aim to govern the data value creation activities addressed by siloed teams belonging to different business units.18The business usually lacked sufficient representation as there would have been data stakeholders that were not also system
97、stakeholders.Therefore,these data stakeholders,although key in helping establish an enterprise-wide data governance program,were often omitted from consultations and decisions.Moreover,these omitted data stakeholders often included the data analysts and data scientists as their work would not necess
98、arily yield data that had to be fed back into the system.This meant that their work often resided outside the boundaries and the scope of a data governance program,which started with a focus on a core system.If the focus was on a particular business unit,there were some benefits coming out of alread
99、y knowing the stakeholders and having the relationships within the business unit to gain support and influence adoption.This was much easier than having a business units data governance office reach into the unknown of other units or lines of business.The good thing was that the data from multiple s
100、ystems could have been in scope,if those systems had the business unit as an owner or a key stakeholder.Furthermore,if data analysts and data scientists created data products for that business unit,those would also fall under the data governance umbrella.The common challenges with both choices on ho
101、w to focus this traditional data governance program were:Organization-wide business requirements were not capturedOrganization-wide level data needs would not be metScaling beyond the core system or the business unit was difficult and costly Conducting redundant and sometimes conflicting efforts if
102、different data governance initiatives were started around separate systems and business unitsTypically,there are three operating models of traditional data governance:decentralized,centralized,and federated.Decentralized Data GovernanceIn a decentralized model we would find multiple,concurrent data
103、governance initiatives and groups that aim to govern the data value creation activities addressed by siloed teams belonging to different business units.19The result may yield value for the given unit,but not necessarily for the broader organization,which will likely be receiving conflicting reports
104、of its master data,redundant efforts creating the same metrics multiple times,and inconsistencies on how data was defined,created,maintained,and consumed.Centralized Data GovernanceTypically,as the data governance program would try to scale up and include newer core systems or other business units,a
105、 common data governance program would aim to emerge,led by a single data governance council.Its membership included representation of all those with their data in scope.The data,analytics,and data value creation were no longer handled in siloed teams,but they were mainly done centrally.Consensus,acc
106、uracy,and consistency were achieved,but because data,analytics,and data value creation were practically gated by this central team,progress was slow and often impeded innovation and the ability to act quickly to changing business needs.Federated Data GovernanceFederated data governance aimed to be t
107、he best of both worlds.It still provided a centralized structure that oversaw the enterprise-level data,analytics and data value creation while allowing the flexibility and self-governance for anything that was unique to each business unit.Modern Data Governance-Data Democratized and Federalized All
108、 of these operating models were still plagued by the traditional way of setting the focus of data governance.A new way had to be established in order to enable an organization-wide data governance program to emerge.One that enables the delivery of actionable insights at scale.Enter modern data gover
109、nanceDefining Modern Data Governance Modern Data Governance is defined as the use of cloud-based technology and tools to govern the use of data effectively and continuously focus on the following five key capabilities?Modern Cloud-based Data Platforms?Data Democratizatio?Data as a Produc?Federalized
110、,Hub-and-Spoke Deliver?Data Observability/Accountability 20Modern Data Governance Key Capability ElementsModern Data PlatformsData DemocratizationData as a ProductHub-and Spoke Delivery ModelCloud-based data platforms and toolsRapid data access and self-service enablement Data Observability/Accounta
111、bility -data usage,decisions,actions and complianceCreate and manage data as a product Decentralize insights creation-centralize data managementModern Data Platforms and the CloudData cloud environments and the tools that power them are starting to make it much easier to integrate data from multiple
112、 systems,and start handling and consuming the data from the point of view of the entire organization and not just isolated business units or solitary systems.The cloud architecture enables organizations to open the flood gates of their data and make it cheaper and easier to start democratizing the d
113、ata.Suddenly,business users with little knowledge of the data space could leverage modern data platforms to get close-to-real-time insights about the business without always having to rely on the Business Intelligence team every time they needed an answer.The data science team could start creating t
114、heir AI products and yielding new measures/metrics without having to wait for new infrastructure to be in place.The data governance team could set their scope on the cloud and not have to worry as much about the source systems that data came in;they could focus more on the needs of the data consumer
115、s.Data DemocratizationNow that companies are able to store and provide access to data centrally in the cloud,insights creators and consumers are increasingly demanding that data be democratized.There is a common misperception that data democratization is a synonym for data access.But in reality,it i
116、s much more than that.Data democratization is a process that enables all data and business stakeholders within an organization to work with the data they need in order to deliver actionable insights at scale and make data-informed decisions.Here are the major elements and differences between a moder
117、n data governance program within a data democratization environment and a traditional one:21Defining Modern Data Governance-Major ElementsTraditionalModern Data Governance-Democratized and FederalizedData access provided according to each project needData access provided according to business roleOn
118、ly technically skilled people can work with dataAny stakeholder can work with dataAnalytical tools not designed for product teamsAnalytical tools designed for product teamsKnow-how and context of the data gatekept by data expertsThe necessary metadata and context is available for all data consumersN
119、ew data products are created by dedicated BI and analytics teamsNew data products can be created by any stakeholderNew data products are mainly available to their creatorsNew data products are available to anyone that needs themComplex value creation model bottlenecked by central IT and data teamsMo
120、re agile value creation through self-service analyticsWith these changes,data democratization yields both caution and optimism for the potential outcomes.Those who are both supportive or weary of this new world of data democratization identify some common opportunities and challenges.Data democratiz
121、ation opportunitie?Empowering employees by providing quicker access to data that is both defined and understoo?Switching from a siloed approach on accountability over the organizations data to a more all-encompassing,centralized on?Gaining more insights by having a diverse set of teams collaborate o
122、n the same measures/metrics and data22Data democratization challenge?More users could now have access to more data,creating risk?Duplication of efforts if different users prepare the same measures/metrics without them knowing about i?More prevalent misuse of data by misinterpreting it if there was a
123、 lack of context and definitionsTo alleviate the challenges and capitalize on the data democratization opportunities,a modern data governance has to come into play.The best practice is therefore to govern this data by data domain(also called subject area).By data domain,Im referring to“a logical gro
124、uping of items of interest to the organization,or areas of interest within the organization”.You can think of data domains as high-level categories of data for the purpose of assigning accountability and responsibility for the data.Such categories or subject areas such as?Custome?Produc?Servic?Locat
125、io?Vendor/SupplierThe benefit of governing data this way is that you have organization-wide representation from any data stakeholder handling any data,analytics,and data value creation within each domain.The needs of the whole organization as well as each business unit are now considered and evaluat
126、ed,and the overall roles and responsibilities,policies,definitions,standards,processes tend to be agnostic from the systems in which the data reside.This method brings even more benefits when coupled with a federalized,hub-and-spoke delivery of data governance.For each data domain,all data stakehold
127、ers come together to define any policies,processes,standards,and metrics related to it,but these are created in the central data governance office,or the hub.Modern Data Governance Delivered via a Federalized,Hub-and-Spoke Delivery23The artifacts and information then flows to its spokes for adoption
128、 and implementation.The spokes could be represented by individual business units and departments,but even geographically different offices and sites.The spokes differ greatly from one another from a market,product,or functional need perspective,but they are free to create their own addendums or spec
129、ific artifacts that only concern their area and do not impact others.The hub-and-spoke model allows organizations to analyze and process more forms of data,for different business needs,while still offering a structured delivery of data governance.But the reality is that even though governing data wi
130、th a focus on data domains through a hub-and-spoke model ensures the enablement of a data-driven organization,data governance is not keeping up the pace with the data product creation.Theres still a missing piece.Data governance hand in hand with data product creation Modern data consumers and produ
131、cers tend to have the same data challenges theyve always had?Ensuring they are working with trusted,accurate,reliable dat?Understanding the data and its contex?Having the data readily available and accessibleOnly now,these challenges are at a higher magnitude as there are more stakeholders who need
132、to work with data and produce data products.In turn,these data products also need to be thrown under the same data governance umbrella so that it can be used as quickly as possible by others.The ideal state of data democratization that organizations are trying to achieve is to be able to govern the
133、data centrally so that all the benefits of data governance come to fruition,but at the same time not create a slow queue for new data products to be created or once they are created to be allowed to be in use.Lets step back for a second and remind ourselves that data democratization comes with a big
134、 cultural shift when it comes to working with data.Deriving insight,both hindsight and foresight out of our data,is no longer gate-kept by IT and specialized data teams such as BI and data analytics.Instead these capabilities are made available to all relevant stakeholders throughout the organizatio
135、n.Within a democratized data environment,analytics and reporting is embedded throughout the organization from top to bottom.One can immediately see the value in this as those who know the business well should be provided with a way to also know and use the data to ask and answer questions relevant t
136、o their business units.24Unfortunately,all these benefits also come with challenges.The truth is that many companies utilize a manual approach to data governance for creating standards and processes to vet data,compare reports,map upstream and downstream dependencies,and so on.So,any new data produc
137、ts like new measures/metrics have to wait in line,sometimes for too long,before the data governance team can get to them.Modern data governance ensures data traceability and data quality for these new data products as well.It also makes data and business stakeholders more efficient at finding and un
138、derstanding their data so that they can ask the right questions as well as answer them properly.Modern data governance also introduces the ability to reuse data and data processes,thus reducing repeated and redundant work.Afterall,a more productive stakeholder maximizes the income generation of pote
139、ntial data.Modern data governance cant be overly restrictive to the point of allowing down progress.It needs to be more flexible and agile and quickly adapt to the increased volume,variety,and velocity of data and data products that organizations consume and create.Modern data governance needs to ex
140、pand its scope to the entire value creation chain.So,whats the solution?Whats the missing piece to unlock the full potential of modern data governance?Our modern data governance needs the help of a semantic layer.A Semantic Layer is a business representation of data that helps data stakeholders and,
141、in particular,data consumers access data using common business terms.A Semantic Layer maps business data into familiar terms to offer a unified,consolidated view of data across the organization,ideally also grouped by data domain.In simple words,the Semantic Layer provides the context and informatio
142、n needed for actionable analytics.The Semantic Layer needs to offer?A common understanding of data-Data consumers,irrespective of their data or business knowledge,need to find the right data,understand its technical and business metadata,and gain the context to use it correctly.Semantic Layer Enable
143、s the Full Potential of Modern Data Governance25?Consistency in data usage As organizations continue relying on different skills and teams to perform different BI and data analytics functions and implement different applications with the same data,consistency in the data usage is critical?Decentrali
144、zed data value creation As data products get created,things like new metrics/measures,and AI results should be quickly instantiated for available access by other data consumers such as the BI team?Data availability,auditable access and data lineage As data is made available to more and more stakehol
145、ders,its access needs to be controlled in relation to the type of data,business roles,and business use.As data gets transformed into new data products,and consumed and used in different mediums and applications,the ability to trace that lineage will remain instrumental.Through this type of a Semanti
146、c Layer,a federated data governance operating model would function best as it would offer a centralized data governance function for enterprise-level data,while offering a decentralized governance for data product creation.The data governance team could still exert control over the Semantic Layer by
147、 requiring any user on-boarded onto the Semantic Layer to be trained on how to contribute to and maintain the roles and responsibilities,policies,standards,processes,and metadata related to the available data.Over the past two years,there has been a tremendous resurgence among enterprises in using t
148、he Semantic Layer.This traces to their recent experience migrating to modern data platforms.Enterprises now have the need to improve speed,scale and cost savings for AI and BI and are able to generate actionable insight from newly available data sources for many new users and use cases.The good news
149、 is that recent research affirms the value of using a semantic layer.Companies realize the promise of successful,impactful data and analytics programs using a semantic layer,including to deliver effective data governance-and stand in stark contrast to those companies that dont use a semantic layer.S
150、emantic Layer Rising26According to recent survey of 300 respondents from Ventana Research:Organizations that have successfully implemented a semantic model/layer?Are significantly more satisfied with analytics(77%compared with 33%of total respondents?Have more of the workforce engaged in analytics(4
151、3%compared with 23%have more than one-half the workforce using analytics?Find analytics capabilities completely adequate(62%vs.33%of total respondents?Are more comfortable with self-service:(54%very comfortable vs.14%of total respondents)Say data governance capabilities are completely adequate(51%vs
152、.25%of total respondents?Value of semantic modelsSatisfaction with AnalyticsMajority of Workforce using AnalyticsReporting Completely AdequateData Governance Completely AdequateComfortable with Self-Service14%25%33%23%33%54%51%62%43%77%Implemented Semantic ModelAll Participants27George is a passiona
153、te advocate for the importance of data,a frequent conference speaker and a YouTuber,being ranked among Top 5 Global Thought Leaders and Influencers on Big Data,Digital Disruption and Top 15 on Innovation.His innovative approach to addressing data challenges received international recognition through
154、 award-winning program and project implementations in data governance,data quality,business intelligence and data analytics.He advises organizations on how to treat data as an asset,and he shares his practical takeaways on social media and various industry sites and publications.George has been a da
155、ta professional for more than ten years.One of Georges passions is to create informative,practical and engaging educational content to share with individuals such as yourself,and help organizations get more visibility on social media.George is the proud founder of LightsOnD and its YouTube channel and is a co-host of the Lights On Data Show.