《Alteryx:2024生成式人工智能時代下的數據質量提升研究報告(英文版)(16頁).pdf》由會員分享,可在線閱讀,更多相關《Alteryx:2024生成式人工智能時代下的數據質量提升研究報告(英文版)(16頁).pdf(16頁珍藏版)》請在三個皮匠報告上搜索。
1、Custom content for Alteryx and Databricks by studioIDImproving Data Quality in the Age of Generative AIData fuels artificial intelligence(AI),shaping the future of your business.The quality,integrity and availability of that data are pivotal in determining the success of generative AI initiatives.72
2、%Within a few months of ChatGPTs launch,generative AI was elevated to boardroom conversations across every industry.Businesses that hope to stay relevant are already investing in the technology.For chief information officers(CIOs),the imperative is not only to adopt these solutions but also to ensur
3、e that the data that feeds generative AI models is accurate and reliable.Good models cannot overcome bad data.The adage“garbage in,garbage out”is particularly relevant here.Outputs will inherit any flawed or biased input,resulting in unreliable models and misinformed decisions.Some estimates place t
4、he failure rate of AI projects as high as 80%,with a lack of data quality and availability among the underlying causes.1of business leaders said that data problems were more likely than other factors to jeopardize their achievement of AI goals.2Databricks Global CIO Survey on AI Adoption by 2025The
5、Data Quality Challenge2CIOs also face pressure to support data democratization by strategically deploying technologies that bridge the gap between complexity and business insight.For that to happen,data insights must be readily available to those who need them.However,when talking about democratizat
6、ion,we are also talking about scale.Data quality becomes incredibly important at scale because the more people who touch the data,the more trust people need to have in it.The Need for DemocratizationMaintaining high-quality curated data will get only harder as organizations scale AI adoption.AI has
7、also exposed the limitations of existing IT architectures,data stacks and skill sets.To ensure that data quality does not end up a barrier to innovation,CIOs must evaluate their strategies by addressing two key areas:Ensuring data architecture can support data quality and integrity:Gaining an unders
8、tanding of existing IT infrastructures and identifying the steps needed to adapt for AI-readiness.Engaging business domain experts to contribute to data quality:How to leverage internal business expertise to enhance the relevance and accuracy of datasets.To help CIOs navigate these challenges,this g
9、uide will provide insight into these essential areas and illustrate how Databricks and Alteryx work together to elevate data quality in the era of generative AI.123AI models rely on massive volumes of data.Most of that data is unstructured,demanding capabilities and resources for its collection,stor
10、age,processing and analysis.Moreover,most data falls under the category of“dark data,”which,by definition,is insufficiently governed for quality and integrity.However,data in any form may hold hidden opportunities,especially for training the next generation of AI models.For example,unstructured data
11、,such as recordings or transcripts of customer service calls,can be useful in training a conversational AI model designed to handle routine customer queries.Modernizing Data Architecture for Data Quality4The Problem:Most Data Stacks Fall Short on Data QualityThe challenge is that most data architect
12、ures are not ready for AI workloads.New research by Alteryx also found that 90%of IT leaders are still using out-of-date technology stacks.3 However,even modern architectures might not be ready because of their complexity.These factors compromise data quality in two main ways:NO.1 Important data oft
13、en ends up lost in departmental silos.The Alteryx survey found that 48%of businesses do not share data outside the department in which it is generated.The issue often stems from a data architecture that lacks interoperability,hindering the seamless sharing of data between departments.These fragmente
14、d technology stacks make it hard to maintain oversight of data.This leads to inconsistency,duplication and outdated information.NO.2 AI workloads demand significant computational power.AI workloads,particularly those involving deep learning,require a great deal of computing power because of the larg
15、e datasets involved.Having high-quality,accurately labeled and relevant data significantly enhances efficiency.Conversely,poor data quality can lead to increased computational demands,as achieving the desired accuracy and performance requires more iterations and processing power.Enterprises have man
16、y working components in their technology stacks;multiple data warehouses,multiple clouds,and multiple orchestration and workflow systems.Its important that organizations have a self-service platform for consuming and governing various forms of data before they unleash it to their stakeholders.5A uni
17、fied data platform democratizes data and lays the foundation for an enterprisewide data-driven culture that is capable of producing and maintaining high-quality data.The first step to achieve this is to evaluate your infrastructure and identify the steps needed to update it to meet data-quality need
18、s:NO.1 Does it break down silos?A unified data architecture provides the technical foundation needed to support seamless sharing and collaboration across departments.Conducting an audit of all data sources helps determine the extent of integration required to obtain a unified view.NO.2 Does it enabl
19、e auditability?Visibility and auditability in your data architecture ensure that you can track data lineage,access and changes over time,allowing you to see what happened to your data at each step of its life cycle.Incorporating monitoring tools and processes lets you regularly review and test data
20、for completeness and accuracy,as well as compliance with regulations and internal policies.NO.3 Is it scalable and adaptable?Data curation and quality are foundational elements of any successful AI or machine learning(ML)project,but they require efficient data consumption and processing at a scale t
21、hat is impossible to meet with traditional architectures.This is where a data lakehouse comes in with its scalable,low-cost,high-performance and flexible architecture designed for diverse use cases.The Solution:Building an Architecture That Facilitates Data QualityA data lakehouse combines the adapt
22、ability and scalability of a data lake with the governance and transactional capabilities of a data warehouse.This hybrid approach allows enterprises to store vast amounts of raw data,while also providing a structured environment and the tools necessary for efficient data querying,analysis and repor
23、ting.The result is a unified platform that better supports AI and ML workloads.6The Databricks Data Intelligence Platform has helped customers:Accelerate time-to-value of big data projects by Improve process efficiency in BI and MLOps by Complexity is often the biggest barrier to adopting a modern d
24、ata lakehouse infrastructure.Almost all solutions are code-based,requiring deeply specialized skill sets that may not be readily available.This also broadens the divide between IT teams and the line-of-business users who rely on data for insights.IT leaders can address this by combining Databricks w
25、ith Alteryx.With this unified approach to managing,processing and using huge volumes of data,you can enhance data quality by:Unifying all data assets with centralized access across environments.Improving and simplifying governance of structured and unstructured data.Supporting data democratization w
26、ith self-service business analytics.How Databricks and Alteryx Can HelpIncrease data team productivity by almost Achieve an average return on investment of Databricks combines the worlds first Data Intelligence Platform powered by generative AI with the business-friendly user interface of the Altery
27、x AI Platform for Enterprise Analytics.52%50%60%428%7Empowering Business Domain Experts With Data DemocratizationOnce you have a modern data architecture,you still need to make it accessible to your business users.You also need to give users the training and tools necessary for them to do their jobs
28、.However,during that training,the company also has to trust in their users,and users need to have trust in the quality of the data they are using.While AI is a powerful tool both for giving end users the opportunity to contribute diverse perspectives to model development and for gaining access to ri
29、cher business insights,it is essential to keep business domain experts in the loop.This requires cultivating a data-driven company culture,in which business domain experts are in sync with IT teams.8Business domain experts are often left out of the data analytics journey.This is a common mistake,and
30、 a serious one.After all,business domain experts tend to know enterprise data best,giving them an essential role in ensuring data quality in any AI-adoption strategy.Democratization is essential for enabling a broader range of professionals,as opposed to just data scientists,to solve problems with A
31、I,as well as support the development of AI applications.In fact,generative AI itself is a catalyst for data democratization simply because it mimics human understanding and engagement.It is designed to be accessible to everyone,regardless of their role.Addressing the lack of trust in data Despite th
32、e clear advantages of generative AI and other models in business decision-making,a lack of trust in data quality persists.If your data is not cleansed and treated correctly,you risk creating misinformation in your organization.The Problem:Business Experts Are an Underused Resource on Data QualityTo
33、mitigate these risks,board members and business users alike must become part of the data-quality conversation from the outset.When driving that conversation in the boardroom,CIOs must emphasize the role of data quality in achieving business goals.With that in mind,here are some of the key points CIO
34、s should impart in the boardroom:Ineffective data leads to misinformed decisions,reputational damage and financial loss.Robust governance is vital for mitigating risks associated with data privacy and ownership.Democratization of AI supports a companywide data-driven foundation for innovation.9The S
35、olution:Engaging Business Experts With Accessible AnalyticsWhile business domain experts usually lack the technical skills to use a code-based platform for these processes,you can bring them in with an accessible,no-code analytics interface and self-service tools.Generative AI offers the next level
36、of data democratization,with AI capabilities that can enhance productivity and make it faster for business users to get started with analytics immediately.However,democratization is also a double-edged sword.On one hand,having more people touching the data means there is a greater scope for somethin
37、g to go wrong,thus demanding a far higher degree of trust and governance capabilities.On the other,democratization is fundamental for eliminating biases and incorporating diverse perspectives into AI model development and training.Democratization is also fundamental for making data insights accessib
38、le to business domain experts.Business users,for example,want to ensure the reports they build are based on trustworthy data.This is why data quality is incredibly important when talking about democratization.10To build a data-driven culture aware of the importance of data quality,leaders must take
39、several fundamental steps:Establish a culture of data ownership in which business domain experts can be responsible for data quality and drive the analytics journey.Implement training programs on data literacy and quality management to ensure that business users are up to date with the best practice
40、s for handling and analyzing data.Select a self-service analytics platform that also respects the governance policies necessary for your organizations data privacy and security needs.Create feedback loops for data quality by setting up mechanisms for business domain experts to report inaccuracies or
41、 inconsistencies directly.To truly democratize AI,you need to democratize your entire analytics journey,from how you collect data to how you prepare and cleanse it through exploratory analysis and feature engineering all the way through to building and maintaining AI models.STEP NO.1STEP NO.2STEP NO
42、.311Navigating the data journey from collection to insight can be challenging,especially for business users who lack skills in coding or AI/ML model development.The integration of Alteryx and Databricks breaks down technical barriers to support a broader range of business users to engage with data a
43、nalytics and AI.With a deep integration with Databricks,the Alteryx AI Platform enables nontechnical business experts to benefit from Databricks features,without needing to code.When used together,both platforms support a more collaborative and productive data-quality process.For example:Both Databr
44、icks and Alteryx have built-in AI features that enhance data exploration and preparation workflows to accelerate productivity and support higher-quality data.The Alteryx AI Platform for Enterprise Analytics has:Become the platform of choice for nearly half of the Forbes Global 2000.Helped over 8,000
45、 customers scale their analytics.Built a community of over 500,000 members.Reached No.1 in the Gartner Peer Insights.How Databricks and Alteryx Can HelpAlteryx introduces a user-friendly,no-code analytics interface that allows business domain experts to use Databricks advanced infrastructure capabil
46、ities to enhance data exploration and preparation.Use AI to document your datasets and the steps you took.For instance,generate a description of a dataset in Databricks Unity Catalog,and use the Workflow Summary Tool in Alteryx Designer to generate a description of your analytic workflow.Alteryx sup
47、ports Databricks Unity Catalog,which enables business users to discover the datasets they need and securely connect to Databricks data for transformation and analytics.Alteryx users can leverage Databricks as their execution engine for data cleansing and processing while using the no-code tools in A
48、lteryx which can minimize data movement and speed up runtimes.121.https:/hbr.org/2023/11/keep-your-ai-projects-on-track 2.https:/ 3.https:/ powers actionable insights with the AI Platform for Enterprise Analytics.With Alteryx,organizations can drive smarter,faster decisions with a secure platform de
49、ployable in on-prem,hybrid and cloud environments.More than 8,000 customers globally rely on Alteryx to automate analytics to improve revenue performance,manage costs and mitigate risks across their organizations.Alteryx is a registered trademark of Alteryx,Inc.All other product and brand names may
50、be trademarks or registered trademarks of their respective owners.LEARN MOREDatabricks is a data and AI company.More than 7,000 organizations worldwide rely on the Databricks Data Intelligence Platform to unify their data,analytics and AI.The company is headquartered in San Francisco,with offices ar
51、ound the globe.Founded by the original creators of Apache Spark,Delta Lake and MLflow,Databricks is on a mission to help data teams solve the worlds toughest problems.Learn morestudioID is Industry Dives global content studio offering brands an ROI rich tool kit:Deep industry expertise,first-party audience insights,an editorial approach to brand storytelling,and targeted distribution capabilities.Our trusted in-house content marketers help brands power insights-fueled content programs that nurture prospects and customers from discovery through to purchase,connecting brand to demand.LEARN MORE