《Linux基金會:開放數據實現路徑:2024年世界開放創新大會挑戰會議成果報告(英文版)(18頁).pdf》由會員分享,可在線閱讀,更多相關《Linux基金會:開放數據實現路徑:2024年世界開放創新大會挑戰會議成果報告(英文版)(18頁).pdf(18頁珍藏版)》請在三個皮匠報告上搜索。
1、Anna Hermansen,The Linux FoundationMarch 2025Pathways to Open DataFindings from the 2024 World Open Innovation Conference Challenge SessionPaul Wiegmann,Eindhoven University of TechnologyForeword by Professor Henry Chesbrough,Luiss University and Haas School of Business at UC BerkeleyPathways to Ope
2、n DataCopyright 2025 The Linux Foundation|March 2025.This report is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International Public License.Data silos hamstring research&innovation,and have become increasingly onerous alongside growing data needs to train AI models.Data privac
3、y concerns stem from compliance with regulations such as GDPR,which create a climate of risk aversion.Significant human resources are required for cleaning,standardizing,&maintaining a dataset.Overture Maps Foundation has built an open,agnostic,&standardized geospatial data platform for data owners&
4、service providers to leverage.Open data is freely accessible for universal use,leading to new avenues for innovation,greater reliability,&increased trust.Proprietary control over data gives companies greater certainty around compliance&quality while reducing the fear of losing competitive advantage.
5、The financial&resource costs of dataset maintenance engender a tradeoff between the quality of the data&the cost of accessing it.Building open data infrastructure requires a reworking of current data collection&sharing processes.The unique qualities of data as compared to softwaresuch as maintenance
6、,quality,privacy,&license diversitymake its openness challenging.“Semi-open”data platforms allow for collaborators to share best practices and other pre-competitive data while maintaining their competitive advantage.While a lack of standardization makes datasets unusable,AI tools offer opportunities
7、 to better manage unstructured data.Open data requires incentivizing collaboration around a pre-competitive layer while incorporating checks&balances in the governance structure.ContentsForeword .4Introduction:The history and future of open data.5Why open data matters.7Current challenges of open dat
8、a .8Open data successes&opportunities .12Next steps:What is needed for open data structures .13Conclusion.15Methodology.15Acknowledgments .16About the authors.16Foreword In an increasingly digital world,we are all both users and producers of data.But we often ignore the possible ramifications of thi
9、s,as we are hustling to make a purchase,read a story,make a post,or react to a photo.Companies that figured this out early on have become enormously valuable,and now mediate the access to the data they have accumulated.The concept of open data is a response to this state of affairs.This is the backd
10、rop of this report,on pathways to open data,supported by the Linux Foundation.The Linux Foundation chose to host a workshop on the topic of open data at the 11th meeting of the World Open Innovation Conference.Fueling interest and participation in the workshop was the rapid growth of artificial inte
11、lligence software(AI),which requires extensive amounts of data to train the algorithms that AI employs.This report summarizes the takeaways from the workshop,so I will simply underscore a few insights here that I found helpful.One was that everyone wants to protect their own data,yet everyone seeks
12、algorithms that can perform very well.To get algorithms to perform at a high level,a lot of data are required.So for all but the very,very largest organizations,it makes sense to open up access to the data so that better AI algorithms result.It is also critical to recognize that data will grow and c
13、hange over time.So there isnt a one-time set of actions and expenses to move to Open Data.Rather,this will be an ongoing journey,and identi-fying an economic model to support the cost and effort to keep up with the inevitable changes in the data must become part of managements commitment to support
14、the project.Where to from here?There were at least three ideas put forward about the ways in which Open Data might advance in the future.The first was data ownership,in which users might have the ability to provide their personal data under certain conditions,and might choose to restrict the use of
15、their data in other conditions.A second idea was to create incentives for contributing data to a“pre-competitive”data set.This would protect contributed data from being used to identify specific people,for example,while allowing more general characteristics to be analyzed.Importantly,this pre-compet
16、itive dataset would be made widely available,demo-cratizing access to data that previously was prohibitively expensive to access,or simply unavailable,to smaller firms and individuals.A third important idea was that of governance.Repositories of large amounts of data must be stored somewhere.There a
17、re costs for hardware,for software,for security,and for maintenance.In order to sustain broad access to useful data repositories,there needs to be an economic model of some kind.And the decisions that are taken around access to data,and whatever expenses might be involved in that access,have to be t
18、aken within a governance mechanism that is credible to the stakeholders supporting the Open Data process.Henry Chesbrough LUISS University in Rome,UC Berkeley in USA 4PATHWAYS TO OPEN DATAIntroduction:The history and future of open dataOur data is everywhere and powering everything.From marketing,to
19、 healthcare,to government services,to the emerging phenomenon of programming AI agents,organizations leverage data to be as effi-cient and effective as possible.However,data is often siloed within entities and any third-party data access requires overcoming signi-ficant technical,legal,economic,oper
20、ational,and cultural obstacles that are multifactorial and at times may seem intractable.1 The increasing reliance on data calls for an assessment of these obstacles and how organizations can shift toward greater openness and sharing.The concept of open data has its roots in open science,where non-p
21、ersonal and non-commercial data is freely published for the purpose of greater innovation,transparency,and collaboration.2 This culture of openness is strongest in public institutions,where the data collected is considered a public good without profit-generating opportunities,and where transparency
22、of government and public-sector information is encouraged.3 The Open Government was popularized in the 2000s with the Obama administrations open data initiative(2009)and the Public Sector Information Directive in Europe(2003),1 and soon,many governments developed open data portals for citizens to ac
23、cess and analyze public infor-mation about their municipality.According to the United States governments data.gov portal,its mission is to“unleash the power of government open data to inform decisions by the public and policymakers,drive innovation and economic activity,achieve agency missions,and s
24、trengthen the foundation of an open and transparent government.”4 When entering commercial and/or personal data ecosystems,the notion of open data becomes much more complex.Without the open government mandate,organizations grapple with profit incentives,privacy concerns,and expectations of control t
25、hat diminish the value of open data in the eyes of data owners.For some sectors,data sharing becomes an ethical imperative(e.g.,in healthcare),while others may be incentivized by the value of triangulating with other third party datasets(e.g.,in marketing).5,6 However,when data access involves perso
26、nally identifiable infor mation,abiding by the privacy regulations that protect this data becomes paramount,and opening up data becomes risky.Added to this privacy risk is the fact that data generation and collection has become a key component of the profit model,causing large corporations to build
27、walled gardens around their data and controlling the flow of information.7 The current data market model consists of commercial entities that take ownership of the“data commons.”9The walled garden concept is felt across industries and sectors.For example,in healthcare,data is siloed within different
28、 hospitals and clinics using their own electronic record systems that lack interoperability between the different systems.These silos reduce the value of the data and negatively impact the patient,the clinician,and the researcher.They also cause a lack of standardization,making the data messy,fracta
29、l,and even unusable.8 The European Commission has worked on initiatives and programs for electronic patient file transfer across healthcare providers in different countries,but this interoperability is still nascent and is not the norm across Defining open dataIn this research,we defined open data a
30、s data infrastructure that has the technical and legal requirements in place to make the data freely accessible for universal use,reuse,and redistribution.16 We also explored openness beyond data-centered definitions.This included dimensions of openness in the context of open standards,such as:acces
31、s to,control over,and cost of the development of the artefact;access to,control over,and cost of use;the completeness of the artefact;sharing of the artefact;and collaboration with competing systems.175PATHWAYS TO OPEN DATAmany geographies.9 Similarly,as the energy sector digitally transforms and el
32、ectrifies,sharing the data collected between all the connected devices in a system is a challenge without stand-ardization and interoperability.Without better access to data generated at different points in the system,operators and distributors lack insights needed to study demand and grid health.10
33、Although access to data is not a new problem,the explosion of generative AI tooling has introduced a heightened pressure for data needed to train the modelsand in particular,data that is licensed in a way that makes this kind of use legal.Organizations are turning to their own proprietary data to tr
34、ain their models.As found by Lawson et al(2024),organizations are relying on a portion of their own data to train both their proprietary models and the open source models they are implementing.13 The desire to build proprietary models is strong,as it gives organizations more control over their data.
35、11 However,complete reliance on proprietary data is not sustainable,and organizations are in need of quality training data from other sources to build effective,robust,and unbiased models.10 In this regard,data governance becomes a top priority for open source AI projects,where data workflows are ma
36、naged responsibly with attention to quality and compliance.13 In this next era,where generative AI becomes a key tool across industries,the future of open data becomes paramount.In November 2024,the authors of this report attended the World Open Innovation Conference(WOIC)in Berkeley,CA,and held a s
37、ession asking participants:What are the pathways to open and accessible data?Focusing on the data ecosystems obstacles,needs,and opportunities,we asked participants to discuss the following questions:What are some challenges you face in access to and use of data?How does your organization or project
38、 rely on data to innovate?How does your organization make its data open to internal and external access?How do you access relevant third-party data?How have you incorporated technology to address your data needs?What solutions outside of the technological realm have been implemented at your organiza
39、tion(e.g.cultural or policy change)?How do you believe we can make data more open?What is needed,given your experience?The following report is structured around a thematic analysis of this 75-minute session.Under The Chatham House Rule,participants are free to use the information received,but none o
40、f the participants from the session may be identified.Session participants included academics and practitioners from a variety of industry sectors with expertise in the area of Open Innovation.They shared relevant insights about barriers and opportunities for open data,based both on their theoretica
41、l expertise and their practical experience of working with open and closed data in various settings.6PATHWAYS TO OPEN DATAWhy open data mattersFor the audience of entrepreneurs,academics,and innovators,access to data is a crucial part of business intelligence and inno-vation.According to one partici
42、pant,the analysis of publicly-available data about investments is an important factor of business intelligence for their clients.Another participant discussed the value of individual-level internal employee data,and their clients desire for transparency of this level of data.They commented,“the team
43、 leader can look at the data and say,look,we can do that.Its empowerment.”Being able to drive certain outcomes using public and internal data makes the case for data openness and accessibility.As studied by Ambiel(2024),triangulating third party data with internal proprietary data“is essential to tr
44、ain large AI models,validate research,or discover market opportunities.”7 Session participants noted that the current state of data access does not necessarily allow for these opportunities and points of validation.For example,as one participant explained,“We are finding it hard to find out who is r
45、esearching what.”Similarly,another participant expressed the lack of collaboration among academics on one dataset:“If you get your hands by any chance on a good data source,its a gold mine.Then,you cannot possibly analyze all facets of it,and you dont have the bandwidth to understand what this could
46、 mean for another sciencefor example,maybe its good for engineering,maybe its good for social science,maybe chemistrybut how would I even know?”This represents a missed opportunity,where that dataset may be useful to other research teams but is not discoverable by those groups.These missed opportuni
47、ties mean unfulfilled innovation.One parti-cipant in the sports industry commented,“Sports data wants to be freewhere the teams compete is to find a commercially valuable product that is usually a value add on top of the data.In other words,predictive analytics.”Sharing data,even among competitors,c
48、an help an organization innovate faster and build a product on top of that shared data.This collaboration produces big data that allows analysts to abstract away from the individuals who represent the datapoints,reducing privacy concerns.One participant gave the example of FootFall data,where an ind
49、ividuals location data can be very personal on its own,but when aggregated with other datapoints to demonstrate how many people show up at a location,“it becomes just the general histogram.And so you can abstract that data into something less personal.”As another participant stated,“If you zoom in o
50、n the micro level of big data,its worthless anyway.Its only the trends and the analysis on top it only becomes useful the moment that its big enough.”Knowing that the bigger the dataset the more valuable it is,participants argued that this should be an incentive to contribute datato make a more vali
51、d dataset for all.Beyond the analytical value of a shared dataset,this activity also increases trust.One participant gave an example of a consortium around a data sharing and analytics platform and how“theres an implication of trust by the participantsthat is rather compelling,when youre talking abo
52、ut building a partnership.”This social contract of collaboration allows for shared activities that are dependent on trust,as described by another participant,“Maybe I cannot see the data myselfbut I can rely on my partner to do that for me.And I have to trust them,but I trust them because were part
53、of the same group.”The act of opening up data leads to new avenues for innovation,increased effectiveness and reliability of datasets,and greater team trust.7PATHWAYS TO OPEN DATACurrent challenges of open data As discussed above,open data platforms are hampered by a myriad of technical,regulatory,e
54、conomic,and cultural challenges.In the first half of the challenge session,we introduced some of these barriers and asked participants to reflect on their own experiences confronting them in their work.Uniqueness of dataWhen considering the challenges faced by open data,it is important to consider t
55、he characteristics of data that make it unique as compared to other content,such as software.In his blog post,Marc Prioleau,the executive director of Overture Maps Foundation,lists six characteristics that make open data different from open code:The proprietary origins of data;The patchwork of data
56、licenses to navigate;The scale and cost of collecting,hosting,and maintaining data;The workflows required for the ongoing production of data;Assuring accuracy and quality of data,and;Protecting personally identifiable information.12 Bennet et al(2024)also point out the unique challenges faced by dat
57、a-intensive applicationsin their example,AI applicationsin particular the potential violations of consent and managing the different open licenses of datasets.13 Participants picked up on these and other challenges during the challenge session.The cost/quality tradeoff One theme that came up a numbe
58、r of times during the session was the idea that there exists a tradeoff between cost and quality.Some participants expressed that they pay for data because they consider it better quality than open data:“I would pay for private databecause its better data,curated and so on,”one participant said.Howe
59、ver,this is an expensive option that is“not sustainable in our business model,”and so they use a mixture of free and paid sources.This tradeoff was expressed by another participant,explaining that they bolster their free data with paid sources,but“if I had unlimited money,I would pay for private dat
60、abecause its better data,curated and so on.”Reflecting on the cost of data,participants brought up the expense of curating dataand the lack of incentive to do so without charging for access.Without an economic model,open datasets rely on volunteer contributions that are considered unreliable and tha
61、t are not standardized.As one participant reflected,“It would be nice if the people creating the data did it in a standard way,right?But the problem is,they dont really have much benefit and unless they have a benefit to doing it,theyre going to say,Why should I do that?The person who actually has t
62、o do it either doesnt have an incentive or theyre not forced to do it.”Another complication in maintaining a quality dataset is data mutability.The artifacts that data are collected on can change,which make datasets less reliable.One participant gave the example of mapping data:“The hard part about
63、maps is,they reflect the physical world,and the physical world changes,so the map data has to change.”The speed at which these artifacts can change in some sectors creates the need for a continuous feedback loop to make sure the data reflects reality.Without continuous updating and maintenance of th
64、e dataset,concerns arise around the quality of the data.This includes whether or not the data is up to date,and how well the data represents diff-erent populations and geographies.A participant gave an example from their work,explaining how they want to provide their clients access to worldwide info
65、rmation,but this is rarely the case:“Worldwide means North America,maybe Brazil if we have data from Brazil,8PATHWAYS TO OPEN DATACentral and Western Europe,but maybe not Eastern Europe.So it comes with all sorts of different geographical limitations.”Because of these limitations,skepticism arises a
66、round claims of a dataset being up to date and complete.This puts pressure on those relying on open datasets,as they become a“source of truth”to their customers,despite third-party data streams being outside of their control.The labor-intensity of creating open datasetsAs described above,data curati
67、on is expensive.This is in large part due to the labor required to manage the different workflows of data,from collection to maintenance to quality control.As one participant plainly stated,“Im currently doing a lot of data cleaning.I think theres no shortcut for that.Theres no workaround,its part o
68、f research,of course.”However,there are also important resource considerations for integrating third-party open datasets within an organizations intelligence.Participants explained the work that goes into conducting quality control on their use of open datasets:“I cannot even describe to you how dif
69、ficult it is to find up-to-date informationyou always end up having to call the person,and that is labor intensive,it doesnt workThese databases are a good starting point but they are rarely the single source of the truth or the end point of the search.”Because these datasets are often incomplete or
70、 are potentially unreliable,a decision must be made on how to use them.As one participant explained,“I need to import the data,good or bad,and then I need to do some manual work.And then the challenge is,do I put in the work,or do I just leave it one-quarter filled out and the rest is not filled out
71、 because I dont want to google it myselfand what this leads to,which is the ultimate challenge,is an incomplete data set,and then its unusable.”Of course,once the data is cleaned,there is still significant human input involved in subsequent activities.“It can last years,”one participant stated.“Mayb
72、e it should not just be,how many hours do you put into cleaning,but how many hours you put into analyzing,researching,publishing,reviewing.”StandardizationOne important aspect of cleaning data is its standardization.If a dataset is not standardized,this impacts the reliability of the data and its us
73、efulness in comparison with other data,as expressed by one participant:“If everyone can write whatever,you will never get standardized data,which can give us the overall picturewhat are the competencies of that department or that group of researchers and stuff like this.So then,we would have a false
74、 positive that we found the right person for that problemSo this is the main problem for us right now.”The process of standardization is more complex for some industries than others,impacting the ability for stakeholders to read and use third-party data:“For engineering data,you have a number and yo
75、u have a unit,and thats it,and maybe a timestamp that comes with it,”explained a participant.“But to interpret healthcare data,you also need the methods used,the conditions where it was measured,when it was measured,and so on.”This makes cross-industry data sharing even more complex.Although standar
76、dization came up as a significant concern,some expressed that it may be less of a problem in the future,in part because of AI.As one participant stated,“I would expect that standardization becomes easier over time,on two aspects.First of all,theresmore and more data,which is by default being annotat
77、ed and more structured than it used to be,lets say,ten years ago Secondly,we see more and more algorithms capable of doing something structured with unstructured data.So for me,from a technology point of view,Im very optimistic that that problem will solve itself.”The benefits of transparency around
78、 the need for standardization,and the tools available to make unstructured data more usable,may potentially diminish the impact that non-standardization has on data sharing practices.Data privacyBeyond the quality of the data,protecting data for privacy and business reasons was considered another si
79、gnificant barrier to open data.The potential to expose sensitive data,such as personally iden-9PATHWAYS TO OPEN DATAtifiable information(PII),made some participants hesitant to open their data:“We have to develop ourselves,because our data are private,so we cannot use open source,open data.So we hav
80、e all our AI and advanced analytics groups who develop our own on-prem tools to monitor everything.”Concerns mainly came from compliance with regulations,such as regulations to only use on-prem storage and not cloud storage,as well as complying across different borders:“governance mechanisms and pol
81、icies are quite sensitive when it comes to how open you can make the data,because it really depends on each country,right?”Participants also expressed concern about AI models,where“each country has their own AI act or not,and that makes it quite difficult when the databecause data is crossing border
82、sbecomes global.”The complexity around which regulations will apply to the activity,particularly when considering activities that flow across borders,cause understandable hesitation.The General Data Protection Regulation(GDPR)became the archetype for a number of participants when discussing the impa
83、cts of regulation on data sharing.One participant shared an example from work they had completed for a client,where“the company wanted to map how the workers in the production process move,in order to avoid risk for them.But then it came out that there was a GDPR problem because they were collecting
84、 personal data.”This hampered the participants ability to provide meaningful results,and interestingly,was in opposition to what the client and the workers wanted.Despite GDPR regulations prohibiting the collection and sharing of datasets comprising fewer than five people,“people actually would like
85、 to do it regardless,because for them,its a great tool to work with their data and to see the datafrom that perspective,the employees actually kind of fight with us against the Workers Council,because they want their data to be seen.”According to one participant,GDPRs negative effect on data sharing
86、 is not uncommon,where“instead of using GDPR for its intended purpose,they just have made everybody scared for what it could do.The first five years,GDPR was only used to kill stuff,whereas,in fact,GDPR allows tons of stuff.Theres no issue.But most people who dont know what you can do with it,they j
87、ust go to this very safe side and say,you cant do anything anymore.”This creates asymmetric risk,as referred to by another participant,where if a lawyer approves a certain data sharing activity and theyre wrong,they risk losing their job,and so it becomes easier to deny where any uncertainty exists.
88、“When you ask,can I share this data?That eventually goes to someone who is incentivized to tell you noSo,you have to solve the asymmetric risk.”This creates strong resistance to opening up a dataset.Control over dataA need for control,from a regulatory as well as business perspective,explained the r
89、esistance to open data.This manifested as internal scrutiny,where one participant explained how their“IT people dont want to provide open access to data.”As discussed above,an organizations legal team can also impede data access:“Data sharing or data receiving is stopped by someone sayingIts almost
90、like somebodys saying,Listen,we have to ask Mr.X,and this guy,with all respect,is kind of a legal guideEven if we needed to open source code,whatever,open source software which is available,we need to use the IT department to say no Im not allowed to.So sometimes you get into conflict with the regul
91、ation.”Finally,another form of internal control came from one participant who explained,“instead of relying on a third partys blood pressure measurements,its way more safe for me to just redo the measurement,because then I have everything under control.”The safer and easier alternative is for organi
92、zations to keep their data closed.Beyond the privacy and quality concerns that keep data closed,participants also expressed the possibility of losing a competitive advantage by sharing data.One academic participant gave an example in the context of publishing a paper as well as the dataset they coll
93、ected,where they ask themselves,“should we publish the dataset beforehand for other researchers?Its a question of strategy,and also,to be honest,being afraid of other researchers being much faster than us in using the data set afterwards in the 10PATHWAYS TO OPEN DATAsame research topic we are in.”T
94、here was a fear that opening up an organizations data to the public would diminish its potential.This leads to the question:Is some data not meant to be open?One participant expressed this sentiment,arguing:“Of course,not all data lends itself to being open.”As discussed by Ambiel(2024),for some ent
95、erprises and organizations in industries such as financial services and healthcare,their data is too valuable or sensitive to introduce new risks,and as a result must limit the distribution and use of their data.7 However,one participant made the case that there is some nuance to consider:“Is all yo
96、ur data highly proprietary?What is the data that actually is proprietary to you and your competitive advantage,and what data actually is not that proprietary?”It is important to consider what data that makes up your competitive advantage,or is too sensitive,compared to the data that would be more va
97、luable when shared with others.11PATHWAYS TO OPEN DATAOpen data successes&opportunities The second half of the challenge session focused on the future of open data,and participants discussed case studies and ideas for the open data ecosystem.One participant shared an example of a fundraising platfor
98、m where collaborators share best practices and outcomes from working with startups.In this system,“the only open data is all the characteristics of the startup,so who they worked with,and they had some data about the timeline,when the experiment took place,the scaleSo you have to contribute enough d
99、ata to say we did something or we didnt achieve something,but not enough for you to give away,for example,the donors names.So,semi-open data.”Building this pre-competitive layer provides an opportunity for different players in the ecosystem to share non-proprietary data in a way that benefits all gr
100、oups,without reducing the advantage of the individual organization.From an academic perspective,the concern around openly publishing datasets led to discussion around potential crediting and licensing strategies.One parti-cipant asked the group,“Youd like to make the dataset available before publica
101、tion,but you fear that others might publish before youso what if,just by collecting the data,you get the credit anyway?They are doing the research faster,but still,you get the creditlicensing the data.This is my data,if you use it for research,I get credit for that.”This audit trail of data collecti
102、on and use through licensing presents a potential solution to competitive advantage fears,in particular in an academic context.Overture Maps Foundation is transforming the mapping industry by creating reliable,interoperable open map data that is freely accessible for use in any map product.Through s
103、trategic collaboration,member organizations develop standardized schemas and datasets,combining data from community,government,and corporate sources.Overture ensures data quality through rigorous validation and standardization,ensuring its suitability for commercial applications while maintaining th
104、e benefits of open data.Overture addresses a fundamental industry challenge:the increasing cost and complexity of processing and conflating geospatial data,which often exceeds licensing costs.By building shared infrastructure and standardized data pipelines,Overture eliminates redundant efforts acro
105、ss organizations.A key innovation is the Global Entity Reference System(GERS),which provides stable,unique identifiers to map features globally.GERS is distinctive for being global,open,and entity-based,enabling organizations to link external data directly to the base map and ensure interoperability
106、 across applications.This collaborative approach enables organizations to focus on value-added services while leveraging a standardized,continuously improving base layer that accelerates innovation across the industry.Overture has made significant strides since it was founded in December 2022,with i
107、ts data powering applications used by hundreds of millions of consumers through platforms like Facebook,Instagram,Bing/Azure maps,and Esris ArcGIS Living Atlas.As of 2024,Overture has released production-ready datasets covering 2.3 billion building footprints,54 million points of interest,divisions,
108、and contextual layers including land and water data.The transportation dataset maps 86 million kilometers of roads worldwide,including detailed traffic rules and restrictions.From its four founding members,Overture has expanded to over 37 organizations across diverse sectors,establishing itself as a
109、n open foundational layer for the entire mapping ecosystem.14 CASE STUDY:OVERTURE MAPS FOUNDATIONNext steps:What is needed for open data structures The discussion on open data case studies and opportunities led the analysis to some next steps to build more open data systems.Similar to what was inves
110、tigated by Majer(2024),the entire walled garden approach needs to be dismantled with new governance mechanisms,decentralization,collaboration,and open source.9 Analysis of the discussion revealed three important themes to help reshape the data sharing landscape and shift the ecosystem toward greater
111、 openness.First,data ownership,as a significant public concern,is a useful avenue to rethink current data collection and sharing practices.One participant referred to the current power and control dynamics as“asymmetric,”stating,“we come from an era where data was mainly gathered and used for tons o
112、f money,generating a business model,without me as a user ever getting feedback or ever getting a refundAnd this is the reason why we kind of over-regulated a number of things.”From their perspective,this asymmetric power dynamicand the regulations that attempt to counter itcould be addressed through
113、 reconfiguring usage rights.This could look like,“I give you my data,or I dont give you my data.Or,I give you my data,but I only allow you to do certain things with itIll give you my data,but you cant use it for advertising or anything.You can only use it to cure cancer or something like that.”These
114、 usage rights increase visibility and transparency of the use of data,reduce fears around data openness and privacy,and create an environment where sharing becomes more important than protecting data.Participants shared technological ideas as a way to establish usage rights in practice.For example,o
115、ne participant discussed the idea of guaranteeing that the data will be used a particular way:“how are you addressing that?Just thinking about,you know,blockchain or something,I think technology can play a role here in giving the assurance to the people who are willing to share their data under cert
116、ain conditions or for certain purposes.”Another participant agreed,saying,“you could add a layer of smart contract,or something with a blockchain.”Beyond blockchain,another participant suggested the Solid standard as a potential way to regain control over ones own data:“For me,thats going to be a hu
117、ge change Because I as a user will be able to switch on or switch off sharing at my volition,so the moment you as a company are no longer doing what I like,I just turn it down,which is something even today is not possible.”Data ownership should also be managed through licenses such as the Community
118、Data License Agreement(CDLA)which provides the legal framework to share data.Their latest release,CDLA 2.0,outlines the terms under which the data can be used,modified,and shared,protecting the owner of the data while allowing widespread sharing and use.When building a governance structure and colla
119、boration model around an open dataset,this kind of license provides structure around dataset activities that increases confidence and streamlines the data sharing process.15Second,incentivizing collaboration around a pre-competitive dataset could address some data sharing challenges and support a cu
120、ltural shift toward greater openness.As described above,participants understood that there is a layer of dataor an abstraction of datathat becomes useful for collaboration purposes with others in the industry.Building a value proposition to contribute data to a collaborative dataset is key for this
121、to happen.As one participant noted,“Very often the incentive to build the dataset in a certain way is dependent on one party,but benefits another partyThe question is,how do you get the people who have the data to give that to you?”The incentive for collaboration then becomes the positive externalit
122、ies that take place when data is shared.For some,building a dataset with others means adding datapoints that actually make the dataset workable:“In those early phases,sometimes it makes sense to share data,just because theres not enough data.So that could be an incentive.Theres not 13PATHWAYS TO OPE
123、N DATAenough.If I only have 100 observations,and maybe someone else has 200 and I find 100 more somewhere,maybe that makes for a more valid data set for all of us.”This contribution makes the shared dataset more valuable,and leads to greater potential for innovation.Altruism is another key incentive
124、 to consider,as hinted at in the discussion of user rights.This is particularly clear in the health sector,where sharing data to save lives is a strong incentive for most.However,encouraging altruistic behaviour is dependent on the way that the value proposition is stated,as one participant expresse
125、d:“If we ask the questioncan we use your data for drug development,where then the net result is Pfizer,or X company a number of studies have come to the conclusion people do not want to share.But if you make the value proposition different,more like:with your blood samples,we will be able to find th
126、e treatment for cancer,and put it open,and you kind of guarantee that it will not go to just one player on its ownall of a sudden,a lot of people become quite okay with sharing all their medical data.”This is based on trust in the data request,and that the entity will be using it the way they say th
127、ey will:“Im going to share itbecause Im helping this or thatyoure hoping that ends up somewhere,and you trust that party to do the right thing with it.”Altruism is a key behavioral mechanism that can help incentivize greater collaboration around contributing to an open dataset.Third,a stumbling bloc
128、k to building open datasets is outlining the right kind of governance structure that balances a culture of collaboration and neutrality while still managing for checks and balances.Current perspectives of open datasets,at least from some members of the challenge session,is that there is no ownership
129、,which impacts the quality of the data.As one mentioned,“No one technically owns the data,so theres no incentive to keep it updated.”When considering how to solve for this,one participant suggested,“what kind of government mechanism or policy would be needed that this could happen,right?What would a
130、n open database need?Trust,and hierarchy in some way.”According to another participant,encouraging a hierarchy would mean that one entity hosts it,pays for the servers,and manages access and security:“Who will pay for all the servers?And if I want to access it,I need a user name,and who will take ca
131、re of that?Security?You want logs of who accessed it when.Someone has to take care of that.Whose IT office should do that?Inevitably you need a hierarchy.”Considering new forms of governance that support incentives to share data and bring the individual back into the process has the potential to tra
132、nsform the data landscape and encourage greater publishing of data for the benefit of all.14PATHWAYS TO OPEN DATAConclusionThe WOIC Challenge session revealed insights into how academics and practitioners are considering the tradeoffs between open and closed data and identified some realistic concer
133、ns and expectations for open databases.Through the analysis and reporting of this session,we hope to shed light on the importance of open data and encourage those working with data to consider the ways that they can better collaborate on datasets,incentivize sharing,and reshape the culture of their
134、organization to support greater openness.As policies and cultures shift with new technologies,new governments,and new economic concerns,it is crucial to establish an orientation of openness no matter the headwinds.MethodologyThe findings discussed in this study were developed from transcripts of a 7
135、5-minute session at the 2024 World Open Innovation Conference in Berkeley,California on November 6th.The authors hosted the session,introducing the topic before guiding the group discussion.The discussion was recorded and turned into transcripts using Otter.ai.The first author coded the transcripts,
136、developed themes from patterns in the codes,and wrote the report using secondary literature to bolster the findings.The report underwent peer review by the second author and other stakeholders before production.15PATHWAYS TO OPEN DATAEndnotes1 Attard,Judie,Fabrizio Orlandi,Simon Scerri,et al.“A syst
137、ematic review of open government data initiatives.”Government Information Quarterly,no.32(October 2015):399-418.https:/doi.org/10.1016/j.giq.2015.07.0062 Braunschweig,Katrin,Julian Eberius,Maik Thiele and Wolfgang Lehner.“The State of Open Data:Limits of Current Open Data Platforms.”(2012).https:/ap
138、i.semanticscholar.org/CorpusID:172983593 Zuiderwijk,Anneke and Marijn Janssen.“Open data policies,their implementation and impact:A framework for comparison.”Government Information Quarterly,no.31(January 2014):17-29.https:/doi.org/10.1016/j.giq.2013.04.0034“Data.gov Home.”Data.gov,accessed February
139、 14,2025.https:/data.gov/5 Ambiel,Suzanne.“The Case for Confidential Computing:Delivering Business Value Through Protected,Confidential Data Processing.”The Linux Foundation.July 2024.https:/www.linuxfoundation.org/research/confidential-computing-use-case-study6 Gaba,Jeanne Fabiola,Maximilian Sieber
140、t,Alain Dupuy,et al.“Funders data-sharing policies in therapeutic research:A survey of commercial and non-commercial funders.”PLoS ONE,15(8).https:/doi.org/10.1371/journal.pone.02374647 Majer,Alan.“Decentralization and AI:The Building Blocks of a Resilient and Open Digital Future.”The Linux Foundati
141、on.November 2024.https:/www.linuxfoundation.org/research/decentralized-internet8 Hermansen,Anna.“An Open Architecture for Health Data Interoperability:How Open Source Can Help the Healthcare Sector Overcome the Information Dark Ages.”The Linux Foundation.October 2024.https:/www.linuxfoundation.org/r
142、esearch/health-data-interoperability9“Exchange of electronic health records across the EU.”European Commission,accessed February 25,2025.https:/digital-strategy.ec.europa.eu/en/policies/electronic-health-records10 Dover,Mike.“Open Source and Energy Interoperability:Opportunities for Energy Stakehold
143、ers in Canada.”The Linux Foundation.August 2024.https:/www.linuxfoundation.org/research/canadian-energy-interoperability11 Lawson,Adrienn,Stephen Hendrick,Nancy Rausch,et al.“Shaping the Future of Generative AI:The Impact of Open Source Innovation.”The Linux Foundation.November 2024.https:/www.linux
144、foundation.org/research/gen-ai-202412 Prioleau,Marc.“The Unique Challenges of Open Data Projects:Lessons From Overture Maps Foundation.”The Linux Foundation.January 13,2025.https:/www.linuxfoundation.org/blog/the-unique-challenges-of-open-data-projects-lessons-from-overture-maps-foundation13 Bennet,
145、Karen,Gopi Krishnan Rajbahadur,Arthit Suriyawongkul,et al.“Implementing AI Bill of Materials(AI BOM)with SPDX 3.0:A Comprehensive Guide to Creative AI and Dataset Bill of Materials.”October 2024.https:/www.linuxfoundation.org/research/ai-bom14“Overture provides free and open map data.”Overture Maps
146、Foundation,accessed February 14,2025.https:/overturemaps.org/?utm_source=LF&utm_id=opendatareport15“Open Data Sharing.”Community Data License Agreement,accessed February 28,2025.https:/cdla.dev/16“What is open?”Open Knowledge Foundation,accessed February 25,2025.https:/okfn.org/en/library/what-is-op
147、en/17 West,Joel.“The economic realities of open standards:black,white,and many shades of gray.”In:Greenstein S,Stango V,eds.Standards and Public Policy.Cambridge University Press;2006:87-122.Acknowledgments The authors would like to thank the organizers of the World Open Innovation Conference for ho
148、sting a seamless and immersive conference and for incorporating this challenge session into the agenda.The session participants were a diverse and highly engaged group that brought relevant,personal,and constructive insights which make up the foundation for this report.Thanks to Hilary Carter and He
149、nry Chesbrough for their keen review of the manuscript and to the Linux Foundation Creative Services team and Christina Oliviero for producing this report and managing its publication.About the authorsAnna Hermansen is a Researcher and the Ecosystem Manager for Linux Foundation Research where she su
150、pports end-to-end management of the Linux Foundations research projects.She has conducted qualitative and systematic review research in health data infrastructure and the integration of new technologies to better support data sharing in healthcare,and has presented on this research work at conferenc
151、es and working groups.Her interests lie at the intersection of health informatics,precision medicine,and data sharing.She is a generalist with experience in client services,program delivery,project management,and writing for academic,corporate,and web user audiences.Prior to the Linux Foundation,she
152、 worked for two different research programs,the Blockchain Research Institute and BC Cancers Research Institute.She received her Master of Science in Public Health and a Bachelor of Arts in International Relations,both from the University of British Columbia.Paul Wiegmann is an Assistant Professor a
153、t Eindhoven University of Technology(TU/e),where he researches and teaches about standards and standardisation in an innovation context.His work is at the intersection of management and policy,and investigates how various stakeholders in standardisation ecosystems can shape and implement standards t
154、o support innovation and positive societal change.Pauls work has been published in outlets,such as Research Policy,the Academy of Management Annals,Environmental Innovation and Societal Transitions,and a single-authored book.Paul is the president of the European Academy for Standardisation(EURAS),an
155、d has stayed as visiting scholar at the University of California,Davis,Yonsei University,and the Technical University of Berlin.Prior to joining TU/e,Paul received PhD and MSc degrees in Innovation Management from Erasmus University Rotterdam,and a BSc in Management from the University of Warwick in
156、 the UK.17PATHWAYS TO OPEN DATA in 2021,Linux Foundation Research explores the growing scale of open source collaboration,providing insight into emerging technology trends,best practices,and the global impact of open source projects.Through leveraging project databases and networks,and a commitment
157、to best practices in quantitative and qualitative methodologies,Linux Foundation Research is creating the go-to library for open source insights for the benefit of organizations the world over.Copyright 2025 The Linux FoundationThis report is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International Public License.To reference this work,please cite as follows:Anna Hermansen and Paul Wiegmann,“Pathways to Open Data:Findings from the 2024 World Open Innovation Conference Challenge Session,”foreword by Henry Chesbrough,The Linux Foundation,March 2025.