1、Governance Implications of Synthetic Data in the Context of International SecurityA Technology and Security Seminar ReportFEDERICO MANTELLASSIGOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY2AcknowledgementsSupport from UNIDIRs core funders provides the foundation f
2、or all of the Institutes activities.This pub-lication was funded by the European Union as part of UNIDIRs Security and Technology Programme,which is also supported by the Governments of Czechia,France,Germany,Italy,the Netherlands,Norway and Switzerland,and by Microsoft.The author would like to exte
3、nd his sincere thanks to Wenting He for her moderation and organization of the events first panel,as well as Jessica Espinosa Azcarraga for help in organizing the event.Addi-tionally,the author would like to extend his thanks to all the speakers for their participation,as well as Dr.Giacomo Persi Pa
4、oli,Sarah Grand Clment,Wenting He,Calum Inverarity,Dr.Ana Beduschi and Aldo Lamberti for their comments on this report.About UNIDIRThe United Nations Institute for Disarmament Research(UNIDIR)is a voluntarily funded,autonomous institute within the United Nations.Being one of the few policy institute
5、s worldwide focusing on dis-armament,UNIDIR generates knowledge and promotes dialogue and action on disarmament and security.It is based in Geneva and assists the international community in developing the practical,in-novative ideas needed to find solutions to critical security problems.About the Se
6、curity and Technology Programme Contemporary developments in science and technology present new opportunities as well as chal-lenges to international security and disarmament.The UNIDIRs Security and Technology Programme aims to build knowledge and awareness about the international security implicat
7、ions and risks of specific technological innovations and convenes stakeholders to explore ideas and develop new thinking on ways to address them.Note The designations employed and the presentation of material in this publication do not imply the ex-pression of any opinion whatsoever on the part of t
8、he Secretariat of the United Nations concerning the legal status of any country,territory,city or area,or of its authorities,or concerning the delimitation of its frontiers or boundaries.The views expressed in the publication are the sole responsibility of the individ-ual authors.They do not necessa
9、rily represent the views or opinions of the United Nations,UNIDIR,its staff members,or sponsors.CitationMantellassi,Federico.“Governance Implications of Synthetic Data in the Context of International Security:A Technology and Security Seminar Report”.Geneva,Switzerland:UNIDIR,2024.GOVERNANCE IMPLICA
10、TIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY3About the AuthorFederico Mantellassi is a Researcher in the Security and Technology Programme at UNIDIR.His work focuses on the international security implications,risks and opportunities of emerging science,and technology developments
11、 and innovations.Previously,Federico was a Research and Project Officer at the Geneva Centre for Security Policy,conducting research on the intersection between emerging technologies,international security and warfare.He holds a masters degree in Intelligence and Inter-national Security from Kings C
12、ollege London and a bachelors degree in International Studies from the University of Leiden.Federico Mantellassi Researcher,Security and Technology Programme GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY4Acronyms&AbbreviationsAIEUGDPRIEEEISOODIPETArtificial intel
13、ligenceEuropean UnionGeneral Data Protection Regulation Institute of Electrical and Electronics EngineersInternational Organization for StandardizationOpen Data InstitutePrivacy Enhancing Technology GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY5Table of Contents1
14、.Introduction 6 1.1.About the Event 72.Synthetic Data and International Security:Framing the Issue 8 2.1.What is Synthetic Data 8 2.2.Synthetic Data in the Military Domain 93.Governance Challenges and Implications 11 3.1.Synthetic Data and Civilian Data Governance 11 3.2.The Role of Standards 13 3.3
15、.Synthetic Data and the International Governance of Military AI 14 3.3.1.On the novelty of challenges and the applicability of existing frameworks 14 3.3.2.On the importance of a multistakeholder approach 15 3.3.3.On guidelines and context specificity 16 3.3.4.On governance opportunities in the mili
16、tary domain 164.Conclusion 18Annex:Event Agenda and Participants 19GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY61.Introduction1 Hao,Shuang et al.2024.“Synthetic Data in AI:Challenges,Applications and Ethical Implications.”School of Software Engi-neering,Huazhong
17、 University of Science and Technology.https:/arxiv.org/pdf/2401.01629;Lee,Peter.2024.“Synthetic Data and the Future of AI.”Cornell Law Review.Forthcoming.https:/ Synthetic Data for Artificial Intelligence and Autonomous Systems:A Primer.”United Nations Institute for Disarmament Research.https:/unidi
18、r.org/wp-content/uploads/2023/11/UNIDIR_Exploring_Synthetic_Data_for_Artificial_Intelligence_and_Autonomous_Systems_A_Primer.pdf.2 Naughton,Mitchell et al.2023.“Synthetic Data as a Strategy to Resolve Data Privacy and Confidentiality Concerns in the Sport Sciences:Practical Examples and an R Shiny A
19、pplication.”International Journal of Sports Physiology and Performance.Vol 18(10):1213-1218.doi:10.1123/ijspp.2023-0007;Syntheticus.“Synthetic Data 101:What Is It,How It Works and What Its Used For.”Syntheticus.Web.n.d.https:/syntheticus.ai/guide-everything-you-need-to-know-about-synthetic-data#chap
20、ter-8.3 Chahal,Husanjot et al.2020.“Messier than Oil:Assessing Data Advantage in Military AI.”Center for Security and Emerging Technology.https:/cset.georgetown.edu/wp-content/uploads/Messier-than-Oil-Brief-1.pdf.4 Ibid.5 Deng,Harry.2023.“Exploring Synthetic Data for Artificial Intelligence and Auto
21、nomous Systems:A Primer.”United Nations Institute for Disarmament Research.https:/unidir.org/wp-content/uploads/2023/11/UNIDIR_Exploring_Synthetic_Data_for_Artificial_Intelligence_and_Autonomous_Systems_A_Primer.pdf.Data is crucial to the training and develop-ment of artificial intelligence(AI)syste
22、ms.However,three key data-related issues can act as barriers to development and deployment of AI capabilities and systems.First,the develop-ment of AI technologies has at least in part depended on the availability of large datasets to train AI models.Second,data is a resource whose availability,coll
23、ection,cleaning,use and sharing is affected by factors such as col-lection costs,lack of real-world data in certain domains,as well as regulatory,legal and ethical constraints.Third,data quality,representative-ness,and diversity are directly linked to an AI models performance,level of bias,accuracy,
24、and reliability.Synthetic data data that is artificially generated in the digital world with properties that are often derived from an original set of data has been proposed as a solution to address some of these da-ta-related issues,especially for AI model training.1 Indeed,synthetic data can help
25、to address issues such as biases in datasets while also enabling their expansion,creation,diversi-fication,and fine-tuning.Synthetic data is also often referred to as a privacy-enhancing tech-nology(PET),facilitating the use and sharing of sensitive datasets.2 Synthetic data is particu-larly promisi
26、ng for domains such as the military.3 In this sensitive domain,AI-enabled capabili-ties are in increasing demand,but high-qual-ity,diverse datasets are in short supply and the consequences of faulty algorithms are po-tentially serious.Synthetic data could enable the ability to develop advanced AI ca
27、pabilities with less need for troves of real-world data.4 However,synthetic data is not a panacea and has been shown to potentially exacerbate many of the issues it seeks to curtail,sparking gover-nance and regulatory discussions.5Synthetic data exists in a relative grey zone in terms of regulation
28、and governance.Major data governance and AI regulatory frameworks,such as the European Unions AI Act and the General Data Protection Regulation(GDPR),mention synthetic data only in passing,if at all.For some,this entails that synthetic data,as a PET,can be a way around stringent reg-ulatory framewor
29、ks,or a useful compliance GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY7tool.6 Others point to the fact that synthetic data carries with it many of the same risks as real-world data,and can result in similar down-stream effects on AI model accuracy,safety,fairnes
30、s,and representativeness,and thus they insist that new regulatory frameworks and approaches are necessary to avoid gover-nance gaps and blind spots.7 In this light,it is of utmost importance to understand how current governance(civilian and military)and regula-tory frameworks encompass synthetic dat
31、a,whether they are fit for purpose to address potential risks,and if they need to be adjusted.Regulatory and governance gaps are of partic-ular consequence in the context of the fast-ad-vancing adoption of AI-enabled capabilities in the military domain.Understanding the implica-tions of synthetic da
32、ta for emerging military AI 6 Zojer,Alexander.“Synthetic Data:A Key Tool for AI Compliance under the EUs AI Act.”Mostly.AI.30 October 2023.https:/mostly.ai/blog/ai-compliance-with-eu-ai-act-using-synthetic-data.7 Gal,Michal,Lynskey,Orla.2023.“Synthetic Data:Legal Implications of the Data Generation
33、Revolution.”Iowa Law Review.109.Forthcoming.https:/ Deslandes,Norman,Justin.2024.“Real Risks of Fake Data:Synthetic Data,Diversity-Washing and Consent Circumvention.”Proceedings of the 2024 ACM Conference on Fairness,Accountability,and Transparency.https:/doi.org/10.1145/3630106.3659002.governance d
34、iscussions is therefore essential.To explore the governance challenges of synthetic data in the context of international security,UNIDIRs Security and Technology Programme held an event titled Technology and Security Seminar on Synthetic Data:Exploring Governance Implications.This report provides a
35、summary of the key themes and takeaways from discussions at the event.The report is divided into two parts,re-flecting the structure of the event.The first part provides a short overview of the technology and its uses in the military domain.The second part presents the various views,issues,and poten
36、tial challenges to governance presented by synthetic data in the context of international security.1.1.About the EventThe Technology and Security Seminars comprise a series of events organized by UNIDIRs Security and Technology Programme focused on various enabling technologies.The key objectives of
37、 the series are threefold:expose the diplomatic community to a wide range of emerging,critical enabling tech-nologies;alert the diplomatic community to the potential international security implications of the development and use of such technol-ogies;and explore governance possibilities through mult
38、i-stakeholder dialogue and engagement.On 29 October 2024,a Technology and Security Seminar was held on the topic of synthetic data governance.This half-day event consisted of a Technology Breakfast,serving as an introduction to the technology for pol-icymakers,as well as a Multi-Stakeholder Dialogue
39、 on Synthetic Data where experts from industry,international organizations,and academia convened to share a variety of views on the specific governance challenges in the context of international security.The event took place virtually,on the margins of the seven-ty-ninth session of the United Nation
40、s General Assemblys First Committee in 2024.For a full programme of the event,please see the annex to this report.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY82.Synthetic Data and International Security:Framing the IssueTakeaways Advances in the field of generat
41、ive AI and growing AI adoption across sectors have expanded the pervasiveness of synthetic data,increasing the scale and ease with which it can be generated as well as its variety and quality.Synthetic data holds promise and offers potential solutions to data-related challenges(bias,scarcity,quality
42、,representativeness,privacy)in both civilian and military domains.Armed forces are increasingly turning towards synthetic data in the context of their growing adoption of AI-enabled capabilities to train military AI models for identification,targeting systems,opera-tional and tactical planning,as we
43、ll as the development of scenarios and synthetic environments.Despite its benefits,synthetic data can perpetuate existing data-related risks,create new ones,or expand the magnitude of their impacts.2.1.What is Synthetic Data88 This section builds on previous work undertaken by UNIDIRs Security and T
44、echnology Programme on Synthetic Data and International Security.For a detailed,in-depth exploration of what synthetic data is,and of the international security risks and opportunities linked to synthetic data,especially in the context of AI enabled and autonomous military capabilities,see https:/un
45、idir.org/wp-content/uploads/2023/11/UNIDIR_Exploring_Synthetic_Data_for_Artificial_Intelligence_and_Autono-mous_Systems_A_Primer.pdf.9 De Wilde,Philippe et al.2024.“Recommendations on the Use of Synthetic Data to Train AI Models.”United Nations University.https:/collections.unu.edu/eserv/UNU:9480/Us
46、e-of-Synthetic-Data-to-Train-AI-Models.pdf.10 Syntheticus.“Synthetic Data 101:What Is It,How It Horks and What Its Used For.”Syntheticus.Web.n.d.https:/syntheti-cus.ai/guide-everything-you-need-to-know-about-synthetic-data.Synthetic data can be defined as“informa-tion created by computer simulations
47、 or algorithms that reproduce some struc-tural and statistical properties of real-world data.”9 Various generation methods exist for synthetic data,and the resulting datasets can either be fully synthetic(with all data ar-tificially generated),partially synthetic(with a small portion of a real datas
48、et replaced with synthetic data),or hybrid(where real-world and fully synthetic data are blended).10 In short,synthetic data is mostly utilized to comple-ment datasets(and seek to address issues in the data,such as those related to bias or repre-sentativeness),create datasets where none exist,or rem
49、ove personally identifiable infor-mation when sensitivity requires it.Hence,the value of synthetic data lies in its ability to assist with key data issues,namely bias,represen-tativeness,quality,scarcity,and privacy.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY9W
50、hile synthetic data is not a novel concept,and has been used for some time,recent tech-nological advances especially in generative AI have dramatically increased the scale and ease with which it can be produced,the diversity of types of data that can be created,as well as its quality.These advances
51、have lowered the bar of access to synthetic data and vastly expanded the number of individuals and organizations without extensive technical expertise that can now utilize it.In turn,this has increased its pervasiveness,with some as-sessments estimating that 60%of all AI training data will be synthe
52、tic as of 2024.11 The increas-ing popularity of synthetic data is furthermore the result of the ever-growing need for more data for the training of AI models.11 Gartner.“Gartner Identifies Top Trends Shaping the Future of Data Science and Machine Learning.”1 August 2023.https:/ Shumailov,Ilia,et al.
53、2024.“AI Models Collapse When Trained on Recursively Generated Data.”Nature.https:/ Deng,Harry.2023.“Exploring Synthetic Data for Artificial Intelligence and Autonomous Systems:A Primer.”United Nations Institute for Disarmament Research.https:/unidir.org/wp-content/uploads/2023/11/UNIDIR_Exploring_S
54、ynthetic_Data_for_Artificial_Intelligence_and_Autonomous_Systems_A_Primer.pdf.14 Ibid.Synthetic data is however no panacea,and has been shown to potentially perpetuate,and sometimes exacerbate,the problems its use aims to address.Indeed,synthetic data is not inherently private,secure,representa-tive
55、 or unbiased,necessitating much consider-ation and curation to make it so.Furthermore,research has shown that the repetitive training of AI models on synthetic data generated from previous version of themselves can lead to model collapse,whereby a model forgets its underlying data distribution leadi
56、ng to drastic reduction in output quality and accuracy.12 Ad-ditionally,the increased prevalence of synthetic data could expand the risk surface in data-re-lated issues and increase the magnitude of negative impacts.13 2.2.Synthetic Data in the Military DomainSynthetic data is increasingly prevalent
57、 in the military domain,where issues surrounding data scarcity,bias,and sensitivity are particularly acute.14 Like in the civilian sector,the increased use of synthetic data in this domain is linked to armed forces turn towards AI-enabled solutions.In this context,synthetic data is primarily used fo
58、r the training of military AI models for iden-tification,targeting,operational and tactical planning,as well as the development of scenarios and synthetic environments.Principally,synthetic data can help armed forces to fill gaps and increase the quality of their datasets such as creating images of
59、objects from different angles and in different conditions to increase the performance of AI models.Additionally,synthetic data can assist in data management,helping to reduce costs associated with labelling and collection,and accelerating the development of AI products.Furthermore,synthetic data can
60、 be used to create realistic simulations of various military operations,including adversarial attacks.These simulations can enable States to test the effectiveness of their AI systems,develop new strategies and tactics,and prepare for a wider range of potential threats in a controlled and safe envir
61、onment.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY10However,use of synthetic data in the military domain suffers from inherent risks linked with such data use.Indeed,synthetic data,despite aiming to represent reality,can perpet-uate and even reinterpret existin
62、g biases found in the original data it is derived from.That pos-sibility presents a significant risk,particularly in sensitive military contexts where biased decisions can have severe consequences.Moreover,risks of re-identification of individ-uals or sensitive information within datasets persist,po
63、tentially leading to the disclosing of sensitive military data,while data poisoning attacks by malicious actors could skew the learning process of AI systems.1515 Ibid.AI generated,Adobe Stock.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY113.Governance Challenges
64、 and ImplicationsTakeaways The governance landscape for synthetic data is immature in both the civilian and military domains.More work is needed to provide clarity over how existing governance frameworks and regulations apply to synthetic data,and how they might need to be adapted to better cover po
65、ssible gaps.No consensus exists over the need for new and dedicated regulations and frameworks specifi-cally focused on synthetic data.International standards are an important tool in the technology governance toolbox.While no in-ternational standards exist with respect to synthetic data,work is ong
66、oing in their development and will be instrumental in fostering responsible innovation and adoption of the technology.Due to its increased use in the military domain to train AI systems,synthetic data is of high relevance to military AI governance discussions.More work should be undertaken to apply,
67、adapt,or build upon established practices and governance concepts linked to data in the military domain.Synthetic data presents opportunities for the governance of military AI,by potentially enabling greater data-sharing,joint development of AI applications,and common development of guide-lines for
68、responsible synthetic data generation and use,hereby advancing global responsible military AI goals.Governance of synthetic data in the military domain will require more multi-stakeholder engage-ment.This entails cooperation among States,but also with the private sector,which should be closely invol
69、ved in governance discussions and efforts.Fostering trust between governments and industry will be fundamental to this effort.3.1.Synthetic Data and Civilian Data GovernanceBalancing the risks and the opportunities of synthetic data will require an understanding of its governance challenges.While sy
70、nthetic data is not necessarily novel,governance discussions relating to its generation and use are only now emerging both in civilian and military domains.Questions regarding synthetic datas legal status,regulatory needs,and potential governance approaches are embryonic,and the governance landscape
71、 remains immature.No legislation or frame-works specific to synthetic data currently exist.Some regulatory frameworks such as the EU AI Act mention synthetic data in passing,while GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY12select governments have issued guide
72、lines on synthetic data generation.16In the civilian domain,no consensus exists on whether synthetic data challenges data regulatory and governance frameworks,and if so,in which ways.For example,it is noted that synthetic data could challenge the categories of personal/non-personal data,which are th
73、e foundation of data governance regulations and frameworks such as the EUs GDPR.It is argued that regulations such as these are not adequately equipped to address the complexities of synthetic data,which can blur the lines between these cat-egories.Depending on the type of synthetic data fully synth
74、etic,partially synthetic,or hybrid the level of personal information present and the risk of re-identification,and therefore the applicability of data protection laws,can vary considerably.17 This ambiguity creates legal uncertainties for both developers and users of synthetic data.In this respect,t
75、he increasing use of synthetic data may neces-sitate an expansion of the scope of the tradi-tional personal/non-personal data paradigm in data protection regulation.For others,appropriately generated synthetic data is a useful PET,to be used as a tool for compliance with various data regulation 16 P
76、ersonal Data Protection Commission of Singapore.2024.“Privacy Enhancing Technology Proposed Guidance on Synthetic Data Generation.”Personal Data Protection Commission of Singapore.https:/www.pdpc.gov.sg/-/media/files/pdpc/pdf-files/other-guides/proposed-guide-on-synthetic-data-generation.pdf.17 Full
77、y synthetic datasets contain data that is fully generated by an AI model and contains no real-world data.The model identi-fies the statistical proprieties and patterns of a dataset and generates an entirely new one.Partially synthetic data replaces some selected sensitive features of a dataset and r
78、eplaces them with synthetic values,while keeping some real data.Hybrid synthetic data combines real world,and fully synthetic data,pairing random records from a real dataset with a synthetic record.For more detail,please see Syntheticus.“Synthetic Data 101:What Is It,How It Horks and What Its Used F
79、or.”Syntheticus.Web.n.d.https:/syntheticus.ai/guide-everything-you-need-to-know-about-synthetic-data.and IBM.“What is synthetic data.”IBM.n.d.https:/ Gal,Michal,Lynskey,Orla.2023.“Synthetic Data:Legal Implications of the Data Generation Revolution.”Iowa Law Review.109.Forthcoming.https:/ Beduschi,An
80、a.2024.“Synthetic Data Protection:Towards a Paradigm Change in Data Regulation?”Big Data and Society.Vol 11(1).https:/doi.org/10.1177/20539517241231277.frameworks.Additionally,synthetic data could be of use in achieving broader data governance goals,by democratizing access to valuable data while pro
81、tecting privacy,enabling transparent data catalogues and audit trails for account-ability,improving data quality by providing a consistent and controllable data source,and facilitating secure data-sharing on national and international levels.Disagreements persist over whether synthetic data is a use
82、ful tool to be incentivized,or an innovation possibly under-mining legal mechanisms developed to guard against various data-related risks.18What transpires is therefore a lack of legal,and normative,clarity with respect to the process-ing of synthetic data.Hence,some have argued for the need for cle
83、ar guidelines,to ensure transparency,fairness,and accountability in the processing of all types of synthetic data,as well as enhanced clarity and guidelines with regards to what data has been employed in the foundational models used to generate synthetic data.19 Propositions include:transparency:syn
84、thetic data should be clearly labelled as such,and information about its generation process should be available;accountability:means of establishing clear procedures for calling to account those GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY13responsible for gener
85、ation and processing of synthetic data should be developed;and fairness:synthetic data should include some guarantees that it is not being generated and used in ways that bring adverse effects,such as perpetuating biases or creating new ones.Due to a lack of legal clarity,it is possible that synthet
86、ic data falls outside of regulatory oversight while potentially carrying with it some of the same issues that these governance 20 The Institute of Electrical and Electronics Engineers.“Synthetic Data.”The Institute of Electrical and Electronics Engineers.n.d.https:/standards.ieee.org/industry-connec
87、tions/activities/synthetic-data/.21 International Organization for Standardization.“Information Technology Artificial intelligence Overview of Synthetic Data in the Context of AI Systems.”International Organization for Standardization.n.d.https:/www.iso.org/standard/86899.html#lifecycle.22 Simperl,E
88、lena and Thomas Carey-Wilson.“The ODI to Help Develop an Open Metadata Standard for Machine Learning Data.”Open Data Institute.6 March 2024.https:/theodi.org/news-and-events/blog/the-odi-to-help-develop-an-open-meta-data-standard-for-machine-learning-data/.frameworks seek to address in terms of re-a
89、l-world data.Hence,a key governance challenge in the civilian domain will be ensuring that synthetic data is developed and used in a way that,if it falls outside the scope of current data regulations,does not perpetu-ate,or create new harms.Further research and work are therefore required to more cl
90、early spell out synthetic datas legal standing,as well as to identify potential governance gaps in existing regulations and frameworks to offer clarity to de-velopers and users of synthetic data.3.2.The Role of StandardsStandards(both technical and non-tech-nical)are an important aspect of civilian
91、technology governance.In the context of synthetic data,they are both much needed,and an important area of current work.Indeed,no international standard for the generation of synthetic data exists,with no universally agreed upon definitions or benchmarks for evaluating the quality and trustworthiness
92、 of synthetic data.By providing clear definitions,methodologies,and evaluation criteria,standards can create common understand-ings and benchmarks for assessing synthetic data.Standards can reassure organizations that the synthetic data they use meets specific quality and privacy thresholds.Moreover
93、,standards for the labelling and documenta-tion of synthetic data,as well as auditing and mechanisms to track provenance,could form a key building block for ensuring transparency,fairness and accountability in the generation and processing of synthetic data.Work to this effect is beginning on variou
94、s fronts.The Institute of Electrical and Elec-tronics Engineers(IEEE),for example,is leading efforts to develop a global standard and best practices for privacy-safe synthetic data.20 Similar efforts are underway in the In-ternational Organization for Standardization(ISO).21 The Open Data Institute(
95、ODI)has contributed towards the development of a tool named Croissant,a community standard that provides machine-readable metadata for datasets,helping to standardize documenta-tion of machine learning datasets.22 Standards will provide a framework and parameters for responsible innovation,incentivi
96、zing good practices in the private sector for synthetic data generation,use and innovation.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY143.3.Synthetic Data and the International Governance of Military AI23 Afina,Yasmin,Persi Paoli,Giacomo.2024.“Governance of Art
97、ificial Intelligence in the Military Domain:A Multi-Stake-holder Perspective on Priority Areas.”United Nations Institute for Disarmament Research.https:/unidir.org/wp-content/uploads/2024/09/UNIDIR_Governance_of_Artificial_Intelligence_in_the_Military_Domain_A_Multi-stakeholder_Per-spective_on_Prior
98、ity_Areas.pdf.Data is of increasing military importance due to the growing centrality of AI in many aspects of the military domain.In light of this,and because synthetic data is primarily used in the military domain for the training and develop-ment of various AI capabilities,governance dis-cussions
99、 linked to synthetic datas impact on in-ternational security should be discussed within the context of the governance of military AI.Data,and its related issues,have been iden-tified as one of the priority areas of work for responsible AI in the military domain.23 However,these issues have not been
100、central to ongoing governance efforts,remaining largely high-level and lacking granularity.It therefore follows that discussions surrounding synthetic data,its governance,and impor-tantly its potential effect on ongoing military AI governance efforts have themselves been embryonic.Some regional-leve
101、l discussions,although similarly nascent,have taken place,especially in still-digitalizing parts of the world where data gaps present a significant barrier of entry to military AI and where alternatives such as synthetic data can act as surrogate.3.3.1.On the novelty of challenges and the applicabil
102、ity of existing frameworksSynthetic data in the military domain brings about similar challenges,with similar implica-tions,to real data,such as biases(both their per-petuation,exacerbation,or creation),reliability and representation concerns,accountabil-ity,traceability or lack of explainability amo
103、ng others.Therefore,synthetic data should not be excluded or remain an unexplored issue in the context of military AI.Importantly,this could lead to a situation where certain types of data remain outside the scope of governance discussions,while potentially perpetuating risks,hence exacerbating data
104、-related risks in military AI.For example,ensuring data accountability a key tenet of responsibility could be further complicated in the context of synthetic data.Indeed,the use of synthetic data introduces an additional layer of persons,sometimes external actors,responsible for its generation,there
105、by making it harder to trace direct accountabil-ity in case of errors.Relatedly,synthetic data could exacerbate data explainability issues,due to the lack of internationally agreed upon standards on the generation,use,and labelling of synthetic datasets.Lack of clarity over data provenance due to li
106、mited traceabil-ity could then hinder auditing capabilities to address biases in datasets.Moreover,the de-mocratization of data access through synthetic data could be an opportunity for greater access to the development of AI and other digital ca-pabilities.On the one hand,this could help to address
107、 issues surrounding the digital divide.On the other hand,in the context of interna-tional security,it could also act as an enabler of greater proliferation of military AI capabilities by lowering the bar of entry to the development GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNAT
108、IONAL SECURITY15of advanced AI models.24 Yet,rather than rep-resenting novel challenges,these are further complications of pre-existing data challenges in the military domain,which require specific attention and clarification in the context of synthetic data.Governance discussions surrounding novel
109、technologies impacts on international security should therefore ideally first focus on whether and how legal and normative frameworks apply.In the context of synthetic data,it is important to analyze whether these governance challenges are new,whether they only complicate pre-existing data challenge
110、s 24 Maas,Matthijs M.2019.“Innovation-Proof Global Governance for Military Artificial Intelligence?How I Learned to Stop Worrying and Love the Bot.”Journal of International Humanitarian Legal Studies.Vol 10(1).https:/ to what extent existing data governance concepts apply to synthetic data.Hence,as
111、opposed to necessarily designing new gover-nance frameworks or approaches,the interna-tional community should look to established practices and concepts,such as equitability,re-sponsibility,traceability and reliability,and work on applying,adapting,or building upon them for synthetic data.In this ma
112、tter,the international community is not starting from scratch and can leverage an already extensive body of work on what constitutes good data and emerging knowledge on what good data practices in the miliary domain look like.BOX 1.Area of Future Research:International Trade of Synthetic Datasets An
113、 important area needing further research is the potential implications of the international trade in synthetic datasets,and whether and how such trade should in some cases be controlled,monitored,or restricted.Indeed,a market could develop to trade synthetic datasets which could be used by malicious
114、 actors in the development of disruptive AI capabilities.The international community should hence consider how the trade in synthetic datasets interfaces with non-proliferation efforts and arms control.It should explore whether some synthetic datasets,or types of synthetic data,should be controlled
115、through tools such as control lists for export controls.3.3.2.On the importance of a multistakeholder approachGovernance frameworks which will most effec-tively enable the leveraging of the benefits of synthetic data for the military domain are ones which will be multi-stakeholder in nature.This ent
116、ails not only cooperation among States,but close cooperation with private sector actors,who should be involved in such governance discussions.The private sector plays a major role in military AI,being primarily responsible for the research and development of core tech-nologies.This remains true with
117、 synthetic data,where most of the capabilities for generation as well as testing and evaluation of synthetic datasets rest with industry players.This creates additional dependencies on private technology companies,especially for States GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTE
118、RNATIONAL SECURITY16with lesser resources,which may not have in-dependent testing and evaluation capabilities for synthetic data,and which rely on the private sector for the quality of synthetic datasets.This dependency necessitates frameworks for publicprivate partnerships that prioritize building
119、trust between governments and industry.Such trust is crucial to ensure that private sector actors engage in governance discussions and adopt responsible practices in developing,deploying,and testing synthetic data for military AI systems.Furthermore,multi-stakeholder approaches contribute to the muc
120、h-needed creation of common language and understanding of synthetic data.3.3.3.On guidelines and context specificityWhile the development of clear guidelines in the generation and processing of synthetic data should be an aspiration,it has been noted that,especially in a military context,their devel
121、opment might be premature.Indeed,clear guidelines are typically grounded in well-defined best practices.However,in the case of synthetic data,the field might still be too nascent to establish definitive best practices.In this context,standardizing testing procedures or establishing rigid guidelines
122、before a thorough understanding of the tech-nologys capabilities,benefits,limitations,and potential risks has been developed might be counterproductive.Furthermore,clear guidelines might not ac-commodate for highly context-specific nature of assessing the appropriateness and level of responsibility
123、of synthetic data-use in a military context.In fact,a given synthetic dataset might be used responsibly in one 25 Deng,Harry.2023.“Exploring Synthetic Data for Artificial Intelligence and Autonomous Systems:A Primer.”United Nations Institute for Disarmament Research.https:/unidir.org/wp-content/uplo
124、ads/2023/11/UNIDIR_Exploring_Synthetic_Data_for_Artificial_Intelligence_and_Autonomous_Systems_A_Primer.pdf.scenario,while its use in another context could be seen as irresponsible.This context-de-pendent nature of synthetic data-use makes it difficult to develop universally applicable guide-lines t
125、hat effectively address the nuances of different use cases.Additionally,quality metrics are themselves also context dependent.For example,the closeness with which a synthetic dataset rep-resents reality is used as a key indicator of its quality.25 However,in some cases,particu-larly in the military
126、domain,deviation from this is precisely the intent.In other words,using synthetic data to represent unprecedented scenarios to help with creative planning might be one of the advantages unlocked by the use of synthetic data in the context of military operations.Moreover,governance approaches will re
127、quire regional and national contexts to be taken into account.Due to the external dependencies synthetic data can create such as the reliance on external actors for the generation of synthetic datasets and their quality assurance it is par-ticularly important to ensure that synthetic datasets create
128、d outside a given region reflect local realities in the intended context of use.This will require transparency over the parameters and assumptions of synthetic datasets,setting out why,how,what for,and by whom synthetic data is being created.3.3.4.On governance opportunities in the military domainSy
129、nthetic data does not only pose gover-nance challenges,but presents opportuni-ties as well,especially for the governance of military AI.There is indeed the potential GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY17for synthetic data to facilitate data-sharing abil
130、ities between armed forces,as well as to help in the common development of military AI capabilities.For example,synthetic datas privacy-preserving potential could provide op-portunities to share datasets,something that is often desired,but impeded by the sensitiv-ity and classified nature of militar
131、y data.In the context of the military domain,this presents tremendous value within and across govern-ment organizations and nations.Synthetic data could serve as neutral ground for collaborative military AI projects between nations.By using synthetic datasets that mirror real-world scenarios but do
132、not contain sensitive information,States could work together to develop and test AI systems,enhance interoperability,and share best practices without the risks asso-ciated with exchanging real military data.Ad-ditionally,States can collaborate on developing collective synthetic datasets that can be
133、used to train and test AI systems for enhanced in-teroperability a key issue in the development of military AI capabilities.This collaborative approach could foster greater cohesion among allied forces,improve the effectiveness of joint operations,and contribute to a more stable and secure internati
134、onal environment in the context of military AI.Moreover,synthetic data represents an op-portunity for States to develop common norms and guidelines surrounding its gen-eration and use in the military domain.Due to its nascent nature,discussions surround-ing synthetic data present the international c
135、ommunity with an opportunity to develop shared responsibility frameworks.Multiple States could agree on governance princi-ples and begin sharing best practices to move together in a systematic way towards the estab-lishment of good practices for synthetic data generation and use.A multilateral taskf
136、orce on data governance could,for example,jointly address some issues and provide a forum for the development of procedures,processes,and accountability standards.AI generated,Adobe Stock.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY184.ConclusionSynthetic data p
137、resents significant potential for advancing AI capabilities across both civilian and military domains.Its advantages ad-dressing data scarcity,enhancing privacy,and facilitating the creation of more representative and less biased datasets make it a powerful tool.However,synthetic data is no panacea,
138、and its use carries inherent risks.To extract the most benefit from this technology,it is of utmost importance that governance discus-sions begin to consider the issue.These efforts are currently in their infancy in both the civilian and military domains,and legal and normative ambiguities persist f
139、or the generation,process-ing,and use of synthetic data.To avoid a legal and normative vacuum leaving synthetic data risks unaddressed,efforts should be directed at identifying gaps in existing frameworks and providing clarity to users and generators of synthetic data.To this end,guidelines,the de-v
140、elopment of international technical standards,and cooperation with industry will be key.In the military domain,data-related issues remain at the periphery of AI governance dis-cussions.In this context,efforts should be directed not only at securing their place within these efforts,but specifically c
141、onsidering the effects of synthetic data use in the military domain.As synthetic data mostly complicates existing governance challenges,as opposed to creating an entirely novel landscape,this will not necessarily entail the development of novel frameworks or regulations.It could entail a need for th
142、e application of military data best practices and concepts to the generation and use of synthetic data.Hence,more work on the issue is required to extend emerging military AI governance frameworks to synthetic data,clarifying how these practices and concepts can be applied.As synthetic data grows in
143、creasingly prevalent,it brings with it not only governance challenges but also opportunities for collaborative interna-tional efforts which could have positive down-stream effects on global military AI governance.To this end,multi-stakeholder efforts bringing together States,and importantly the priv
144、ate sector,will be instrumental.Looking ahead,synthetic data will not be the last innovation in data science.This under-scores the importance of creating governance frameworks adaptable to future developments.Building such frameworks with flexibility at their core will be essential to their sustaina
145、bility,ensuring that they remain relevant as new tech-nologies and use cases emerge.GOVERNANCE IMPLICATIONS OF SYNTHETIC DATA IN THE CONTEXT OF INTERNATIONAL SECURITY19Annex:Event Agenda and ParticipantsIntroductory RemarksFederico MantellassiResearcher,United Nations Institute for Disarmament Resea
146、rchTechnology Breakfast on Synthetic Data and International SecurityDr.Eleonore Fournier-TombsHead of Anticipatory Action and Innovation,Centre for Policy Research,United Nations UniversityCalum InveraritySenior Researcher,Open Data InstituteModerated byWenting He Associate Researcher,United Nations
147、 Institute for Disarmament ResearchMulti-Stakeholder Dialogue on Synthetic Data:What Opportunities and Challenges for International GovernanceDr.Jane PinelisChief AI Engineer of the Applied Information Sciences Branch at Johns Hopkins Universitys Applied Physics LaboratoryAldo LambertiFounder and CE
148、O,Syntheticus;Subject Matter Expert,European Commis-sion;Vice-Chair,Industry Connection Synthetic Data,IEEE;Working Group Expert,Standard for Security and Trustworthiness Requirements in Genera-tive Pretrained Artificial Intelligence(AI)Models,IEEEYasmin AfinaResearcher,United Nations Institute for
149、Disarmament Research;Expert,Global Commission on Responsible Artificial Intelligence in the Military DomainDr.Ana BeduschiFull Professor of Law with a Personal Chair at the University of Exeter;Director,Research Centre for Science,Culture and the Law,University of Exeter Law SchoolModerated byFederico MantellassiResearcher,United Nations Institute for Disarmament Researchunidir/unidir/un_disarmresearch/unidirgeneva/unidirPalais des Nations 1211 Geneva,Switzerland UNIDIR,2024WWW.UNIDIR.ORG