《斯坦福大學:2025生成式AI在中低收入國家健康領域的應用白皮書(英文版)(52頁).pdf》由會員分享,可在線閱讀,更多相關《斯坦福大學:2025生成式AI在中低收入國家健康領域的應用白皮書(英文版)(52頁).pdf(52頁珍藏版)》請在三個皮匠報告上搜索。
1、1Generative AI for HealthIN LOW&MIDDLE INCOME COUNTRIES2TABLE OF CONTENTSKey definitions Executive summaryIntroduction Scoping analysis Survey results Framework to guide the use of GenAI for LMIC healthcareKey principlesKey risks Key recommendationsCase Studies in Health-Related Behavioral ChangeJac
2、aranda Health:PROMPTsViamo:Ask Viamo Anything(AVA)Girl EffectAudere:Self-Care From AnywhereNoora HealthConclusionsAcknowledgementsResearch TeamCase study teams,interviewees,workshop&roundtable participantsAppendixReferences3481015181821232932353740434648484950513KEY DEFINITIONSLow-and middle-income
3、countries(LMICs):As classified by the World Bank Atlas method using gross national income(GNI)per capita.3Generative artificial intelligence(GenAI):computational techniques capable of generating seemingly new,meaningful content such as text,images,or audio from training data.1Large language model(LL
4、M):A a type of GenAI system trained on large amounts of text data,that understands and generates human-like language.2 Human-in-the-loop(HITL):Cause of human interaction or intervention to control or change the outcome of a process.5Retrieval-Augmented Generation(RAG):A technique for enhancing the a
5、ccuracy of LLM outputs by retrieving relevant information from specific external sources to supplement the models training data.6Tokens:The basic units of text processed by a language model.Depending on tokenization strategy,a token may comprise a phrase,a word,part of a word,or a character.Language
6、 models break down text into tokens to analyze and generate responses.The number of tokens used in a query affects processing time,cost,and the amount of information the model can consider at once.7Application Programming Interface(API):A set of rules and tools that allows different software applica
7、tions to communicate with each other.In the context of AI,an API enables developers to integrate AI capabilitiessuch as text generation,speech recognition,or image analysisinto their own applications without needing to build an AI model from scratch.8Health behaviors or health-related behaviors:Inte
8、ntional or unintentional actions taken by individuals that affect health or mortality.4 Examples include smoking,diet,physical activity,sleep,substance use,risky sexual activities,healthcare seeking behaviors,and adherence to prescribed medical treatments and vaccination programmes.4Generative AI(Ge
9、nAI)has the potential to improve health and healthcare in low-and middle-income countries(LMICs).Where is GenAI currently being used and what are the greatest successes?How can we realize greater impact and unlock the full potential of GenAI,both for behavior change and broader healthcare applicatio
10、ns?To help answer these questions,from August to December 2024 we conducted an extensive review including two roundtable events,in-depth interviews with dozens of people who are actively working on applications of GenAI to health and healthcare in LMICs(including academics,health system leaders,impl
11、ementers,and funders),and a quantitative survey with over 100 respondents.Additionally,we reviewed 14 GenAI accelerator programs for health that have collectively supported over 250 projects worldwide.This white paper has a specific focus on the use of GenAI tools to drive health-related behavior ch
12、ange(HBC).Our scoping analysis,framework and key recommendations are inclusive of a range of health use cases,to contextualise HBC interventions within the wider ecosystem,and facilitate broadly applicable learnings.Here is what we found.Where is GenAI being currently used,and what are the greatest
13、successes?Use cases typically centered around applying large language models(LLMs)to health and healthcare-related tasks related to summarization,classification,extraction,translation,and/or conversation(please see definitions in Table 1).Use cases typically fell in one of three categories:direct-to
14、-consumer,direct-to-provider,or system-level.Some examples include:Direct-to-Consumer Offering personalized counseling on sensitive topics(e.g.,HIV testing,sexual and reproductive health)via conversational LLM-based agents.In some cases,using improved voice capabilities of LLMs to better engage cons
15、umers,especially in low-literacy settings.Direct-to-Provider Providing better support for healthcare worker-to-consumer communication in traditional helpdesk workflows,including triaging and routing incoming questions,providing personalized suggested responses for healthcare workers,and live transla
16、tion between languages.System-Level Generating early-warning alerts for potential emerging pandemics by analyzing large amounts of unstructured data from diverse sources,such as health records,news articles,social media and climate data.EXECUTIVE SUMMARY5We provide quantitative summaries of projects
17、 from the GenAI accelerator programs,encompassing a broad range of health use cases,as well as survey respondents perspectives on priority use cases and health areas,and key factors and barriers for successful implementation of GenAI health interventions.Additionally,we profile five case studies of
18、GenAI deployments with a specific focus on health-related behavior change in LMICs.In terms of scale of deployment,we found many projects in the“pilot phase”,including some that are deployed to over 10,000 monthly users.As of late 2024,we found only one application reaching scale(to 100,000 or more
19、monthly users)of GenAI in health-related behavior change for LMICs,detailed in our included case studies.Several pilots had promising preliminary data on cost-effective impact,including health worker efficiency gains,and all are planning further evaluation in 2025 with a move to greater scaling.Pilo
20、ts conducted as part of a broader scaled system(for example,an existing helpdesk workflow with millions of total users that is now testing integrating GenAI for efficiency improvements)have a more predictable path to fast scaling.Given the nascency of the field,the relatively small scale of existing
21、 projects and limited evaluation data is unsurprising,but highlights the need for sustained focus to realize greater health impact.While this review represents the most comprehensive analysis of GenAI in health-related behavior change to date,it is not exhaustive.Our findings focus on deployments fu
22、nded by major GenAI accelerator programs and insights from expert interviews.While we believe this provides a representative snapshot of the current landscape,we recognize that some applications may not be captured.Additionally,this paper has not directly explored the perspectives of end users.We we
23、lcome input on additional large-scale deployments and user-centered insights to build collective knowledge in this evolving space.To unlock the full potential of generative AI in healthcare for low-and middle-income countries,we must bridge technical innovation with local realities.This means sharin
24、g knowledge,building inclusive infrastructure,and creating systems that learn and evolve with communities.The true measure of success is not just technological advancement,but the lives we improve and the health disparities we reduce through thoughtful,collaborative action.-Fei-Fei Li,PhD,Co-Directo
25、r,Stanford Human-Centered AI Institute(HAI),Professor of Computer Science,Stanford UniversityEXECUTIVE SUMMARY6Stakeholders wanted to learn more from others experiences;this is especially important given how quickly technology and applications are evolving.Specific needs included:(a)understanding of
26、 the types of tasks LLMs are well suited to;their weaknesses;and strategies to address;and(b)summaries of specific successes,with concrete case studies reporting on comparable outcome metrics.Stakeholders wanted better ways of measuring benefits,costs,and risks,in ways that provide rigorous but also
27、 timely data.For example,funders cannot wait 3 years for results of a randomized controlled trial to guide annual investment decisions,but we still need scientifically valid ways to measure success to inform implementation decisions in the interim.Establishing a clear evidence base will also be esse
28、ntial for supporting government decisions to implement successful applications at a national scale.a.Produce practical guidance on how to identify LLM applications while mitigating risks and then pilot/validate/scale them.A regular update process will be required given technical capabilities are cha
29、nging quickly.b.Utilize consistent outcome metrics to describe scale of projects(such as monthly active users,total users and retention of users)and specificity regarding the type of AI system being used(for example deterministic vs.generative)to facilitate meaningful comparisons and benchmarking.c.
30、Establish a regular process to identify successes and disseminate learnings,including case studies.a.Establish standards for measurement and best practice,with concrete examples.b.Identify opportunities for implementer partnership with academics on measurement.Since many projects are in pilot phase(
31、and some are starting to scale),there is a time-sensitive opportunity for accelerative partnerships.HOW CAN WE REALIZE GREATER IMPACT AND UNLOCK THE FULL POTENTIAL OF GENAI?Strategies to address:Strategies to address:Share learningsFocus on actionable measurementEXECUTIVE SUMMARY:FINDINGS7Experts no
32、ted that technical capabilities of GenAI implementers varied dramatically;similarly some funders and health system leaders identified gaps in their own knowledge that,if addressed,would allow them to make more impactful funding and procurement decisions.They also noted that some technical barriers(e
33、.g.,language models)likely would be better addressed centrally vs.in a fragmented way.Throughout our research,the risk of inadvertently perpetuating the digital divide emerged as a key concern:no matter how advanced AI models and datasets become,their potential to effect behavior change is wasted if
34、 the people who need them most cannot access the necessary digital or physical infrastructure(for example,lack of stable internet connection or access to healthcare facilities recommended by GenAI chatbots).a.Identify elements of technical infrastructure that should be shared,and establish ways to c
35、entralize these efforts.b.Provide technical capacity and consulting expertise to health system leaders,funders,and implementers.a.Prioritize investment in basic healthcare infrastructure alongside digital interventions.b.Consider whether GenAI is the highest impact way of addressing your use case,ta
36、king into account existing basic healthcare and digital infrastructure.c.Evaluate an organizations digital readiness before deploying AI tools to avoid avoidable costly failures,and first focus funds on ensuring digital readiness where needed.Strategies to address:Strategies to address:Improve techn
37、ical capacity&shared infrastructureImprove digital&basic health infrastructure7EXECUTIVE SUMMARY:FINDINGSExperts noted that the quality of models varies by language,by medium(with voice particularly important for low-literacy settings)and by use case(e.g.,health-specific contexts).The fact that larg
38、e language models are not trained on or fluent in local languages was the most commonly selected barrier to using GenAI in healthcare settings in LMICs in our quantitative survey.We highlight the importance of identifying and closing gaps in quality as a key next step.a.Establish standardized measur
39、es to evaluate model performance across different languages and specific health contexts to ensure consistent quality.b.Curate high-quality datasets for underserved languages,including region-specific dialects,culturally relevant health information,and voice data for low-literacy populations.Strateg
40、ies to address:Improve language&localization8There has been a rapid rise in the use of GenAI since the launch of Chat Generative Pre-trained Transformer(ChatGPT)in November 2022,with an array of emerging use cases in healthcare already in the implementation phase.GenAI has the potential to improve h
41、ealth and healthcare in LMICs on an unprecedented scale.Yet,we are at an early stage,with an urgent need for cross-sectional collaboration to address barriers to realizing maximal impact.Implementing such technologies in health contexts requires specific considerations and nuances,with the risk of p
42、otential harms,but also potential benefits,amplified,compared to other sectors.This white paper has a particular focus on the use of GenAI tools to drive health-related behavior change(HBC),with our selected case studies illustrating use cases within HBC.Well-designed HBC interventions can empower i
43、ndividuals to make informed decisions to improve health.Equally,ill-considered interventions risk exacerbating existing disparities stemming from lack of supporting infrastructure.Our scoping analysis,framework and key recommendations are inclusive of a range of health use cases,not limited to behav
44、ior change,to contextualise the HBC interventions within the wider ecosystem,and facilitate broadly applicable learnings.Alongside five case studies of HBC applications,we present our learnings from an analysis of key GenAI accelerator programs;a quantitative survey with 145 respondents;two roundtab
45、le events;and 24 in-depth qualitative interviews with experts in Generative AI and digital health,encompassing perspectives from academia,funding bodies,implementers and health system leaders.LLMs have been shown to perform significantly better than previous AI approaches in specific task areas.Tech
46、nologies are advancing rapidly,but some currently validated task domains with healthcare-specific examples are outlined in Table 1.The most successful LLM implementations require identifying domain-specific use cases that map well onto these potential strengths.Table 1:Key LLM Task Domains STRENGTHS
47、 OF LARGE LANGUAGE MODELS“The big question mark that remains is,are the risks associated with using a GenAI based tool outweighed by the benefits of what you can now achieve?”-Bilal Mateen,Chief AI Officer,PATH“The ability of Gen AI to be much more nuanced and talk much more directly to the users sp
48、ecific question is really exciting I think thats going to be a real step change”-Isabelle Amazon-Brown,The MERL Tech Initiative“One of the other opportunities for Gen AI and health behavior change is that AI doesnt get frustrated or tired it will keep having that conversation with that person and an
49、swer all their questions,and it will never act as if its getting bored of the conversation.”-Shawna Cooper,Principal Product Manager,Audere.INTRODUCTIONLLM TaskSummarizationClassificationExtractionTranslationConversationDefinitionHealthcare-specific exampleCategorizing incoming patient messages in a
50、n online healthcare portal into categories such as medical versus administrative queries,to facilitate more efficient handling of queries.Assigning labels or categories to contentSummarizing long medical guidelines into succinct summaries for immediate in-clinic use.Condensing content into shorter s
51、ummariesIdentifying and extracting salient data points such as diagnoses,medications,and test results from patients medical records.Identifying and retrieving information from a larger body of contentA health chatbot providing real-time,personalized responses to user questions.Engaging in dynamic,co
52、ntext-aware exchangesRewriting content to match different tones or styles,e.g.transforming clinical documents into patient-facing material.Converting content from one form to another,across languages,formats,or styles9While this review provides what we believe to be the most comprehensive analysis o
53、f GenAI applications in health-related behavior change to date,it is important to acknowledge its limitations.Our findings are based primarily on deployments funded by GenAI accelerator programs and insights gathered through snowball sampling interviews.As a result,while we believe this report is re
54、presentative of the current landscape in global public health,it is possible that some deploymentsparticularly those outside of these funding networkshave not been captured.Additionally,this paper did not engage directly with end users,meaning that their perspectives on the usability,impact,and chal
55、lenges of these interventions are not reflected.Further,its findings may not fully reflect commercial use cases,since its primary focus is grantmaking in global health.We envisage this report as a starting point rather than a definitive catalogue of successful applications,and we encourage stakehold
56、ers who are aware of additional large-scale or high-impact deployments,as well as those who can contribute user-centered perspectives,to share their insights.Continued collaboration and knowledge-sharing will be essential in tracking the evolution of GenAI deployment for health behavior change.KEY L
57、IMITATIONS OF THIS REPORT10Medical research support 1.8%Communicable diseases16.1%Non-communicable diseases(NCDs)10.9%Injury-related conditions1.4%Maternal,newborn&child health 14.4%Sexual&reproductive health10.9%Mental health10.2%Environmental health2.5%Health systems strengthening(HSS)31.9%SCOPING
58、 ANALYSISHow widespread are GenAI deployments for health currently?Our scoping analysis encompasses 285 grants from 14 accelerator programs sponsored by 10 funding organizations,funding 279 projects(please see Appendix for the list of included programs).These projects cover a diverse spectrum of hea
59、lth areas and use cases in LMICs across the globe.Key inclusion criteria:Accelerator funds GenAI or LLM projects in health for LMICs Accelerators whose projects had broader scope(e.g.AI outside of GenAI/LLMs,or topics beyond health)were included as long as they had at least 1 project that utilized G
60、enAI or an LLM for health in an LMICs.Accelerator programs with sufficient public data available to enable classification were included.Although OpenAI did not have public data available,we worked closely with their team to secure relevant project details given their very substantial contributions t
61、o the current funding landscape.Projects which were inclusive of both high and low-and middle-income settings were included.Classification of projects:By reviewing the established global health literature and piloting 50 projects from Gates Global Grand Challenges,we developed a taxonomy to categori
62、ze projects by use case and health areas described below.Although in some instances projects encompassed multiple different use cases or health areas,the project was classified according to its primary focus.Use cases typically fell into one of three categories:direct-to-provider,direct-to-consumer,
63、or system-level.Although direct-to-provider interventions will always involve human review,the degree to which human-in-the-loop safeguards are implemented for direct-to-provider and system-level interventions varies depending on the use case.Despite the lack of current clear regulatory guidance,mos
64、t of the use cases we reviewed currently incorporate a high level of human supervision.The most commonly funded health area was health system strengthening,which included a variety of projects aiming to streamline access to or delivery of healthcare services generally.Communicable diseases was the m
65、ost commonly funded disease-specific health area(16%of funded projects),followed by maternal,newborn and child health(14%of funded projects).11System-levelMedical research support System-levelDirect-to-providerDirect-to-consumer3.5%Clinical decision support10.9%Health education&awareness22.1%Health-
66、related behavior change15.1%Diagnostic&triage tools7.4%Disease prediction&risk modelling3.9%Public health surveillance7.7%Remote care13.3%Workfl ow optimization&health system effi ciency16.1%System-level interventions(public health surveillance,remote care,workfl ow optimization and healthcare syste
67、m effi ciency,and medical research support),were the most commonly funded group in terms of GenAI use case,representing a combined 41%of funded projects.This was followed by direct-to-consumer interventions(health education and awareness,and health-related behavior change),which represented a combin
68、ed 37%of funded projects.Direct-to-provider interventions(clinical decision support,diagnostic and triage tools,and disease prediction and risk modeling)made up the remaining 22%of funded projects.Direct-to-consumerDirect-to-provider12AfricaAsia/PacificLatin AmericaMiddle EastUnspecified68.6%17.4%9.
69、1%1.7%3.1%The majority of funded projects were based primarily in Africa(68%of funded projects).This reflects Global Burden of Disease(GBD)data,with Sub-Saharan Africa having the lowest life expectancy of the super-regions,followed by South Asia,9 and Sub-Saharan Africa also having the lowest covera
70、ge of essential health services.1013Health-Related Behavior Change:AI-driven systems that provide personalized recommendations or nudges to encourage healthier behaviors.Examples:AI chatbots that give personalized advice on smoking cessation.Health Education and Awareness:AI-based tools designed to
71、educate populations or raise awareness about specific health topics or access to healthcare.Examples:AI chatbots for reproductive health education.Clinical Decision Support:AI tools that assist healthcare providers in making better clinical decisions or answering patient queries.Example:AI system th
72、at assists health worker responses to consumer questions via more efficient triaging and routing of questions and suggested answers.Diagnostic and Triage Tools:AI systems that help in diagnosing diseases,or triaging patients for urgent assessment.Example:machine learning models to diagnose tuberculo
73、sis based on X-rays.Disease Prediction and Risk Modeling:AI tools that predict individual health risks and outcomes.Example:predicting an individual patients risk of developing diabetes.Public Health Surveillance:AI tools that predict disease outbreaks or perform public health surveillance.Examples:
74、early warning systems for infectious disease outbreaks.Remote Care:AI-enabled platforms that support remote health consultations,monitoring or care delivery.Example:systems enabling virtual consultations between patients and healthcare providers.Workflow Optimization and Health System Efficiency:AI
75、tools designed to streamline healthcare operations,optimize workflows,and improve the efficiency of healthcare delivery systems.Example:tools for improving supply chain logistics for medications.Medical Research Support:AI tools that optimize medical research such as clinical trials.Example:a tool t
76、hat can streamline clinical trial recruitment by identifying suitable participants.Each intervention has been categorized according to the following taxonomies for Use Case and Health Area:USE CASE CATEGORIESDirect-to-consumerDirect-to-providerSystem-level interventions14Photo Source:Girl EffectComm
77、unicable Diseases:includes infectious diseases such as HIV/AIDS,tuberculosis,malaria,viral hepatitis and neglected tropical diseases.11Non-Communicable Diseases(NCDs):also known as chronic diseases;the main types of NCD are cardiovascular diseases,cancers,chronic respiratory diseases and diabetes.12
78、 Injury-related conditions:covering health issues that result from trauma,violence,road traffic accidents and occupational hazards.Maternal,Newborn,and Child Health(MNCH):addresses pregnancy,childbirth,neonatal care,child nutrition,and the prevention of maternal and child mortality.Sexual and Reprod
79、uctive Health(SRH):includes access to contraception,sexual education,prevention and treatment of sexually transmitted infections,and protection from gender-based violence(GBV).Mental Health:Mental health conditions include depression,anxiety and psychosis,as well as neurological and substance use di
80、sorders.11 Environmental Health:includes health risks related to environmental factors such as air pollution,climate instability,access to clean water,sanitation and hygiene,and exposure to hazardous chemicals.13Health Systems Strengthening(HSS):includes the strategies,responses,and activities that
81、are designed to sustainably improve the performance of a health system,14 including enhancing healthcare infrastructure,policy implementation,and workforce training.Note that while HSS interventions are often system-level use cases,this is not always the case.For instance,a chatbot delivering inform
82、ation about available healthcare services would be classified by use case as direct-to-consumer,and by health area as HSS.Conversely,a public health surveillance tool focusing on cervical cancer would be classified by use case as system-level,but by health area as non-communicable diseases(not HSS).
83、Medical Research Support:Tools that optimize medical research,such as facilitating clinical trial recruitment or streamlining ethical review board processes.HEALTH AREA CATEGORIES1415To support our scoping analysis and qualitative interviews,our research partner,the Bay Area Global Health Alliance,s
84、ent out a survey to 572 individuals on its 2024 AI and Global Health Discussion Series email list,encompassing perspectives from research,funding,and implementation.We received 145 responses(25%response rate).Respondents selected their primary professional role in relation to the use of GenAI as fol
85、lows:Health Implementer Part of organization responsible for creating or implementing health programs(37%)Tech Facilitator Part of technology platforms or business solutions providers that support technical implementation(19%)Academic/ResearcherPart of organization that conducts research on health p
86、rograms(18%)Health Funder Part of organization responsible for funding the implementation of health programs;can include NGOs or government(12%)Health System ExpertPart of organization that delivers health care services,including running hospitals and clinics(5%)Fifty-one percent of respondents were
87、 actively involved in projects or initiatives using GenAI for health in LMICs,with 91%of these projects including work in Africa,and 30%including work in Asia/Pacific(note that many projects include multiple regions of implementation).SURVEY RESULTS16Which health areas do you think should be priorit
88、ized for the use of GenAI in healthcare settings in LMICs?We asked respondents to select their top 3 priority use cases.The most popular response was Health education and awareness(selected by 50%of participants),closely followed by Clinical decision support(44%)and Health-related behavior change(43
89、%).Health education&AwarenessCommunicable diseasesClinical decision supportMaternal,newborn,&child health(MNCH)Health-related behavior changeHealth Systems Strengthening(HSS)Workfl ow optimization&health system effi ciencyMental HealthDiagnostic&triage toolsNon-Communicable DiseasesPublic health sur
90、veillanceSexual and Reproductive HealthDisease prediction&risk modellingMedical Research SupportRemote careOtherMedical research supportEnvironmental HealthOtherInjury-Related ConditionsNot planning to use GenAI in our future projects10%10%20%20%30%30%40%40%50%50%60%60%70%0%50%67%44%49%43%40%38%36%2
91、9%35%29%29%22%7%16%7%6%6%4%1%2%0%Health education&Awareness50%10%20%30%40%50%60%0%Health education&Awareness50%Health-related behavior change43%Diagnostic&triage tools29%Disease prediction&risk modelling22%Medical research support6%Clinical decision support44%Clinical decision support44%Health-relat
92、ed behavior change43%Workfl ow optimization&health system effi ciency38%Workfl ow optimization&health system effi ciency38%Diagnostic&triage tools29%Public health surveillance29%Public health surveillance29%Disease prediction&risk modelling22%Remote care16%Remote care16%Medical research support6%Oth
93、er4%Other4%Not planning to use GenAI in our future projects2%Not planning to use GenAI in our future projects2%Communicable diseases67%10%20%30%40%50%60%70%0%Communicable diseases67%Health Systems Strengthening(HSS)40%Non-Communicable Diseases35%Medical Research Support7%Environmental Health6%Matern
94、al,newborn,&child health(MNCH)49%Maternal,newborn,&child health(MNCH)49%Health Systems Strengthening(HSS)40%Mental Health36%Mental Health36%Non-Communicable Diseases35%Sexual and Reproductive Health29%Sexual and Reproductive Health29%Medical Research Support7%Other7%Other7%Environmental Health6%Inju
95、ry-Related Conditions1%Injury-Related Conditions1%What are the primary use cases you would like GenAI to be used for in a healthcare setting in LMICs?We asked respondents to select their top 3 priority health areas.Communicable diseases was the most popular response(67%of respondents),followed by Ma
96、ternal,newborn and child health(49%),Health systems strengthening(40%),Mental health(36%),Non-communicable diseases(35%)and Sexual and reproductive health(29%).WE ASKED THE FOLLOWING QUESTIONS ABOUT THE USE OF GENAI FOR LMIC HEALTHCARE:1761%45%57%42%41%42%36%38%32%38%32%32%28%30%5%17%14%5%1%What do
97、you think are the most important factors for ensuring successful use of GenAI for health in LMICs?What are the key barriers to using GenAI in healthcare settings in LMICs?In terms of the most important factors for ensuring successful use of GenAI,the majority of participants selected Workforce capac
98、ity and training(61%),followed by Technical infrastructure(57%).Data security and privacy protection,and Regulatory and health policy environment were selected by 41%and 36%of respondents respectively.Slightly more respondents felt that cultural acceptability from healthcare practitioners(32%)was an
99、 important factor than cultural acceptability from patients(28%).In terms of key barriers to use,Large language models not trained on or fl uent in local languages was the most commonly selected response(45%),followed by Lack of other required digital infrastructure(e.g.,access to digital devices)an
100、d Concerns about incorrect outputs or inaccuracies of GenAI(both 42%).Unstable internet connection and Low digital literacy or awareness were both selected by 38%of respondents as key barriers.Fewer participants were concerned about cultural acceptability as a potential barrier to use.Workforce capa
101、city and trainingLarge language models not trained on or fl uent in local languagesTechnical infrastructureLack of other required digital infrastructure(e.g.,access to digital devices)Data security and privacy protectionConcerns about incorrect outputs or inaccuracies of GenAlRegulatory and health p
102、olicy environmentUnstable internet connectionCultural acceptability(healthcare practitioners perspectives)Low digital literacy or awareness(e.g.,of healthcare workers/patients)AffordabilityConcerns about data security and privacyCultural acceptability(patients perspectives)Concerns about the cost/su
103、stainability of GenAlOtherCultural acceptability(healthcare practitioners perspectives)Cultural acceptability(patients perspectives)OtherThe institution health policy does not permit the use of GenAl10%10%20%20%30%30%40%40%50%50%60%70%0%0%61%Workforce capacity and training10%20%30%40%50%60%70%0%61%W
104、orkforce capacity and training41%Data security and privacy protection32%Cultural acceptability(healthcare practitioners perspectives)28%Cultural acceptability(patients perspectives)57%Technical infrastructure57%Technical infrastructure41%Data security and privacy protection36%Regulatory and health p
105、olicy environment36%Regulatory and health policy environment32%Cultural acceptability(healthcare practitioners perspectives)32%Affordability32%Affordability28%Cultural acceptability(patients perspectives)5%Other5%Other45%Large language models not trained on or fl uent in local languages10%20%30%40%5
106、0%0%45%Large language models not trained on or fl uent in local languages42%Concerns about incorrect outputs or inaccuracies of GenAl38%Low digital literacy or awareness(e.g.,of healthcare workers/patients)30%Concerns about the cost/sustainability of GenAl14%Cultural acceptability(patients perspecti
107、ves)42%Lack of other required digital infrastructure(e.g.,access to digital devices)42%Lack of other required digital infrastructure(e.g.,access to digital devices)42%Concerns about incorrect outputs or inaccuracies of GenAl38%Unstable internet connection38%Unstable internet connection38%Low digital
108、 literacy or awareness(e.g.,of healthcare workers/patients)32%Concerns about data security and privacy32%Concerns about data security and privacy30%Concerns about the cost/sustainability of GenAl17%Cultural acceptability(healthcare practitioners perspectives)17%Cultural acceptability(healthcare prac
109、titioners perspectives)14%Cultural acceptability(patients perspectives)5%Other5%Other1%The institution health policy does not permit the use of GenAl1%The institution health policy does not permit the use of GenAl18KEY PRINCIPLES Prioritize user-centered designGenAI solutions must be tailored to the
110、 needs of their end-userswhether healthcare workers or patients.Partnership with community organizations is crucial to ensuring the needs and concerns of target end-users are heard and integrated throughout the design and implementation process.Interventions should play to the particular strengths o
111、f LLMs,map onto local priorities,and take into account the level of contextual risk given the use case.“One of the best ways we can mitigate some of the risk is to authentically co-develop solutions with the communities where we want them sitting with the people who are going to be using these solut
112、ions.”-Stanford Workshop Define and implement evaluation frameworksAlthough existing health-specific outcome measures are broadly applicable across digital health interventions,with morbidity and mortality remaining key,we currently lack established standards for measurement and benchmarking specifi
113、c to GenAI tools.Measuring success starts with clearly defining goals and outcome metrics upfront.Going forward,utilizing consistent outcome metrics to describe scale of projects(such as monthly active users,total users and retention of users)and specificity regarding the type of AI system being use
114、d(for example deterministic vs.generative)will facilitate meaningful comparisons and benchmarking.For tools using LLMs to generate responses to queries(whether from a healthcare worker or user),metrics of interest are likely to include correctness and completeness of LLM responses and time-and cost-
115、savings compared to previous non-GenAI methods,as well as qualitative factors such as understandability,empathy,and appropriateness of tone and style.Unlike a pharmaceutical compound,GenAI outputs are constantly evolving,so evaluation and review processes must be continuous to ensure tools remain ac
116、curate,relevant,and aligned with local health guidelines.“Its changing under us constantly,every minute of the day So we need to make sure that if were using it for healthcare we are putting it through its paces on a regular basis”-Stanford WorkshopCost-effectiveness is a key evaluation metric in re
117、source-constrained contexts where healthcare systems face significant financial and operational challenges.Evaluations should consider not only the upfront costs of implementing GenAI tools but also the cost-benefit analysis involved in improving healthcare access and outcomes.“The moment you start
118、introducing services that give more access to patients who need healthcare,your costs go up.So the first thing that we saw when we started using digital tools to increase patient loyalty in their care journey,the cost went up.However,the cost went up a little bit,and the quality went up big time.So
119、theres a cost efficiency in the improvement of health outcomes.”-Nicole Spieker,Chief Executive Officer,PharmAccess Balance safety and potential benefitsThe principle of“do no harm”is deeply embedded in healthcare.Yet while safety remains paramount,it is important to consider the opportunity cost of
120、 not using GenAI to address unmet health needs,particularly in LMICs.Pursuing perfection with GenAI tools is futile,and disproportionate given existing human error rates in healthcare.“Perfection is the enemy here because even when talking to large companies that are based here in LMICs,I dont think
121、 they visibly understand how bad the next best alternative is for many people”-Stanford WorkshopFRAMEWORK TO GUIDE THE USE OF GENERATIVE AI FOR HEALTHCARE IN LOW-AND MIDDLE-INCOME COUNTRIESAiming to synthesize a diverse range of perspectives from research,policy,funding and implementation,across the
122、 public and private sectors,we highlight four key principles and four key risks of GenAI health interventions.The recent FUTURE-AI framework delineates broad ethical principles and considerations for the development and deployment of trustworthy AI tools in healthcare,covering the lifecycle of healt
123、hcare AI.15 Our recommendations complement this by offering deeper implementation insights and actionable strategies,supported by real-world case studies.19The EU Artificial Intelligence Act is the first comprehensive regulatory framework for AI globally,coming into force on August 1,2024.16 It regu
124、lates AI based on risk levels:unacceptable-risk systems,like social scoring and manipulative AI,are prohibited.High-risk systems(which typically include healthcare applications)face strict requirements including risk-mitigation systems,high-quality data sets,clear user information and human oversigh
125、t.Limited-risk systems,such as non-health-focussed chatbots,must meet transparency requirements.Minimal-risk systems,like AI in video games,are mostly unregulated,though this may change with advances in GenAI.While the EU AI Act sets a high bar with binding,risk-based regulations,AI governance frame
126、works are rapidly emerging in LMICs:a growing number of countries in Africa,South Asia and South America have released national AI strategies,with others in development.However,these often focus more on strategic development and ethical guidelines rather than enforceable legal requirements.Examples
127、include:Nigerias National Artificial Intelligence Strategy,released in 2024,outlines ethical principles and governance priorities,including data protection and responsible AI use.17 However,it stops short of establishing binding regulations or specific compliance mechanisms,focusing on guiding princ
128、iples for AI development in sectors including healthcare while placing reliance on existing legislation not specific to AI,such as the Nigerian Data Protection Act(NDPA).Kenyas Draft National AI Strategy,developed in collaboration with German and EU partners,similarly promotes responsible AI through
129、 ethical guidelines,data governance frameworks,and capacity-building initiatives.18 Whilst it acknowledges the need for regulatory oversight in sensitive areas such as health,it currently lacks enforceable regulatory provisions,instead emphasizing a roadmap for future legislative development.Indias
130、National Strategy for Artificial Intelligence(#AIforAll)aims to foster AI innovation across key sectors,including healthcare,through ethical guidelines and policy recommendations.However,India does not yet have a dedicated AI regulatory framework;governance relies on existing data protection and IT
131、laws,with AI-specific regulatory discussions still ongoing.19Brazils National AI Strategy(EBIA)provides broad principles for AI ethics and responsible innovation.20 Recent proposals include investments to support AI governance,but like many LMIC strategies,Brazils approach focuses on strategic direc
132、tion and sector-specific guidelines rather than comprehensive AI-specific regulations.THE EVOLVING LANDSCAPE OF AI REGULATIONStriking this balance can be a challenge given there is not a clear regulatory landscape internationally,and regulators in many LMICs lack the resources to govern digital heal
133、th tools effectively.Ideally forthcoming regulation will foster experimentation and implementation in lower-risk applications to maximize benefits,while putting in appropriate controls in higher-risk settings.“The most important thing we can do amidst all the hype and flashy tools,is ensure we are l
134、eading with evidence.Its true that generating such evidence is not straightforward and we are largely in uncharted territory but this is no excuse for not ensuring we do that hard work.This is also likely to challenge our current models of regulatory approval and post launch surveillance,but this to
135、o,shouldnt slow down the enormous upside the tools have for communities in the Global South.The challenge is worth taking on!”-Zameer Brey,Deputy Director,Technology Diffusion,Gates Foundation 20 Ensure collaboration,transparency&knowledge-sharingThe need for cross-organizational collaboration and s
136、haring of experiences and learnings was stressed consistently throughout workshop discussions and interviews alike.Even for those working at the forefront of innovation in this arena,there is much that is not yet well understood,and the speed at which the fi eld is evolving entails considerable unce
137、rtainties along the way.Failures are inevitable,but are also valuable opportunities to learntransparency and knowledge sharing can help us avoid repeating the same mistakes.“We have to waste time,but lets do it quick.Lets learn,lets iterate.Theres going to be a ton of failures,and thats okay,but if
138、theres no cross-organizational learning.that is one of my biggest fears of AI”-Stanford Workshop“A lot of the lessons learned on the digital health side were hard,hard-won lessons that were still working to try to implement.And by breaking the AI out independent of the rest of the digital health spa
139、ce,theyre going to relearn all of those lessons again.So rather,you encapsulate it within it and use the same sorts of approaches and metrics that we would for any sort of digital interventionthats the biggest approach we want to take.”-Merrick Schaefer,Director of the Center for Innovation and Impa
140、ct,USAIDs Global Health Bureau Government buy-in is crucial in successfully scaling innovations,particularly in public health systems that serve diverse and widespread populations.The complexity of decentralized systemswhere federal,state,and local authorities often have separate but overlapping rol
141、esmeans any scaling effort requires engagement across multiple layers of governance.Achieving alignment across such diverse entities takes time,but lays the groundwork for sustainable and equitable implementation.“The government ownership was critical.I think when youre looking at scale,youre talkin
142、g of each and every public health facility,which means the fi rst mile to the last mile,and in any public health system in any middle-,lower middle-income country,typically,90%of these health facilities are sub-district level,and focus usually ends up being at the national or the state level hospita
143、ls,but really its the primary and secondary health centers which are the fi rst contact for these communities that they serve,where the system really works.”-Manish Pant,Policy Specialist,Digital Health,UNDP On his work digitizing vaccine supply chains in India in 2015Photo Source:Jacaranda HealthSt
144、anford Universitys RAISE Health(Responsible AI for Safe and Equitable Health)initiative is dedicated to advancing responsible AI innovation in biomedicine by fostering collaboration across organizations and sectors.Convening stakeholders from academia,government,and industry,RAISE Health promotes kn
145、owledge-sharing on best practices and challenges in AI development and implementation.Through its forums and collaborations,RAISE Health aims to ensure that AI technologies are designed and deployed with safety,accountability,and inclusivity in minds.21“We launched RAISE Health because we recognize
146、no single organization can chart the future of AI in biomedicine alone.To build responsible AI solutions,it is imperative that we collaborate across sectors,ensuring that every voice and interest is part of the process.”-Lloyd Minor,MD Dean of the Stanford School of Medicine and Vice President for M
147、edical Affairs,Stanford UniversityRAISE HEALTH INITIATIVE21KEY RISKSEven when embracing the above key principles,risks associated with the use of GenAI tools for health will remain and must be considered throughout design,implementation and evaluation phases.Model-based risksInaccuraciesInaccuracies
148、 or“hallucinations”(outputs that sound plausible but are incorrect)are a universal fl aw in current GenAI algorithms.To mitigate potential harms associated with inaccurate outputs,it is important to establish an acceptable error rate for a given use case,integrate human oversight,and employ strategi
149、es like Retrieval Augmented Generation(RAG)to improve contextually accurate responses.“Any sort of machine learning tool is never going to be 100%correct Different applications have different error rates that are acceptable.And in some applications,its more severe if you get things wrong than others
150、”-Stanford WorkshopCost and environmental impactLarge Language Models(LLMs)are resource-intensive.Costs vary by language,with non-Western languages generally requiring more processing power due to higher token density.Additionally,GenAI systems are energy-consumptive,raising environmental concerns.D
151、ata security and privacyWhile this issue goes beyond the intrinsic risks associated with GenAI,there is a specifi c model-based concern regarding the potential of GenAI chatbots to elicit large amounts of personal information more readily than in current clinical settings.This raises concerns about
152、storage practices and the risk of misuse,particularly in contexts where a given health-related behavior(such as same-sex sexual activity)is stigmatized or criminalized.Limitations of the training dataGenAI models are only as good as the data they are trained on.Existing datasets are heavily skewed t
153、owards European languages and cultural contexts.Training LLMs requires vast amounts of written text in the target language,and for many languages globally this may be challenging to acquire.Other challenges arise from differences between cultural contexts,which may lead to LLMs misinterpreting local
154、 slang expressions.This can have profound consequences in the context of medical communication,from damaging trust and rapport in provider-patient relations to missing key health information.Additionally,the available data for a given language may have inadequate coverage for particular health areas
155、for example,in places which lack gender parity,womens health concerns may be overlooked or trivialized in existing language datasets.“If you ask a chat model in Swahili,My baby has this high fever,what should I do?Chances are the model will tell you to go to your local pediatrician,which is just not
156、 a thing,right?Nobody has a local pediatrician youve offered essentially garbage information to someone who maybe just needs to call a nurse,or go to the local clinic.But either well spend a lot of money to go to a pediatrician,or well just disregard the message and say,Well,I cant do that Thats the
157、 appropriate message to give to someone who lives in North America if you have access to care,but it is not the appropriate message to give to someone else.”-Sathy Rajasekharan,Co-Executive Director of Jacaranda Health“A user asked about an empty chest GPT-4 interpreted that one as depression,and it
158、 turned out that the person was having chest pains And thats a common way people in Nigeria might say chest pain”-Stephen Meyer,Director of Partnerships,ViamoWe welcome the announcement in February 2025 by the AI for Development Funders Collaborative(a global partnership including FCDO,IDRC,BMZ,and
159、the Gates Foundation)of$10 million towards the development of AI models that are inclusive of African languages.22Digital divide&lack of basic health infrastructureNo matter how advanced AI models and datasets become,their potential to effect behavior change is wasted if the people who need them mos
160、t cannot access the necessary digital or physical infrastructure.The issue of digital access is particularly urgent:in 2022,the World Bank reported that only 36%of people in Africa had broadband internet access.23 Many low-income settings also lack the computing capacity needed to train and deploy a
161、dvanced AI models.21 This forces reliance on external infrastructure,which can reduce local control and ownership of AI initiatives.Additionally,basic healthcare infrastructure often falls short.For instance,a chatbot that encourages young women to get the HPV vaccine will be ineffective if local cl
162、inics do not stock it.Even in regions with adequate digital and healthcare infrastructure,a lack of education,familiarity with digital tools,and trust in their use can lead to failed implementation.22“We too often just look at the intervention in isolation What does infrastructure look like at the l
163、ast mile?Are there power and connectivity challenges?Are the health workforce digitally literate?Are the tools well designed based on their needs,their user experience?There are all these factors that are outside the individual interventions that tend to have an outsize impact on whether these inter
164、ventions are going to be successful or not.”-Sean Blaschke,Senior Health Specialist,UNICEF“Lets just take the elephant in the room:we assume that the end users got some kind of device and we can reach them.And obviously,disproportionately,people without devices will be less capable of accessing heal
165、th service anyway.And we need to keep that in mind,because its very easy to just say,Well,everybodys got mobile phones,but thats clearly not the case.”-Gustav Praekelt,Co-Founder,Turn.ioIt is important to recognize how digital divide challenges interact with other societal dynamics and inequalities.
166、For instance,while the gap is narrowing,women in LMICs are still 15%less likely than men to use mobile internet,with a more pronounced disparity in Sub-Saharan Africa(32%)and South Asia(31%).245Reflecting societal biases and problemsAI systems have a powerful ability to reflect and amplify existing
167、societal biases.These biases are not intrinsic to AI but arise from issues like skewed training data or the assumptions of those designing the systems.For example,even a well-representative dataset may inadvertently embed the biases of its developers.In contexts with entrenched cultural biases such
168、as gender discrimination,the need for language-and context-appropriate LLM training data could result in local gender stereotypes being reinforced.There is also a serious lack of consistency in standards for responsible AI reporting:the Stanford Institute for Human-centered AI(HAI)has found that lea
169、ding developers including OpenAI,Google,and Anthropic test their models against different benchmarks.27 A move towards standardization will enable easier interpretation of risks and optimal solutions.Further,AI technologies can be exploited by bad actors,who are not bound by ethical guidelines or Re
170、sponsible AI frameworks.“A number of the things were concerned about are systemic social problems that that you need to find different routes of actually addressing”-Stanford WorkshopPATH INITIATIVES TO TACKLE DATASET LIMITATIONS IN SUB-SAHARAN AFRICADigital Square at PATH is spearheading initiative
171、s to accelerate the effective use of LLMs for improving primary healthcare in sub-Saharan Africa through three targeted workstreams in 2025:251.Developing localized datasets to address biases inherent in LLMs trained on data from high-income countries.In collaboration with partners in Kenya,Nigeria,
172、and Rwanda,PATH is gathering medical questions and answers to create datasets that reflect local disease burdens and medical practices.They will use the datasets to test the performance of existing LLMs,where an expert panel will compare LLM responses to responses provided by relevant medical expert
173、s.The datasets will be made publicly available for others to train and fine-tune LLMs.PATH is also supporting the development of the AfriMed-QA dataset:a multi-institution,open-source dataset of 25,000 Africa-focused medical question-answer pairs created to represent the disease burden and medical p
174、ractices common across Africa.262.Evaluating the accuracy,safety,and effectiveness of LLM-enabled clinical decision support systems for frontline workers in primary health care settings.Clinical trials in Kenya,Nigeria,and Rwanda will assess tools ranging from voice-based call centers for community
175、health workers to electronic medical record-integrated consult features for clinicians.These trials aim to measure impact on care quality,patient outcomes,and the tools appropriateness for diverse healthcare environments.3.Establishing a community of practice to bring together stakeholders from tech
176、nology companies,academia,donor organizations,and implementing partners to share lessons learned and prevent duplication of efforts.The community of practice will engage through monthly virtual meetings.23GenAI solutions must be designed with a clear focus on the needs and priorities of their end-us
177、ers,whether healthcare workers or patients.For Funders:Support co-design approaches by funding projects that actively engage local stakeholders and potential end-users in the design process.Ensure that funding priorities reflect the realities of on-the-ground healthcare workers.For Implementers:Enga
178、ge local partners such as healthcare workers,community leaders,and policymakers to co-design solutions that address local priorities.Ensure interventions are realistic and sustainable in resource-limited settings.Tailor safeguards to the deployment environment,considering varying levels of contextua
179、l risk for clinician-facing and consumer-facing use cases.“Typically,our grants focus on supporting institutions that are very close to the issues that are at hand and that put people and communities at the center of the interventions”-Topaz Mukulu,Strategy Analyst,Patrick J McGovern Foundation gran
180、tmaking teamWhile no systemhuman or AIis error-free,careful planning and safeguards can help minimize these mistakes and their potential impact.“We cannot guarantee that we have eliminated hallucinations from a particular project.So the goal then becomes to minimize those,or build some infrastructur
181、e around containing them”.-Brian DeRenzi,VP,Research and AI,DimagiFor Funders:Support efforts to establish acceptable error thresholds and need for human oversight mechanisms for different healthcare applications.For Implementers:Improve accuracy with design techniques such as prompt engineering and
182、 retrieval-augmented generation.Define appropriate error tolerances for the tools purpose.For example,for a tool using GenAI to categorize incoming user queries by intent,false negatives(i.e.health queries misclassified as non-health)should be minimized at the cost of a higher false positive rate.En
183、sure appropriate human oversight:human-in-the-loop is a practice increasingly emphasized in the field of AI,and a key feature of many current use cases where errors would be detrimental to patient care.Conduct regular monitoring of error rates,mindful of the need for continual review processes given
184、 the ever-changing nature of GenAI algorithms.KEY RECOMMENDATIONSPrioritize user-centered design Ensure robust error safeguards24Share LearningsPeople wanted better ways of measuring benefits,costs,and risks,in ways that provide rigorous but also timely data to inform implementation decisions.Tradit
185、ional evaluation methods such as randomized controlled trials(RCTs)can take years to provide actionable results,meaning we also need better ways to measure success to inform time-critical implementation and funding decisions in interim periods.Establishing a clear evidence base will also be essentia
186、l for supporting government decisions to implement successful applications at a national scale.For Funders:Identify opportunities and establish funding streams for implementer and academic partnerships to develop robust measurement and evaluation frameworks,leveraging implementers access to data and
187、 academics expertise on measurement.Require grantees to adopt standardized metrics for evaluating GenAI health interventions.For Implementers:Develop continuous monitoring mechanisms to account for the evolving nature of LLM responses.Partner with academics to identify appropriate evaluation approac
188、hes to enable agile,real-time assessment.Prioritize transparency in reporting methodology,data sources,and key assumptions in evaluation.“How can we do research studies going forward,where theyre a bit more agile?Because we have a bunch of different system-defined prompts in our clinical decision su
189、pport system,and we want to be able to iterate on those in real time.Sometimes its a super small thing:we want to add an additional example to the prompt for how it should behave.And I think the current research paradigm wants us to set up the intervention,exactly as it is,and then freeze it like th
190、at for two,three,four months,while we run the trial.And in the meantime,we know we could improve this prompt.So I think being able to have more flexible research paradigms for the LLM age is important.”-Robert Korom,Chief Medical Officer,Penda HealthDefine and enable actionable measurementFunders ha
191、ve a vital role to play in creating the incentives for knowledge sharing between organizations across the public and private sectors,whilst implementers have access to the most cutting-edge knowledge regarding successes and failures.For Funders:Provide funding streams for collaborative approaches,an
192、d ongoing roundtable and workshop events to accelerate problem-solving around key challenges.Support regular processes for sharing and disseminating case studies,as demonstrated in this document.Support the production of practical guidance on how to identify LLM applications while mitigating risks a
193、nd then pilot/validate/scale them.A regular update process will be required given technical capabilities are changing quickly.“How do we create the right incentives for organizations to collaborate?Especially where things arent working well which is not intuitive.Its not what they get rewarded for-S
194、tanford Workshop For Implementers:Define and use consistent outcome metrics to describe the scale of projects(such as monthly active users,total users,and retention of users)and specificity regarding the type of AI system being used(for example,deterministic vs.generative)to facilitate meaningful co
195、mparisons and benchmarking.Share implementation insights and lessons learned,particularly regarding challenges and barriers encountered,to inform future efforts for the field.“Being able to capture the lessons that our partners have identified,whether thats challenges that they encountered,barriers
196、or just lessons.And so I think success also means generating the evidence and learning that can inform future efforts,whether its for your organization or just the field at large.”-Topaz Mukulu,Strategy Analyst,Patrick J McGovern Foundation grantmaking team25There is a risk that implementing GenAI t
197、ools could further exacerbate the digital divide in low-resource contexts where digital infrastructure is unevenly distributed.While GenAI tools can increase demand for health services by empowering users with better information and decision-making support,this must be accompanied by corresponding i
198、nvestment in the supply side of service delivery.“LMICs need to invest in digitizing care.Otherwise,were not going to be able to take advantage of this”-Robert Korom,Chief Medical Officer,Penda HealthFor Funders:Prioritize sustained investment in foundational healthcare systems alongside digital ini
199、tiativesboth are essential for meaningful,equitable progress.Prioritize GenAI investments that complement existing healthcare systems.For Implementers:Evaluate whether GenAI is the highest-impact way to address a given use case,considering existing healthcare and digital infrastructure.Assess organi
200、zational digital readiness before implementing AI tools,using tools such as the Global Digital Health Monitor.28 Design with maximal inclusivity in mind,considering how to reach individuals without smartphones and internet connectivity.“Just because we have a hammer,we dont want to go out and think
201、that everythings an AI nail”-Stanford WorkshopImprove digital&basic health infrastructureToo often,promising digital health interventions stall after the pilot phase due to insufficient funding,inadequate infrastructure,or a lack of strategic planning for expansion.Design for scale and consider shar
202、ed infrastructure.The effectiveness of GenAI tools in health behavior change hinges on their ability to communicate clearly and accurately across diverse languages and cultural contexts.However,the quality of models varies considerably by language,by medium(with voice particularly important for low-
203、literacy settings)and by use case(e.g.health-specific contexts).There is a pressing need for ways to identify and close gaps in quality.For Funders:Invest in the development of high-quality datasets for underserved languages,including region-specific dialects,culturally relevant health information,a
204、nd voice data for low-literacy populations.Fund efforts to establish standardized measures to evaluate model performance across different languages and specific health contexts to ensure consistent quality.For Implementers:Ensure LLM-generated content is accurate and culturally appropriate by involv
205、ing local experts in the testing and training process.Test and train models on data relevant to the intended healthcare setting.“There is potential for Gen AI to bridge that spoken language gap the danger,though,is that most of AI has been trained on Englishand is there enough written material in so
206、me of these other spoken languages to really exploit the possibility?”-Stanford Workshop“I would hope more funders understand the need for building capacity locally to be able to tune and train models appropriately.”-Sathy Rajasekharan,Co-Executive Director of Jacaranda HealthImprove language and lo
207、calization26For Funders:Structure funding to support pilots in achieving outcome data that will be needed to take an intervention to scale.Remain open to funding pilots initially supported by other funding bodies to enable sustained investment required for scale.Identify opportunities for centralize
208、d investment in shared infrastructure that multiple organizations can access and adapt.Encourage licensing and development of open-source models to maximize collective impact.For Implementers:Design solutions for long-term sustainability from the outset.Prioritize establishing a clear evidence base
209、for your intervention to support government decisions to implement at a national scale.Partner with governments and healthcare systems to embed GenAI tools into national strategies and service delivery models.“Theres also been a shift in funding,where initially there was a lot of small scale,innovat
210、ion-based funding for pilots.The problem that is ever-present in the sector is that the whole purpose of a pilot is so that you can then ideally scale something that works well,but without the funding to do that,youre just left with a pilot.And so it was just a lot of,Look at how we were able to use
211、 AI in the small use case,and then they kind of fizzle,or you cant maintain the systems because,of course,you need the money and so on.And we are seeing more scale funding.”-Elizabeth Shaughnessy,Director of Digital Programming&Co-Lead of AI Working Group,NetHopeTURN.IO:EXEMPLIFYING SHARED INFRASTRU
212、CTURE SUPPORTING MULTIPLE ORGANIZATIONS“We find the best implementing organizations,of which there are thousands in the world that are trying to have an impact in the Global South,and we try and provide them with the resources,the technology or the platforms and advice in order to deliver and to sca
213、le evidence based solutions in healthcare”-Gustav Praekelt,Co-Founder,Turn.ioTurn.io addresses the challenges of health service scalability through its GenAI-powered helpdesk,enabling organisations in LMICs to deploy chat-based solutions across the public health and low-cost private healthcare secto
214、rs.The system functions as a centralized platform that various health organizations can use and customize.This shared infrastructure approach reduces redundant technology development,contributing to the development of a shared“health commons”and enabling multiple organizations to scale their interve
215、ntions efficiently.Their target market encompasses healthcare providers,NGOs,and government health services across the Global South seeking to scale their digital health engagement.After successful pilots,the platform is now deployed by over 200 organizations,with 50 million users across deployments
216、 including 12 million new users in 2024.Multiple organizations using the platform are running RCTs with results expected in 2025.Successful deployments include:Penda Health(Kenya):Penda Health is a primary healthcare provider,using the Turn.io helpdesk to enable remote care and increase access.Using
217、 the platform,they have scaled telemedicine delivery from 2050 to over 1,500 interactions per month,with a 50%reduction in response time.MomConnect(South Africa):MomConnect is a flagship initiative of the Department of Health in South Africa,providing interactive maternal health support.Through the
218、Turn.io platform,they are answering an average of 40,000 maternal health questions per month.Noora Health(India):Noora Healths digital companion powered by Turn.io supported 700,000 users in 2024 Please see our in-depth Noora Health case study for further information.27GenAI tools have the potential
219、 to amplify and perpetuate societal biases embedded in their training data or influenced by the assumptions of developers,which must be accounted for throughout the design and implementation phase.“When we talk about bias and algorithms,were talking about very specific technical bias that is very di
220、fficult to mitigate in practice,because if youre using a package or tool off the shelf,it might already have the encoded bias.So its important to acknowledge that we wont be able to fully understand the scope of what that bias might look like,but maybe we can see what the impacts will look like,and
221、how to mitigate it from there,which also means human review is really important.-Elizabeth Shaughnessy,Director of Digital Programming&Co-Lead of AI Working Group,NetHopeFor Funders:Fund research into ethical AI frameworks tailored to LMIC contexts.For Implementers:Develop algorithms that include ch
222、ecks against bias and discrimination.29 Use datasets that reflect diverse populations,geographies,and healthcare contexts to minimize bias.Collaborate with local experts to include underrepresented perspectives,including gender-specific health needs.Regularly audit training datasets for bias and ina
223、ccuracies.Ensure transparency in development by clearly communicating methodologies,data sources,and bias mitigation strategies.Comply with legal governance frameworks and Responsible AI guidelines where they are in place,and remain alert to the potential for rapid changes in the regulatory landscap
224、e.Understand and monitor how adversarial actors are using GenAI to accomplish their goals,and establish appropriate mitigations.Confront societal biases and problemsDIMAGIS OPEN CHAT STUDIO“So our approach to getting involved in large language models was to build a platform,initially for ourselves,j
225、ust as a way of being able to quickly spin up different chatbots and try to understand what it looked like to do different prompting and understand kind of the safety and controls of everything.So we put that together and then realized it might be useful for other people.So open sourced it and put i
226、t out as Open Chat Studio”-Brian DeRenzi,VP,Research and AI,DimagiDimagi is a global social enterprise working to build and scale sustainable,high-impact digital solutions that amplify frontline work in healthcare and other sectors.Their Open Chat Studio(OCS)is an open-source platform to facilitate
227、the rapid prototyping,testing and deployment of LLM-based chatbots,democratizing access to GenAI technology and helping to ensure that its benefits are realised equitably by enabling developers to tailor interventions to local needs.The platform works with any LLM with an API,such as GPT-4,and can b
228、e deployed over the web as well as via mobile messaging apps including WhatsApp and Telegram.There are approximately 50 organizations currently onboarded with Open Chat Studio.Innovative recent deployments(currently in early pilot testing)include collaborations with two established multimedia behavi
229、or change organizations:Shujaaz in Kenya and Rseau Africain de lducation pour la Sant(RAES)in Senegal.Both organizations seek to enhance sexual and reproductive health(SRH)education and behavior change among adolescents through role-playing conversations,with chatbots emulating known characters from
230、 comic books and TV series.STANDING Together(STANdards for data Diversity,INclusivity and Generalisability),a partnership of over 30 academic,regulatory,policy,industry,and charitable organisations worldwide,has published recommendations to support transparency regarding limitations of health datase
231、ts and proactive evaluation of their effect across population groups,with the aim of reducing the risk of perpetuating existing biases and health inequalities when using AI technologies.3028For GenAI health interventions to succeed,gaining the trust and engagement of local and national stakeholdersi
232、ncluding patients,healthcare workers and policymakersis essential,not only for ensuring the relevance and acceptability of these tools but also enabling smoother implementation,scalability and long-term sustainability.“I think scaling in low and middle income countries is challenging because success
233、 requires more than just a great product,but youre also looking at local buy-in.So thats one of the criteria that were trying to understand when we talk to folks:who are they connected to?Are they working with governments?Are they working with Ministries of Health?Do they have those networks already
234、?”-Topaz Mukulu,Strategy Analyst,Patrick J McGovern Foundation grantmaking team“Ultimately,any digital health solution,be it AI enabled or non-AI enabled At the most basic level,it relies on data generation,which happens by the people in the fi eld,the Last Mile Health Worker,and if they are convinc
235、ed that this technology is of no use to me in my daily work,they will not use it.And the best of technologies will fail So the buy-in of the Last Mile Health Worker is critical,because thats where the real health outcome data health comes in”-Manish Pant,Policy Specialist,Digital Health,UNDPFor Fund
236、ers:Support alignment with national health strategies by funding projects that actively engage local policymakers and health system leaders.Provide grants for community engagement efforts needed to develop successful interventions.For Implementers:Partner with local organizations and government agen
237、cies to ensure alignment with regional and national health priorities.Demonstrate relevance and impact by ensuring evaluation metrics address specifi c local challenges.Clearly communicate the capabilities,limitations,and safeguards of GenAI health tools.Offer ongoing training and support for end-us
238、ers to ensure sustained adoption.Create channels for stakeholders to provide feedback and voice concerns during implementation.Build local stakeholder and government buy-in28Photo Source:Noora Health2929CASE STUDIES IN HEALTH-RELATED BEHAVIOR CHANGEThrough our two roundtable events,in-depth qualitat
239、ive interviews,analysis of key GenAI accelerator programs and survey,we have identifi ed many promising projects utilizing GenAI for health-related behavior change in the pilot phase,including some that are already deployed to 10,000+monthly users.As of late 2024,the only widely scaled application(t
240、o 100,000 or more monthly users)of GenAI in health-related behavior change for LMICs we were able to fi nd is Jacaranda Healths PROMPTS.However,several others have imminent scaling plans,and a predictable path to fast scaling is apparent for GenAI pilots conducted as part of an established broader s
241、caled system:for example,an existing helpdesk workfl ow with millions of total users that is now testing integrating GenAI for effi ciency improvements.We have identifi ed sharing of learnings,including case studies demonstrating implementation principles and evaluation processes,as an important pro
242、cess to accelerate progress in the fi eld.We therefore present fi ve exemplar case studies which have promising preliminary impact data,all of which are planning further evaluation in 2025 with a move to greater scaling.These projects also demonstrate various key principles and risk mitigation appro
243、aches detailed in our recommended framework.Given the nascency of the fi eld,most projects are still in early phases regarding outcome data specifi c to GenAI integration.Where possible,we have presented outcome data on:Scale of deployment Evaluation of LLM performance,for example accuracy and compl
244、eteness of LLM responses,as well as qualitative factors such as understandability,empathy,and appropriateness of tone and style Health impact:intended or actual health-related behavior change(largely not yet available)Cost-effectiveness analysis(largely not yet available).We found that in terms of d
245、escribing scale,there is currently not a consistent set of metrics being used by implementers,and it is often diffi cult to separate out the impact related to non-AI,deterministic AI,and generative AI elements of a given use case.As highlighted in our key recommendations section,a move towards consi
246、stency with this going forward,whilst acknowledging unique applications and individualized requirements,would facilitate more meaningful comparisons between projects.Evaluation of cost-effectiveness will be an important focus in upcoming evaluation frameworks.30CASE STUDYTARGET AUDIENCEJacaranda Hea
247、lth:PROMPTSPregnant and postpartum mothers in Sub-Saharan Africa.Viamo:Ask Viamo Anything(AVA)No-/low-literacy,underserved populations using non-internet phones in Sub-Saharan Africa and Asia.Girl Effect:Big Sis&Bol BehenAdolescent girls and young women in Sub-Saharan Africa and South Asia.Audere:Se
248、lf-Care from AnywhereHigh-stigma populations at risk for HIV.Noora Health:Remote Engagement Service query classifi erCaregivers in India,Bangladesh,and Indonesia.USE CASEDEPLOYMENT STATUSLate pilot:32,000 users in Zambia,DR Congo,Nigeria,Botswana,Tanzania and Pakistan.Direct-to-consumer,voice-based
249、query responses accessed by basic phone call,for populations without internet access.Scaled:526,000 users of GenAI-enabled platform in Kenya in 2024.3 million cumulative users in Kenya inclusive of pre-GenAI models.Direct-to-consumer SMS messaging for maternal and newborn health.Over 75,000 users in
250、 India routed to content via LLMs.Successful early pilot of GenAI chatbot in South Africa(4,000 users)has led to scale-up phase.Deterministic chatbots active at scale,with over a million conversations initiated since 2018.Direct-to-consumer chatbots for youth SRH education.Classifi cation system in
251、use by a 20-person clinical team processing 10,000 messages/day.Direct-to-provider query classifi cation for clinical teams.Early pilot;scaling studies planned for 2025 in South Africa and Zimbabwe.Direct-to-consumer conversational AI for HIV,SRH,GBV education,counseling,and linkage to care.Direct-t
252、o-consumer chatbots classifi cation for clinical teams.conversational AI for HIV,SRH,conversational AI for HIV,SRH,Over 75,000 users in India routed to pilot of GenAI chatbot in South Africa mothers in Sub-Saharan Africa.No-/low-literacy,underserved populations using non-internet phones in Sub-Sahar
253、an Africa women in Sub-Saharan Africa CASE STUDIES IN HEALTH-RELATED BEHAVIOR CHANGE31BENEFITSGENAI OUTCOME DATAFUTURE PLANS94%response listening rate;59%female users;high engagement in rural areas;high self-reported behavior change.Scale to Viamos 14 other countries and 27 million existing non-AI u
254、sers;increased localization;traceable referrals to health product and service providers.Increased access to healthcare services;stigma-free information delivery;reaching underserved populations.Signifi cant improvement in average response times for users(10-15 mins vs.2-4 days);capacity to respond t
255、o 10,000+incoming questions a day from mothers.Improved personalization at scale;clinical effi ciency and cost-effectiveness analyses;further platform expansion in Sub-Saharan Africa.Improved knowledge of pregnancy/postpartum danger signs and uptake of maternal health services;improved government ac
256、countability to improve services.AB testing of GenAI service demonstrated a signifi cant increase in key message consumption and service access;104%deep content consumption increase in India.Further A/B testing to assess impact of GenAI integration on user engagement,retention and behavior change;po
257、tential RCT evaluation of chatbots.Improved user knowledge and agency;increased SRH service uptake.80%reduction in nurse-reviewed queries;low false negative rate.Partnership with local organizations to improve vernacular language coverage and reduce misclassifi cations.Operational effi ciency.Greate
258、r than 90%usability,acceptability and appropriateness of the AI Companion;qualitative data demonstrated success in building user trust and comfort in addressing sensitive topics,and gathering more honest risk information.Three scaling studies planned for 2025,targeting vulnerable populations for HIV
259、 counseling,prevention and treatment support.Improved sexual and reproductive health education;stigma-free access to HIV testing,prevention or confi rmatory care options;improving effi ciency and effi cacy of clinical follow-up.acceptability and appropriateness of acceptability and appropriateness o
260、f user trust and comfort in addressing user trust and comfort in addressing sensitive topics,and gathering more sensitive topics,and gathering more 32“We realized very early on,quite accidentally,that if you send messages and theres a free way to respond,moms start asking questions seeded by the mes
261、sages youre sending,but they have a ton of questions,and then we realized very quickly that we need a way to answer those questions in an efficient way.”-Sathy Rajasekharan,Co-Executive Director of Jacaranda HealthWhat is the GenAI use case?Use Case Category:Direct-to-consumer(human-in-the-loop)Heal
262、th Area:Maternal,Newborn and Child Health(MNCH)PROMPTS is a two-way SMS service designed to promote positive care-seeking behaviors amongst new and expectant mothers through timely health information and support throughout the pregnancy and postpartum journey.Responses are generated by Jacarandas cu
263、stomized LLM,UlizaLlama,which is based on Metas Llama2 and fine-tuned for use in Swahili and English.Launched in October 2023,UlizaLlama is the first free-to-use Swahili LLM,and since August 2024 has been further fine-tuned for other African languages.As of October 2024,GenAI is integrated as standa
264、rd on the platform.What tasks are LLMs being used for?LLM tasks:Summarization:Condensing health guidance into easy-to-understand SMS messages.Classification:Identifying high-risk users based on message content and flagging them for escalation.Extraction:Extracting key information from user conversat
265、ional history and clinical information to enable personalized risk profiles.Translation:Handling mixed-language queries,local vernacular and slang;presenting technical medical information in accessible,patient-friendly language.Conversation:Personalized,two-way communication with users in Swahili an
266、d English.Designing for Inclusivity:Offline accessibility:SMS-based model ensures accessibility for users without smartphones or internet connectivity.Local adaptation:UlizaLlama supports multiple African languages,ensuring responses are culturally and linguistically relevant.Community trust-buildin
267、g:Strong government partnerships and local tech and helpdesk teams enhance credibility and engagement.Mitigating risks:All messages flagged as high-risk are escalated straight to the helpdesk for a human response(from a trained clinical nurse on the PROMPTS helpdesk).For routine queries,a second LLM
268、 audits UlizaLlama-generated responses for correct grammar,clarity and coherence,and medical accuracy.Responses scoring 85%or more are sent directly to the user(this threshold was derived from a previous system assessing responses of human helpdesk agents).1CASE STUDIESJACARDANDA HEALTH:PROMPTSCASE
269、STUDY33 Responses that fail the audit process are sent to human agents for review.Users who flag query answers as unsatisfactory are connected to the human helpdesk.What is the current deployment status?Approximately 526,000 unique users in Kenya engaged with the GenAI-enabled platform in 2024.Aroun
270、d 10,000 questions are answered per day,with 70%directly answered by GenAI(generated response passed audit),and the remainder referred to human agents for review(generated response failed audit)or answered directly by humans(high-risk cases bypassed automated response).“The generative tool does not
271、answer questions related to the miscarriage,or significant trauma;anything like that goes straight to a human being,but they need time to be able to answer that.So by taking away all the other stuff,theyre now able to focus on that Staff are much happier not answering backlog questions and focusing
272、on significant problems.”-Sathy Rajasekharan,Co-Executive Director of Jacaranda HealthHow widely deployed could it be over time?Target market is new and expectant mothers in Kenya and potentially across Sub-Saharan Africa.Sub-Saharan Africa is home to over 250 million women of reproductive age.In to
273、tal,over 3 million people have used the PROMPTS platform since its launch in 2017.Work has begun on expansion into Ghana,Nigeria,Eswatini,and Nepal.Work is underway to make the multilingual UlizaLlama LLM specific to the maternal and newborn health domain for other African languages,including Hausa,
274、Yoruba,Xhosa,and Zulu.How are they measuring success?LLM Performance:Evaluation of LLM responses for medical accuracy and appropriateness,personability,and simplicity:UlizaLlama outperformed the top-rated off-the-shelf LLMs by approximately 14%on overall scores.For example,off-the-shelf models use m
275、ore complex words and medical jargon,which is not appropriate for the PROMPTS audience(as per reading level estimates).Response time to user queries:Average response times decreased from approximately 5 hours in September 2024 to less than 15 minutes in December 2024.With GenAI,70%of queries receive
276、 an instant response.Language-specific outcomes,assessed using established NLP evaluation tools such as BLEU(Bi-Lingual Evaluation Understudy).Health impact:(Data specific to GenAI integration forthcoming;health impact data prior to GenAI integration):20%increase in mothers attending 4+prenatal care
277、 visits.100%increase in uptake of postpartum family planning services.89%of users exclusively breastfed for the first six months post-delivery.Cost-Effectiveness:GenAI adds an estimated$0.10 to cost per user($0.74 per user for duration of use prior to GenAI integration).The principal cost driver is
278、SMS costs,at around$0.40 per user.Enrollment costs per user are$0.120.20,depending on rural or urban location,with remaining costs operational(e.g.helpdesk team,field agents).Future measurement plansDeveloping a framework for efficiency of maternal care,to evaluate the cost-effectiveness of PROMPTS
279、in terms of appropriate care-seeking.34Whats been instrumental for PROMPTS to enable scaling?“This is a topic thats come up a lot in the last couple of weekswhats the secret sauce?The reality is this series of years of just battling.We started off as a small pilot,so I never have a problem with pilo
280、ts;theres a thing people say:Oh,were tired of pilots.And the problem with that is it assumes you can innovate without having pilots,which is not possible.All good things emerge from some idea that needs iteration.I think enablers for our scale have been:A relatively lean platform.Right from the star
281、t,we have always focused on the most efficient way to deliver the service for impact.So continuously,weve looked at the impact and then said,what can we pare down to maintain that impact?Partnership with the government has been keywere partnering directly with the Ministry of Health in Kenya that bu
282、ilds trust with the enrolled at facilities.This probably wouldnt have worked as well if we did some sort of big mass media campaign to enroll people,because theres a lot of fluff out there,and you probably wouldnt reach the right audience.Continuous learning has been helpful,because were able to piv
283、ot to things like using AI as part of a solution.As we scale from 1,000 moms to 100,000 to a million moms,very different prospect in terms of technology and operations that you need.Team is a huge piece of the scaling thing that I think gets discounted a lot because everyone wants a model that they
284、could say,oh,heres how you replicate.But I think the people piece is probably the biggest one.Our entire tech team is Kenyan.Weve benefited a lot from smart partnerships where weve had experts come in and guide the team,but by and large,theyre the ones whove done and built everything,and I think tha
285、ts been instrumental as well.So this isnt a project.This is our business.This is what we do,and its healthier for us to have a local team building solutions for a local problem Finding the right people for your organization makes or breaks it.”-Sathy Rajasekharan,Co-Executive Director of Jacaranda H
286、ealthPhoto Source:Jacaranda Health35If you want to reach economically disadvantaged people in low-and middle-income countries,most dont have smartphones,or dont have access to the internet,and literacy is an issue.You need to engage disadvantaged groups on the device they already have in their pocke
287、t.”-Stephen Meyer,Director of Partnerships at Viamo What is the GenAI use case?Viamo has two GenAI products which have been deployed in 9 countries and 5 languages in Africa and Asia:Ask Viamo Anything(AVA)and Ask An Expert(AAE).We focus on AVA as an example of a tool with exciting potential to driv
288、e health-related behavior change.“In short,its ChatGPT for offline audiences on a phone call.”-David McAfee,CEO,ViamoUse Case Category:Direct-to-consumer(fully autonomous)Health Area:Health Systems Strengthening(HSS)Ask Viamo Anything(AVA)is a voice-based GenAI system enabling conversational interac
289、tions,designed for users with basic,non-internet-enabled phones.The platform converts user calls into text files to input to the LLM(GPT-4 or others depending on language),and converts the LLM text responses to voice files to return to users.This approach tackles two key barriers:1.)low literacy,whi
290、ch hinders use of text-based solutions;and 2.)digital divide challenges in LMIC contexts,by enabling offline access to reliable,context-specific information.During a 2024 pilot in Zambia,approximately 30%of a total of 570,000 questions from 32,000 users were health-related,addressing topics includin
291、g HIV prevention,maternal health,and mental health.Responses to health-related queries aim to effect meaningful behavior change,such as attending a healthcare clinic to access essential care.“People are asking questions strongly around highly stigmatized topics We suspect that our users have maybe n
292、ever had an honest and direct and open conversation about HIV before,and now they have the opportunity to have this conversation from the privacy of their own phone,where they dont have to discuss with their auntie or their health worker or their partner,and they just ask all these nitty-gritty ques
293、tions and get answers.”-Stephen Meyer,Director of Partnerships at Viamo What tasks are LLMs being used for?Summarization:Condensing complex health information into accessible query responses.Classification:Categorizing user queries by topic to map to relevant responses.Extraction:Extracting relevant
294、 details from user queries to provide tailored responses.Translation:Supporting conversations in multiple core languages including English,French,Swahili,Portuguese,Urdu,Arabic;adapting to local vernacular and cultural nuances;converting between voice and text formats.Conversation:Facilitating tailo
295、red,empathetic interactions on sensitive and stigmatized health topics.Designing for inclusivity:A key focus of AVA is to reach underserved populations(e.g.women in rural areas;non-literate users),achieved by:Voice-based interface:Responses delivered by phone call to tackle challenges of low literac
296、y.Offline accessibility:Designed for non-smartphone users without access to internet connectivity.Localized content:Responses are tailored to users cultural norms.2VIAMO:ASK ANYTHING AVACASE STUDY3636Photo Source:NeedPhoto Source:ViamoMitigating risks:Early auditing of a large sample of health-relat
297、ed questions with Harvard Global Health found that all responses given were accurate,with room for improvement in localization and referrals.Resulting modifi cations to prompt engineering,in-country partnership strategy,and improved use of ChatGPTs moderations API.HITL processes are no longer in pla
298、ce.What is the current deployment status?Late pilot:pilot testing with 32,000 users in Zambia and smaller pilots in DR Congo,Nigeria,Botswana,Tanzania and Pakistan conducted in 2024.On average,users called in 7.7 times per month.How widely deployed could it be over time?Target audiences are underser
299、ved populations including those who lack internet access(2.7 billion people globally),use non-smartphone devices,and/or have low literacy.Viamo now has 27 million users on the basic,non-GenAI platform,in 68 languages across 19 countries.How are they measuring success?LLM performance:Evaluation of LL
300、M outputs for accuracy,cultural appropriateness,and empathy.User engagement metrics:94%of users listened to complete responses in pilot testing;engagement across different demographics:59%of users were female;74%were aged 24 or under;high engagement in rural areas.Health outcomes:Key health outcomes
301、 of interestformal evaluation data forthcoming:Self-reported behavior change Verifi ed behavior change(such as redeeming a coupon,connecting to a partner call center,attending an appointment).Cost-effectiveness:Viamo has negotiated zero-rating agreements with telecommunications companies(Airtel,MTN,
302、Vodafone,Orange,and more)to eliminate the airtime costs.Telecoms contributed over$200M in airtime in 2024.Viamo has so far secured free product credits for all GenAI tech,and will explore either zero-rating or open-source options at scale.The resulting cost-per-engagement is a few cents,and will con
303、tinue to decrease with scaling.Future measurement plans The non-GenAI version of the Viamo platform has traceably connected users directly to health services(family planning and HIV appointments,health call centres,and many non-health services).Viamo is interested in understanding if AVA is a more p
304、owerful tool to create these traceable outcomes.Photo Source:NeedKey health outcomes of interestformal evaluation data forthcoming:Verifi ed behavior change(such as redeeming a coupon,connecting to a partner call center,Viamo has negotiated zero-rating agreements with telecommunications companies(Ai
305、rtel,MTN,Vodafone,Orange,and more)to eliminate the airtime costs.Telecoms contributed over Viamo has so far secured free product credits for all GenAI tech,and will explore either The resulting cost-per-engagement is a few cents,and will continue to decrease with HIV Viamo is interested in understan
306、ding if AVA is a more powerful tool to 37“The role of our AI-enhanced chatbots is largely around building space for young people to be able to ask questions,and theyre in a contemplation phase in making a choice about their sexual health we have discovered in every market that they lack a judgment-f
307、ree space to contemplate these decisions.”-Karina Rios Michel,Chief Creative and Technology Offi cer,Girl EffectWhat is the GenAI use case?Use Case Category:Direct-to-consumer(human-in-the-loop)Health Areas:Sexual and Reproductive Health(SRH);Mental HealthGirl Effect is a nonprofi t organization ded
308、icated to empowering adolescent girls globally by providing them with the tools,information,and support they need to overcome societal barriers,including those relating to accessing healthcare,focussing within the domains of sexual health,economic empowerment,education and mental health.They use soc
309、ial behavior change methods through multi-channel,multi-product programs delivered via radio,TV,social media,and community activations to promote agency of girls and young women.Central to Girl Effects strategy are chatbots designed to motivate and inspire young people to take charge of their health
310、 and access relevant health services.These chatbots have previously used deterministic,classifi cation-based BERT(Bidirectional Encoder Representations from Transformers)AI models to map user queries to pre-curated responses.In 2024,Girl Effect began integrating GenAI to deliver more personalized,dy
311、namic responses,piloting a GenAI version of their South African chatbot Big Sis,and enhancing their Indian chatbot Bol Behen with LLM-powered content classifi cation.GenAI integration is supported by HITL safeguards,enabling escalation of high-risk cases.“The advantages of GenAI were using it for a
312、much more complex understanding of what users want.So were kind of seeing it as a way to understand our users better,and how we can use that power to deliver our services better.”-Soma Mitra-Behura,Lead AI Researcher,Girl EffectWhat tasks are LLMs being used for?Summarization:Condensing health guida
313、nce into easy-to-understand SMS messages.Classifi cation:Classifying user inputs according to topic to provide relevant responses;identifying high-risk cases for escalation to human supervisors.Extraction:Retrieving relevant details from user queries to provide tailored responses.Translation:Handlin
314、g mixed-code languages such as Hinglish(Hindi-English)and Sheng(Swahili-English),and adapting to local/youth vernaculars and slang.Conversation:Natural,human-like interactions with users on sensitive topics,emulating a trusted big sister persona.Designing for inclusivity:Youth-friendly design:chatbo
315、ts emulate a trusted big sister persona to encourage rapport.Cultural relevance:chatbots adapt to local slang and code-mixed languages,ensuring contextual authenticity.Community co-creation:target users are engaged throughout,ensuring content delivery refl ects their needs and concerns.3GIRL EFFECT:
316、BIG SIS AND BOL BEHEN CASE STUDY38Mitigating risks:LLM classifi cation fl ags high-risk disclosures for human review,and automatically directs users to emergency or professional support services.LLM outputs are limited to predefi ned and vetted content boundaries.Bespoke LLM evaluation framework has
317、 been built for Girl Effect,measuring safety,accuracy,relevance and tone.Unsupervised user engagement could only commence once the framework demonstrated a suffi ciently high pass rate.What is the current deployment status?Early pilot of generative chatbot(Big Sis)in South Africa:Supervised co-creat
318、ion and alpha testing of the GenAI prototype were conducted in South Africa in August 2024 to assess user trust and barriers to GenAI engagement.An unsupervised beta pilot ran from December 2024 to January 2025 in South Africa,reaching 4,000 users and generating over 11,000 responses.A/B testing com
319、pared the GenAI-enabled chatbot to the deterministic model,with users rating the GenAI chatbot as signifi cantly more supportive and trustworthy.Results showed users were signifi cantly more likely to engage with key messaging content and access service information:GenAI implementation will now be s
320、caled.Generative classifi cation system is active in India:Over 75,000 users have had their questions routed to content by LLMs.88,000 user submissions were successfully routed to key messaging.38Photo Source:Girl Effect39How widely deployed could it be over time?Target users are 18-24 year-olds wit
321、h high unmet needs in sexual and reproductive health and mental health support.Across South Africa,Kenya,and India,Girl Effects deterministic AI chatbots engage approximately 1.4 million users,with 110,000 monthly active users and 80,000 daily messages received.Potential market reach is estimated at
322、 110 million users across these markets,with expansion plans in Nigeria.How are they measuring success?LLM evaluation:User engagement:high levels of engagement in supervised testing of generative chatbot in South Africa.User experience:qualitative feedback captures positive feedback on chatbot tone
323、and relevance of responses.Evaluation framework:LLM tests the safety,accuracy,reliability and tone of the GenAI answers.A/B testing comparing the effi cacy of GenAI and non-GenAI experiences.Health outcomes:Increased consumption of content previously demonstrated to be successful in achieving behavi
324、or change:the generative classifi cation system in India has achieved a 104%increase in repeat engagement with key messaging,and increased breadth of content consumed.A/B testing in South Africa found a 300%increase in engagement among users of the GenAI model compared to the legacy model.GenAI user
325、s were 12%more likely to access service information in the chatbot and 11%more likely to engage with key messaging compared to the control group.Key health outcomes of interest:demonstrated for pre-GenAI models,formal GenAI evaluation data forthcoming:Increase in awareness and knowledge of contracep
326、tive methods,STI prevention and mental health coping mechanisms.Increase in intention to use and actual usage of contraceptives among adolescent girls and young women(AGYW).Increase in AGYW intending to get tested and regularly getting tested for HIV.Cost-effectiveness:Girl Effect is assessing the c
327、ost-effectiveness of their Kenyan chatbot by analyzing costs in relation to contraceptive needs and service uptake conversions.Research by the Guttmacher Institute shows each dollar spent on contraceptive services for adolescents in Kenya saves$2.71 in maternal and newborn healthcare costs;fully add
328、ressing the contraceptive needs of adolescents in Kenya could reduce pregnancy-related healthcare expenses by$46 million.For Girl Effects non-AI-enabled chatbot WAZZII,the estimated cost to encourage a young person to uptake a health service was$28 per user.Four-week beta testing of Big Sis in South
329、 Africa cost approximately$80 for the duration of the test($0.007 per question),indicating that using GenAI could remain cost-effi cient.Future measurement plans Analysis of A/B testing results to determine the impact of GenAI integration on:User uptake of topic recommendations User engagement and r
330、etention User behavior change Additional A/B testing will explore impact of other system variables:Allowing users to have a conversation history of three or more messages Different editorial approaches Additional layers of natural language understanding before generating answers RCT planned in Kenya
331、 in partnership with the World Bank;exploring potential for additional RCT in South Africa.40Photo Source:Audere“Sometimes when people visit a clinic,its really hard for clinicians to remain fully empathetic and not have some of the questions come off as abrasive,because they just dont have time to
332、slowly gather those data.They have to ask these really personal questions And the individuals can feel stigmatized for their answers and just dont want to answer while theyre looking somebody in the eye.”-Shawna Cooper,Director of Product,AudereWhat is the GenAI use case?Use case category:Direct-to-
333、consumer(human-in-the-loop)Health Areas:Communicable Diseases;Sexual and Reproductive HealthAudere is a nonprofi t organization using GenAI to enhance HIV prevention and counseling services in South Africa.Their Self-Care from Anywhere program was co-created with local community partners and SHOUT-IT-NOW,a South African nonprofi t providing youth-focused HIV prevention and sexual health services.P