1、Creating trustworthy open data for scientific discoveryNew York Scientific Data Summit 2024:Addressing Data Challenges in Digital TwinsNew York City,New YorkSeptember 16,2024Grace C.Y.Peng,PhDScully et al.(2015):Hidden technical debt in Machine learning systems doi:10.5555/2969442.2969519December 6,
2、2019 ACD AI WG Reporthttps:/acd.od.nih.gov/documents/presentations/12132019AI_FinalReport.pdfDecember 13,2019 ACD presentationhttps:/acd.od.nih.gov/documents/presentations/12132019AI.pdfThe NIH Bridge2AI ProgramSupported by the NIH Common FundCo-ChairsMichael ChiangEric GreenHelene LangevinSteve She
3、rryBruce TrombergCommon Fund Program LeaderHaluk ResatCommon Fund Program OfficersChris KinsingerGeorge PapanicolaouFederal Working Group(+100 Members)CC,CIT,FIC,NCATS,NCI,NCCIH,NEI,NHGRI,NIA,NIAID,NICHD,NIBIB,NIDA,NIDDK,NIAMS,NIGMS,NIMHD,NINDS,NLM Other Federal Agencies:DARPA,DOE,FDA,NIST,NSFBridge
4、2AI Program Management TeamWorking Group CoordinatorsJames Gao,NEILanay Mudd,NCCIHGrace Peng,NIBIBShurjo Sen,NHGRICommon Fund StaffNatalie Vineyard(Comm)David Dzamashvili(Ops)Karen Kellton(Prog Mgmt)Kristina Faulk(Prog Coord)Awards ManagementKristen Kreuter(DOTM)Erna Petrich(DOTM)DATADiverseFAIRAI-r
5、eadyETHICSAccurateReliableEthically-sourcedPEOPLEDiverse teams&research cohortsTrainingBridge to Artificial IntelligenceVision:to propel biomedical and behavioral research forward by setting the stage for widespread use of artificial intelligence(AI)technologiesGoals:Use biomedical and behavioral re
6、search grand challenges to generate flagship datasets Prepare AI/ML-friendly dataPrioritize ethical best practicesPromote diverse perspectivesData PreparationModel DevelopmentModel EvaluationTeamingEthicsStandardsToolsData AcquisitionSkills&Workforce DevelopmentModel-DrivenExperimental DesignAI/ML M
7、odel DevelopmentBiomedical&Behavioral Science DiscoveryPublic Data ReposData PreparationScientific Discovery PipelineBridge2AI8Clinical Care-Using imaging,clinical,and other data collected in an ICU setting for diagnosis and risk predictionPrecision Public Health-Using voice as a biomarker for human
8、 health,revealing how genomic variation,human development,behavioral,and environmental factors affect individual and population healthSalutogenesis(Return to Health)-Uncovering the details of how human health is restored after disease,using type 2 diabetes as a model Functional Genomics-Mapping spat
9、iotemporal architecture of human cells to interpret cell structure/function in health and diseaseGrand Challenges-Data Generation Projects 9From Vision to DeliverablesGrand Challenge-MotivationPeople,Ethics,Data Preparation-ProcessesStakeholder Testing on DATA-FeedbackData SheetsCriteria for ML-Frie
10、ndly DataModel CardsData Access StandardsConsent StandardsEthical PrinciplesCurriculaFlagship Data GenerationBridge2AIGenerating ethically sourced data and best practicesChen,Clayton,Novak,Anders,Malin.Human-Centered Design to Address Biases in Artificial Intelligence.JMIR.2022.Multimodalfor AIHoRUS
11、Bridge2AI is supported by NIH U54 HG012510,U54 HG012513,U54 HG012517,OT2 OD032720,OT2 OD032742,OT2 OD032644,OT2 OD032701P R E C I S I O N P U B L I C H E A L T HS A L U T O G E N E S I SF U N C T I O N A L G E N O M I C SC L I N I C A L C A R EEHR/CLINICALSURVEYSIMAGINGSENSOR-BASEDOMICSWAVEFORMA dat
12、abase of 10,000 diverse bioacoustic waveforms is being established to establish voice biomarkers in mental health,respiratory,neurological,and other areas.Demographics Diagnosis(ICD)Severity of disease Treatment information Social history(smoking,alcohol)12 validated questionnaires(e.g.,MOCA,GAD-7,V
13、HI-10,PANAS,DI,etc.)Brain MRI/CTs Chest/neck CTs Laryngoscopy Whole genome sequencing Bioacoustic data tasks of voice and non-voice sounds,shared as waveforms,Mel spectrograms,featuresOMOPOMOPBrain imaging:DICOM;laryngoscopy:MP4 CRAM&VCFs with metadataWaveform database(WFDB);creating new standard fo
14、r bioacousticsCreating a temporal atlas from 3,000 individuals around pathogenesis and salutogenesis to expand applications of AI in clinical care,focusing on Type 2 diabetes Demographics,SDoH Diet Social history Lab tests(blood,urine)Monofilament test Physical assessment Medications Vision testing
15、Multiple validated self-reporting surveys(CES-D,PAID-5,etc.)Retinal imaging(undliated/dilated fundus photography,pupillary dilation,FLIO,optical coherence tomography(OCT),OCT angiography)Continuous glucose monitoring(CGM)Physical activity monitoring(heart rate,steps,sleep phases)Environmental sensor
16、s(air quality and particulate measures,temperature)Whole genome sequencing Electrocardiogram(ECG)OMOP,LOINCOMOP,LOINCDICOMCGM,physical activity:open mHealth;Air:Earth Science Data SpecCRAM&VCFs with metadataWaveform database(WFDB)Establishing a set of 100,000 patients from 14 ICU sites across the Un
17、ited States to improve recovery from acute illnesses Demographics,SDoH Clinical notes Lab tests Medications Encounters Procedures All imaging acquired during ICU setting and captured in PACS(MR,CT,US,x-ray)Physiological data(ECG;electroenceph-alogram,EEG)OMOP,LOINCDICOMWaveform database(WFDB)Creatin
18、g a library of large-scale maps of cellular structure,function,and disease contexts using cell lines.200 genes/proteins are the subject of coordinated experiments in three modalities Immunofluorescence imaging data for cell imaging Proteomic mass spectrometry CRISPR perturbation scRNA-Seq Datasets C
19、ell mapsCell imaging:RO-Crate with JPEG 4-channel(red,green blue,yellow)and metadataMass spec:RO-Crate w/TSV&metadata;CRISPR:RO-Crate with h5ad file&metadata;Cell maps:RO-Crate with Cytoscape CX&metadataCloud environmentMicrosoft Azure Cloud environmentMicrosoft Azure Cloud environmentGoogle CloudCl
20、oud environmentMicrosoft Azure Bridge2AILessons Learned so far18What make Bridge2AI challenging?Biomedical Research Humans inferred knowledge Heterogeneous,messy data Non-standardized processes Validation through scientific method Diversity Open culture of sharingAI/ML Machines explicit knowledge“Co
21、mplete”data Standardized algorithms Training Bias Closed culture of securityOur Goal:Propel Scientific Discovery19Ethical Challenges for Open Science Biases:Issues related to inherent biases of the data Informed Consent:Going beyond a legal consent form How do we ensure consent given the evolving la
22、ndscape of AI/ML?Re-identification:Navigating the risk of re-identification with multi-modal data Unauthorized Use:How do we prevent unauthorized secondary use?20People Challenges Teaming&CollaborationMultidisciplinary teamsCross-Consortium collaborationCommunity engagement committees Diverse cohort
23、s for data collectionConsent&privacyLegal issuesSovereignty issues AI/ML Training NeedsComputational science training on the ethical,legal,and social implicationsNew material with use casesTraining for non-computational scientists(e.g.,clinicians,physician scientists)Hands-on training21Lessons Learn
24、ed Program vision&goals:Promote repeatedly and continuously and consistently Governance:Create iterative governance structure to adapt to the changing needs Iterative AI model build and evaluation:As data and best practices are being created Synchronized stakeholders:Partner with each team from the
25、outset,equitably Sustainability plan:For data storage,access,distribution,sovereignty from the outsetOther NIH ProgramsSupporting trustworthy data for open scienceMajor Bias SourcesData CollectionData PreparationModel DevelopmentModel EvaluationModel Deployment https:/www.midrc.orgBias Awareness Too
26、l:Diversity CalculatorTrustworthy open data requires understanding dependencies!https:/ IMAG MSM Consortium Meeting2024 IMAG MSM Consortium MeetingSetting up TEAMS for Biomedical Digital Twins(Teaming4BDT)September 30-October 2,2024Bethesda,MarylandRegister on the IMAG WIKIIn-person and online open
27、to all!Organized and hosted by the Interagency Modeling and Analysis Group(IMAG)and the Multiscale Modeling(MSM)ConsortiumDay 1-Defining Biomedical Digital Twins(BDT)Goal 1:To understand the NASEM Digital Twin componentsGoal 2:To identify unique features for digital twins in the biomedical domain(BD
28、T)Create requirements template for BDTDay 2-Approaches to address BDT challengesGoal 1:To understand the challenges unique to developing BDTGoal 2:To discuss needs with experts and compile BDT component resourcesCreate review template for BDTDay 3-Operationalizing Team Science for BDTGoal 1:To form
29、BDT idea teams guided by team science approachesGoal 2:To present and review realizable,fit for purpose BDT ideasUtilize consensus requirements and review templates developed in Day 1 and Day 2Special thanks to NSF for providing Travel AwardsSpecial thanks to the Society for Mathematical Biology for providing refreshments