基因組學的效率挑戰.pdf

編號:464945 PDF 25頁 423.21KB 下載積分:VIP專享
下載報告請您先登錄!

基因組學的效率挑戰.pdf

1、AI Hardware&SystemsaiandsystemsEfficiency Challenges in GenomicsTom Sheffler AI Hardware&SystemsaiandsystemsPreface Goal is to give insights into the characteristics of genomics computations Explain AI/ML on the Edge for DNA processing Challenges in AI/ML for genomics(from the real world)AI Hardware

2、&SystemsaiandsystemsGenomics Applications why does it matter?Cancer Screening identify DNA changes that increase a persons risk guide selection of therapies Whole Genome Sequencing for newborns(Wash Post 2018)6 days old severe seizures 39 hours to sequence whole genome simple treatment identifiedhtt

3、ps:/ Hardware&SystemsaiandsystemsRapidly decreasing cost increasing data and computation Cost for WGS(Whole Genome Sequencing)$300K in 2006 2020$1000$100 Ultima UG 100 Jan 2024https:/www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-costAI Hardware&SystemsaiandsystemsIntroduction to

4、SequencingAI Hardware&SystemsaiandsystemsSequencing Workflow and AnalysisATGCTACGExtractionTemplateAdapterLigationFragmented DNAFragmentSequencing LibraryLibrary PrepPoolingSequencingAnalysis PipelinePrimaryAnalysisdemuxconsensusvariantcallingconsensusvariantcalling*NAI Hardware&SystemsaiandsystemsE

5、xtractionATGCTACGAI Hardware&SystemsaiandsystemsLibrary PreparationTemplateAdapterLigationFragmented DNAFragmentSequencing Library“ACAC”AI Hardware&SystemsaiandsystemsPoolingPoolLibrariesAI Hardware&SystemsaiandsystemsSequencingPoolDataSequencer100 GB to 1 TB+12 to 48 hoursAI Hardware&Systemsaiandsy

6、stemsData PipelineInherent data parallelism,potential streaming parallelismPrimaryAnalysisdemuxconsensusvariantcallingBaseCalls(500GB+)DemuxedBaseCalls(500GB+)ConsensusReadsVariants(100MB+)consensusvariantcalling*NsensorAI Hardware&SystemsaiandsystemsPrimary AnalysisPrimaryAnalysisdemuxconsensusvari

7、antcallingconsensusvariantcalling*NAI Hardware&SystemsaiandsystemsNanopore SequencingPrimary:Raw signal to Base-Calls Many NanoporesAnalyzeaacgtcgtactagtctactctaggtacctagtactaaRaw Data:Neural NetDSPorhttps:/ Hardware&SystemsaiandsystemsChallenge:ML for Basecalling Keeping up with real-time constrain

8、ts Runs are expensive Data changes with chemistry Noise in Data Sequencing is inherently inexactAI Hardware&SystemsaiandsystemsDemultiplexPrimaryAnalysisdemuxconsensusvariantcallingconsensusvariantcalling*NAI Hardware&SystemsaiandsystemsDemultiplexingaacgtcgtactagtctactctaggtacctagtactaacgatccgattag

9、ctactacgatagtacgattaactaaacgcgatacagacttacaacgcatacattacgaatacgatagcctagactactactagcctaactataccttgaacgtcgtactagtctactaacgcgatacagacttacaacgcatacattacgaatacgatccgattagctactacgatagtacgattaactacgatagcctagactactactaggtacctagtactaactagcctaactataccttgAI Hardware&SystemsaiandsystemsDemultiplexing using Mac

10、hine Learning Challenges applying ML to this process Error rates from previous step stack up Consider a basecalling accuracy of 99%(Q20)and an adapter seq length of 40 The probability that the entire barcode sequence is correct is.9940=0.68 A.99C.99C.99T.99G.99T.99C.99A.99=.9940=0.68AI Hardware&Syst

11、emsaiandsystemsONT:Demultiplexing using Machine Learninghttps:/www.ncbi.nlm.nih.gov/pmc/articles/PMC10173771/AI Hardware&SystemsaiandsystemsConsensusPrimaryAnalysisdemuxconsensusvariantcallingconsensusvariantcalling*NAI Hardware&SystemsaiandsystemsMachine Learning for Consensus“Deep Consensus”Accele

12、ration One 8M SMRT Cell can take 500 hours to run,500-way parallelization is 1 hour per shard(GPU V100 3.3x faster)4 On PacBio Revio 5.5 hours per SMRT cell 3“Being able to achieve higher accuracy using DeepConsensus now allows us to deliver accurate HiFi reads to customers in a shorter amount of ti

13、me;whereas the Sequel IIe has a standard sequencing time of 30 hours,for our new platform,we can now reduce that to 24 hours.”3AI Hardware&SystemsaiandsystemsTo the Cloud,and BackPrimaryAnalysisdemuxalignmentconsensusvariantcallingalignmentconsensusvariantcalling*NInterpretationTherapyPrimaryAnalysi

14、sdemuxalignmentconsensusvariantcallingalignmentconsensusvariantcalling*NInstrument+AI/ML with GPUsCloudInstrumentCloud/DatacenterAI Hardware&SystemsaiandsystemsChallenge:Heat and Noise GPUs generate a lot of heat,more than some labs can cool.Noise of the fans cooling the GPUs can exceed sound allowa

15、nces for operators.Solution:separate acceleration unit Problem:IT department headachesAI Hardware&SystemsaiandsystemsChallenge:deploying updates Updates in a clinical setting Validation of software updates Updates on the order of 6 months may be tolerated Connectivity and size Bandwidth to labs is c

16、ontinually improving But may not be what other industries are used toAI Hardware&SystemsaiandsystemsChallenge:Obtaining Datasets for Training Problems:Privacy Consent Population bias Noise in Data Sequencing is inherently inexactAI Hardware&SystemsaiandsystemsSummary Genomics computations are inherently parallel!Challenges exist in Thermals,noise and power Deployment,updates Obtaining training data ML and GPUs have enabled strides in Turnaround time Processing more data

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(基因組學的效率挑戰.pdf)為本站 (com) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站