SNIA-SDC23-Li-Approximate-DNA-Storage.pdf

編號:148969 PDF 43頁 4.27MB 下載積分:VIP專享
下載報告請您先登錄!

SNIA-SDC23-Li-Approximate-DNA-Storage.pdf

1、1|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Approximate DNA Storage with High Robustness and Density for ImagesPresented byBingzhe LiAssistant ProfessorUniversity of Texas at Dallas2|2023 SNIA.All Rights Reserved.3|2023 SNIA.All Rights Reserved.Big Data EraData is doubled a

2、lmost every 2 years44 Zettabytes in 2020175 Zettabytes in 2025Image from:https:/ SNIA.All Rights Reserved.Why DNA Storage?1 Allentoft et al.The half-life of dna in bone:measuring decay kinetics in 158 dated fossils.Proceedings of the Royal Society B:Biological Sciences,279(1748):47244733,2012.2 Gras

3、s et al.Robust chemical preservation of digital information on dna in silica with error-correcting codes.Angewandte Chemie International Edition,54(8):25522555,2015.3 Figure source:IDC25,000 x 8TB HDDs5 10 years of warrantyLarge gap between generated data and installed storage capacity.1 EB data cen

4、ter Fort Worth,TX750,000 sq ft1 gram DNA 1Several centuries 2Photo:Tara Brown/UW5|2023 SNIA.All Rights Reserved.What is DNA Storage?Nucleotides/Bases:ATCGData:BitBase00A01T10G11CGTACACTGSimple encoding:150 300 basesTACAGT1001001100110110primerprimerGCTmetadatapayloadEncodingAssemblingSynthesis100100

5、1100110110SequencingDisassemblingDecodingWriteRead6|2023 SNIA.All Rights Reserved.Issues of DNA StorageErrors of DNA storage:Some patterns may increase error rates:Consecutive identical nucleotides(e.g.,“AAAA”)Hairpin structure/secondary structure etc.GTACAOriginal sequence:GTGCAGTCASubstitution err

6、orDeletion errorInsertion errorGTACAGAGAGDNA storage is Error-proneExpensive(e.g.,$1million/GB)Slow(e.g.,hours/GB)Special preservationLow encoding density(ideal one is 2bits/nt)00-A,01-T,10-C,11-G.7|2023 SNIA.All Rights Reserved.Conclusion:One nucleotide error causes a series of errors in its subseq

7、uenceError Propagation in DNA StorageError propagation:01100001011100001110000110110001100101Binary:.T G A TT C A AT C A AT G C AT G T T.Original DNA sequence:BitBase00A01T10G11CEncodingSynthesis&SequencingT G A TT A A TC A A TG C A TG T T.Sequencing result:01100001010000011100000110110001100101 Dec

8、oded binary:.DecodingDeletion error8|2023 SNIA.All Rights Reserved.Conclusion:error propagation in DNA sequence One nucleotide error causes a series of errors in its subsequenceError Propagation(EP)in DNA Storage cont.EP in sequencing:G T A C AG T A C AG TAG T A C ACTvotingMillions of DNA strandsLin

9、,Dehui,Yasamin Tabatabaee,Yash Pote,and Djordje Jevdjic.Managing reliability skew in DNA storage.In Proceedings of the 49th Annual International Symposium on Computer Architecture,pp.482-494.2022.10|2023 SNIA.All Rights Reserved.Issues of DNA StorageDNA storage is Error-prone Expensive(e.g.,$1millio

10、n/GB)Slow(e.g.,hours/GB)Special preservation Low encoding density(ideal one is 2bits/nt)00-A,01-T,10-C,11-G .DP-DNA MASCOTS23IMG-DNA Systor21HL-DNA ICCD2211|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Increase Density of DNA StorageDP-DNA:A Digital Pattern-Aware DN

11、A Encoding Scheme to Improve Encoding Density of DNA Storage 11 Bingzhe Li,Li Ou,Bo Yuan,and David Du,“DP-DNA:A Digital Pattern-Aware DNA Encoding Scheme to Improve Encoding Density of DNA Storage”,The 31st International Symposium on the Modeling,Analysis,and Simulation of Computer and Telecommunica

12、tion Systems(2023).12|2023 SNIA.All Rights Reserved.A typical encoding scheme rotation codeRotating encoding1 JamesBornholt,RandolphLopez,DouglasMCarmean,LuisCeze,GeorgSeelig,and Karin Strauss.A dna-based archival storage system.In Proceedings of the Twenty-First International Conference on Architec

13、tural Support for Programming Languages and Operating Systems,pages 637649,2016.Avoid long homopolymer GC content is roughly maintained13|2023 SNIA.All Rights Reserved.Issues of previous workLow encoding densityMapping 8 bits to 5 or 6 trits(base3)1.57bits/ntTheoretically,encoding density is 2bits/n

14、t,or 1.98bits/nt14|2023 SNIA.All Rights Reserved.Encoding scheme 2bit-code and unbalance codeIssue:how about 111111 for 2bit-code?Long homopolymers15|2023 SNIA.All Rights Reserved.Issue of 11-codeOn average,encoding density is 1.6 bits/ntBut,an extreme case A sequence of 1111,1111 with an A at the b

15、eginning Then,DNA sequence will be:A ACAC,ACAC Encoding density is 1bits/nt16|2023 SNIA.All Rights Reserved.Observation:to solve the issueFour patterns(i.e.,00,01,10,and 11)have different distributions among sequences1nt/bit is used for the pattern with the lowest percentage.Lower bound case will be

16、 25%for all patterns17|2023 SNIA.All Rights Reserved.Digital Pattern aware code(DP-DNA)Find the lowest-frequency patternUse the corresponding codeFor example,11 has the lowest frequency in a binary sequenceThen,use 11-codeWorst case:All patterns evenly show in a sequenceEncoding density is 1.60 bits

17、/nt 1.57bit/nt18|2023 SNIA.All Rights Reserved.Adding 2bit-code and Using Variable LengthAdding 2bit-code:Ideal encoding density(2bits/nt)If some sequences encoded with 2bits-code have no bio-constraint violations,we can encode those sequences with 2bit-codeEncoding densityVariable Length Ideal enco

18、ding density(2bits/nt)A sequence encoded with 2bits/nt Bio-constraint violationEncoding with 2bit-codewhere 1and 2indicate the code densities of the low-density and high-density codes,respectively.L is the default length of the binary sequence to be encoded.M indicates how many bits are excluded for

19、 the high-density code.Lmeta refers to the number of nucleotides used for metadata such as primer pairs and internal index in DNA strands.19|2023 SNIA.All Rights Reserved.DP-DNA overall design20|2023 SNIA.All Rights Reserved.Experimental resultsDataset Web Database Text Image Video21|2021 Storage De

20、veloper Conference.Insert Company Name Here.All Rights Reserved.Increase Robustness of DNA Storage for ImagesIMG-DNA:approximate dna storage for images11 Bingzhe Li,Li Ou,and David Du.IMG-DNA:approximate dna storage for images.Proceedings of the 14th ACM International Conference on Systems and Stora

21、ge.2021.22|2023 SNIA.All Rights Reserved.High Demand for Storing Images23|2023 SNIA.All Rights Reserved.Small practical tube capacity About 230GB per tube for random-access based DNA storage 1Error prone:Propagation errors 2:One nucleotide error causes a series of errors in its subsequence1 Y.Wei,B.

22、Li,and D.H.Du,“Dna storage:A promising large scale archival storage?”arXiv preprint arXiv:2204.01870,2022.2 B.Li,L.Ou,and D.Du,“Img-dna:approximate dna storage foXr images,”in Proceedings of the 14th ACM International Conference on Systems and Storage,2021,pp.19.Observations of DNA Storage Encoding2

23、4|2023 SNIA.All Rights Reserved.Background of JPEG-based ImageThe 14th ACM International Systems and Storage Conference(Systor21)DCT:Discrete Cosine TransformDPCM:Differential Pulse Code Modulation JFIF:JPEG File Interchange Format DCAC1 Yu-ChunKuo,Ruei-FongChiu,andRen-ShuoLiu.Long-termjpegdataprote

24、ction and recovery for nand flash-based solid-state storage.In 2019 35th Symposium on Mass Storage Systems and Technologies(MSST),pages 141147.IEEE,2019.2 Qianqian Fan,David J Lilja,and Sachin S Sapatnekar.Adaptive-length coding of image data for low-cost approximate storage.IEEE Transactions on Com

25、puters,69(2):239252,2019.Two observation 1,2:Fault toleranceDC and AC coefficients have different influence on the quality of images25|2023 SNIA.All Rights Reserved.Our ContributionsImage-based DNA Storage ArchitectureAC/DC Coefficient Separation at DNA Level Adding Barriers Asymmetric Barriers for

26、AC/DC Coefficients The 14th ACM International Systems and Storage Conference(Systor21)26|2023 SNIA.All Rights Reserved.Image-based DNA Storage ArchitectureThe 14th ACM International Systems and Storage Conference(Systor21)1.AC/DC separation2.Encoding3.Adding barrier4.Chunking&assembling27|2023 SNIA.

27、All Rights Reserved.Adding Barriers and Asymmetric Barriers The 14th ACM International Systems and Storage Conference(Systor21)”AA”as a barrier keeps the error propagation within a partition No two consecutive identical“A”in the rotation encoding scheme The probability of generating”AA”caused by err

28、ors is low Barrier window is used for preventing the errors of insertion and deletionAsymmetric Barriers for AC/DC coefficients Quality:AC/DC have different influence on the quality of images Overhead:The number of ACs is much more than that of DC28|2023 SNIA.All Rights Reserved.Experimental Results

29、Dataset:ImageNetBaselines:1)Raw-DNA;2)Approx-IMG;3)IMG-DNAMetric:SSIM(structural similarity index metric)DNA strand length 250bpEnvironment:A system with Intel i-7-47900 CPU3.6GHz and 8GB memory MATLAB2020a The 14th ACM International Systems and Storage Conference(Systor21)29|2023 SNIA.All Rights Re

30、served.Robustness of Image-based DNA System The 14th ACM International Systems and Storage Conference(Systor21)The SSIM is higher,the quality of images is betterMore results are shown in the paperA graphic view of an image with different encoding schemes(0.1%error rate):Overall comparison:30|2021 St

31、orage Developer Conference.Insert Company Name Here.All Rights Reserved.Increase Robustness and Density of DNA Storage for ImagesHL-DNA:A Hybrid Lossy/Lossless Encoding Scheme to Enhance DNA Storage Density and Robustness for Images11 Yi Li,David HC Du,Li Ou,and Bingzhe Li.HL-DNA:A Hybrid Lossy/Loss

32、less Encoding Scheme to Enhance DNA Storage Density and Robustness for Images.In 2022 IEEE 40th International Conference on Computer Design(ICCD),pp.434-442.IEEE,2022.31|2023 SNIA.All Rights Reserved.MotivationImages are error tolerantDNA storage is error-proneConsider them together32|2023 SNIA.All

33、Rights Reserved.Lossless code designDNA strands need to follow some bio-constraints to avoid high errorsRotation code helps avoid homopolymers(e.g.,AAAA)Lossless code designHigh density area:2bits/ntLow density area:1bits/nt33|2023 SNIA.All Rights Reserved.Lossless code designDNA strands need to fol

34、low some bio-constraints to avoid high errorsRotation code helps avoid homopolymers(e.g.,AAAA)Lossless code designHigh density area:2bits/ntLow density area:1bits/ntCommon first nucleotide34|2023 SNIA.All Rights Reserved.Lossy code design Combine two low density rows together Using four different co

35、des(C10,C11,C00,and C01)Four codes have different error preferences 1X(0)indicates 11 and 10 are both encoded into the same nucleotides but will be decoded back to 1035|2023 SNIA.All Rights Reserved.Partition Scheme:Adding Barrier”A”as a barrier indicator Improve the robustness of DNA storage like 1

36、 Restricts the error propagation in a partition Enable multiple encodings in the same DNA strand to improve the encoding density/reduce error rates induced by the lossy encoding1 B.Li,L.Ou,and D.Du,“Img-dna:approximate dna storage foXr images,”in Proceedings of the 14th ACM International Conference

37、on Systems and Storage,2021,pp.19.36|2023 SNIA.All Rights Reserved.Overall Design of HL-DNA1.Encode binary to nucleotides based on encoding schemeBased on density lossy to select which encoding is used2.Insert”barrier”to the DNA sequence3.Adding the corresponding metadata such as primers,index,ECC,e

38、tc.4.Coding format to indicate multiple encodings in the DNA strand37|2023 SNIA.All Rights Reserved.Experimental Setup Dataset:ImageNet Four schemes:Church et al.1,Organick et al.2,Blawat et al.3,and HL-DNA Metric:Encoding density(bits/nt)SSIM(structural similarity index metric)DNA strand length 300

39、bp1 G.M.Church,Y.Gao,and S.Kosuri,“Next-generation digital information storage in dna,”Science,vol.337,no.6102,pp.16281628,2012.L.Organick,S.D.Ang,Y.-J.Chen,R.Lopez,S.Yekhanin,2 K.Makarychev,M.Z.Racz,G.Kamath,P.Gopalan,B.Nguyen et al.,“Random access in large-scale dna data storage,”Nature biotechnol

40、ogy,vol.36,no.3,p.242,2018.3 M.Blawat,K.Gaedke,I.Huetter,X.-M.Chen,B.Turczyk,S.Inverso,B.W.Pruitt,and G.M.Church,“Forward error correction for dna data storage,”Procedia Computer Science,vol.80,pp.10111022,2016.38|2023 SNIA.All Rights Reserved.Overall encoding density comparisonHL-DNA increases the

41、average encoding density of the previous studies by about 20.2%-89.4%.HL-DNA achieves the highest SSIM,which indicates the best robustness among different schemes.39|2023 SNIA.All Rights Reserved.Robustness of Image-based DNA System The higher the SSIM is,the better the quality of images is.A graphi

42、c view of an image with different encoding schemes(0.5%error rate):40|2023 SNIA.All Rights Reserved.Potential DNA storage researchScalabilityCapabilityEncoding/ECCMicrofluidic systemMore issues:DNA storage preservation Issue of limited read number Performance of sequencing/synthesis API to users41|2

43、023 SNIA.All Rights Reserved.ConclusionsDP-DNA for increase areal densityIMG-DNA is a robust architecture of DNA storage for imagesA hybrid lossy/lossless encoding based DNA storage architecture called HL-DNAPotential DNA storage research directions42|2023 SNIA.All Rights Reserved.Thanks!Q&A43|2023 SNIA.All Rights Reserved.AcknowledgementProf.David DuDr.Li OuYixun WeiAlex SensintaiffarYi Li44|2023 SNIA.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(SNIA-SDC23-Li-Approximate-DNA-Storage.pdf)為本站 (2200) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站