《SESSION 16 - From Processors to Circuits.pdf》由會員分享,可在線閱讀,更多相關《SESSION 16 - From Processors to Circuits.pdf(313頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2024SESSION 16Security:From Processors to Circuits16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference1 of 45A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 2
2、8nm CMOS Technology for FHE-Based Privacy-Preserving Computing Hyunhoon Lee*,Hyeokjun Kwon*,and Youngjoo LeePohang University of Science and Technology,Korea16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE Internat
3、ional Solid-State Circuits Conference2 of 45Outline Introduction RNS-CKKS FHE Scheme Proposed Processor Measurement Results Summary16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits
4、 Conference3 of 45Evolutions of Cloud Computing Cloud computing services grow rapidly From 2024 to 2030,market increases$652B to$1668B*Server-scale AICloud serviceMarket size(USD)20242030500B1000B1500B*https:/ 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based P
5、rivacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference4 of 45Privacy Concerns in Cloud Computing The risk of sensitive data leakage existsSuch as health or financial dataClientsServerDataResultDataResultLeakage hazardAttacker!16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CK
6、KS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference5 of 45Limitation of Traditional Encryption Schemes Traditional encryption schemes cannot process the dataClientsDataEncryptDataDecryptServerUnable to process dataAt
7、tacker?DataData16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference6 of 45Fully Homomorphic Encryption(FHE)Allows computations on the encrypted data directlySupporting unli
8、mited addition and multiplicationSolution for privacy concerns in cloud computingClientsDataResultEncryptDataResultDecryptEncrypted computationAttackerServer?16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE Interna
9、tional Solid-State Circuits Conference7 of 45Limitation of Current FHE-based Computing HE operation is costly with direct ciphertext computationEncrypted mult.consumes 8402x more energy than unencrypted mult.Key-switch is the most complicated operationEnergy consumption(J/data)1001010.10.010.001Key-
10、switch95.9%Multiplication4.1%Unencryptedmult.Encryptedmult.8402x*Intel(R)Xeon(R)Gold 6230 CPU(80core)16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference8 of 45Limitation o
11、f Previous Hardware for FHE Previous hardware for FHE have several limitationsFor FHE-based PPC,cost-efficient and flexible hardware is presentedFHE req.MICRO21ISCA22HPCA23DATE23This workLarge NEfficiencyPrime rangeFabricated16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Tech
12、nology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference9 of 45Outline Introduction RNS-CKKS FHE Scheme Proposed Processor Measurement Results Summary16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Priva
13、cy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference10 of 45RNS-CKKS FHE Scheme RNS-CKKS enables fixed-point arithmeticIt is suitable for Machine-Learning(ML)applicationsClientsServerDataResultEncryptDataResultDecryptEncrypted computation =23.149.8616.1:A 2.7-to-13.3J/boot
14、/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference11 of 45Encryption in RNS-CKKS Fixed-point vector is encrypted into several polynomialsSmall error16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKK
15、S Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference12 of 45Structure of Ciphertext Ciphertext consists of two polynomial setsEach polynomial set consists of +1 residue polynomialsEach residue polynomial is an element
16、of /(+1)A coefficient is residues at the same degree in a polynomial set01*:level of ciphertext:prime number:degree of polynomial16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits C
17、onference13 of 45Hierarchy of FHE operations Bootstrapping enables unlimited computationsIt refreshes error and level of ciphertextIt consists of several primitive HE operations16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computin
18、g 2024 IEEE International Solid-State Circuits Conference14 of 45Primitive HE Operations Include addition,multiplication,and rotationDuring HE operation,the form of the ciphertext changes The key-switch returns it to its original form*Multiplication process16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-
19、CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference15 of 45Key-switch(KS)Operation KS is the most complex primitive operationIt consists of several low-level operationsNTT/iNTT,BConv,and MAC operations*:number of t
20、emporary moduli:(l+1)/16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference16 of 45Key-switch(KS)Operation KS is performed in pre-defined order NTT and BConv occupy most of
21、the KS costsKS cost breakdown(N=216,l+1=24,=8)16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference17 of 45Number Theoretic Transform(NTT)NTT is variant of DFT on It reduces
22、 the complexity of polynomial multiplication Most operations can be performed in NTT-applied formPolynomials remain mostly in NTT-applied form16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-Sta
23、te Circuits Conference18 of 45Basis Conversion(BConv)BConv changes the basis of residue polynomialsBConv is applied to minimize the KS errorsBConv cannot be performed in NTT-applied formTo perform BConv,a series of iNTT-BConv-NTT operations is requiredBConv:=0,1,1 =0,1,012210121BConv16.1:A 2.7-to-13
24、.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference19 of 45Flexibility Requirements of FHE There exist various trade-offsPrime bit-width and precision of output dataPrime bit-width and ci
25、phertext level*=0=0116.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference20 of 45Flexibility Requirements of FHE There exist various trade-offsAs increases,the security leve
26、l decreasesHowever,evaluation key size and KS complexity decrease16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference21 of 45Outline Introduction RNS-CKKS FHE Scheme Propos
27、ed Processor Measurement Results Summary16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference22 of 45Overall Architecture The proposed processor has three features:Instructi
28、on-based programmable core designOptimized KS schedulingCost-reduced computing engines(CEs)On-chip memoryInstruction(352B)FIFOParameter(101KB)NTT(248KB)Top controllerFetcherDecoderCacheComputing enginesArithmetic engineNTT/iNTT engineBConv engine16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Proces
29、sor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference23 of 45KS Scheduling Optimization Conventional schedulingIt computes polynomial sets one by oneIt strictly follows the data dependencies among CEsConventional16.1:A 2.7-to-1
30、3.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference24 of 45KS Scheduling Optimization Intra-set scheduling of residue polynomialsDirectly passes computed residue polynomials to the requi
31、red CEIntra-setschedulingConventional16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference25 of 45KS Scheduling Optimization Inter-set scheduling of residue polynomialsDiffe
32、rent polynomial sets(d0and d1)are processed in parallel d0d1d0+d1+Inter-setschedulingIntra-setscheduling16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference26 of 45KS Sched
33、uling Optimization Using the proposed scheduling optimizationsAchieves low latency and high average engine utilization05101520253035404501020304050607038%15%47%1.9x1.6x1.2xKS Latency(ms)Avg.engine util.(%)Conventional schedulingIntra-set scheduling+Inter-set scheduling*=216,+1=24,=816.1:A 2.7-to-13.
34、3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference27 of 45NTT Engine NTT engine consists of BU array and TF generatorOn-the-fly TF generation is adopted to reduce TF memoryHeterogeneous M
35、odMult architecture is adopted*TF:Twiddle Factor16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference28 of 45NTT Engine Global-level data access patternExternal data reorder
36、 is performed when 21516.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference29 of 45NTT Engine On-the-fly TF generation with less TF seed16.1:A 2.7-to-13.3J/boot/slot Flexibl
37、e RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference30 of 45NTT Engine Heterogeneous ModMult architectureUses different ModMults for different modules16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor i
38、n 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference31 of 45NTT Engine Using the proposed optimizationsSuccessfully reduces cost of NTT engine16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE
39、-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference32 of 45BConv Engine BConv can be represented as a matrix multiplicationResults of the step 1 operation can be reusedThe step 2 operation can be decomposed()=01 1 ,0=0,1,1,=0,1,16.1:A 2.7-to-13.3J/boot/slot Fl
40、exible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference33 of 45CUBConv Engine Unified conversion unit(CU)architectureBuffers keep the results of step 1 operationStep 1 results are used in repetitive step 216
41、.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference34 of 45BConv Engine Merged CU architectureEach CU performs step 2 operation for one coefficient when 16When 16,two CUs pe
42、rform step 2 operation for one coefficient 1617 3216.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference35 of 45BConv Engine Using the proposed optimizationsSuccessfully redu
43、ces cost of BConv engine16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference36 of 45Arithmetic Engine Modular unit array performs various modular operations Automorphism un
44、it calculates automorphism indicesArithmetic engineAutomorphism unitModular unit(MU)arrayMU3MU2MU1MU0MU7MU6MU5MU4MUMU0MU1MU2MU3Param.baMod.sub.Mod.add.Mod.mult.62 62626216.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 I
45、EEE International Solid-State Circuits Conference37 of 45Outline Introduction RNS-CKKS FHE Scheme Proposed Processor Measurement Results Summary16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-S
46、tate Circuits Conference38 of 45Chip Micrograph and Verification Platform The proposed processor was fabricated in 28nm CMOS It is tested on the FPGA-based platform16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE I
47、nternational Solid-State Circuits Conference39 of 45Comparison with Previous WorksCICC18ISSCC19ISSCC23MICRO21ISCA22HPCA23This workPlatformASICASICASICSimulationSimulationFPGAASICTechnology40nm40nm28nm12/14nm12/14nm16nm28nmFrequency300MHz 12-72MHz500MHz1GHz1GHz450MHz333MHzVoltage0.9V0.68-1.1V0.9VN/AN
48、/AN/A1VPower216.5mW 710mW4W/12W113Wa320WN/A180mWArea2.05mm20.28mm242.96mm254.56mm2 a240.5mm2 aN/A11.28mm2FunctionalityApplicationPQCPQCHEFHEFHEFHEFHEHE supportNoNoPaillierRNS-CKKS RNS-CKKSRNS-CKKSRNS-CKKSFlexible parametersBit-width,NBit-width,NBit-widthBit-width,N,l,Bit-width,N,l,Bit-width,N,l,Bit-
49、width,N,l,logN611611N/A14171617Primebit-width32 bit24 bitN/A32 bit28 bit32 bit128#of slots13276832768163843276865536Throughput 769.2boots/s255.8boots/s7.9boots/s1.4boots/s0.4boots/s0.07boots/sEnergy eff.11.5mJ/boota775.7mJ/boota3267.8mJ/boota43.8mJ/boot179.2mJ/boot 874mJ/bootEnergy eff.per slot 1150
50、5J/boot/slot23.7J/boot/slot99.7J/boot/slot2.7J/boot/slot5.5J/boot/slot13.3J/boot/slotaFHE accelerator part onlyblogN not reported,assumed to 14*I/O cost excluded16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE Inte
51、rnational Solid-State Circuits Conference44 of 45Outline Introduction RNS-CKKS FHE Scheme Proposed Processor Measurement Results Summary16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Cir
52、cuits Conference45 of 45Summary The first ASIC RNS-CKKS processor for FHE-based PPCHigh flexibility for various PPC scenariosKS scheduling optimizations for low-latency PPCCost-efficient designs for feasible FHE-based PPC Prototype implemented in 28nm CMOSAchieves 1737x higher KS energy efficiency a
53、nd 25x higher throughput than CPUAttains 5.5J/slot bootstrapping energy consumption(4x lower than state-of-the-art)16.1:A 2.7-to-13.3J/boot/slot Flexible RNS-CKKS Processor in 28nm CMOS Technology for FHE-Based Privacy-Preserving Computing 2024 IEEE International Solid-State Circuits Conference46 of
54、 45Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems1 of 40A 28nm 69.4kOPS 4.4J/Op Versatile Post-Quantum Crypto-Process
55、or Across Multiple Mathematical Problems Yihong Zhu1,2,Wenping Zhu1,Yi Ouyang1,Junwen Sun1,2,Min Zhu3,Qi Zhao1,2,Jinjiang Yang1,Chen Chen1,Qichao Tao1,Guang Yang1,Aoyang Zhang1,Shaojun Wei1,2,Leibo Liu1,21School of Integrated Circuits,Tsinghua University,2Beijing National Research Center for informa
56、tion Science and Technology(BNRist),Beijing,China3Micro Innovation Integrated Circuit Design,Wuxi,China*DS1 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems2 of 40B.S.Electrical En
57、gineering 2014-2018The University of Electronic Science and Technology of China,Chengdu,ChinaPh.D.Electrical Engineering 2018-nowTsinghua University,Beijing,ChinaResearch Interest:Self Introduction of Yihong ZhuSpecialAcknowledgments:Deng Feng Fund.Reconfigurable architecture of cryptography process
58、or,especially PQC processor.The agile mapping of the algorithms to hardware,domain-specific accelerator design.2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems3 of 40Outline Backgr
59、ounds and Motivations System Architecture and Contributions Key Details in Data Path Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems4 of 40O
60、utline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Pro
61、blems5 of 40BackgroundsDigital ApplicationsOnline ShoppingEmail SystemSurfing the InternetSmart CardFoundation of securityTraditional public-key System(Now)RSAECC(Elliptic Curve Cryptography)Diffie-HellmanDSAECDSAMigrationFoundation of securityPost-quantumCryptography(PQC):future public-key mechanis
62、mLattice-based(Kyber)Code-based(McEliece)Hash-based(Sphincs+)Quantum ComputerResistance 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems6 of 40BackgroundsCurrent state and migratio
63、n roadmap of PQC 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems7 of 40BackgroundsCrypto-agility and performance requirement in PQC usage 2024 IEEE International Solid-State Circu
64、its Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems8 of 40BackgroundsExisting PQC hardware designs.1 U.Banerjee et al.,“An Energy-Efficient Reconfigurable DTLS Cryptographic Engine for End-to-End Security in IoT Applications
65、,”ISSCC,20182 Y.Zhu et al.,“A 28nm 48KOPS 3.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems,”ISSCC,2022.3 P.Karl et al.,“Post-Quantum Signatures on RISC-V with Hardware Acceleration”,IACR eprint,2022*Only the hardware design of the verification phase achiev
66、es algorithmic customization.1,Insufficient research.2,Lack of the flexibility.2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems9 of 40MotivationDesign objectives and contributions
67、2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems10 of 40Outline Backgrounds and Motivations System Architecture and ContributionsOverall ArchitectureScalable Clustered Architecture
68、Region-based Task Path Key Details in Data Path(TOC)Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems11 of 40Outline Backgrounds and Motivatio
69、ns System Architecture and ContributionsOverall ArchitectureScalable Clustered ArchitectureRegion-based Task Path Key Details in Data Path(TOC)Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Q
70、uantum Cryptography on Multi-Mathematical Problems12 of 40Overall architectureSystem architecture3,BUF:poly-buffer(Communicate with TOCs)4,MEM:Bulk data storage and in/out2,TOC:task-operatorClusters(PQC calculations)1,TP:task path(task generating and issuing):Region-basedtask path:Task-clustering ar
71、chitecture:PQC task operators and heterogeneous processing array 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems13 of 40Outline Backgrounds and Motivations System Architecture and
72、 ContributionsOverall ArchitectureScalable Clustered ArchitectureRegion-based Task Path Key Details in Data Path(TOC)Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Mul
73、ti-Mathematical Problems14 of 40Clustered ArchitectureBenefits of clustering:1)reuse the ports of BUF and reduce the complexity of the crossbar;2)improve the scalability.Principle of clustering operators together:1)tasks of the same type;2)share the same module.Pipelined cluster design:exploit the p
74、otential parallelism among different data variables.2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems15 of 40Outline Backgrounds and Motivations System Architecture and Contribution
75、sOverall ArchitectureScalable Clustered ArchitectureRegion-based Task Path Key Details in Data Path(TOC)Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematic
76、al Problems16 of 40PQC Task Path(a)Task-mem:the memory storing all the pre-generated tasks.(b)Task Fetcher:fetch the tasks from Task-mem and manage the reading address.(c)Task Updater:dynamically update the tasks,including the unfolding of loops.(d)Task Scheduler:automatically schedule the tasks and
77、 manage the region-based dependency.Task Path:2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems17 of 40Keccak(SHAKE128,len=20,dst+20*i,src0+20*i)2020src0lensrc1dst=01012020(00)Examp
78、le task:(i=1)PQC Task Path(Updater)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems18 of 40E.g.Keccak(SHAKE128,len=20,dst+20,src0+20)Keccak-task in Scheduler:1,check whether the Ha
79、sh cluster is busy.2,check whether the BUF regions src0+20,src0+39 and dst+20,dst+39 conflicts with other running tasks.3,if no conflicts,then issue this Keccak-task to the Hash cluster.PQC Task Path(Scheduler)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile
80、 Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems19 of 40PQC Task Path(Parallelism Optimization)Different types of potential parallelism in the architecture proposed.Vector forwarding strategy to mitigate the RAW conflicts.2024 IEEE International Solid-State Circuits Con
81、ference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems20 of 40Outline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Hash Cluster-Parallel Keccak coreFormat Cluster-Dynamic data-format a
82、lignerSample Cluster-Parallel configurable rejection sampler Arithmetic Cluster-Reusable NTT and FFT hardware Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Math
83、ematical Problems21 of 40Clustered ArchitectureTOC:five clusters:1)Hash Cluster:hash and pseudo-random number generation.2)Sample Logic Cluster:sample functions and a few logical functions.3)Arithmetic Cluster:operators involving expensive crypto-arithmetic calculations.4)Format Logic Cluster:functi
84、ons involving logical calculations and data format conversions.5)Communication Cluster:data movement among outside,MEM,BUF,and TP.2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems22
85、 of 40Outline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Hash Cluster-Parallel Keccak coreFormat Cluster-Dynamic data-format alignerSample Cluster-Parallel configurable rejection sampler Arithmetic Cluster-Reusable NTT and FFT hardware Measurement
86、and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems23 of 40Key Blocks Hash Cluster(Keccak core)Configurable(4x2)-parallel Keccak:1,automatic input padding2,
87、(4x2)-parallel Keccak cores 3,automatic output aligningClock gated while operators are idle.Each core:12 cycles per round of the f-function.Throughput:=1344b*4 cores*550MHz/12 cycles=246Gbps.Area:0.18mm2 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypt
88、o-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems24 of 40Outline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Hash Cluster-Parallel Keccak coreFormat Cluster-Dynamic data-format alignerSample Cluster-Parallel configurable rejec
89、tion sampler Arithmetic Cluster-Reusable NTT and FFT hardware Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems25 of 40Key Blocks Format Clust
90、er(Aligner)Generic data-format transformation1,Hash,In/out 2,Word-width conversionRow-size adapter(any input width to any output width)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Probl
91、ems26 of 40Outline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Hash Cluster-Parallel Keccak coreFormat Cluster-Dynamic data-format alignerSample Cluster-Parallel configurable rejection sampler Arithmetic Cluster-Reusable NTT and FFT hardware Measure
92、ment and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems27 of 40Key Blocks Sample Cluster(Rejection sampler)16-parallel rejection sampler(Arbitrary rejectio
93、n conditionand threshold32b sampling)Configuration path:1,Config-Task configures the registers.2,Run/Communicate-Task reads them.(Len=256;Threshold=3329)Latency:25.1 cycles.Throughput:=256/25.1*12b*550M=67.3GbpsArea:0.02mm2 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS
94、4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems28 of 40Outline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Hash Cluster-Parallel Keccak coreFormat Cluster-Dynamic data-format alignerSample Cluster-Paralle
95、l configurable rejection sampler Arithmetic Cluster-Reusable NTT and FFT hardware Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems29 of 40Key
96、 Blocks Arithmetic ClusterArchitecture of Arithmetic Cluster(AE:Arithmetic Element;FE:Float Element)Triple-fields calculation and leveled execution 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathe
97、matical Problems30 of 40Key Blocks Arithmetic ClusterArithmetic element(AE)Float element(FE)Bit-width:32bit;BIG-MUL:64bitLatency:32 mod-ops per cycleThroughput:=2x32x32x550=1126.4GbpsBit-width:64bit double-precision pointLatency:8 mod-ops per cycleThroughput:=2x64x8x550=563.2Gbps 2024 IEEE Internati
98、onal Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems31 of 40Key Blocks Arithmetic ClusterNTT module and AE arrayFFT module,FE and AE arrays“In-place optimized permutations1”1 Y.Zhu et al.,“A 28nm 48KOPS
99、3.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems,”ISSCC,2022.(W=8;FFT Cycles=1/(2*W)*n*log(n)(W=32;NTT cycles=1/(2*W)*n*log(n)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Crypt
100、ography on Multi-Mathematical Problems32 of 40Outline Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for
101、Post-Quantum Cryptography on Multi-Mathematical Problems33 of 40MeasurementDie photo and chip characteristicsTechnologySupply voltagePackageDie sizeCore areaSRAMLogic gatesHash functionPRNGCrypto-fieldsPowerCryptography CoreChip SpecificationsTSMC 28nm HPC0.9VFCBGA2.2 mm X 3.3 mm110420 mW3.2 mm2228.
102、5KB2.1M(NAND2 equiv.)SHA3-256/384/512CHACHA20/AES/SHAKEZq/Binary/Complex2.2 mm3.3 mmSet-up measurement(Demo Session:demo-video)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems34 of
103、 40MeasurementThroughput of schemes supportedEnergy efficiency of schemes supportedRange:569 KOPS(1Op=Keygen+Encaps+Decaps)Range:4.423030 uJ/Op(1Op=Keygen+Encaps+Decaps)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Crypto
104、graphy on Multi-Mathematical Problems35 of 40MeasurementVoltage Frequency scaling(Kyber512-keygen)(0.71.1V,300750MHz,110850mW)Area breakdown(Total:3.2mm2)AE array(29.3%)FE array(10.3%)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post
105、-Quantum Cryptography on Multi-Mathematical Problems36 of 40ComparisonsComparison with the mainstream CPUs in energy-efficiencyEnergy efficiency improvements(vs.CPUs)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptogra
106、phy on Multi-Mathematical Problems37 of 40Comparisons44.6%Higher throughput19.3%lessenergy-delay productMore flexible 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems38 of 40Outlin
107、e Backgrounds and Motivations System Architecture and Contributions Key Details in Data Path(TOC)Measurement and Comparisons Conclusion 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Prob
108、lems39 of 40ConclusionAgile PQC processor:Task-clustering-based architecture:provide scalability and potential parallelism(Kyber:44.6%higher throughput)Region-based PQC task-path:augment the flexibility improve the scheduling efficiency(7 algorithms in the 4-th round or to be standardized supported.
109、Previous:3 algos)Efficient PQC task operators and heterogeneous processing array:optimize the execution and improve energy efficiency reduce the area overheads(Dilithium:19.3%less EDP;74.6%area is reusable(BUF+MEM+AE/FE)2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4u
110、J/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems40 of 40Thanks for your kind attention!If you have any further questions,please contact:zhuyihon18mails.tsinghua.edu 2024 IEEE International Solid-State Circuits Conference16.2:A 28nm 69.4KOPS 4.4uJ/Op Agile Cryp
111、to-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems41 of 40Please Scan to Rate Please Scan to Rate This PaperThis Paper16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference1 of
112、 153-nm PUF with Multi-Mode Self-Destruction and 3.4810-5Bit Error RateEric Hunt-Schroeder1,2,Parker Lin-Butler2,Amit Degada1,Tian Xia21Marvell Technology,Inc.and 2University of Vermont16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE Inter
113、national Solid-State Circuits Conference2 of 15Outline Physical Unclonable Function(PUF)Introduction Pre-Amplifier PUF SystemAchieving a low bit error rate Multi-mode self-destruct 3-nm Silicon Summary Conclusion16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit
114、 Error Rate 2024 IEEE International Solid-State Circuits Conference3 of 15Physical Unclonable Function Introduction What is a Physical Unclonable Function(PUF)?PUFs provide full entropy cryptographic keys for use in device securityOutput response is derived from local manufacturing variations Key pr
115、operties:intrinsic,randomness,uniqueness,stability Our 3-nm PUF:Custom Pre-Amplifier BitcellSelf-Destruct:Electromigration and Time-Dependent Dielectric BreakdownFull entropy key passing NIST SP800-90B and SP800-22ChallengeResponsePUF16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destru
116、ction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference4 of 15Physical Unclonable Function SystemSafety Lock CircuitryPre-Amplifier Entropy Source with Self-DestructElectromigration CircuitryTime-Dependent Dielectric Breakdown CircuitryState Machine&Control LogicAND
117、Tamper DetectionCircuitry1024-bit Entropy SourceANDAND16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference5 of 15Pre-Amplifier Bitcell(One Data Bit)1WLBitline True(BLT)NTNCPCPTPCCCPTCCVDDWLNPHVSSUn
118、stable BitcellsIn Shaded Region01V(BLT)V(BLC)Bitline Comp.(BLC)1 E.Hunt-Schroeder and T.Xia,12-nm Stable Pre-Amplifier Physical Unclonable Function With Self-Destruct Capability,in IEEE TVLSI Systems,vol.31,no.6,pp.840-850,June 2023.16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruc
119、tion and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference6 of 15Voltage(a.u.)Time(a.u.)WL0=VREADSA_NCCIncrease InDifferential SignalBLTBLCBLTDATAXNDATAXNDATAXNXYSense AmplifierBLT(Bitline True)BLC(Bitline Complement)VDDBLTBLCArray CellsVSS1 BitBLCVDDVSSVREADVSSVDDV(BLT
120、)V(BLC)16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference7 of 15Voltage(a.u.)Time(a.u.)WL0=VREADSA_NCCBLTBLCIncreaseIn SignalBLTDATAXNDATAXNDATAXNXYSense AmplifierBLT(Bitline True)BLC(Bitline Com
121、plement)VDDBLTBLCArray CellsVSS1 BitBLCVDDVSSVREADVSSVDDV(BLT)V(BLC)SA_NCCBLTBLCAdditional Gain NFETs reduce BER16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference8 of 15Multi-Mode Self-Destructio
122、n Motivation for self-destruct capability:Researchers demonstrated ability to clone some PUF topologies 2Critical program information is protected only if PUF key remains safeAdversaries have significant time and resources for tamperingEnd of life recycling and anti-counterfeitFull entropy keyCorrup
123、ted Key2 C.Helfmeier,C.Boit,D.Nedospasov and J.-P.Seifert,Cloning Physically Unclonable Functions,2013 IEEE International Symposium on Hardware-Oriented Security and Trust(HOST),Austin,TX,USA,2013,pp.1-6.16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error R
124、ate 2024 IEEE International Solid-State Circuits Conference9 of 15 High voltage(VPUMP)is applied to bitcell NFETsElevated self-heating effectsHigh current densities along BLSustained high current results in electromigration(EM)Sustained over voltage stress results in Time-Dependent Dielectric Breakd
125、own(TDDB)VDDBLTBLCVSSVDDWL15=2.5VWL0=2.5VVDDVDDVDDVDDWLT_EMN=0VWLC_EMN=0VWLT_EM=VDDWLC_EM=VDDI_BLT Self-DestructI_BLC Self-DestructWLDRVANALOG MUXWLDRVANALOG MUXoffoff16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-Sta
126、te Circuits Conference10 of 15VDDBLTBLCVSSVDDWL15=2.5VWL0=2.5VVDDVDDVDDVDDWLT_EMN=0VWLC_EMN=0VWLT_EM=VDDWLC_EM=VDDI_BLT Self-DestructI_BLC Self-DestructWLDRVANALOG MUXWLDRVANALOG MUXBeforeSelf-DestructAfterSelf-Destruct16 WordlineX6 BitlineDamaged(96 bits)offoff16.3:3-nm Physical Unclonable Function
127、 with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference11 of 15On-Chip High Voltage GeneratorSTARTPOSCAOSCCVH(Regulator Output)Phase shiftOSCAOSCCPUMP APUMP BPUMP CPUMP DVPUMPTo Array CellsPhase Generator APhase Generator BOscillatorANALO
128、G MUX0.51V0.98V1.20V1.35V1.54VSEL*VTRGVIO(1.8V)VIOVH16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference12 of 153-nm Silicon SummaryEach Packaged-Die Contains 4x 1 Kb PUF Arrays16.3:3-nm Physical U
129、nclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference13 of 1550.0149.9149.96Hamming Weight By Test Condition650750850950Voltage(mV)Temperature(C)125850-40Temperature(C)49.91 49.9049.9349.9449.98 49.9849.9549.9250.02 50
130、.0049.9549.9250.01 49.9849.9349.92ROW 3 BitcellBitcellBitcellBitcellBitcellBitcellROW 2 BitcellBitcellBitcellBitcellBitcellBitcellROW 1 BitcellBitcellBitcellBitcellBitcellBitcellROW 0 BitcellBitcellBitcellBitcellBitcellBitcellCOL 0 COL 1 COL 0 COL 1 COL 0 COL 1SenseAmp.0SenseAmp.1SenseAmp.2SenseAmp.
131、3SenseAmp.4SenseAmp.5Q0Q1Q2Local Correlation AnalysisReadout OrderMinimum Entropy(per bit)Data Fast0.764Column Fast0.722Row Fast0.80016.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference14 of 15This
132、 WorkTVLSI 231ISSCC 213ISSCC 214Technology(nm)3122865Entropy SourcePre-AmplifierPre-AmplifierSRAMInv.ChainAdded Features/TechniquesMulti-stage Self-Destruct,BER Reduction NFETsSelf-Destruct,Stable Bit IdentificationSRAM,PUF,TRNGSelf-Check/HealVDD(V)0.650.950.701.000.751.050.701.40Temp.(C)-40 125-40
133、125-25 100-40 125Area(m2)135309954154005150BER(%)0.003480.1741.8 3.783.34E-6Inter-chip HD(%)49.8349.6650.3049.95Hamming Wgt(%)49.9550.7749.80-Minimum Entropy0.7640.6970.9997-1 E.Hunt-Schroeder and T.Xia,12-nm Stable Pre-Amplifier Physical Unclonable Function With Self-Destruct Capability,in IEEE TVL
134、SI Systems,vol.31,no.6,pp.840-850,June 2023.3 S.Taneja,et al.,Unified In-Memory Dynamic TRNG and Multi-Bit Static PUF Entropy Generation for Ubiquitous Hardware Security,2021 IEEE ISSCC,2021,pp.498-500.4 Y.He,et al.,An Automatic Self-Checking and Healing Physically Unclonable Function(PUF)with 310-8
135、 Bit Error Rate,2021 IEEE ISSCC,San Francisco,CA,USA,2021,pp.506-508.16.3:3-nm Physical Unclonable Function with Multi-Mode Self-Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference15 of 15Conclusion&Acknowledgements Physical Unclonable Function System 3-nm
136、 FinFET TechnologyMulti-stage Self-destruct:EM&TDDBLow 3.48 x 10-5Bit Error RateIndustry&University Collaboration Acknowledgements:Special thanks to Darren Anand,Steven Burns,Steven Lamphier,Darrin Hinterneder,Jon Raymond and Blake Hewgill.16.3:3-nm Physical Unclonable Function with Multi-Mode Self-
137、Destruction and 3.4810-5 Bit Error Rate 2024 IEEE International Solid-State Circuits Conference16 of 15Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE International Solid-State Circuits Conference16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reductio
138、n After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference1 of 26High-Density and Low-Power PUF Designs in 5nm Achieving 23 and 39BER Reduction After Unstable Bit Detection and MaskingSudhir Kudva1,Mahmut Ersin Sinangil1,Stephen Tell2,Nikola Nedovic1,Sanquan S
139、ong1,Brian Zimmer1,C.Thomas Gray21Nvidia,Santa Clara,CA2Nvidia,Durham,NCISSCC 202416.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference2 of 26Outline Motivation Previous work
140、 High-Density PUF:Diode clamped inverter PUF Low-Power PUF:Leakage biased inverter PUF Measurement results Conclusion16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference3 of
141、 26Motivation:Economics Revenue loss due to breachesPiracy1Regulatory compliance fines2Reputation loss3 Hardware security module market4Currently at$2Billion$8Billion by 20351https:/ and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE I
142、nternational Solid-State Circuits Conference4 of 26Motivation:Hardware root of trust Hardware root of trust provides additional protection On-die PUF use device variation to generate unique key16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detec
143、tion and Masking 2024 IEEE International Solid-State Circuits Conference5 of 26Motivation:Key generation block Helper and mask data generated during enrollment Stored in fuses on-die:consume large area1 0 0 1?1 0?10 0U U U U M U U M U M U U1 0 0 1 1 0 1 0 0Native PUF Output Generated Mask after Dete
144、ctionMasked PUF OutputUMUnmaskedMasked?Unstable bit16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference6 of 26Motivation:BER Vs Helper Data Higher BER of PUF Larger helper d
145、ata for ECC Large number of masked cell Larger PUF array0123450.0050.010.0150.020.0250.03Helper Data Size KbitsBERHW Cost vs.BER for BCH CodesFuse area increases drastically for higher BERBCH codes not feasible beyond this16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduc
146、tion After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference7 of 26Error detection:Principle PUF Operation:Dev.Variation Compare Amplify Mask Gen:Bias injectiondetect small variation cellsINV1INV2AmplificationSRAM PUFInverter chain PUF16.4:High-Density and Lo
147、w-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference8 of 26Previous work:Bias inj.Supply Bias can be injected using supply voltage terminal Split supply between compared stagesSSCL2018JSSC202316.4:
148、High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference9 of 26Previous work:Bias inj.Non-Supply Capacitance mod.works only with+ve feedback Body bias to emulate temperature variatio
149、nTCAS-1 2020JSSC202016.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference10 of 26This work Goal:Unstable detection technique which is compatible with lower technology nodeEas
150、e of layout Unstable cell detection technique in 2 PUFsHigh Density PUF:Diode clamped inverter PUFLow power PUF:Leakage Biased inverter PUF Digitally controllable variable mask sizes16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Ma
151、sking 2024 IEEE International Solid-State Circuits Conference11 of 26 High-Density PUF:Diode clamp Inv.PUF Similar DC characteristic as inverter Lower power compared to simple inverter chainTCAS 1 201816.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable B
152、it Detection and Masking 2024 IEEE International Solid-State Circuits Conference12 of 26High-Density PUF:Bias injection Bias injection in parallel with diodes NMOS bias injection implementedConfigurationGate VoltagesStage 1 bias injectionMN7,MP7:BiasMN8:GND,MP8:VDDStage 2 bias injectionMN8,MP8:BiasM
153、N7:GND,MP7:VDDNMOS bias injectionMN7,MN8:BiasMP7:VDD,MP8:VDDPMOS bias injectionMP7,MP8:BiasMN7:GND,MN8:GND16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference13 of 26High-De
154、nsity PUF:Bias generation CMOS self-biased Vth reference current source Bias only during enrollment,else pulled to GND16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference14
155、of 26Low-Power PUF:Leakage Biased inv.Core PUF:Inverter with OFF state NMOS ISSCC17 2b current DAC for bias injectionVinBias StageAmplification StageVout2b Current DACDAC StageBuffer Stage16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection
156、and Masking 2024 IEEE International Solid-State Circuits Conference15 of 26Low-Power PUF:Bias injection OFF state T-gate provide high resistance bias=0/1 are 2 states to shift the bias stage output16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit D
157、etection and Masking 2024 IEEE International Solid-State Circuits Conference16 of 26Low-Power PUF:Layout challenges Edge effects has severe impact on bias stage Coupling input node causes oscillationVin(metal)Vout(metal)Cc:Need to minimize to prevent ringingDACAmplification StagesAmplification Stage
158、sBuffer StageBias Stage00.20.40.60.8101252503755006257508751,000Voltage Vtime nsVoutVinUnwanted ringing fixed with careful layout*Representative layout(Not to scale)16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE I
159、nternational Solid-State Circuits Conference17 of 26Die micrograph Implemented in TSMC 5nm node16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference18 of 26Implementation and
160、 Testing 512b PUF arraysDiode clamped inverter PUF area:870um2Leakage biased inverter PUF area:3384um2 Masking overheadDiode clamper inverter PUF:14.2%Leakage biased inverter PUF:21.4%Total 18 chips tested7 FF,5 TT and 6 SS corner chips 16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23
161、and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference19 of 26Measurement:Valid bits High-Density 3b control to adjust the bias injected Average:50 120(23%)bits masked 16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39B
162、ER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference20 of 26Measurement:BER High-Density PUF BER with Voltage and temperature variation 23x BER reduction due masking23x BER reduction 16x BER reduction 16.4:High-Density and Low-Power PUF Design
163、s in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference21 of 26Measurement:intra/inter HD High-Density Inter-die HD remains same,intra-die HD decreases 500 x separation between inter and intra die HD meanUn-maskedMediu
164、m mask strength(Code=4)Max mask strength(Code=8)16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference22 of 26Measurement:Valid bits Low-power PUF 2b control to adjust the bia
165、s injected Average:40 140(27%)bits masked 16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference23 of 26Measurement:BER Low-Power PUF 2b masking control 39x BER reduction for
166、maximum masking16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference24 of 26Measurement:intra/inter HD Low-Power Most of the unstable cells are detected by code 2 500 x separ
167、ation between inter and intra die HD meanUn-maskedMedium mask strength(Code=2)Max mask strength(Code=4)16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference25 of 26Comparison
168、16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Reduction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference26 of 26Conclusion Two different PUF with unstable cell detection and masking in 5nm technologyDiode clamped inverter PUF
169、:low area(582 F2)Leakage biased inverter:low power(8fJ/bit)Single point detection and mask generationMask generation at 0.7V and 30C Works across temperature and voltage variation Enables use of simple ECC with low fuse area16.4:High-Density and Low-Power PUF Designs in 5nm Achieving 23and 39BER Red
170、uction After Unstable Bit Detection and Masking 2024 IEEE International Solid-State Circuits Conference27 of 26Please Scan to Rate Please Scan to Rate This PaperThis Paper16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE Internat
171、ional Solid-State Circuits Conference1 of 45A Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOSYan He,Kaiyuan YangSecure and Intelligent Micro-Systems(SIMS)LabRice University,Houston TX16.5:Synthesizable Design-Agnostic Timing Fault Injection M
172、onitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference2 of 45Fault Injection Attack(FIA)A serious threat to modern computing systemsPhysicalPerturbationComputation ErrorOS Security BypassCryptography Key LP.Qiu,CCS19K.Murdock,S&P20B.Giller,Blackha
173、t1516.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference3 of 45Fundamental Types of FIAsLaser Detection Circuit(LDC)R.Kumar,ISSCC23 Laser Fault(Soft Error)Requires high-cost laser setupLo
174、w-cost countermeasures exist Laser monitor with high detection accuracy Integrated with digital ASICInverter-sized PhotosensorH.Zhang,JSSC2316.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conf
175、erence4 of 45Fundamental Types of FIAs Timing FaultVersatile,low-cost injection methodsDemands for low-cost countermeasure with comprehensive coverageCLK DQtsetupNormaltholdCLK DQHold time violationCLK DQSetup time violationChange Propagation DelayErrorData.CLKDQChange CLK waveform16.5:Synthesizable
176、 Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference5 of 45Timing-FIA:Clock Fault InjectionLocalized Block-Level Injection Function Block(F)Power Management(Buck,LDO,etc.)Voltage InjectionVDDFEM,Heating,F
177、reezingEM/Temp InjectionClock Fault InjectionAdd PulseSkip CycleChange PhaseChange Duty CycleGlitched WaveformT5T7T1T3T9T6T8T2T4T10T11T12 Clock Fault InjectionInduce abnormal clock patterns to incur timing violations 12 basic typesA real attack could be a combination of the basic typesDVFSRogue PLLA
178、ttack external clockHardware TrojanGlitched Clock16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference6 of 45Timing-FIA:Voltage/EM/Temp.Interference Voltage/EM/Temperature InjectionChange
179、 propagation delayLocalized Fault Injection Countermeasure needs to be distributable!Function Block(F)Power Management(Buck,LDO,etc.)VDDVoltage Injection:EM/Temperature Injection:FEM,Heating,FreezingChip16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks
180、 in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference7 of 45Timing-FIA:CountermeasuresLogic CheckingTiming-FIA AgnosticDesign SpecificComplex DesignR.Kumar,ISSCC2316.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE
181、International Solid-State Circuits Conference8 of 45Timing-FIA:CountermeasuresLogic CheckingTiming-FIA AgnosticDesign AgnosticHigh Testing CostNot DistributableClock FIA onlyDesign AgnosticLarge Area&PowerNot DistributableTiming-FIA AgnosticDesign SpecificComplex DesignD.Nemiroff,intel22S.Song,VLSI2
182、2R.Kumar,ISSCC23Physical Anomaly DetectionOver-SamplingFLLLock to N FreqCLKDecision LogicGlitchCLK DQTRCCLKDQGlitch16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference9 of 45Timing-FIA:C
183、ountermeasuresLogic CheckingTiming-FIA AgnosticDesign AgnosticLow CostDistributableTiming-FIA AgnosticDesign AgnosticHigh Testing CostNot DistributableClock FIA onlyDesign AgnosticLarge Area&PowerNot DistributableTiming-FIA AgnosticDesign SpecificComplex DesignD.Nemiroff,intel22S.Song,VLSI22R.Kumar,
184、ISSCC23This workOver-SamplingFLLLock to N FreqCLKDecision LogicGlitchCLK PW CompareDLLCLK Lock to FreqCLKGlitchPhysical Anomaly DetectionDQTRCCLKDQGlitch16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State
185、Circuits Conference10 of 45Design Principles CLK pulse width monitorGenerate CLK Replica Low pass filtered over past few cyclesCreate Acceptance Window WNegand WPosdeterminevariation and noise toleranceCLKGlitch0CLK ReplicaWNegWPos16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Cov
186、ering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference11 of 45Design Principles CLK pulse width monitorGenerate CLK Replica Low pass filtered over past few cyclesCreate Acceptance Window WNegand WPosdeterminevariation and noise toleranceAsserts Glitch signa
187、l if CLK pulse width(PW)is outside the windowCLKGlitch01CLK ReplicaWNegWPos16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference12 of 45Implementation-DMin,DL,DmaxGenerationDLDMinRLRMaxCL
188、KDMaxRMinGlitchGlitch DetectionD QRPLPMinPMaxDQRDQRDQRDQDQDQPMaxDMaxCLKDWNegWPosDMinDLGlitch00RMin,RMaxPPMinPLDLis CLK replica PW(DL)=PW(CLK)Acceptance Window determined by PW difference WNeg=|PW(DL)-PW(DMin)|Wpos=|PW(DL)-PW(DMax)|16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Cov
189、ering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference13 of 45Implementation-RMin,RL,RMaxGenerationFalling edge samplingDetection Logic:if RMin,RMax0,1:Glitch=1else:Glitch=0DLDMinRLRMaxCLKDMaxRMinGlitchGlitch DetectionD QRPLPMinPMaxDQRDQRDQRDQDQDQPMaxDMaxCL
190、KDWNegWPosDMinDLGlitch00RMin,RMaxPPMinPL16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference14 of 45FIA Detection ExampleDetects Tolerance Window ViolationSame-cycle AlertDLDMinRLRMaxCLK
191、DMaxRMinGlitchGlitch DetectionD QRPLPMinPMaxDQRDQRDQRDQDQDQWNeg violationWNegWPos10 WPos violationWNegWPos10Alert issued in same cycle!CLKDGlitchRMin,RMaxFIA16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-St
192、ate Circuits Conference15 of 45FIA Detection CoverageDLDMinRLRMaxCLKDMaxRMinGlitchGlitch DetectionD QRPLPMinPMaxDQRDQRDQRDQDQDQCLKDMinVDDRMinVoltage GlitchLonger delay caused by voltage drop10GlitchClock Injection Attacks Change clock pulse widthVoltage/EM/Temperature Injection Attacks Change propag
193、ation delayFIA16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference16 of 45CLK FIA Coverage-Single Monitor Single monitor detects 9/12 types of clock glitchesThe undetected 3 types do not
194、 change the positive pulse widthClock Injection TypeT1T2T3T4T5T6T7T8T9T10T11T12M1M2CombinaitonUndetected Types:CLKGlitchM1DUTDetection CoverageT9T4T12Skip CycleChange PhaseNormal Positive PW16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS
195、 2024 IEEE International Solid-State Circuits Conference17 of 45CLK FIA Coverage-Dual Inverted MonitorCLKGlitchM1DUTM2DUTClock Injection TypeT1T2T3T4T5T6T7T8T9T10T11T12M1M2CombinaitonDetection Coverage Dual inverted monitor covers all clock glitching typesNot necessary to be placed togetherSuitable
196、for distributed scenario16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference18 of 45Pulse Width LockingPW(DL)=PW(CLK)3-stage digital delay line for fast locking with fine resolutionCoars
197、e MediumFineCLKConfACCPLPMinPMaxConfFSMCDLDLDMinRLRMaxCLKDMaxRMinLocking FSMReadyGlitchGlitch DetectionD QRConfigurable Delay Line(CDL)PLPMinPMaxConfACCConfFSMConfFSMDQRDQRDQRDQDQDQ16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEE
198、E International Solid-State Circuits Conference19 of 45Configurable Delay Line(CDL)-Coarse StageRO-based coarse tuningConverts rising edge to delayed pulseCan be bypassed for high-frequency clock monitoringCoarse 9b Medium8b CLKMedium16b Medium16b ConfMLocking FSMFine 4bDQRDQConfFConfCBypassCCLKConf
199、ACCPLPMinPMaxDLRLCDLCntROBypassFCount=ConfC ENFree Running ROCntROStop16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference20 of 45Configurable Delay Line(CDL)-Medium StagePath-selection-
200、based medium tuning(B.Liu,TCAS-I21)Small inherent delay,medium resolutionUsed for Acceptance Window GenerationCoarse 9b Medium8b CLKMedium16b Medium16b ConfMLocking FSMFine 4bDQRDQConfFConfCBypassCCLKConfACCPLPMinPMaxDLRLCDLCntROBypassFThermometerCodeCount=ConfC ENFree Running ROCntRO100.16Stop16.5:
201、Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference21 of 45Configurable Delay Line(CDL)-Fine StageDigital-varactor-based fine tuningLarge inherent delay,fine resolutionCan be bypassed for hi
202、gh-frequency clock monitoringCoarse 9b Medium8b CLKMedium16b Medium16b ConfMLocking FSMFineDQRDQConfFConfCBypassCCLKConfACCPLPMinPMaxDLRLCDLCntROBypassFThermometerCodeCount=ConfC ENFree Running ROCntRO100.16Stop16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GH
203、z Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference22 of 45Clock Pulse Width Locking FSMStartMediumFineLinear Search NYPatternFound?Stuck at Min?YLocking FSMNReset all conf.Full Range Lin.TrackingRLYNBypassF,BypassC 1ConfSkipCoarseConfC=CntRORLTMVMedium?BypassF?NYReady0CLK0
204、101ReadyPLDLRLPattern=010 3-stage linear locking Full-range linear tracking after locking16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference23 of 45Full Range Linear Tracking Overlapped
205、 delay configurationFull delay coverageCoarse SwitchingDelay ConfigurationInititalFinalDelayMedium SwitchingFine SwitchingDrift Acceptance WindowFalse AlertDrop Full Range Linear Tracking16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 20
206、24 IEEE International Solid-State Circuits Conference25 of 45CMedium=1Delay ConfigurationInititalFinalDelayCFine=9MFine=6ConfSkip:Programmable Skipping ConfigurationsFull Range Linear Tracking16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CM
207、OS 2024 IEEE International Solid-State Circuits Conference26 of 45 Monotonic TransitionNo false alertCMedium=1Delay ConfigurationInititalFinalDelayCFine=9MFine=6ConfSkip:No False Alert!Full Range Linear Tracking16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GH
208、z Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference27 of 45Chip Micrograph65nm CMOSArea:1500m2Fully synthesizable designNo manual layout effort56789341250m30mPattern Gen.&Pulse Adder10DUT 1-1016.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to
209、 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference28 of 45Measured CDL RangeMeasured Delay Range(ps)Min:Step:MaxCoarse 9b w/Bypass130:500:255670Medium 8b100:100:800Fine 4b295:10:475Fine 4b Bypassed130Medium 16b100:100:1600Total Delay(CLK PL)400:100:700,725:10:257075
210、Coarse MediumFineCLKConfACCPLPMinPMaxConfFSMCDLMeasured Min and Max delaySteps extrapolated assuming linear increment16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference29 of 45Locking W
211、aveformRequires 7 26 cycles to ready the monitorLocking StartReady40nsCDL Config.0 0 0X X X3 4 7 8C M F60ns 250MHz=15 cycles16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference30 of 45Lo
212、cking Frequency Range VariationFrequency(MHz)10310210110010-10.60.81.01.21.4Voltage(V)0.4MeanSTD25CFreqMaxFreqMinFrequency(MHz)Mean=2STD=0.2015Count1.61.82.02.22.4 FMin 1000 1200 1400 1600 FMaxMean=1267STD=122.2Frequency(MHz)Count50 DUTs measured at 25C16.5:Synthesizable Design-Agnostic Timing Fault
213、 Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference31 of 45Locking Frequency Range Variation Temperature(C)Frequency(MHz)16001200123 040801201.2V VDDFreqMaxFreqMin50 DUTs measured at 1.2V16.5:Synthesizable Design-Agnostic Timing Fau
214、lt Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference32 of 45GlitchShift Register-based Pattern GeneratorChipPLL2GHzDUT2 DUT1250MHzInjection EnableCLK Glitch Detection-Pattern SweepGlitch type:T1 T12Clock period:4nsGlitch width:500p
215、sT6 0000 1111 0000 0111 0000 11110000 1111 0000 1111 0000 11110000 1111 0100 1111 0000 11110000 1111 0000 1011 0000 11114ns Injected WaveformNo GlitchT1 T2 T3 Glitch Type0000 1111 0001 1111 0000 1111T4 T5 0000 1111 1111 1111 0000 11110000 1111 0000 0000 0000 11110000 1111 0000 1110 0000 11110000 111
216、1 0001 1110 0001 1110T7 T8 T9 0000 1111 1000 0111 1000 0111T10 T11 T12 0000 1111 0000 1111 1000 11110000 1111 0000 1110 0001 11100000 1111 0000 0111 1000 0111500ps 16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International S
217、olid-State Circuits Conference33 of 45CLK Glitch Detection-Pattern SweepAcceptance Window(ps)#Missing Alerts for 100 Trials each Config.100200300400500Glitch TypeT1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 No Glitch600000000000000000000000000000000000000000000000000000000000988310091899698990000010010010
218、0100100100100100100%detection rate with a proper acceptance windowGlitchShift Register-based Pattern GeneratorChipPLL2GHzDUT2 DUT1250MHzInjection Enable16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State C
219、ircuits Conference34 of 45CLK Glitch Detection-Fast Pulse Injection00000000100200.15001600T1 T2 Glitch Type.#Missing Alerts for 100 Trials each Config.Acceptance Window(ps)100ps Glitch300psDUTT1:T2:ChipPLL250MHz4nsSynchronous Pulse AdderRDQInjection EnableGlitch type:T1,T2Clock period:4nsGlitch widt
220、h:100ps100%detection rate16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference35 of 45Voltage Glitch DetectionGlitch Alert9nsPulse Depth=120mVVDD50mV90nsFunction GeneratorVDD1.2VDUTPLL100
221、MHz16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference36 of 45Voltage Glitch DetectionPulse Depth(mV)1002003004005006007008009001000110012001300140015001600110 120 130 140 150 160 170 1
222、80 190Acceptance Window(ps)Electromagnetic pulse(EMP)attack has similar effects(A.Dehbaoui,FDTC12)Function GeneratorVDD1.2VDUTPLL100MHz16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conferenc
223、e37 of 45EMI Fault DetectionRF Signal GeneratorDC supplyBias TeePLLDUT100MHzVDDGlitch AlertVDD1ms40mVEMI off:1.2V DC EMI on:60mV Vpp 10MHz-10dbm RF powerSame clock and EMI frequency as previous reported attack(D.Fujimoto,EMC/APEMC18)16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor C
224、overing 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference38 of 45EMI Fault Detection1002003004005006007008009001000110012001300140015001600-11-10-9-8-7-6-5-4-3-2-101RF Power(dbm)Acceptance Window(ps)Same clock and EMI frequency as previous reported attack(D.
225、Fujimoto,EMC/APEMC18)RF Signal GeneratorDC supplyBias TeePLLDUT100MHzVDD16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference39 of 45Temperature Fault DetectionTemperature(C)Average Slew
226、Rate(C/min)Testing Equipment Monitor ResultHeating Attack25 1221200Hot Air Rework StationGlitch DetectedFreezing Attack25 -11-600Freeze SprayGlitch DetectedTemperature Drift-40 1252Temp.ChamberNo Glitch122.5CGlitch Alert:On-11.6CGlitch Alert:OnHot Air Rework StationFreeze SprayHeatingFreezingNormal
227、ConditionGlitch Alert:Off24.8CChip16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference40 of 45Power across Temperature 040801200.20.30.40.50.60.7Power(mW)50MHz250MHz2.5MHz500MHz0.8Temper
228、ature(C)1.2VCountPower(mW)01550 DUTs 250MHz,1.2V,25C0.40.50.6Mean=0.487STD=0.03616.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference41 of 45Power across VDD0.40.60.81.01.21.40.00.51.01.5
229、2.02.5Power(mW)Voltage(V)25C 5MHz 100MHz 250MHz 500MHz833MHz 1250MHz16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference42 of 45ComparisonTech.ApplicationPrincipleDigital DesignVoltage(V
230、)Temp.(C)Power(mW)Area(MF2)Monitor PrecisionClock FrequencyTarget Attacks ISSCC2344nmAES-256Error CheckingFully Synthesizable0.7525-244.56-0-780MHzAny Fault Attack on AESVLSI2255nmDesign AgnosticHigh-freq.SamplingPartially Digital0.5-1.0250.8025a192FLL Period0-40MHzLow-Freq.Clock This Work65nmDesign
231、 AgnosticPW ComparisonFully Synthesizable0.4-1.4-40-1250.487b0.355DLL Delay Step2MHz-1.26GHzClock,Voltage,EM,Tempa:measured 0.75V VDD,locking to 40MHz clock.b:measured 1.2V VDD,locking to 250MHz clock.16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks i
232、n 65nm CMOS 2024 IEEE International Solid-State Circuits Conference43 of 45ComparisonTech.ApplicationPrincipleDigital DesignVoltage(V)Temp.(C)Power(mW)Area(MF2)Monitor PrecisionClock FrequencyTarget Attacks ISSCC2344nmAES-256Error CheckingFully Synthesizable0.7525-244.56-0-780MHzAny Fault Attack on
233、AESVLSI2255nmDesign AgnosticHigh-freq.SamplingPartially Digital0.5-1.0250.8025a192FLL Period0-40MHzLow-Freq.Clock This Work65nmDesign AgnosticPW ComparisonFully Synthesizable0.4-1.4-40-1250.487b0.355DLL Delay Step2MHz-1.26GHzClock,Voltage,EM,Tempa:measured 0.75V VDD,locking to 40MHz clock.b:measured
234、 1.2V VDD,locking to 250MHz clock.16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference44 of 45ComparisonTech.ApplicationPrincipleDigital DesignVoltage(V)Temp.(C)Power(mW)Area(MF2)Monitor
235、 PrecisionClock FrequencyTarget Attacks ISSCC2344nmAES-256Error CheckingFully Synthesizable0.7525-244.56-0-780MHzAny Fault Attack on AESVLSI2255nmDesign AgnosticHigh-freq.SamplingPartially Digital0.5-1.0250.8025a192FLL Period0-40MHzLow-Freq.Clock This Work65nmDesign AgnosticPW ComparisonFully Synthe
236、sizable0.4-1.4-40-1250.487b0.355DLL Delay Step2MHz-1.26GHzClock,Voltage,EM,Tempa:measured 0.75V VDD,locking to 40MHz clock.b:measured 1.2V VDD,locking to 250MHz clock.16.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE Internationa
237、l Solid-State Circuits Conference45 of 45Conclusions A Low-cost Timing FIA MonitorDesign-agnosticFully synthesizable,no manual layout effortDistributable Prototype in 65nm CMOS2x lower power consumption 500 x smaller area Monitors 2MHz to 1.26GHz clock frequency rangeCovers clock,voltage,EM,and temp
238、erature FIA Demonstration presented in DS116.5:Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOS 2024 IEEE International Solid-State Circuits Conference46 of 45Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE International
239、 Solid-State Circuits Conference16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference1 of 21PACTOR:A Variation-Tolerant Probing-Attack Detector for a 2.5Gb/s4-Channel Chip-to-Chip Int
240、erface in 28nm CMOSMao Li1,Zhaoqing Wang1,Sanu K.Mathew2,Vivek De2,Mingoo Seok11Columbia University2Intel Corporation16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference2 of 21Outlin
241、e Background and challenges Circuits architecture Measurements Conclusion16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference3 of 21Outline Background and challenges Circuits archite
242、cture Measurements Conclusion16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference4 of 21Adversary Tampering in the PCB Level An adversary can place a probe on the PCB trace Steal cri
243、tical information Take over the victim systemBlack Hat 2017 Starting point of other hazardous attacks,e.g.,Side-channel attacksFault-injection attacksSoCMem16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid
244、-State Circuits Conference5 of 21Detection-Driven Protection Normal operation Adversary tampering a PCB trace Attack detected Enable the protection engineProtect transmitted dataDetectoruPProtectionEngineMemoryProtectDetection-driven protectionLess delay and energy overheadvs static protection16.6:P
245、ACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference6 of 21Probe Survey All probes have capacitive loading Probe loadingPassive:3.9 pFActive:0.5 pF Our targetDetect as small as 0.5 pFRobus
246、tly across PVT variations0.536912151001k10k100k Single-ended passive probe Logic probe(multi-channel)Low voltage single-ended probe Low voltage differential probePrice($)Capacitance(pF)16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS
247、2024 IEEE International Solid-State Circuits Conference7 of 21Outline Background and challenges Circuits architecture Measurements Conclusion16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits
248、 Conference8 of 21PACTOR Architecture4-ch-by-2.5-Gbps linkTX3TX chipTX2TX1TX0(link_clk)RX3RX chipRX2RX1RX0(link_clk)Channel3Channel2Channel1Channel016.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State C
249、ircuits Conference9 of 21PACTOR Architecture4-ch-by-2.5-Gbps link4-ch detector on TX and RX.The following focuses on TX detectionTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0RX3RX_EN3RX chipCap comparatorRX2Cap comparatorRX1Cap comparat
250、orRX0(link_clk)Cap comparatorRX_EN3RX_EN2RX_EN2RX_EN1RX_EN1RX_EN0RX_EN0Temp SensorTemp SensorBinary-weighted capacitor arrayChannel3Channel2Channel1Channel0Binary-weighted capacitor arrayFSMFSMEN3:0sw7:0RX_EN3:0RX_sw7:0high/lowhigh/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(
251、link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0RX3RX_EN3RX chipCap comparatorRX2Cap comparatorRX1Cap comparatorRX0(link_clk)Cap comparatorRX_EN3RX_EN2RX_EN2RX_EN1RX_EN1RX_EN0RX_EN0Temp SensorTemp SensorBinary-weighted capacitor arrayChannel3Channel2Channel1Channel0Binary-weighted capacitor arrayFSMFSME
252、N3:0sw7:0RX_EN3:0RX_sw7:0high/lowhigh/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0RX3RX_EN3RX chipCap comparatorRX2Cap comparatorRX1Cap comparatorRX0(link_clk)Cap comparatorRX_EN3RX_EN2RX_EN2RX_EN1RX_EN1RX_EN0RX_EN0Temp SensorTemp S
253、ensorBinary-weighted capacitor arrayChannel3Channel2Channel1Channel0Binary-weighted capacitor arrayFSMFSMEN3:0sw7:0RX_EN3:0RX_sw7:0high/lowhigh/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0RX3RX_EN3RX chipCap comparatorRX2Cap compara
254、torRX1Cap comparatorRX0(link_clk)Cap comparatorRX_EN3RX_EN2RX_EN2RX_EN1RX_EN1RX_EN0RX_EN0Temp SensorTemp SensorBinary-weighted capacitor arrayChannel3Channel2Channel1Channel0Binary-weighted capacitor arrayFSMFSMEN3:0sw7:0RX_EN3:0RX_sw7:0high/lowhigh/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1
255、Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0RX3RX_EN3RX chipCap comparatorRX2Cap comparatorRX1Cap comparatorRX0(link_clk)Cap comparatorRX_EN3RX_EN2RX_EN2RX_EN1RX_EN1RX_EN0RX_EN0Temp SensorTemp SensorBinary-weighted capacitor arrayChannel3Channel2Channel1Channel0Binary-weighted capa
256、citor arrayFSMFSMEN3:0sw7:0RX_EN3:0RX_sw7:0high/lowhigh/lowEach cap comparator compares external loading and internal referenceA shared binary-weighted capacitor arrayA temperature sensor for dynamic thresholdingAn FSM16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chi
257、p-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference10 of 21PACTOR OperationTX transmits data.Pause data transmission and decouple TX from the link.Enable detector and run detection channel by channel.Resume data transmission if all channels are clear.TX3EN3TX chi
258、pCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0Temp SensorBinary-weighted capacitor arrayFSMEN3:0sw7:0high/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0Temp SensorBinary-weighted capa
259、citor arrayFSMEN3:0sw7:0high/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN2EN1EN1EN0EN0Temp SensorBinary-weighted capacitor arrayFSMEN3:0sw7:0high/lowTX3EN3TX chipCap comparatorTX2Cap comparatorTX1Cap comparatorTX0(link_clk)Cap comparatorEN3EN2EN
260、2EN1EN1EN0EN0Temp SensorBinary-weighted capacitor arrayFSMEN3:0sw7:0high/lowPeriodically execute in a short time windowMinimum impact on throughput16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Ci
261、rcuits Conference11 of 21Cap comparatorsw 7:0EN3clk_ch3clk_ch3clk_ch3EN3EN3SR LatchRSref3ext3LatchDQcout3Cap comparatorCap comparatorCap comparatorEN2EN1clkEN0EN2EN1EN0Q1X2X4X8X16X32X64X128XDetector ArchitectureStrong-arm comparator w/o input transistorsInput:Ref.cap and ext.loadingOutput:Regenerati
262、ve comparison resultShared reference capacitor arrayArea savings:70%Binary-search algorithmEight rounds of comparison.Convert the external loading to an eight-bit digital codeChannel-by-channel detectionFSM controls the EN signal for each channelDetect the next channel if the previous one is clear16
263、.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference12 of 21Temperature sensor-based thresholdsDetection results are sensitive to temperature variationsIf we set only one threshold(blu
264、e):Low temp:Missed alarmHigh temp:False alarmVrefclkclkclkVrefSR LatchSRtempVsensQ-20025406080105170175180185190 Dout Temp(oC)0pF 0.5pFMeasurementVDD=0.9VHT thresholdfor temp25CLT thresholdfor temp 25CTemp sensor helps to separate high-temp and low-temp regions Tell the temperature is RT(black dash
265、line)Allow the use of two different thresholds depending on the temperature Facilitate a wider temperature rangeFalse alarmMissed alarm16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Confe
266、rence13 of 21Noise suppression algorithm We repeat detection 2N times and obtain the averageReduce the impact of random noiseIncrease the margin by 5X and largely eliminate false alarmsMargin Ndout HT threshold?YesYesProbing-attack detectedRun detectiontemp=1?dout LT threshold?YesNoYesStartEnd16.6:P
267、ACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference14 of 21Outline Background and challenges Circuits architecture Measurements Conclusion16.6:PACTOR:A Variation-Tolerant Probing-Attack D
268、etector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference15 of 21Minimum Detectable Capacitance(MDC)MDC=71fF at a typical conditionSimilar for all 5 tested chipsMin detectable capacitance+10715000170175180185190195MeasurmentVDD=0.9VTemp
269、=25oCDoutAdded cap(fF)Chip 1 Chip 2 Chip 3 Chip 4 Chip 516.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference16 of 21One-Temperature Point CalibrationAfter the chip is assembled onto
270、the boardSet the temperature to room temperature.Read the detector outputSet the LT threshold equal to the output plus a small margin(e.g.,2).Set the HT threshold based on the simulated linear relationship between the LT and HT thresholdsThe worst-case MDC is 0.5pF across five chipsChip 1Chip 2Chip
271、3Chip 4Chip 50.30.40.50.6MeasurementVDD=0.9V Cap(pF)170 172 174 176 178 180 182 184 186176178180182184186188Chip 1Chip 2Chip 3Chip 4Chip 5HT thresholdLT threshold16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International
272、 Solid-State Circuits Conference17 of 21The Worst-Case MDC Across VT VariationsMDC across VDD and temperature sweeps(0.65-1.1V,-20oC-105oC)Use the same HT and LT thresholds set at the typical conditionFind the the smallest cap that triggers the detection in each combinationWorst-case:0.5pF2002540608
273、01050.650.70.80.911.1VDD(V)Temp(oC)0.080.130.170.220.270.310.360.410.450.50Cap(pF)0.516.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference18 of 21PACTORSSCL23Oksman20Technology28nm28n
274、mFPGAAttack modalityProbing attackProbing attackProbing attackIntegration targetIO cellIO cellMemory controllerMinimum detectable capacitance at a typical condition(pF)0.0711N/AMinimum detectable capacitance across all conditions with the same thresholds(pF)0.52N/AWorking temperature(C)-20-10530-90N
275、/AVDD(V)0.65-1.10.8-0.9N/AIO data rate(bps)2500M160M1066MArea(m2)71413910N/AArea per channel(m2)1785.33910N/APower(mW)0.1450.036840MbpsN/APower per channel(mW)0.03620.036840MbpsN/AComparison14X4X2X4.5X2.3X2.1X16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip
276、 Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference19 of 21Outline Background and challenges Circuits architecture Measurements Conclusion16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE Internati
277、onal Solid-State Circuits Conference20 of 21Conclusion Probing attacks can be very detrimental.We proposed the detection-driven protection.Less energy and delay overhead PACTOR achieves the most precise and robust on-chip probing-attack detection capability.0.5pF worst case MDC-20-105oC,0.65-1.1V16.
278、6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference21 of 21Thank You16.6:PACTOR:A Variation-Tolerant Probing-Attack Detector for a2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOS 2
279、024 IEEE International Solid-State Circuits Conference22 of 21Please Scan to Rate Please Scan to Rate This PaperThis PaperPlease Scan to Please Scan to Rate This PaperRate This Paper16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits
280、 Conference1 of 33A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nmJieun Park,Yong Ki Lee,Karpinskyy Bohdan,Yunhyeok Choi,Jonghoon Shin,Hyo-Gyuem Rhew,Jongshin ShinSamsung Electronics,Republic of Korea16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 I
281、EEE International Solid-State Circuits Conference2 of 33Outline BackgroundCryptographic KeyTRNGTRNG Standard Proposed TRNG structureEntropy Source CellTRNG Structure Mathematical Model Evaluation Results Summary16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE Inter
282、national Solid-State Circuits Conference3 of 33Outline BackgroundCryptographic KeyTRNGTRNG Standard Proposed TRNG structureEntropy Source CellTRNG Structure Mathematical Model Evaluation Results Summary16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International
283、Solid-State Circuits Conference4 of 33Background Cryptographic Key Cryptographic Key is“Root of Trust”All Security starts from Cryptographic KeyProtect the key from attack is importantCryptographicKeyHardwareFirmwareOSAPIProtocol16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4
284、nm 2024 IEEE International Solid-State Circuits Conference5 of 33Background TRNG TRNG(True Random Number Generators)Random number is used as a Cryptographic Key Essential for security protocols and cryptographic algorithmsApplications Data integrity,confidentiality,and authenticity and etc.Requireme
285、nts High entropy Stability across PVT variations High throughput16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference6 of 33Background TRNG Standard Goal of the standardGuarantee the quality of TRNG outputEvaluation criteria
286、 for detecting TRNG entropy degradation Requirements of the standardMathematical model of TRNG For the derivations of the throughputTest result of TRNG16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference7 of 33Outline Backg
287、roundCryptographic KeyTRNGTRNG Standard Proposed TRNG structureEntropy Source CellTRNG Structure Mathematical Model Evaluation Results Summary16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference8 of 33Our Entropy Source Cel
288、l-STRSTR(Self Timed Ring)structureA ring of n stage;each stage comprising Muller-C elementDepending on the status of each node,token delivered to the next stage i-th stage is different as the(i+1)-th stage:i-th stage has token i-th stage is the same as the(i+1)-th stage:i-th stage has bubbleSchemati
289、cM1M2M3M4MnMn-1MFCRFRC0001010111Truth table of Muller-C16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference9 of 33Compare STR-TRNG vs IRO-TRNG STR oscillatorSeveral tokens can be simultaneously propagatedFast oscillationPer
290、formance limit can be ideally indefinitely improvedTime difference between adjacent transition can be set as intended token1token2token15 stage STR-TRNG5 stage IRO-TRNG*IRO(Inverter Ring Oscillator)16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Soli
291、d-State Circuits Conference10 of 33Our Entropy Source Cell-STRDuring STR operation,two effects come into playCharlie effect:closer the separation between the two input signals longer the propagation delay Drafting effect:output of the C value is changed before it reaches VDD or GNDreaction speed is
292、increased and the delay reduced16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference11 of 33Our Entropy Source Cell-STREvenly spaced mode Burst modetVoltagetVoltageNo periodicCharlie effect dominantUniform oscillation16.8:A
293、60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference12 of 33Design Challenge Challenge in an ASIC than in an FPGA Not easy to make it into an evenly spaced modeEvery PVT condition,Charlie effect Drafting effect Forward/reverse del
294、ay ratio has to be insensitive High throughput with small footprint16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference13 of 33TRNG Structure 45 stage STR,Merging,and Sampling partSTR(45 stages)Accumulate JitterMerging Thro
295、ughput can be improvedSampling Merged_C signal captured by sampling clockSTR 1-stageCbLCLC2Cb2Cb1C1C3Cb3Cb2C2C4Cb4Cb3C3Cb43C43C45Cb45Cb44C44C1Cb1CbLCLSTR(45 stages)CL:1MergingMerged_CSamplingDOUTSTR 1-stageSTR 1-stageSTR 1-stageSTR 1-stageSamplingclockProposed TRNGFFbRRbCCbFFbRRbCCbFFbRRbCCbFFbRRbCC
296、bFFbRRbCCb16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference14 of 33STR 1-stageCbLCLC2Cb2Cb1C1C3Cb3Cb2C2C4Cb4Cb3C3Cb43C43C45Cb45Cb44C44C1Cb1CbLCLSTR(45 stages)CL:1MergingMerged_CSamplingDOUTSTR 1-stageSTR 1-stageSTR 1-sta
297、geSTR 1-stageSamplingclockProposed TRNGTRNG Structure STR 1stageBalance in forward/reverse path is importantDesigned to match balance under PVT conditionSTR_MODEbSTR_MODESTR_MODEbSTR_MODESTR_MODEbSTR_MODESTR_MODEbSTR_MODESET_CSET_CSET_CbSET_CbFFbRbRCbCFFbRRbCCb16.8:A 60Mb/s TRNG with PVT-Variation-T
298、olerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference15 of 33TRNG Structure STR 1stage Initialization modeSTR_MODE=L Initialize every stage Set token and bubble(22 tokens)Oscillation modeInitialization modeHLHLHLHLFFbRbRCbCAll offLHLHLHLHHLHLFFbRCbCAll offSET_CSE
299、T_CSET_CbSET_CbIf SET_C=H C=H&Cb=LIf SET_C=L C=L&Cb=H16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference16 of 33TRNG Structure STR 1stage Oscillation mode STR_MODE=H STR OscillatesInitialization modeOscillation modeHLHLHLH
300、LFFbRbRCbCAll offLHLHLHLHHLHLFFbRbRCbCAll offSET_CSET_CSET_CbSET_CbIf SET_C=H C=H&Cb=LIf SET_C=L C=L&Cb=H16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference17 of 33TRNG Structure MergingC1C2CL-1CLMergeSmall jitter output r
301、andomSTR 1-stageCbLCLC2Cb2Cb1C1C3Cb3Cb2C2C4Cb4Cb3C3Cb43C43C45Cb45Cb44C44C1Cb1CbLCLSTR(45 stages)CL:1MergingMerged_CSamplingDOUTSTR 1-stageSTR 1-stageSTR 1-stageSTR 1-stageSamplingclockProposed TRNGjitter16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International
302、 Solid-State Circuits Conference18 of 33STR 1-stageCbLCLC2Cb2Cb1C1C3Cb3Cb2C2C4Cb4Cb3C3Cb43C43C45Cb45Cb44C44C1Cb1CbLCLSTR(45 stages)CL:1MergingMerged_CSamplingDOUTSTR 1-stageSTR 1-stageSTR 1-stageSTR 1-stageSamplingclockProposed TRNGTRNG Structure SamplingTo prevent the signal from being collapsedFir
303、st captured with the sampling clockMerged into Merged_C using XORsFinally sampled to the output DOUTF/FCLXORSampled_CnF/FSampled_C1C1.Merged_CSamplingclockSamplingclock16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference19
304、of 33Outline BackgroundCryptographic KeyTRNGTRNG Standard Proposed TRNG structureEntropy Source CellTRNG Structure Mathematical Model Evaluation Results Summary16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference20 of 33Mat
305、hematical model Sampling timing Worst case scenario in terms of Sampling timingBest case Sampling at the transition pointWorst case Sampling at the iT1/4TSampling timing at the best caseat the worst caseSamplingtiT+T/4iT-T/4TjitterSamplingtiT+T/4iT-T/4Tjitter*T:period of sampling signal16.8:A 60Mb/s
306、 TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference21 of 33Mathematical model Derivation of throughput1.P(1)=i=iTT4iT+T4N 0,total P(1):probability of 1 value in random data T:period of sampling signal total:standard deviation of the ac
307、cumulated jitter at the sampling point2.total=k T=k TSL1NT k:constant value for entropy satisfaction TS:travel time of a token for two cycles of the STR L:number of STR stage NT:number of token16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-Sta
308、te Circuits Conference22 of 33Mathematical model Derivation of throughput3.total2=Cycleneed(Ts)PJ(Ts)2 Cycleneed(TS):number of cycles that a token needs to travel around the STR PJ(Ts):standard deviation of the period jitter for a token during TS4.Cycleneed(TS)=freqS tneed=freqS1throughput freqS:fre
309、quency of a token for traveling two rounds(=1TS)tneed:time of oscillation depending on the throughput16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference23 of 33Mathematical modelEquationSTR-TRNGIRO-TRNGDescriptionThroughpu
310、tHz()()I1)k:constant value for entropy satisfaction2)L:number of STR stage3)NT:number of token4)TS:travel time of a token for two cycles of the STR5)PJ(TS):standard deviation of the period jitter for a token during TS6)PJ(TI):standard deviation of the period jitter for one cycle in the oscillation s
311、ignal=Main difference of STR-TRNG and IRO-TRNG Throughput of STR-TRNG is multiplied by 2 Performance limit can be ideally indefinitely improved16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference24 of 33Outline BackgroundCr
312、yptographic KeyTRNGTRNG Standard Proposed TRNG structureEntropy Source CellTRNG Structure Mathematical Model Evaluation Results Summary16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference25 of 33Evaluation ConditionProcess:
313、4nm Voltage:0.75V10%Temperatures:-40 C,25 C,150 CFrequency:60MHz#of Chips:320(64 chips per process corner)Data size:4kbit for all PVT condition2Mbit(25C)for analysis according to NIST SP 800-90B16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-St
314、ate Circuits Conference26 of 33Evaluation ItemP(1):Probability of 1 value in random data 8bit chi-squared:For determining the difference between observed and expected data Auto correlation function:The correlation of values taken by a random signalMin-entropy(NIST SP 800-90B):Conservative way to mea
315、sure the unpredictability of resultsLower bound on entropyNon-IID Test(NIST SP 800-90B):IID(Independent Identical Distribution)Entropy estimation for 10 types of non-IID test16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Confere
316、nce27 of 33P(1)&8bit chi-squared8 bit X2TrialsProbability of 1Trials0204060801000100200300400150200250300350Result of 8bit X2mean=255.6std=23.0(df is 255)Result of P(1)mean=0.499std=0.030P(1)=0.5(ideal)X2=255(ideal)Test conditions 5 corners+3 voltages+3 temp.320 chips(4Kbit per chip)Test conditions
317、5 corners+3 voltages+3 temp.320 chips(4Kbit per chip)P(1)test8bit chi-squared test16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference28 of 33Auto correlation&Min-entropy-0.0200.020100200300400500ACFLag LengthACF with95%con
318、fidence0.650.70.750.8FFFSNNSFSSProcess cornersMin-entropySupply Voltage0.675V0.750V0.825V0.770.73Auto correlation functionMin-entropy NIST SP 800-90B16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference29 of 33Non-IID 10 typ
319、es of non IID test16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference30 of 33Micrograph of TRNGLayout and Die photo16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-Sta
320、te Circuits Conference31 of 33Outline BackgroundCryptographic KeyTRNGTRNG Standard Proposed TRNG structureEntropy Source CellTRNG Structure Mathematical Model Evaluation Results Summary16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circu
321、its Conference32 of 33Summary Our TRNG achieves the high throughput 60Mbps Robust operation across wide PVT condition Small foot-print Same TRNGs schemes can be used in advanced technologies16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference33 of 33Q&A16.8:A 60Mb/s TRNG with PVT-Variation-Tolerant Design Based on STR in 4nm 2024 IEEE International Solid-State Circuits Conference34 of 33Please Scan to Rate Please Scan to Rate This PaperThis Paper