SESSION 17 Hardware Security.pdf

編號:620850 PDF 177頁 14.22MB 下載積分:VIP專享
下載報告請您先登錄!

SESSION 17 Hardware Security.pdf

1、ISSCC 2025SESSION 17Hardware Security17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference1 of 37Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monit

2、oring with 4.35%Area OverheadHui Zhang*1,Longyang Lin*2,Dingyi Xiong1,Massimo Bruno Alioto1(*equally credited authors)1 National University of Singapore,Singapore2 Southern University of Science and Technology,Shenzhen,China17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage

3、 Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference2 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP A

4、ttacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits

5、Conference3 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demon

6、stration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference4 of 37Laser Voltage Probing Attacks(LVP)Si aplanatic solid immersion lens(AS

7、IL,high N.A.)transistor under LVP cont.-wave laserFWHMlaserspotPO pitchSi die under attack(100m)lens tubeXY scannerbeam profilerbeam polarization moduleIR sourcebeam splitterfrequency mapping:static imagingwaveform averaging:SNR-Tattacktrade-offphoto detector(GHz-BW)0 1 0 1 00 1 0 1 0key$sdynamic ti

8、ming signaltime averaging across acquisitions for SNR40 dB17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference5 of 37Laser Voltage Probing Attacks(LVP)Si aplanatic solid immersion

9、 lens(ASIL,high N.A.)transistor under LVP cont.-wave laserFWHMlaserspotPO pitchSi die under attack(100m)lens tubeXY scannerbeam profilerbeam polarization moduleIR sourcebeam splitterfrequency mapping:static imagingwaveform averaging:SNR-Tattacktrade-offphoto detector(GHz-BW)0 1 0 1 00 1 0 1 0key$sdy

10、namic timing signaltime averaging across acquisitions for SNR40 dB17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference6 of 37Laser Voltage Probing Attacks(LVP)Spatially accurate T

11、emporally accurateSi aplanatic solid immersion lens(ASIL,high N.A.)transistor under LVP cont.-wave laserFWHMlaserspotPO pitchFWHM 58%IOLTS17 2,TCAS-I17 3JSSC23 4,VLSI23 5NRequire dedicated sensorArea overheadDynamic security-performance tradeoff0%0%JSSC23 4,VLSI23 5high in ISQED22 7,IOLTS17 2,TCAS-I

12、17 3Power overheadSpatial coverage100%JSSC23 4,VLSI23 558%IOLTS17 2,TCAS-I17 3JSSC23 4,VLSI23 5NRequire dedicated sensorArea overheadDynamic security-performance tradeoff0%0%JSSC23 4,VLSI23 5high in ISQED22 7,IOLTS17 2,TCAS-I17 3Power overheadSpatial coverage100%JSSC23 4,VLSI23 5 ILKG_PD LASER OFF:I

13、LKG_HDR+ILIG_HDR+ILIG_P1 ILKG_PD LASER OFF:ILKG_HDR+ILIG_HDR+ILIG_P1 ILKG_PD timeVVDDPG=1LASER ONLASER OFFVDDVREF=0.75 VDDSENS_EN=1VVDD at VDD or slightly higherVVDD ILKG_PD LASER OFF:ILKG_HDR+ILIG_HDR+ILIG_P1 ILKG_PD timeVVDDPG=1LASER ONLASER OFFVDDVREF=0.75 VDDSENS_EN=1VVDD at VDD or slightly high

14、erVVDD 95oCVVDD char.hardness(PD size:7X7)INFRARED THERMAL IMAGING OF 7-STAGE 6.7 GHZ RING OSCILLATORS INDUCING HOTSPOTS95oCVVDD char.harness(PD size:7mX7m)17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International So

15、lid-State Circuits Conference18 of 37Post-Silicon Processing for Mounting LVP28nm CMOS FCAu stud bumpback-lapped die(100m)2.5mm2mm6.5mm6mmSi extension to hold SIL Si die back-lapped to 100 m for best optical resolution Customized die slicing for SIL attachment(4-mm extension)17.1:Sensor-Less Laser V

16、oltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference19 of 37LVP Attack SetupASILobjective lensPCB(flip-chip)data recordoscilloscopeSMU/DC-supplytest controllerLVP machinespectrum analyzer&oscilloscope17.

17、1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference20 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Schem

18、e Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monito

19、ring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference21 of 37Evaluation of LVP Attacks:Thermal FieldEasy differentiation of LVP laser(290oC)from on-chip hot spot(95oC)Thermal field spreads wider:25X in FWHM(5 m)Enable large power domain partition small area overhead17.

20、1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference22 of 37Evaluation of LVP Attacks:SNR vs TattackSNR dBR2=0.9897typical SNR:40dB9.76 dB/decTattack s64.3-hr attack needed to recove

21、r 42.6dB SNR gap10k acq./s(max)30%optical intensity(typical)8%optical intensity0.54hrs attack10110210302040LVP attack SNR vs.time/acquisitions 1,319nm low-noise IR laser source Typical SNR target for LVP attack:40dB 64 hrs Tattackto reveal signal of interest by using 8%optical power(design target)at

22、tack unsuccessful after few hours Acquisitions at max allowed rate of 10 kacq./s17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference23 of 37Outline Laser Voltage Probing Attacks(L

23、VP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:S

24、ensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference24 of 37Test Structure w/Extreme Stdcell Config.optical intensity:8%X18VVDDVDDPGtie highoff PMOS(stacked)X18VVDDVDDPGtie lowoff NMOSN

25、R2X18VVDDVDDtie highoff PMOSX18VVDDVDDPGtie lowoff NMOS(stacked)ND2PGINV(TL)INV(TH)close placement to track local hotspotLOGIC GATE/INPUT CONFIGURATIONS WITH EXTREME PMOS/NMOS LEAKAGE COMBINATION IN POWER DOMAIN VVDD measured for parallel-connected and stacked off-P/NMOS to comprehensively study dec

26、ision margin(and across corners)Case study of size of power domain:7m X 7m17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference25 of 37Measured Transient VVDD,W/,W/O Laser,TT020406

27、000.51020406000.516060INV(TH)time s ativePGND2PGVREF=0.75 VDDVREF=0.75 VDD sitive LH ROs at 6.7GHzw/LH ROs at 6.7GHz,LVP mountedtracks125Co23Co95oCVVDD char.hardness(PD size:7X7)GHZ RING OSCILLATORS INDUCING HOTSPOTS Inherent robustness against PVT by large VVDD with laser on/off95oCVVDD char.harnes

28、s(PD size:7mX7m)17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference26 of 37FF cornermeasured CLK period,VDD=1.05VFF,3.4nsdecision margin(30%VDD and 30%VSS to mimic FF,SS)05103.54

29、.56.0124516(1.5%)NR21.191.53performance degradation%246ND26.0174029(1.36%)0.681.362.041.52.56.0122072(0.7%)INV(TL)0.510.85SS cornerdecision margin SS,7nsmeasured CLK period,VDD=1.05V05102.536.0392731(1.9%)NR21.752.10510486.0227515(5.2%)ND22.85.6051011.26.0171142(0.79%)INV(TL)0.70.84#of clk cycles 10

30、3020406000.51VREF=0.75 VDDPGINV(TH)time shighest leakage across configurations W/,W/O LASER,FF CORNERMeasured Decision Margin:TT,SS,FFWorst perf.degrade:5.2%across cornersCorner-specific VREFtrimming is viable off-chip17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift

31、 Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference27 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attack

32、s:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Confe

33、rence28 of 37Full-Area Coverage of LVP:AEScount x103SS corner1.050204060ALM voltage V100%detectionFF corner1.050204060ALM voltage V100%detectionTT corner02040601.05ALM voltage V100%detectionAES fCLK=294 MHzAES fCLK=200MHzAES fCLK=142 MHz1,319nm laser spot FWHM:200nm1m stepoptical intensity:8%total:1

34、010 encryptions128b AES w/run-time leakage shift monitoringVDDplain textcipher textkey$sALMPDPDPDPDPDPDPDPDcomparator output in power domain detectorscomparator output in power domain detectorscomparator output in power domain detectors100%DETECTION IN 480,000 MEASUREMENTS ACROSS CORNERS(VDD=1.05,VR

35、EF=0.75 VDD)LVP:1,319-nm laser,8%optical intensity Total 1010 encryption:100%detection across TT,FF,SS17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference29 of 37Detection Latency

36、00.5100.5100.51time sVVDDPGVVDDattackSENS_EN(sensing enable)ALM(alarm signal)128-BIT AES,VDD=1.05 V,TT CORNER,FCLK=200 MHZvoltage Vvoltage Vvoltage Vworst case full-system latency 2 mYes100%JSSC20 9Backside buried metal1307.5%2Not reportedNot reported0%LFI(1064 nm,pulsed)Post-Si wafer processingYes8

37、 m-40%(limited buried metal pitch)LFI(NIR pulsed)JSSC23 4On-chip stdcell photosensor28150%0.5-1.050.1%(+40%leakage)0%YesYes220 nmYes100%VLSI23 5On-chip stdcellthermal sensor2858%0.9-1.051.01.89c/5.83dNorm.Area Efficiency(Gbps/mm2)31.9434.39b-*1.294.88b13.96c/37.62dFlexibilityBest Area EfficiencyHigh

38、est NTT throughput in silicon-proven works17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted Client-Server Computing 2025 IEEE International Solid-State Circuits Conference37 of 40ComparisonsMICRO211CHES232ISSCC233ESSCIRC234ISSCC

39、245This WorkTechnology12nm(Synthesis)12nm(Synthesis)28nm28nm28nm28nmArea(mm2)151.4a150a42.961.6911.28h5.4Frequency(MHz)1,000a1,000-2,000a5000.5-157333125-625Voltage(V)N/A0.720.900.64-1.101.000.70-1.10Power Consumption(W)180.457.5-115.04.0-12.00.0130.1800.138-1.185Client-Side OperationsBFV-EncThrough

40、put(KOPS)-1.22-41.48eNorm.Energy(J/OP)-10.33-4.05eBFV-DecThroughput(KOPS)-1.95-39.43eNorm.Energy(J/OP)-6.45-4.41eServer-Side OperationsCKKS-HMulThroughput(KOPS)16,667-8.72e/20.92fNorm.Energy(J/OP)58.81-19.11e/39.53fBGV-HAddThroughput(Gbps)5,461.33b172.03g23.55*-36.56fNorm.Energy(pJ/bit)179.85b3,639.

41、56g175.78*-11.41fBGV-PMulThroughput(Gbps)1,780.87b57.34g6.28*-5.74fNorm.Energy (pJ/bit)551.52b10,919.28g659.18*-172.10f27x faster50%energy19x better energy efficiency8x largerPailliar(Partially HE without HMul)17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Enc

42、ryption Processor for Encrypted Client-Server Computing 2025 IEEE International Solid-State Circuits Conference38 of 40Outline Background and Motivations Key Design Challenges System Architecture and Contributions Measurement and Comparisons Conclusion17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfi

43、gurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted Client-Server Computing 2025 IEEE International Solid-State Circuits Conference39 of 40Conclusion A Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor Fabricated in 28nm.Support client-side&server-side operatio

44、ns efficiently.Evaluates various PPML tasks fully on-chip.Enhanced PerformanceImproved NTT throughput in silicon-proven works.Better area-efficiency(2.9x7.7x compared to ISSCC24).17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted

45、 Client-Server Computing 2025 IEEE International Solid-State Circuits Conference40 of 40Thanks for your attention!For further questions,please contact:17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference1 of 42A 30.4 GOPS/mW

46、 MK-CKKS Processor for Secure Multi-Party ComputationLiang-Hsin Lin,Yao-Kai Yang,Chia-Hsiang YangNational Taiwan University17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference2 of 42Outline Introduction Preliminaries System

47、Architecture Algorithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference3 of 42Multi-party Computation(MPC)Key approach for privacy-preserving applications Multiple par

48、ties(users)collaboratively perform computations Each partys data remain secretMulti-party Computation(MPC)Secure Machine Learning 1Secure HealthcareData Analytics 21 D.H.Kang,Scientific Reports,2024 2 M.Yang,Cell Systems,2024 17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2

49、025 IEEE International Solid-State Circuits Conference4 of 42Promising Solution:MK-FHE Three major schemes to implement MPC:Homomorphic encryption(HE)with secret sharing 3Fully homomorphic encryption(FHE)4Multi-key fully homomorphic encryption(MK-FHE)5 MK-FHE is promising for several advantages:Unli

50、mited number of partiesParties can join or leave the computation on the flyNon-interactive communication between parties3 C.Juvekar,USENIX Security,20184 J.H.Cheon,SAC,20195 A.Lpez-Alt,STOC,2013 17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-St

51、ate Circuits Conference5 of 42Robust MK-FHE:MK-CKKS 6 MK-FHE algorithm for fixed-point data using packed ciphertexts Homomorphic operations:computations performed on ciphertextsData are encrypted and remain secret during homomorphic operationsThree levels of homomorphic operations:task,ciphertext,po

52、lynomial6 H.Chen,CCS 2019.17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference6 of 42Toward Efficient MPC Processor CPU:low throughput due to irregular MPC data accessDedicated architecture for efficient MPC is required Dedi

53、cated processors for two-party computation(2PC)7-10 Limited to single-key setups for between two distinct partiesRestricted configuration and lack of scalability This work presents the first MK-CKKS processorMPC with an unlimited number of partiesFlexible and scalable architecture for diverse applic

54、ations7 G.Shi,ISSCC,2023 8 J.Kim,MICRO,2022 9 S.Kim,ISCA,2022 10 H.Lee,ISSCC,202417.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference7 of 42Outline Introduction Preliminaries System Architecture Algorithm-hardware Co-optimiz

55、ations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference8 of 42Number Theoretic Transform(NTT)A radix-N NTT is an FFT-like operation to map a polynomial with degree N-1 to an ordered pair with N

56、 elementsN(logN)/2 butterfly(BF)operations involved=()where 0are the twiddle factors of NTT Transformation between two polynomial domainsCoefficient domain:/+1 and NTT domain:=01 =0,1,117.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circui

57、ts Conference9 of 42Automorphism:Polynomial Permutation Two types of permutations:rotation(ROT)&conjugation(CON)Permutation can be performed in coefficient domain(CD)or NTT domain(ND)Four types of automorphism:ROT-CD,CON-CD,ROT-ND,CON-ND17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Co

58、mputation 2025 IEEE International Solid-State Circuits Conference10 of 42Modular Arithmetic Addition and multiplication in Montgomery reduction with Montgomery factor RCalculates 1 for some large number 2Involves two integer multiplications with bit width close to log Mathematical structure of prime

59、s in MK-CKKSPrimes in MK-CKKS are close to power of two(e.g.,0 x20000000280001)Data precision for homomorphic operations 6 can be increased+,6 H.Chen,CCS 2019,where,=0,1,117.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference1

60、1 of 42Outline Introduction Preliminaries System Architecture Algorithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference12 of 42Computational Framework MPC task can be

61、 decomposed into polynomial-level operations Each operation is performed parallelly with customize instructionData are accessed in a row-wise manner for pipeline execution17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference1

62、3 of 42Design Parameters(W,D)Two pre-defined design parametersW:number of data in each row(=twice of the number of PEs)D:depth of the data bufferInstantiated with(W,D)=(64,64)Larger design parameters lead to higher performanceIdeally,throughput should scale proportional with W(with a varying D)Confi

63、gurable PE 0Configurable PE 31(=W/2-1)Configurable PE 1W x 64 bitBank 0Bank 1DBank 0Bank 1Input BufferOutput Buffer17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference14 of 42System Architecture Single instruction multiple d

64、ata(SIMD)architecture17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference15 of 42Arithmetic Engine Includes a processing element(PE)array and two switch networks Supports the modular arithmetic required for polynomial-level

65、operations17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference16 of 42Automorphism Engine Includes multiple switch networks with a configurable datapath Supports four types of automorphism:ROT-CD,CON-CD,ROT-ND,CON-ND17.3:A 3

66、0.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference17 of 42Instruction Decoder and Data Handler Coordinates the arithmetic and automorphism engines with fine-grained instructions Stores polynomials using double buffering and data m

67、apping to prevent data hazards17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference18 of 42Architectural Features Large configuration space by using fine-grained instructionsMultiple algorithmic parameters and optimizations s

68、upported Simple dataflow for pointwise arithmeticHigh resource utilization with linearly scaled throughput17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference19 of 42Outline Introduction Preliminaries System Architecture Alg

69、orithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference20 of 42Conflict-free Data Mapping A polynomial is stored with different orders for different domainsColumn majo

70、r(CM)order for coefficient domainSplit-row major(SRM)order for NTT domain Enables efficient computational flow for NTT and automorphismData hazard can be eliminated for row-wise accessSize of a polynomial N=M x W17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE Inter

71、national Solid-State Circuits Conference21 of 42NTT:Hazard-free Computational Flow BF operations in radix-N NTT are distributed across 3 phasesIn each phase,BF operations performed in parallel,along a distinct direction to avoid data hazardsThe number of BF operations in each phase is determined by(

72、N,W,D)=,=/=+17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference22 of 42NTT:Row-wise Data Access Two rows of the data buffer are fetched in each cycleData are sent to the PE array to perform W/2 BF operations in parallelData

73、 are then reordered to meet the computational order of NTTThe data mapping gradually changes from CM to SRM17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference23 of 42NTT:Batching for Large Polynomial Size BF operations in e

74、ach phase can be performed in batchesEach batch only requires a portion of the polynomial Batches from the first and second phases are combinedIntermediate result of NTT is fully reused to reduce external memory access17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE

75、 International Solid-State Circuits Conference24 of 42NTT:Performance with Arithmetic Engine Optimal performance is achieved by choosing appropriate D8-to-32x smaller buffer size with same external memory access2x higher throughput with 8x smaller memory compared to 1010 H.Lee,ISSCC,202417.3:A 30.4

76、GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference25 of 42Automorphism:Twiddle Factor Reordering Data no longer remain in the same row after automorphism The order of twiddle factors needed to be rearrangedAutomorphism can now be perf

77、ormed with row-wise accessEach row is moved to another row with intra-row reordering17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference26 of 42Automorphism Engine Area-efficient implementation of intra-row reorderingFour ty

78、pes of intra-row reordering are supported with a configurable datapathReordering decomposed into shifting,re-indexing,sign-flipping operationsThese operations are implemented by switch networks with width WThe most hardware-costly switch network is the barrel shifter(B-shift)17.3:A 30.4 GOPS/mW MK-C

79、KKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference27 of 42Automorphism Engine Batching for large polynomial sizeProposed data mapping is leveraged Performance35x higher throughput than 8(given the same hardware cost)17.3:A 30.4 GOPS/mW MK-CKKS Pr

80、ocessor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference28 of 42Proposed Efficient Modular Reduction Mathematical structure of primes in MK-CKKS is leveraged Proposed method exhibits several properties:=022 0/2+0 22+0 2 22+1 2=2+2+1,64,=2+1,2+1 2where =0+12+

81、222+Correctness =/2 Boundedness|2when|2+for some +3EfficiencyTwo integer multiplications with small bit width17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference29 of 42Modular Reduction Unit Based on proposed modular reduct

82、ion methodTwo integer multipliers with bit width aConfigurable registers for prime-related parameters(e.g.,a,k,n)Saves 49%area compared to Montgomery-based designModular Reduction Unit17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuit

83、s Conference30 of 42Outline Introduction Preliminaries System Architecture Algorithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference31 of 42Chip Micrograph&Summary17.

84、3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference32 of 42Operations for Benchmarking Ciphertext-level operationCiphertext multiplication(including key switching)at max ciphertext level Task-level operationHomomorphic logisti

85、c regression training 1 and oblivious neural network 6 Task-level operation for 2PC:bootstrappingAmortized performance(per ciphertext level per slots)is adoptedWith multiple configuration options to test designs flexibility and scalability6 H.Chen,CCS 2019 1 D.H.Kang,Scientific Reports,202417.3:A 30

86、.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference33 of 42Performance Evaluation:Ciphertext Multiplication Linear key switching process 11 adoptedFor 65 parties,throughput is improved by 64x compared to baseline with quadratic key

87、switching 611 H.Kim,CCS 2023 6 H.Chen,CCS 2019 17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference34 of 42Flexibility Test with Bootstrapping Flexibility demonstrated w/feasible bootstrapping configurationsOptimal performan

88、ce can be achieved according to the target metrics17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference35 of 42Scalability Test with Bootstrapping Scalability demonstrated w/feasible design parametersThroughput scales linearl

89、y with the number of processing elements(PEs)17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference36 of 42Comparison with CPU This work(210MHz)outperforms Intel Platinum CPU(2.6 GHz)14x speedup for homomorphic logistic regres

90、sion training11.5x speedup for oblivious neural network inferenceMulti-key Homomorphic Encryption Logistic Regression Dataset:mobile price classificationInput size:20Output Label:0 or 1Training Data:512Batch size:128 Training Speed(Epochs/minute)This WorkCPU#of Data Provider480.395.60.202.8Oblivious

91、 Neural Network InferenceDataset:MNIST ClassificationInput size:28x28Output Label:0-9 Inference Data:104Batch size:128 Inference Speed(Labels/s)This WorkCPU0.566.48Accuracy:98.4%17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conf

92、erence37 of 42Comparison with State-of-the-Art Designs17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference38 of 42Comparison with State-of-the-Art Designs Supports an unlimited number of parties Offers flexibility and scalab

93、ility17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference39 of 42Comparison with State-of-the-Art Designs Row-wise access for all polynomial-level operations Alternative method replacing Montgomery reduction 97%PE utilizatio

94、n 2.1-to-69x higher energy efficiency 2.4-to-10.2x higher area efficiency17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference40 of 42Performance for Bootstrapping 3.2-to-196x lower amortized energy than state-of-the-art CKKS

95、 processors 810Bootstrapping131Amortized Energy+*MICRO22 8ISCA22 9ISSCC24 10This Work24241.23+:J/bootstrapping/level after bootstrapping/slots*:Normalized to 40nm technology17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conferenc

96、e41 of 42SummaryThe first MK-CKKS processor for MPCUnlimited number of partiesFlexible and scalable for diverse applicationsAlgorithm-hardware Co-optimizationsSIMD architecture with hazard-free row-wise accessNTT:2x higher throughput with 8x smaller memoryAutomorphism:35x higher throughput given the

97、 same hardware costModular reduction:49%less hardware costChip implementation(40nm CMOS)6.72GOPS and 30.4GOPS/W at 210MHz from a 1.3V supply11.5-to-14x speedup(at a 12.4x lower frequency)compared to CPU2.1-to-69x 2.4-to-10.2x higher energy area efficiency than prior 2PC processors17.3:A 30.4 GOPS/mW

98、 MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference42 of 42Acknowledgements This work is supported by National Science and Technology Council(NSTC)of Taiwan and Intelligent&Sustainable Medical Electronics Research Fund in National Taiwan Univ

99、ersity The authors also thank Taiwan Semiconductor Research Institute(TSRI)for technical support on chip design and fabrication17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference1 of 25An Efficient Vth-Tilting PUF De

100、sign in 3nm GAA and 8nm FinFet TechnologiesBohdan Karpinskyy,Yong Ki Lee,Sumin Noh,Yunhyeok Choi,JieunPark,Jisu Kang,Taewook Park,Eunhye Oh,Gapkyung Kim,SunghaLee,Hyunwoo Ko,Jonghoon Shin,Hyo-Gyuem Rhew,Jongshin ShinSamsung Electronics,Republic of Korea17.4:An Efficient Vth-Tilting PUF Design in 3nm

101、 GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference2 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomness Summary17.4:An Efficient Vth-Tilting PUF

102、 Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference3 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&Steadiness UniquenessRandomness Summary17.4:An Efficient

103、Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference4 of 25Background:the role of extensive*PUFPUF response utilization for authentication Benefits:Primarily applicable to PUFs with an extensive challenge-response space.Limitations:Lim

104、ited applicability due to unresolved issues with PUF response modeling.Requires additional mechanisms,including cryptographic methods,to overcome these limitations.*aka“strong”PUF17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circu

105、its Conference5 of 25Background:the role of confined*PUFPUF response for a secret key generation Benefits:The PUF response remains inaccessible when the circuit is powered off.The response is physically and potentially mathematically unclonable.The security strength of the generated key can be enhan

106、ced by utilizing PUF technology.ISO/IEC 20897-1/2:Specifies security requirements and evaluation methods for PUF responses.Limitations:Currently,no security certification body or lab validates PUF-generated secret keys.*aka“weak”PUF17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET T

107、echnologies 2025 IEEE International Solid-State Circuits Conference6 of 25Background:PUF design challenges Entropy sourcePower,Performance and Area considerations for the PUF entropy source.Mitigation of information leakage through side-channels.Key generation infrastructureError-correction mechanis

108、ms to address PUF instability and aging effects.Enrollment procedure,including duration and variation in voltage/temperature(V/T)conditions.Non-Volatile Memory(NVM)for storing PUF helper data Technology/products coverageScalability of PUF design(response size)and portability across technologies.17.4

109、:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference7 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomne

110、ss Summary17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference8 of 25Proposed PUF design Schematic TOPSELmsABYABYPath Gate Cellx32x32tiltingtilting strengthABYx32ABYx32Power balanced samplerLaser Attack Detectors132AB

111、Yx32Power Gating circuitSLEEPVDDGCombination of responses Global power(VDDG)Gated power(VDD)DATAiVALIDLaser AttackDATAtiltingtilting directiontilting strength32323232323223232x32 PUF CELLsVth tilting circuitryValid Checker17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologie

112、s 2025 IEEE International Solid-State Circuits Conference9 of 25Proposed PUF design Schematic TiltingVDDControllable tilt downControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4

113、Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6VDDGNDStage 1(standard cells view)msmsmsmsmsmsnet_0234net_0234net_0234net_0234net_12net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdo

114、wnSELVth1Vth2msRABYABYStage2 Vth tilting circuitryup13down1300down12:0up12:0down22:0up22:000down32:0up32:0up23down23Stage3 Vth tilting circuitryup34down34PUF celltiltingtiltingSW117.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circu

115、its Conference10 of 25Proposed PUF design Schematic TiltingVDDControllable tilt downControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4 Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6V

116、DDGNDStage 1(standard cells view)msmsmsmsmsmsnet_0234net_0234net_0234net_0234net_12net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdownSELVth1Vth2msRABYABYStage2 Vth tilting c

117、ircuitryup13down1300down12:0up12:0down22:0up22:000down32:0up32:0up23down23Stage3 Vth tilting circuitryup34down34PUF celltiltingtiltingSW117.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference11 of 25Proposed PUF design

118、Schematic TiltingVDDControllable tilt downControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4 Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6VDDGNDStage 1(standard cells view)msmsmsmsm

119、smsnet_0234net_0234net_0234net_0234net_12net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdownSELVth1Vth2msRABYABYStage2 Vth tilting circuitryup13down1300down12:0up12:0down22:0

120、up22:000down32:0up32:0up23down23Stage3 Vth tilting circuitryup34down34PUF celltiltingtiltingSW1ms17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference12 of 25Proposed PUF design Schematic TiltingVDDControllable tilt do

121、wnControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4 Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6VDDGNDStage 1(standard cells view)msmsmsmsmsmsnet_0234net_0234net_0234net_0234net_1

122、2net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdownSELVth1Vth2msRABYABYStage2 Vth tilting circuitryup13down1300down12:0up12:0down22:0up22:000down32:0up32:0up23down23Stage3 V

123、th tilting circuitryup34down34PUF celltiltingtiltingSW1ms17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference13 of 25Proposed PUF design PUF Enrollment with Vth Tilting3nm PUF cell SPICE simulation(Monte Carlo).measur

124、ed Vth:mean=0.00034 std=0.01925controllable Vth tilt-downcontrollable Vth tilt-upscreened with tiltingVth,V1=Vth 1 1,1=1 2=1 2,12+22=N 0,122=Vth 2 2,217.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference14 of 25Propose

125、d PUF design Micrograph(3nm GAA example)1.Valid checker,Power BalancedSampler,Attacks Countermeasures 2.Isolation for output portsInput ports and Power gating cellsInput ports and Power gating cells2132 x 32 PUF cells with tilting circuitry17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm

126、FinFET Technologies 2025 IEEE International Solid-State Circuits Conference15 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomness Summary17.4:An Efficient Vth-Tilting PUF Design in 3

127、nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference16 of 25Enrollment&Steadiness PUF Enrollment Efficiency*3nm GAA8nm FinFet*Vtyp,Ttyp operational conditions BER trendsstable responsesBER trendsstable responses17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA

128、 and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference17 of 25Enrollment&Steadiness Vth Tilting PUF Steadiness3nm GAA8nm FinFetTemp,CVoltagePUF Bit Error Rate,%FFFSNNSFSS1500.75V-20%0.0010.0010.0000.0020.0010.75V+20%0.0670.1370.0540.0740.132250.75V-20%0.0080.0030.0090.0

129、050.0080.75V+20%0.0020.0060.0040.0030.008-400.75V-20%0.3050.1610.3130.1920.1590.75V+20%0.0000.0000.0000.0010.00036x times BER reductionTemp,CVoltagePUF Bit Error Rate,%FFFSNNSFSS1250.70V-20%0.0240.0040.0030.0190.0110.70V+20%0.0140.0680.1650.0190.223250.70V-20%0.0120.0040.0120.0040.0150.70V+20%0.0000

130、.0120.0110.0050.041-400.70V-20%0.0920.1220.2000.1570.1810.70V+20%0.0010.0010.0120.0050.01051x times BER reduction17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference18 of 25PUF Uniqueness(3nm GAA)Uniqueness of PUF res

131、ponsesminmeanmaxstd0.42870.49500.56740.0158minmeanmaxstd0.43260.49650.56640.0157minmeanmaxstd0.42190.49620.56540.0157minmeanmaxstd0.43460.49660.57230.0158minmeanmaxstd0.43360.49820.56840.015717.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid

132、-State Circuits Conference19 of 25PUF Uniqueness(8nm FinFet)Uniqueness of PUF responsesminmeanmaxstd0.42090.49780.56540.0156minmeanmaxstd0.42970.49790.56450.0157minmeanmaxstd0.43260.49810.56930.0156minmeanmaxstd0.43650.49800.56250.0156minmeanmaxstd0.43070.49860.56150.015617.4:An Efficient Vth-Tiltin

133、g PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference20 of 25Randomness evaluation of PUF(3nm GAA)TestEntropy per bitMCV*0.892916Collision0.860454Markov0.894155tCompression0.723963T-Tuple0.864429LRS0.988841MultiMCW0.896237Lag0.965752MultiMMC0.892

134、989LZ78Y0.892946H_original0.723963*MCV is a most Common ValueNIST SP 800-90b test summary*T5 test comes from AIS31 package for TRNGLag=1.23751Autocorrelation scorescoreoccurrencesPUF 4-bit evaluation T5*Autocorrelation evaluation17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Tech

135、nologies 2025 IEEE International Solid-State Circuits Conference21 of 25Randomness evaluation of PUF(8nm FinFet)TestEntropy per bitMCV*0.985641Collision0.934789Markov0.988406tCompression0.833262T-Tuple0.925578LRS0.996388MultiMCW0.992699Lag0.996234MultiMMC0.986301LZ78Y0.986111H_original0.833262*MCV i

136、s a most Common ValueNIST SP 800-90b test summary*T5 test comes from AIS31 package for TRNGLag=1.23751Autocorrelation scorescoreoccurrencesPUF 4-bit evaluation T5*Autocorrelation evaluation17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-S

137、tate Circuits Conference22 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomness Summary17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE Intern

138、ational Solid-State Circuits Conference23 of 25SummaryThis work123456Technology3nm(GAA)8nm(FinFet)8nm(finfet)40nm3nm(GAA)5nm(finfet)40nm65nmStabilizing technique Vth tilt,mask.Mask.TMV,Burn-in,mask.Vth tilt,mask.Vth tilt,mask.TMV,Vth tilt.,mask.TMV,mask.,reconfig.PUF entropy sourceINV VthINV VthSRAM

139、 SRAM+pre-amplifierDiode ClampedLeakage inverterHybrid-ROINV VthAdditional featuresAttacks counter.*1),anti-aging Attacks counter.,anti-aging NA*2)Self-descructionAnti-aging NAAnti-agingDesign portabilityGoodGoodGoodLowLowLowLowLowEnrollment cost*3)Low:nominal V/TMed:3V configs.nominal THigh(hardeni

140、ng)NR*2)Low:nominal V/TLowHigh:6 corner V/TMask size,bitsn*1n*1n*1n*1n*1n*3n*2Screening ratio,%=75(accepted range)2427NR23.227.73.6427Temp.,C-40125-40150-4015025110-40125090-40125-40125Operational Voltage,range%20%10%15%-13 +26%17%-30%+40%-58%+16%BER,%0.22270.31272.231.460.003481.180.551.9E-3*3)3.34

141、E-6*3)Inter-PUF HD,norm.0.49820.49780.49650.48600.49830.50100.49900.49630.4995MinEntropy(NIST)0.7240.833NRNR0.764NRNRNRBit Rate,Mbits/s160NRNRNRNRNR22750#of tested chips308320330500020181610(15)*1)Side-channel(power)attacks countermeasures and laser attack detector.*2)NA stands for Not Available and

142、 NR stands for Not Reported.*3)The cost of enrollment is high when corner V/T or the response hardening are required.*4)Not the worst V/T is reported.17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference24 of 25Summary

143、Unified designTechnology portability and verification are simplified by using a standard cell library.The same design has been successfully verified in 3nm GAA and 8nm FinFet technologies.Side-channel attack countermeasures are integrated into the design.Enrollment simplicity and design efficiencyEf

144、ficient PUF enrollment at normal voltage and temperature(V/T)operational conditions.Vth tilting has proven effective for screening and eliminating unstable selections.Reduced NVM requirements compared to the no-tilting version.A robust PUF solution providing low BER and strong Uniqueness/Randomness

145、properties.17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference25 of 25Q&A17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a

146、BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference1 of 13An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOSBjoern Driemeyer,Holger Mandry,David-Peter Wiens,Jo

147、achim Becker,John G.Kauffman,Maurits OrtmannsUniversity of Ulm,Ulm,Germany17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference2 o

148、f 13Introduction to PUFs Each device has randomly different properties by mismatch Physical Unclonable Function:Translation into IC-fingerprint Demand for IC identification Use IC fingerprint110111001010100010111011PUF:Translation into digital fingerprintDevice dependent fingerprintDevice process va

149、riation&local mismatch17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference3 of 13PUF:Fundamental Design Challenge(s)1000101110111

150、00010101011100010101011100010101011TemperatureSupply VoltageUnstable PUF-bit Ideal(weak-)PUF:Constant fingerprint/PUF output Noise causes random fingerprint errors Unstable PUF-bits cause permanent fingerprint errors over changing environmental conditions Account for additional stabilization&noise r

151、eduction100010111011Golden Key:Time100010111011Golden Key:100010111011|Reliable100010011011|Unreliable100010111011|Reliable100010111011|ReliableNoisyPerfectPerfectPerfect17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization

152、 Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference4 of 13SotA:Noise-Robustness and Stabilization Many techniques used for noise reduction&PUF-bit stabilization Noise averaging:Temporal Majority Voting(TMV)Blanking mask:Identified unstable bits removed from

153、fingerprint100010101011Blanking maskPUF-bit averagedUnstable PUF-bit blanked100010111011100010101011100010101011100010101011TemperatureSupply Voltage100010111011Golden Key:Time100010111011Golden Key:100010111011|Reliable100010011011|Unreliable100010111011|Reliable100010111011|Reliable100010111011TMV

154、17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference5 of 13100010111011|Reliable100010101011|Reliable100010101011|Reliable1000101

155、01011|ReliableThis Work:Noise-Robustness and Stabilization Introduce additional information about readout reliability Auto-Error-Detection:Use reliability to only re-readout if necessary Use Auto-Error-Detection under nominal conditions to predict blanking maskAuto-Error DetectionNoise-robust PUFUns

156、table PUF-bit blanked100010101011100010101011Unstable PUF-bit blankedTimeTemperatureSupply Voltage100010111011Golden Key:100010111011Golden Key:100010111011|Reliable100010011011|Unreliable100010111011|Reliable100010111011|ReliableBlanking mask10001011101117.5:An Eye-Opening Arbiter PUF for Fingerpri

157、nt Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference6 of 13 Ring-oscillator PUF with phase-domain(arbiter)readout until phase difference exceeds deadzone(DZ)Oscillation only

158、allowed until either ARB=1 or TEN=0 High bit-rate&low energy per bit readout ARB at end of TENallows to judge the reliability of PUF-BIT Proposed EOA-ArchitectureENRO2DPQPUF-bitARBDRO1TimeQARBPRO1/RO2ENTEN17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-R

159、obust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference7 of 13PUF Stability vs.Temperature&VDDChangeRO1RO2FrequencyTemperature/VDDStable and noise-robustfnomlargefreadlargeRO1RO2Temperature/VDDStable but noisyfnomlargefreadsmal

160、lRO1RO2Unstable and noisyfreadlargefnomsmallcrossing Simulate RO frequency over temperature+supply-voltage variation Statistic of fnomand VT-gradient(accounts environmental range)Predict minimum fnom,minto ensure RO-pair stability Same considerations for minimum fread,minfor noise robustness Set bot

161、h fnom,min,fread,minby combination of DZ and TENTemperature/VDD17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference8 of 13Propose

162、d Two Phase PUF Operation-IENRO2DPQPUF-bitARBDRO1TimeARBARBRO1/2EN10nsRO1/21.Enrolment phase:Perform once at nominal operating condition Use short TEN,nom=10ns Investigate ARB after TEN,nom ARB=1 mark RO-pair as stable in blanking mask ARB=0 mark RO-pair as unstable in blanking mask17.5:An Eye-Openi

163、ng Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference9 of 13Proposed Two Phase PUF Operation-IIENRO2DPQPUF-bitARBDRO1TimeARBARBRO1/2EN25nsRO1/22.Re

164、adout Phase:Perform always after enrolment,at any conditions Blanking mask:Run only stable RO-pairs with TEN,read=25ns ARB=1 accept PUF-bit readout ARB=0 repeat readout Auto-Error Detection17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking an

165、d Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference10 of 13Manufactured Chip Fabrication in 28nm bulk-CMOS Single Arbiter shared by 4 RO-pairs(Area per PUF-bit:34.84m2)Total of 480 separate EOA-cells per die(16 dies measured in total)Enrol

166、ment-phase performed at 20C and 0.9V supply voltage 903 PUF-bit per die(47%)marked found stable on average17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International So

167、lid-State Circuits Conference11 of 13Evaluation of PUF Stability Investigated environmental range:-40C-125C,10%VDD HTOL aging equivalent to 4 year cont.usage Raw BER between 4.68%(nominal)and 7.9%(worst-case)Both(b),(c)reduce BER(x2 Auto-Error Detection,3e-4 Mask)Using(b)and(c)combined Resulting BER

168、=2e-6%(a)Raw(b)Auto-Error Detection(c)Mask only(d)Both(b)+(c)Bit-Error-Rate%17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference1

169、2 of 13Comparison to Prior Art17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference13 of 13Conclusion Eye-Opening Arbiter PUF:Osci

170、llation until the accumulated phase difference exceeds deadzone Simulation:frequency thresholds for stability&noise Frequency thresholds:deadzone/enable duration combination Enrolment phase:Find always stable RO-pairs Blanking Mask Readout phase:Auto-Error Detection noise-robustness The resulting BE

171、R is only 2e-8 17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference1 of 19A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in

172、 a 3nm FinFET ProcessNandish Mehta1,Stephen Tell2,Sanquan Song1,Sudhir Kudva1,Brian Zimmer1,Mahmut Sinangil1,C.Thomas Gray21NVIDIA,Santa Clara,CA2NVIDIA,Durham,NC17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE Int

173、ernational Solid-State Circuits Conference2 of 19SoC Hardware Security:What is it?Hardware counter measures preventing unauthorized access,tampering,extraction of sensitive data,or malicious modifications to SoC.Voltage,clock,EM,laser,body-bias etc.Physical attacksFaultinjectionDevice underAttackSec

174、urity BreachimpactFaults alter intended SoC behaviorRevealing encryption keyBypass of secure authentication,etc17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference3 of 19So

175、C Hardware Security:Clock-Glitch InjectionClock-glitches commonly injected:(1)During Run-time(2)or during boot-upClock-glitches induces transient faults Forces SoC to reveal encrypted key or by-pass boot authentication Often a single fault is enough!C.H.Kim DATE0717.6:A 100MHz Self-Calibrating RC Os

176、cillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference4 of 19Clock-GlitchesFew examples:Y.He ISSCC24Locks to input CKREF Detects injected pulse and clock stop Hard to capture FM attacks Needs standalone on-chip

177、 oscillator During RuntimeCountermeasures for Clock-Glitch Attacks17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference5 of 19Countermeasures for Clock-Glitch AttacksBoot-on

178、-RO:Boot-up using on-chip oscillatorRequires fairly accurate osc.freq.Needs trimming and fusesOn-chip oscillatorN.Mehta JSSC22Clock-GlitchesDuring Boot-upFew examples:17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEE

179、E International Solid-State Circuits Conference6 of 19Design GoalsProblem-1:Runtime clock-glitch detectionProblem-2:Oscillator for Boot-on-RO applicationSingle proven and validated IPNeedSolution1.Cycle-by-Cycle detectionMulti-phase VCOS.Song VLSI22 2.High-resolutionNeedSolution1.Supply sensitivityD

180、igital-freq locked loop D.S.Truesdell JSSC21 2.Low Freq drift3.Temp.sensitivityLow temp-co thin-film resistors(Hi-R)B.S.Lien ISSCC244.Process spread0-trim Background self-calibration17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET P

181、rocess 2025 IEEE International Solid-State Circuits Conference7 of 19Conceptual view of the Proposed SolutionMulti-phase VCO oversamples CKREF by 4N High-resolution glitch detectCKREF is a stable clock source as it is derived from a crystal oscillatorIf no attack CKREF calibrates RC oscillator impro

182、ving its accuracy17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference8 of 19Proposed Architecture of RC OscillatorLinear search FSM cancels comparator offset and locks VCO

183、to RC core72-bit PMOS DAC 6b coarse and 64b fine tunes FVCO17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference9 of 19Efficient layout of RC cores Hi-R resistors stacked on

184、 metal capacitorsProposed Architecture of RC OscillatorMultiple phases of VCO samples CKREF17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference10 of 19Operation of Glitch-D

185、etectorSample and Sync logic aligns all samples into a 1 clock domain(CKSAMP)Clock-pulses narrower than 1/(4FVCO)are filtered by clock-glitch filterWH0WL0WH0WL017.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE Inter

186、national Solid-State Circuits Conference11 of 19Sample and Sync logic aligns all samples into a 1 clock domain(CKSAMP)Clock-pulses narrower than 1/(4FVCO)are filtered by clock-glitch filterWH0WL0WH1WL1WL0Operation of Glitch-Detector17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch

187、 Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference12 of 19Self-Calibration LogicRC Core-A/B switched into self-calibrationStable 100 MHz CKOUT from crystal oscillatorComp.offset calibrated12-bit resistor DAC(R-DAC)8b coarse and 4b fine Co

188、mpensates process and temperature17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference13 of 19Die Micrograph and Test setupDie flip-chip attached TSMC 3nm FinFET chip packag

189、ed on an organic substrate Optical image after back side substrate thinning17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference14 of 19Glitch Detection for various external

190、 attacksClock-glitch injected17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference15 of 19False Negative using On-Chip Glitch Generator Glitch generator adds glitches to CKR

191、EF as per the pattern code False negatives Glitches close to rising/falling edge Sensitivity of glitch detection FVCO/CKREF17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Confere

192、nce16 of 19 Supply regulator can reduce frequency variation with supply Temperature sensitivity limited by Hi-R resistors non-zero temp-coFrequency Stability w.Supply and Temperature17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET P

193、rocess 2025 IEEE International Solid-State Circuits Conference17 of 19Effectiveness of Self-Calibration Compensates spread due to process and temperature Accuracy is limited by quantization of the 12-bit R-DAC17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware

194、 Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference18 of 19Performance SummaryThisWorkS.PanISSCC24W.ChoiISSCC21N.MehtaJSSC22K.ParkISSCC23A.KhashabaJSSC22A.DelkeJSSC23Process3nm FinFET180nmCMOS65nmCMOS 5nm FinFET65nmCMOS 65nmCMOS 130nm HVCMOS SOIFrequency MHz1003

195、22877100 3270Power mW0.890.130.1420.840.1420.0340.21Supply range V0.75 0.95 1.7 2.00.85 1.051.1 1.351.1 1.31.1 2.33 3.6Supply sensitivity ppm/V1221300029002000*140080*92*Temp.Range C-20 125-40 125-40 85-40 125-40 85-40 85-63 165Frequency Error%0.260.090.020.30.0760.110.0084Period Jitter ps11.618.77-

196、5.122.314.5No.of samples122812814618Calibration TypeOn-chip self2-point trim2-point trim2-point trim 2-point trim 2-point trim 1-point trimClock Glitch detection?YesNoArea mm20.002250.0280.060.01520.220.180.69*Includes an on-chip LDO *From 16 samples Estimated from plots17.6:A 100MHz Self-Calibratin

197、g RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference19 of 19Conclusions A 100 MHz RC oscillator for hardware security applications capable of clock-glitch attack detectionstable oscillation frequency enabled by digital frequency-locked loophigh 0-trim accuracy using background self-calibrationAcknowledgements:We thank MSDV team of Nvidia,Santa Clara,for equipment and test support with special thanks to Neil Pham,Lamar Tatro,and Andy Tran.Thank you for your time and attention!

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(SESSION 17 Hardware Security.pdf)為本站 (張5G) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站