《SESSION 17 Hardware Security.pdf》由會員分享,可在線閱讀,更多相關《SESSION 17 Hardware Security.pdf(177頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2025SESSION 17Hardware Security17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference1 of 37Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monit
2、oring with 4.35%Area OverheadHui Zhang*1,Longyang Lin*2,Dingyi Xiong1,Massimo Bruno Alioto1(*equally credited authors)1 National University of Singapore,Singapore2 Southern University of Science and Technology,Shenzhen,China17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage
3、 Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference2 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP A
4、ttacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits
5、Conference3 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demon
6、stration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference4 of 37Laser Voltage Probing Attacks(LVP)Si aplanatic solid immersion lens(AS
7、IL,high N.A.)transistor under LVP cont.-wave laserFWHMlaserspotPO pitchSi die under attack(100m)lens tubeXY scannerbeam profilerbeam polarization moduleIR sourcebeam splitterfrequency mapping:static imagingwaveform averaging:SNR-Tattacktrade-offphoto detector(GHz-BW)0 1 0 1 00 1 0 1 0key$sdynamic ti
8、ming signaltime averaging across acquisitions for SNR40 dB17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference5 of 37Laser Voltage Probing Attacks(LVP)Si aplanatic solid immersion
9、 lens(ASIL,high N.A.)transistor under LVP cont.-wave laserFWHMlaserspotPO pitchSi die under attack(100m)lens tubeXY scannerbeam profilerbeam polarization moduleIR sourcebeam splitterfrequency mapping:static imagingwaveform averaging:SNR-Tattacktrade-offphoto detector(GHz-BW)0 1 0 1 00 1 0 1 0key$sdy
10、namic timing signaltime averaging across acquisitions for SNR40 dB17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference6 of 37Laser Voltage Probing Attacks(LVP)Spatially accurate T
11、emporally accurateSi aplanatic solid immersion lens(ASIL,high N.A.)transistor under LVP cont.-wave laserFWHMlaserspotPO pitchFWHM 58%IOLTS17 2,TCAS-I17 3JSSC23 4,VLSI23 5NRequire dedicated sensorArea overheadDynamic security-performance tradeoff0%0%JSSC23 4,VLSI23 5high in ISQED22 7,IOLTS17 2,TCAS-I
12、17 3Power overheadSpatial coverage100%JSSC23 4,VLSI23 558%IOLTS17 2,TCAS-I17 3JSSC23 4,VLSI23 5NRequire dedicated sensorArea overheadDynamic security-performance tradeoff0%0%JSSC23 4,VLSI23 5high in ISQED22 7,IOLTS17 2,TCAS-I17 3Power overheadSpatial coverage100%JSSC23 4,VLSI23 5 ILKG_PD LASER OFF:I
13、LKG_HDR+ILIG_HDR+ILIG_P1 ILKG_PD LASER OFF:ILKG_HDR+ILIG_HDR+ILIG_P1 ILKG_PD timeVVDDPG=1LASER ONLASER OFFVDDVREF=0.75 VDDSENS_EN=1VVDD at VDD or slightly higherVVDD ILKG_PD LASER OFF:ILKG_HDR+ILIG_HDR+ILIG_P1 ILKG_PD timeVVDDPG=1LASER ONLASER OFFVDDVREF=0.75 VDDSENS_EN=1VVDD at VDD or slightly high
14、erVVDD 95oCVVDD char.hardness(PD size:7X7)INFRARED THERMAL IMAGING OF 7-STAGE 6.7 GHZ RING OSCILLATORS INDUCING HOTSPOTS95oCVVDD char.harness(PD size:7mX7m)17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International So
15、lid-State Circuits Conference18 of 37Post-Silicon Processing for Mounting LVP28nm CMOS FCAu stud bumpback-lapped die(100m)2.5mm2mm6.5mm6mmSi extension to hold SIL Si die back-lapped to 100 m for best optical resolution Customized die slicing for SIL attachment(4-mm extension)17.1:Sensor-Less Laser V
16、oltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference19 of 37LVP Attack SetupASILobjective lensPCB(flip-chip)data recordoscilloscopeSMU/DC-supplytest controllerLVP machinespectrum analyzer&oscilloscope17.
17、1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference20 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Schem
18、e Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monito
19、ring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference21 of 37Evaluation of LVP Attacks:Thermal FieldEasy differentiation of LVP laser(290oC)from on-chip hot spot(95oC)Thermal field spreads wider:25X in FWHM(5 m)Enable large power domain partition small area overhead17.
20、1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference22 of 37Evaluation of LVP Attacks:SNR vs TattackSNR dBR2=0.9897typical SNR:40dB9.76 dB/decTattack s64.3-hr attack needed to recove
21、r 42.6dB SNR gap10k acq./s(max)30%optical intensity(typical)8%optical intensity0.54hrs attack10110210302040LVP attack SNR vs.time/acquisitions 1,319nm low-noise IR laser source Typical SNR target for LVP attack:40dB 64 hrs Tattackto reveal signal of interest by using 8%optical power(design target)at
22、tack unsuccessful after few hours Acquisitions at max allowed rate of 10 kacq./s17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference23 of 37Outline Laser Voltage Probing Attacks(L
23、VP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attacks:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:S
24、ensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference24 of 37Test Structure w/Extreme Stdcell Config.optical intensity:8%X18VVDDVDDPGtie highoff PMOS(stacked)X18VVDDVDDPGtie lowoff NMOSN
25、R2X18VVDDVDDtie highoff PMOSX18VVDDVDDPGtie lowoff NMOS(stacked)ND2PGINV(TL)INV(TH)close placement to track local hotspotLOGIC GATE/INPUT CONFIGURATIONS WITH EXTREME PMOS/NMOS LEAKAGE COMBINATION IN POWER DOMAIN VVDD measured for parallel-connected and stacked off-P/NMOS to comprehensively study dec
26、ision margin(and across corners)Case study of size of power domain:7m X 7m17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference25 of 37Measured Transient VVDD,W/,W/O Laser,TT020406
27、000.51020406000.516060INV(TH)time s ativePGND2PGVREF=0.75 VDDVREF=0.75 VDD sitive LH ROs at 6.7GHzw/LH ROs at 6.7GHz,LVP mountedtracks125Co23Co95oCVVDD char.hardness(PD size:7X7)GHZ RING OSCILLATORS INDUCING HOTSPOTS Inherent robustness against PVT by large VVDD with laser on/off95oCVVDD char.harnes
28、s(PD size:7mX7m)17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference26 of 37FF cornermeasured CLK period,VDD=1.05VFF,3.4nsdecision margin(30%VDD and 30%VSS to mimic FF,SS)05103.54
29、.56.0124516(1.5%)NR21.191.53performance degradation%246ND26.0174029(1.36%)0.681.362.041.52.56.0122072(0.7%)INV(TL)0.510.85SS cornerdecision margin SS,7nsmeasured CLK period,VDD=1.05V05102.536.0392731(1.9%)NR21.752.10510486.0227515(5.2%)ND22.85.6051011.26.0171142(0.79%)INV(TL)0.70.84#of clk cycles 10
30、3020406000.51VREF=0.75 VDDPGINV(TH)time shighest leakage across configurations W/,W/O LASER,FF CORNERMeasured Decision Margin:TT,SS,FFWorst perf.degrade:5.2%across cornersCorner-specific VREFtrimming is viable off-chip17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift
31、 Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference27 of 37Outline Laser Voltage Probing Attacks(LVP)On-Chip LVP Detection,Sensor vs.Sensor-Less Proposed Sensor-Less Detection Scheme Test Setup,&Post-Silicon Processing for Mounting LVP Evaluation of LVP Attack
32、s:Thermal,SNR vs Tattack Evaluation of VVDD with Extreme Stdcell Config.Full Demonstration:AES Comparison With Prior Arts Conclusion17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Confe
33、rence28 of 37Full-Area Coverage of LVP:AEScount x103SS corner1.050204060ALM voltage V100%detectionFF corner1.050204060ALM voltage V100%detectionTT corner02040601.05ALM voltage V100%detectionAES fCLK=294 MHzAES fCLK=200MHzAES fCLK=142 MHz1,319nm laser spot FWHM:200nm1m stepoptical intensity:8%total:1
34、010 encryptions128b AES w/run-time leakage shift monitoringVDDplain textcipher textkey$sALMPDPDPDPDPDPDPDPDcomparator output in power domain detectorscomparator output in power domain detectorscomparator output in power domain detectors100%DETECTION IN 480,000 MEASUREMENTS ACROSS CORNERS(VDD=1.05,VR
35、EF=0.75 VDD)LVP:1,319-nm laser,8%optical intensity Total 1010 encryption:100%detection across TT,FF,SS17.1:Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35%Area Overhead 2025 IEEE International Solid-State Circuits Conference29 of 37Detection Latency
36、00.5100.5100.51time sVVDDPGVVDDattackSENS_EN(sensing enable)ALM(alarm signal)128-BIT AES,VDD=1.05 V,TT CORNER,FCLK=200 MHZvoltage Vvoltage Vvoltage Vworst case full-system latency 2 mYes100%JSSC20 9Backside buried metal1307.5%2Not reportedNot reported0%LFI(1064 nm,pulsed)Post-Si wafer processingYes8
37、 m-40%(limited buried metal pitch)LFI(NIR pulsed)JSSC23 4On-chip stdcell photosensor28150%0.5-1.050.1%(+40%leakage)0%YesYes220 nmYes100%VLSI23 5On-chip stdcellthermal sensor2858%0.9-1.051.01.89c/5.83dNorm.Area Efficiency(Gbps/mm2)31.9434.39b-*1.294.88b13.96c/37.62dFlexibilityBest Area EfficiencyHigh
38、est NTT throughput in silicon-proven works17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted Client-Server Computing 2025 IEEE International Solid-State Circuits Conference37 of 40ComparisonsMICRO211CHES232ISSCC233ESSCIRC234ISSCC
39、245This WorkTechnology12nm(Synthesis)12nm(Synthesis)28nm28nm28nm28nmArea(mm2)151.4a150a42.961.6911.28h5.4Frequency(MHz)1,000a1,000-2,000a5000.5-157333125-625Voltage(V)N/A0.720.900.64-1.101.000.70-1.10Power Consumption(W)180.457.5-115.04.0-12.00.0130.1800.138-1.185Client-Side OperationsBFV-EncThrough
40、put(KOPS)-1.22-41.48eNorm.Energy(J/OP)-10.33-4.05eBFV-DecThroughput(KOPS)-1.95-39.43eNorm.Energy(J/OP)-6.45-4.41eServer-Side OperationsCKKS-HMulThroughput(KOPS)16,667-8.72e/20.92fNorm.Energy(J/OP)58.81-19.11e/39.53fBGV-HAddThroughput(Gbps)5,461.33b172.03g23.55*-36.56fNorm.Energy(pJ/bit)179.85b3,639.
41、56g175.78*-11.41fBGV-PMulThroughput(Gbps)1,780.87b57.34g6.28*-5.74fNorm.Energy (pJ/bit)551.52b10,919.28g659.18*-172.10f27x faster50%energy19x better energy efficiency8x largerPailliar(Partially HE without HMul)17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Enc
42、ryption Processor for Encrypted Client-Server Computing 2025 IEEE International Solid-State Circuits Conference38 of 40Outline Background and Motivations Key Design Challenges System Architecture and Contributions Measurement and Comparisons Conclusion17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfi
43、gurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted Client-Server Computing 2025 IEEE International Solid-State Circuits Conference39 of 40Conclusion A Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor Fabricated in 28nm.Support client-side&server-side operatio
44、ns efficiently.Evaluates various PPML tasks fully on-chip.Enhanced PerformanceImproved NTT throughput in silicon-proven works.Better area-efficiency(2.9x7.7x compared to ISSCC24).17.2:A 28nm 4.05J/Encryption 8.72kHMul/s Reconfigurable Multi-Scheme Fully Homomorphic Encryption Processor for Encrypted
45、 Client-Server Computing 2025 IEEE International Solid-State Circuits Conference40 of 40Thanks for your attention!For further questions,please contact:17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference1 of 42A 30.4 GOPS/mW
46、 MK-CKKS Processor for Secure Multi-Party ComputationLiang-Hsin Lin,Yao-Kai Yang,Chia-Hsiang YangNational Taiwan University17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference2 of 42Outline Introduction Preliminaries System
47、Architecture Algorithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference3 of 42Multi-party Computation(MPC)Key approach for privacy-preserving applications Multiple par
48、ties(users)collaboratively perform computations Each partys data remain secretMulti-party Computation(MPC)Secure Machine Learning 1Secure HealthcareData Analytics 21 D.H.Kang,Scientific Reports,2024 2 M.Yang,Cell Systems,2024 17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2
49、025 IEEE International Solid-State Circuits Conference4 of 42Promising Solution:MK-FHE Three major schemes to implement MPC:Homomorphic encryption(HE)with secret sharing 3Fully homomorphic encryption(FHE)4Multi-key fully homomorphic encryption(MK-FHE)5 MK-FHE is promising for several advantages:Unli
50、mited number of partiesParties can join or leave the computation on the flyNon-interactive communication between parties3 C.Juvekar,USENIX Security,20184 J.H.Cheon,SAC,20195 A.Lpez-Alt,STOC,2013 17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-St
51、ate Circuits Conference5 of 42Robust MK-FHE:MK-CKKS 6 MK-FHE algorithm for fixed-point data using packed ciphertexts Homomorphic operations:computations performed on ciphertextsData are encrypted and remain secret during homomorphic operationsThree levels of homomorphic operations:task,ciphertext,po
52、lynomial6 H.Chen,CCS 2019.17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference6 of 42Toward Efficient MPC Processor CPU:low throughput due to irregular MPC data accessDedicated architecture for efficient MPC is required Dedi
53、cated processors for two-party computation(2PC)7-10 Limited to single-key setups for between two distinct partiesRestricted configuration and lack of scalability This work presents the first MK-CKKS processorMPC with an unlimited number of partiesFlexible and scalable architecture for diverse applic
54、ations7 G.Shi,ISSCC,2023 8 J.Kim,MICRO,2022 9 S.Kim,ISCA,2022 10 H.Lee,ISSCC,202417.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference7 of 42Outline Introduction Preliminaries System Architecture Algorithm-hardware Co-optimiz
55、ations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference8 of 42Number Theoretic Transform(NTT)A radix-N NTT is an FFT-like operation to map a polynomial with degree N-1 to an ordered pair with N
56、 elementsN(logN)/2 butterfly(BF)operations involved=()where 0are the twiddle factors of NTT Transformation between two polynomial domainsCoefficient domain:/+1 and NTT domain:=01 =0,1,117.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circui
57、ts Conference9 of 42Automorphism:Polynomial Permutation Two types of permutations:rotation(ROT)&conjugation(CON)Permutation can be performed in coefficient domain(CD)or NTT domain(ND)Four types of automorphism:ROT-CD,CON-CD,ROT-ND,CON-ND17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Co
58、mputation 2025 IEEE International Solid-State Circuits Conference10 of 42Modular Arithmetic Addition and multiplication in Montgomery reduction with Montgomery factor RCalculates 1 for some large number 2Involves two integer multiplications with bit width close to log Mathematical structure of prime
59、s in MK-CKKSPrimes in MK-CKKS are close to power of two(e.g.,0 x20000000280001)Data precision for homomorphic operations 6 can be increased+,6 H.Chen,CCS 2019,where,=0,1,117.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference1
60、1 of 42Outline Introduction Preliminaries System Architecture Algorithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference12 of 42Computational Framework MPC task can be
61、 decomposed into polynomial-level operations Each operation is performed parallelly with customize instructionData are accessed in a row-wise manner for pipeline execution17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference1
62、3 of 42Design Parameters(W,D)Two pre-defined design parametersW:number of data in each row(=twice of the number of PEs)D:depth of the data bufferInstantiated with(W,D)=(64,64)Larger design parameters lead to higher performanceIdeally,throughput should scale proportional with W(with a varying D)Confi
63、gurable PE 0Configurable PE 31(=W/2-1)Configurable PE 1W x 64 bitBank 0Bank 1DBank 0Bank 1Input BufferOutput Buffer17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference14 of 42System Architecture Single instruction multiple d
64、ata(SIMD)architecture17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference15 of 42Arithmetic Engine Includes a processing element(PE)array and two switch networks Supports the modular arithmetic required for polynomial-level
65、operations17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference16 of 42Automorphism Engine Includes multiple switch networks with a configurable datapath Supports four types of automorphism:ROT-CD,CON-CD,ROT-ND,CON-ND17.3:A 3
66、0.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference17 of 42Instruction Decoder and Data Handler Coordinates the arithmetic and automorphism engines with fine-grained instructions Stores polynomials using double buffering and data m
67、apping to prevent data hazards17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference18 of 42Architectural Features Large configuration space by using fine-grained instructionsMultiple algorithmic parameters and optimizations s
68、upported Simple dataflow for pointwise arithmeticHigh resource utilization with linearly scaled throughput17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference19 of 42Outline Introduction Preliminaries System Architecture Alg
69、orithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference20 of 42Conflict-free Data Mapping A polynomial is stored with different orders for different domainsColumn majo
70、r(CM)order for coefficient domainSplit-row major(SRM)order for NTT domain Enables efficient computational flow for NTT and automorphismData hazard can be eliminated for row-wise accessSize of a polynomial N=M x W17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE Inter
71、national Solid-State Circuits Conference21 of 42NTT:Hazard-free Computational Flow BF operations in radix-N NTT are distributed across 3 phasesIn each phase,BF operations performed in parallel,along a distinct direction to avoid data hazardsThe number of BF operations in each phase is determined by(
72、N,W,D)=,=/=+17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference22 of 42NTT:Row-wise Data Access Two rows of the data buffer are fetched in each cycleData are sent to the PE array to perform W/2 BF operations in parallelData
73、 are then reordered to meet the computational order of NTTThe data mapping gradually changes from CM to SRM17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference23 of 42NTT:Batching for Large Polynomial Size BF operations in e
74、ach phase can be performed in batchesEach batch only requires a portion of the polynomial Batches from the first and second phases are combinedIntermediate result of NTT is fully reused to reduce external memory access17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE
75、 International Solid-State Circuits Conference24 of 42NTT:Performance with Arithmetic Engine Optimal performance is achieved by choosing appropriate D8-to-32x smaller buffer size with same external memory access2x higher throughput with 8x smaller memory compared to 1010 H.Lee,ISSCC,202417.3:A 30.4
76、GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference25 of 42Automorphism:Twiddle Factor Reordering Data no longer remain in the same row after automorphism The order of twiddle factors needed to be rearrangedAutomorphism can now be perf
77、ormed with row-wise accessEach row is moved to another row with intra-row reordering17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference26 of 42Automorphism Engine Area-efficient implementation of intra-row reorderingFour ty
78、pes of intra-row reordering are supported with a configurable datapathReordering decomposed into shifting,re-indexing,sign-flipping operationsThese operations are implemented by switch networks with width WThe most hardware-costly switch network is the barrel shifter(B-shift)17.3:A 30.4 GOPS/mW MK-C
79、KKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference27 of 42Automorphism Engine Batching for large polynomial sizeProposed data mapping is leveraged Performance35x higher throughput than 8(given the same hardware cost)17.3:A 30.4 GOPS/mW MK-CKKS Pr
80、ocessor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference28 of 42Proposed Efficient Modular Reduction Mathematical structure of primes in MK-CKKS is leveraged Proposed method exhibits several properties:=022 0/2+0 22+0 2 22+1 2=2+2+1,64,=2+1,2+1 2where =0+12+
81、222+Correctness =/2 Boundedness|2when|2+for some +3EfficiencyTwo integer multiplications with small bit width17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference29 of 42Modular Reduction Unit Based on proposed modular reduct
82、ion methodTwo integer multipliers with bit width aConfigurable registers for prime-related parameters(e.g.,a,k,n)Saves 49%area compared to Montgomery-based designModular Reduction Unit17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuit
83、s Conference30 of 42Outline Introduction Preliminaries System Architecture Algorithm-hardware Co-optimizations Chip Implementation Summary17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference31 of 42Chip Micrograph&Summary17.
84、3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference32 of 42Operations for Benchmarking Ciphertext-level operationCiphertext multiplication(including key switching)at max ciphertext level Task-level operationHomomorphic logisti
85、c regression training 1 and oblivious neural network 6 Task-level operation for 2PC:bootstrappingAmortized performance(per ciphertext level per slots)is adoptedWith multiple configuration options to test designs flexibility and scalability6 H.Chen,CCS 2019 1 D.H.Kang,Scientific Reports,202417.3:A 30
86、.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference33 of 42Performance Evaluation:Ciphertext Multiplication Linear key switching process 11 adoptedFor 65 parties,throughput is improved by 64x compared to baseline with quadratic key
87、switching 611 H.Kim,CCS 2023 6 H.Chen,CCS 2019 17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference34 of 42Flexibility Test with Bootstrapping Flexibility demonstrated w/feasible bootstrapping configurationsOptimal performan
88、ce can be achieved according to the target metrics17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference35 of 42Scalability Test with Bootstrapping Scalability demonstrated w/feasible design parametersThroughput scales linearl
89、y with the number of processing elements(PEs)17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference36 of 42Comparison with CPU This work(210MHz)outperforms Intel Platinum CPU(2.6 GHz)14x speedup for homomorphic logistic regres
90、sion training11.5x speedup for oblivious neural network inferenceMulti-key Homomorphic Encryption Logistic Regression Dataset:mobile price classificationInput size:20Output Label:0 or 1Training Data:512Batch size:128 Training Speed(Epochs/minute)This WorkCPU#of Data Provider480.395.60.202.8Oblivious
91、 Neural Network InferenceDataset:MNIST ClassificationInput size:28x28Output Label:0-9 Inference Data:104Batch size:128 Inference Speed(Labels/s)This WorkCPU0.566.48Accuracy:98.4%17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conf
92、erence37 of 42Comparison with State-of-the-Art Designs17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference38 of 42Comparison with State-of-the-Art Designs Supports an unlimited number of parties Offers flexibility and scalab
93、ility17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference39 of 42Comparison with State-of-the-Art Designs Row-wise access for all polynomial-level operations Alternative method replacing Montgomery reduction 97%PE utilizatio
94、n 2.1-to-69x higher energy efficiency 2.4-to-10.2x higher area efficiency17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference40 of 42Performance for Bootstrapping 3.2-to-196x lower amortized energy than state-of-the-art CKKS
95、 processors 810Bootstrapping131Amortized Energy+*MICRO22 8ISCA22 9ISSCC24 10This Work24241.23+:J/bootstrapping/level after bootstrapping/slots*:Normalized to 40nm technology17.3:A 30.4 GOPS/mW MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conferenc
96、e41 of 42SummaryThe first MK-CKKS processor for MPCUnlimited number of partiesFlexible and scalable for diverse applicationsAlgorithm-hardware Co-optimizationsSIMD architecture with hazard-free row-wise accessNTT:2x higher throughput with 8x smaller memoryAutomorphism:35x higher throughput given the
97、 same hardware costModular reduction:49%less hardware costChip implementation(40nm CMOS)6.72GOPS and 30.4GOPS/W at 210MHz from a 1.3V supply11.5-to-14x speedup(at a 12.4x lower frequency)compared to CPU2.1-to-69x 2.4-to-10.2x higher energy area efficiency than prior 2PC processors17.3:A 30.4 GOPS/mW
98、 MK-CKKS Processor for Secure Multi-Party Computation 2025 IEEE International Solid-State Circuits Conference42 of 42Acknowledgements This work is supported by National Science and Technology Council(NSTC)of Taiwan and Intelligent&Sustainable Medical Electronics Research Fund in National Taiwan Univ
99、ersity The authors also thank Taiwan Semiconductor Research Institute(TSRI)for technical support on chip design and fabrication17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference1 of 25An Efficient Vth-Tilting PUF De
100、sign in 3nm GAA and 8nm FinFet TechnologiesBohdan Karpinskyy,Yong Ki Lee,Sumin Noh,Yunhyeok Choi,JieunPark,Jisu Kang,Taewook Park,Eunhye Oh,Gapkyung Kim,SunghaLee,Hyunwoo Ko,Jonghoon Shin,Hyo-Gyuem Rhew,Jongshin ShinSamsung Electronics,Republic of Korea17.4:An Efficient Vth-Tilting PUF Design in 3nm
101、 GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference2 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomness Summary17.4:An Efficient Vth-Tilting PUF
102、 Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference3 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&Steadiness UniquenessRandomness Summary17.4:An Efficient
103、Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference4 of 25Background:the role of extensive*PUFPUF response utilization for authentication Benefits:Primarily applicable to PUFs with an extensive challenge-response space.Limitations:Lim
104、ited applicability due to unresolved issues with PUF response modeling.Requires additional mechanisms,including cryptographic methods,to overcome these limitations.*aka“strong”PUF17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circu
105、its Conference5 of 25Background:the role of confined*PUFPUF response for a secret key generation Benefits:The PUF response remains inaccessible when the circuit is powered off.The response is physically and potentially mathematically unclonable.The security strength of the generated key can be enhan
106、ced by utilizing PUF technology.ISO/IEC 20897-1/2:Specifies security requirements and evaluation methods for PUF responses.Limitations:Currently,no security certification body or lab validates PUF-generated secret keys.*aka“weak”PUF17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET T
107、echnologies 2025 IEEE International Solid-State Circuits Conference6 of 25Background:PUF design challenges Entropy sourcePower,Performance and Area considerations for the PUF entropy source.Mitigation of information leakage through side-channels.Key generation infrastructureError-correction mechanis
108、ms to address PUF instability and aging effects.Enrollment procedure,including duration and variation in voltage/temperature(V/T)conditions.Non-Volatile Memory(NVM)for storing PUF helper data Technology/products coverageScalability of PUF design(response size)and portability across technologies.17.4
109、:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference7 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomne
110、ss Summary17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference8 of 25Proposed PUF design Schematic TOPSELmsABYABYPath Gate Cellx32x32tiltingtilting strengthABYx32ABYx32Power balanced samplerLaser Attack Detectors132AB
111、Yx32Power Gating circuitSLEEPVDDGCombination of responses Global power(VDDG)Gated power(VDD)DATAiVALIDLaser AttackDATAtiltingtilting directiontilting strength32323232323223232x32 PUF CELLsVth tilting circuitryValid Checker17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologie
112、s 2025 IEEE International Solid-State Circuits Conference9 of 25Proposed PUF design Schematic TiltingVDDControllable tilt downControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4
113、Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6VDDGNDStage 1(standard cells view)msmsmsmsmsmsnet_0234net_0234net_0234net_0234net_12net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdo
114、wnSELVth1Vth2msRABYABYStage2 Vth tilting circuitryup13down1300down12:0up12:0down22:0up22:000down32:0up32:0up23down23Stage3 Vth tilting circuitryup34down34PUF celltiltingtiltingSW117.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circu
115、its Conference10 of 25Proposed PUF design Schematic TiltingVDDControllable tilt downControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4 Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6V
116、DDGNDStage 1(standard cells view)msmsmsmsmsmsnet_0234net_0234net_0234net_0234net_12net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdownSELVth1Vth2msRABYABYStage2 Vth tilting c
117、ircuitryup13down1300down12:0up12:0down22:0up22:000down32:0up32:0up23down23Stage3 Vth tilting circuitryup34down34PUF celltiltingtiltingSW117.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference11 of 25Proposed PUF design
118、Schematic TiltingVDDControllable tilt downControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4 Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6VDDGNDStage 1(standard cells view)msmsmsmsm
119、smsnet_0234net_0234net_0234net_0234net_12net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdownSELVth1Vth2msRABYABYStage2 Vth tilting circuitryup13down1300down12:0up12:0down22:0
120、up22:000down32:0up32:0up23down23Stage3 Vth tilting circuitryup34down34PUF celltiltingtiltingSW1ms17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference12 of 25Proposed PUF design Schematic TiltingVDDControllable tilt do
121、wnControllable tilt upStage1 Vth tilting circuitry(transistor view)GNDup10down10up11down11up12down12Path Gate Cellms10tilting VDD1VSS1A1 Y1VDD2VSS2A2 Y2VDD0VSS0A0 Y0VDD4VSS4A4 Y4VDD3VSS3A3 Y3VDD5VSS5A5 Y5VDD6VSS6A6 Y6VDDGNDStage 1(standard cells view)msmsmsmsmsmsnet_0234net_0234net_0234net_0234net_1
122、2net_0234net_12net_12net_45net_45net_356net_356net_356net_356net_356SW2SW2net_45net_0234net_356net_45net_0234net_12net_0234net_356net_45net_12710000010tilting103731tiltuptiltdownSELVth1Vth2msRABYABYStage2 Vth tilting circuitryup13down1300down12:0up12:0down22:0up22:000down32:0up32:0up23down23Stage3 V
123、th tilting circuitryup34down34PUF celltiltingtiltingSW1ms17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference13 of 25Proposed PUF design PUF Enrollment with Vth Tilting3nm PUF cell SPICE simulation(Monte Carlo).measur
124、ed Vth:mean=0.00034 std=0.01925controllable Vth tilt-downcontrollable Vth tilt-upscreened with tiltingVth,V1=Vth 1 1,1=1 2=1 2,12+22=N 0,122=Vth 2 2,217.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference14 of 25Propose
125、d PUF design Micrograph(3nm GAA example)1.Valid checker,Power BalancedSampler,Attacks Countermeasures 2.Isolation for output portsInput ports and Power gating cellsInput ports and Power gating cells2132 x 32 PUF cells with tilting circuitry17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm
126、FinFET Technologies 2025 IEEE International Solid-State Circuits Conference15 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomness Summary17.4:An Efficient Vth-Tilting PUF Design in 3
127、nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference16 of 25Enrollment&Steadiness PUF Enrollment Efficiency*3nm GAA8nm FinFet*Vtyp,Ttyp operational conditions BER trendsstable responsesBER trendsstable responses17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA
128、 and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference17 of 25Enrollment&Steadiness Vth Tilting PUF Steadiness3nm GAA8nm FinFetTemp,CVoltagePUF Bit Error Rate,%FFFSNNSFSS1500.75V-20%0.0010.0010.0000.0020.0010.75V+20%0.0670.1370.0540.0740.132250.75V-20%0.0080.0030.0090.0
129、050.0080.75V+20%0.0020.0060.0040.0030.008-400.75V-20%0.3050.1610.3130.1920.1590.75V+20%0.0000.0000.0000.0010.00036x times BER reductionTemp,CVoltagePUF Bit Error Rate,%FFFSNNSFSS1250.70V-20%0.0240.0040.0030.0190.0110.70V+20%0.0140.0680.1650.0190.223250.70V-20%0.0120.0040.0120.0040.0150.70V+20%0.0000
130、.0120.0110.0050.041-400.70V-20%0.0920.1220.2000.1570.1810.70V+20%0.0010.0010.0120.0050.01051x times BER reduction17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference18 of 25PUF Uniqueness(3nm GAA)Uniqueness of PUF res
131、ponsesminmeanmaxstd0.42870.49500.56740.0158minmeanmaxstd0.43260.49650.56640.0157minmeanmaxstd0.42190.49620.56540.0157minmeanmaxstd0.43460.49660.57230.0158minmeanmaxstd0.43360.49820.56840.015717.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid
132、-State Circuits Conference19 of 25PUF Uniqueness(8nm FinFet)Uniqueness of PUF responsesminmeanmaxstd0.42090.49780.56540.0156minmeanmaxstd0.42970.49790.56450.0157minmeanmaxstd0.43260.49810.56930.0156minmeanmaxstd0.43650.49800.56250.0156minmeanmaxstd0.43070.49860.56150.015617.4:An Efficient Vth-Tiltin
133、g PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference20 of 25Randomness evaluation of PUF(3nm GAA)TestEntropy per bitMCV*0.892916Collision0.860454Markov0.894155tCompression0.723963T-Tuple0.864429LRS0.988841MultiMCW0.896237Lag0.965752MultiMMC0.892
134、989LZ78Y0.892946H_original0.723963*MCV is a most Common ValueNIST SP 800-90b test summary*T5 test comes from AIS31 package for TRNGLag=1.23751Autocorrelation scorescoreoccurrencesPUF 4-bit evaluation T5*Autocorrelation evaluation17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Tech
135、nologies 2025 IEEE International Solid-State Circuits Conference21 of 25Randomness evaluation of PUF(8nm FinFet)TestEntropy per bitMCV*0.985641Collision0.934789Markov0.988406tCompression0.833262T-Tuple0.925578LRS0.996388MultiMCW0.992699Lag0.996234MultiMMC0.986301LZ78Y0.986111H_original0.833262*MCV i
136、s a most Common ValueNIST SP 800-90b test summary*T5 test comes from AIS31 package for TRNGLag=1.23751Autocorrelation scorescoreoccurrencesPUF 4-bit evaluation T5*Autocorrelation evaluation17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-S
137、tate Circuits Conference22 of 25Outline BackgroundThe role of PUFPUF design issues Proposed PUF designVth-tilting circuit PUF evaluation resultsEnrollment efficiency&SteadinessUniquenessRandomness Summary17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE Intern
138、ational Solid-State Circuits Conference23 of 25SummaryThis work123456Technology3nm(GAA)8nm(FinFet)8nm(finfet)40nm3nm(GAA)5nm(finfet)40nm65nmStabilizing technique Vth tilt,mask.Mask.TMV,Burn-in,mask.Vth tilt,mask.Vth tilt,mask.TMV,Vth tilt.,mask.TMV,mask.,reconfig.PUF entropy sourceINV VthINV VthSRAM
139、 SRAM+pre-amplifierDiode ClampedLeakage inverterHybrid-ROINV VthAdditional featuresAttacks counter.*1),anti-aging Attacks counter.,anti-aging NA*2)Self-descructionAnti-aging NAAnti-agingDesign portabilityGoodGoodGoodLowLowLowLowLowEnrollment cost*3)Low:nominal V/TMed:3V configs.nominal THigh(hardeni
140、ng)NR*2)Low:nominal V/TLowHigh:6 corner V/TMask size,bitsn*1n*1n*1n*1n*1n*3n*2Screening ratio,%=75(accepted range)2427NR23.227.73.6427Temp.,C-40125-40150-4015025110-40125090-40125-40125Operational Voltage,range%20%10%15%-13 +26%17%-30%+40%-58%+16%BER,%0.22270.31272.231.460.003481.180.551.9E-3*3)3.34
141、E-6*3)Inter-PUF HD,norm.0.49820.49780.49650.48600.49830.50100.49900.49630.4995MinEntropy(NIST)0.7240.833NRNR0.764NRNRNRBit Rate,Mbits/s160NRNRNRNRNR22750#of tested chips308320330500020181610(15)*1)Side-channel(power)attacks countermeasures and laser attack detector.*2)NA stands for Not Available and
142、 NR stands for Not Reported.*3)The cost of enrollment is high when corner V/T or the response hardening are required.*4)Not the worst V/T is reported.17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference24 of 25Summary
143、Unified designTechnology portability and verification are simplified by using a standard cell library.The same design has been successfully verified in 3nm GAA and 8nm FinFet technologies.Side-channel attack countermeasures are integrated into the design.Enrollment simplicity and design efficiencyEf
144、ficient PUF enrollment at normal voltage and temperature(V/T)operational conditions.Vth tilting has proven effective for screening and eliminating unstable selections.Reduced NVM requirements compared to the no-tilting version.A robust PUF solution providing low BER and strong Uniqueness/Randomness
145、properties.17.4:An Efficient Vth-Tilting PUF Design in 3nm GAA and 8nm FinFET Technologies 2025 IEEE International Solid-State Circuits Conference25 of 25Q&A17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a
146、BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference1 of 13An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOSBjoern Driemeyer,Holger Mandry,David-Peter Wiens,Jo
147、achim Becker,John G.Kauffman,Maurits OrtmannsUniversity of Ulm,Ulm,Germany17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference2 o
148、f 13Introduction to PUFs Each device has randomly different properties by mismatch Physical Unclonable Function:Translation into IC-fingerprint Demand for IC identification Use IC fingerprint110111001010100010111011PUF:Translation into digital fingerprintDevice dependent fingerprintDevice process va
149、riation&local mismatch17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference3 of 13PUF:Fundamental Design Challenge(s)1000101110111
150、00010101011100010101011100010101011TemperatureSupply VoltageUnstable PUF-bit Ideal(weak-)PUF:Constant fingerprint/PUF output Noise causes random fingerprint errors Unstable PUF-bits cause permanent fingerprint errors over changing environmental conditions Account for additional stabilization&noise r
151、eduction100010111011Golden Key:Time100010111011Golden Key:100010111011|Reliable100010011011|Unreliable100010111011|Reliable100010111011|ReliableNoisyPerfectPerfectPerfect17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization
152、 Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference4 of 13SotA:Noise-Robustness and Stabilization Many techniques used for noise reduction&PUF-bit stabilization Noise averaging:Temporal Majority Voting(TMV)Blanking mask:Identified unstable bits removed from
153、fingerprint100010101011Blanking maskPUF-bit averagedUnstable PUF-bit blanked100010111011100010101011100010101011100010101011TemperatureSupply Voltage100010111011Golden Key:Time100010111011Golden Key:100010111011|Reliable100010011011|Unreliable100010111011|Reliable100010111011|Reliable100010111011TMV
154、17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference5 of 13100010111011|Reliable100010101011|Reliable100010101011|Reliable1000101
155、01011|ReliableThis Work:Noise-Robustness and Stabilization Introduce additional information about readout reliability Auto-Error-Detection:Use reliability to only re-readout if necessary Use Auto-Error-Detection under nominal conditions to predict blanking maskAuto-Error DetectionNoise-robust PUFUns
156、table PUF-bit blanked100010101011100010101011Unstable PUF-bit blankedTimeTemperatureSupply Voltage100010111011Golden Key:100010111011Golden Key:100010111011|Reliable100010011011|Unreliable100010111011|Reliable100010111011|ReliableBlanking mask10001011101117.5:An Eye-Opening Arbiter PUF for Fingerpri
157、nt Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference6 of 13 Ring-oscillator PUF with phase-domain(arbiter)readout until phase difference exceeds deadzone(DZ)Oscillation only
158、allowed until either ARB=1 or TEN=0 High bit-rate&low energy per bit readout ARB at end of TENallows to judge the reliability of PUF-BIT Proposed EOA-ArchitectureENRO2DPQPUF-bitARBDRO1TimeQARBPRO1/RO2ENTEN17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-R
159、obust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference7 of 13PUF Stability vs.Temperature&VDDChangeRO1RO2FrequencyTemperature/VDDStable and noise-robustfnomlargefreadlargeRO1RO2Temperature/VDDStable but noisyfnomlargefreadsmal
160、lRO1RO2Unstable and noisyfreadlargefnomsmallcrossing Simulate RO frequency over temperature+supply-voltage variation Statistic of fnomand VT-gradient(accounts environmental range)Predict minimum fnom,minto ensure RO-pair stability Same considerations for minimum fread,minfor noise robustness Set bot
161、h fnom,min,fread,minby combination of DZ and TENTemperature/VDD17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference8 of 13Propose
162、d Two Phase PUF Operation-IENRO2DPQPUF-bitARBDRO1TimeARBARBRO1/2EN10nsRO1/21.Enrolment phase:Perform once at nominal operating condition Use short TEN,nom=10ns Investigate ARB after TEN,nom ARB=1 mark RO-pair as stable in blanking mask ARB=0 mark RO-pair as unstable in blanking mask17.5:An Eye-Openi
163、ng Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference9 of 13Proposed Two Phase PUF Operation-IIENRO2DPQPUF-bitARBDRO1TimeARBARBRO1/2EN25nsRO1/22.Re
164、adout Phase:Perform always after enrolment,at any conditions Blanking mask:Run only stable RO-pairs with TEN,read=25ns ARB=1 accept PUF-bit readout ARB=0 repeat readout Auto-Error Detection17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking an
165、d Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference10 of 13Manufactured Chip Fabrication in 28nm bulk-CMOS Single Arbiter shared by 4 RO-pairs(Area per PUF-bit:34.84m2)Total of 480 separate EOA-cells per die(16 dies measured in total)Enrol
166、ment-phase performed at 20C and 0.9V supply voltage 903 PUF-bit per die(47%)marked found stable on average17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International So
167、lid-State Circuits Conference11 of 13Evaluation of PUF Stability Investigated environmental range:-40C-125C,10%VDD HTOL aging equivalent to 4 year cont.usage Raw BER between 4.68%(nominal)and 7.9%(worst-case)Both(b),(c)reduce BER(x2 Auto-Error Detection,3e-4 Mask)Using(b)and(c)combined Resulting BER
168、=2e-6%(a)Raw(b)Auto-Error Detection(c)Mask only(d)Both(b)+(c)Bit-Error-Rate%17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference1
169、2 of 13Comparison to Prior Art17.5:An Eye-Opening Arbiter PUF for Fingerprint Generation Using Auto-Error Detection for PVT-Robust Masking and Bit Stabilization Achieving a BER of 2e-8 in 28nm CMOS 2025 IEEE International Solid-State Circuits Conference13 of 13Conclusion Eye-Opening Arbiter PUF:Osci
170、llation until the accumulated phase difference exceeds deadzone Simulation:frequency thresholds for stability&noise Frequency thresholds:deadzone/enable duration combination Enrolment phase:Find always stable RO-pairs Blanking Mask Readout phase:Auto-Error Detection noise-robustness The resulting BE
171、R is only 2e-8 17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference1 of 19A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in
172、 a 3nm FinFET ProcessNandish Mehta1,Stephen Tell2,Sanquan Song1,Sudhir Kudva1,Brian Zimmer1,Mahmut Sinangil1,C.Thomas Gray21NVIDIA,Santa Clara,CA2NVIDIA,Durham,NC17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE Int
173、ernational Solid-State Circuits Conference2 of 19SoC Hardware Security:What is it?Hardware counter measures preventing unauthorized access,tampering,extraction of sensitive data,or malicious modifications to SoC.Voltage,clock,EM,laser,body-bias etc.Physical attacksFaultinjectionDevice underAttackSec
174、urity BreachimpactFaults alter intended SoC behaviorRevealing encryption keyBypass of secure authentication,etc17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference3 of 19So
175、C Hardware Security:Clock-Glitch InjectionClock-glitches commonly injected:(1)During Run-time(2)or during boot-upClock-glitches induces transient faults Forces SoC to reveal encrypted key or by-pass boot authentication Often a single fault is enough!C.H.Kim DATE0717.6:A 100MHz Self-Calibrating RC Os
176、cillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference4 of 19Clock-GlitchesFew examples:Y.He ISSCC24Locks to input CKREF Detects injected pulse and clock stop Hard to capture FM attacks Needs standalone on-chip
177、 oscillator During RuntimeCountermeasures for Clock-Glitch Attacks17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference5 of 19Countermeasures for Clock-Glitch AttacksBoot-on
178、-RO:Boot-up using on-chip oscillatorRequires fairly accurate osc.freq.Needs trimming and fusesOn-chip oscillatorN.Mehta JSSC22Clock-GlitchesDuring Boot-upFew examples:17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEE
179、E International Solid-State Circuits Conference6 of 19Design GoalsProblem-1:Runtime clock-glitch detectionProblem-2:Oscillator for Boot-on-RO applicationSingle proven and validated IPNeedSolution1.Cycle-by-Cycle detectionMulti-phase VCOS.Song VLSI22 2.High-resolutionNeedSolution1.Supply sensitivityD
180、igital-freq locked loop D.S.Truesdell JSSC21 2.Low Freq drift3.Temp.sensitivityLow temp-co thin-film resistors(Hi-R)B.S.Lien ISSCC244.Process spread0-trim Background self-calibration17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET P
181、rocess 2025 IEEE International Solid-State Circuits Conference7 of 19Conceptual view of the Proposed SolutionMulti-phase VCO oversamples CKREF by 4N High-resolution glitch detectCKREF is a stable clock source as it is derived from a crystal oscillatorIf no attack CKREF calibrates RC oscillator impro
182、ving its accuracy17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference8 of 19Proposed Architecture of RC OscillatorLinear search FSM cancels comparator offset and locks VCO
183、to RC core72-bit PMOS DAC 6b coarse and 64b fine tunes FVCO17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference9 of 19Efficient layout of RC cores Hi-R resistors stacked on
184、 metal capacitorsProposed Architecture of RC OscillatorMultiple phases of VCO samples CKREF17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference10 of 19Operation of Glitch-D
185、etectorSample and Sync logic aligns all samples into a 1 clock domain(CKSAMP)Clock-pulses narrower than 1/(4FVCO)are filtered by clock-glitch filterWH0WL0WH0WL017.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE Inter
186、national Solid-State Circuits Conference11 of 19Sample and Sync logic aligns all samples into a 1 clock domain(CKSAMP)Clock-pulses narrower than 1/(4FVCO)are filtered by clock-glitch filterWH0WL0WH1WL1WL0Operation of Glitch-Detector17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch
187、 Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference12 of 19Self-Calibration LogicRC Core-A/B switched into self-calibrationStable 100 MHz CKOUT from crystal oscillatorComp.offset calibrated12-bit resistor DAC(R-DAC)8b coarse and 4b fine Co
188、mpensates process and temperature17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference13 of 19Die Micrograph and Test setupDie flip-chip attached TSMC 3nm FinFET chip packag
189、ed on an organic substrate Optical image after back side substrate thinning17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference14 of 19Glitch Detection for various external
190、 attacksClock-glitch injected17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference15 of 19False Negative using On-Chip Glitch Generator Glitch generator adds glitches to CKR
191、EF as per the pattern code False negatives Glitches close to rising/falling edge Sensitivity of glitch detection FVCO/CKREF17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Confere
192、nce16 of 19 Supply regulator can reduce frequency variation with supply Temperature sensitivity limited by Hi-R resistors non-zero temp-coFrequency Stability w.Supply and Temperature17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET P
193、rocess 2025 IEEE International Solid-State Circuits Conference17 of 19Effectiveness of Self-Calibration Compensates spread due to process and temperature Accuracy is limited by quantization of the 12-bit R-DAC17.6:A 100MHz Self-Calibrating RC Oscillator Capable of Clock-Glitch Detection for Hardware
194、 Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference18 of 19Performance SummaryThisWorkS.PanISSCC24W.ChoiISSCC21N.MehtaJSSC22K.ParkISSCC23A.KhashabaJSSC22A.DelkeJSSC23Process3nm FinFET180nmCMOS65nmCMOS 5nm FinFET65nmCMOS 65nmCMOS 130nm HVCMOS SOIFrequency MHz1003
195、22877100 3270Power mW0.890.130.1420.840.1420.0340.21Supply range V0.75 0.95 1.7 2.00.85 1.051.1 1.351.1 1.31.1 2.33 3.6Supply sensitivity ppm/V1221300029002000*140080*92*Temp.Range C-20 125-40 125-40 85-40 125-40 85-40 85-63 165Frequency Error%0.260.090.020.30.0760.110.0084Period Jitter ps11.618.77-
196、5.122.314.5No.of samples122812814618Calibration TypeOn-chip self2-point trim2-point trim2-point trim 2-point trim 2-point trim 1-point trimClock Glitch detection?YesNoArea mm20.002250.0280.060.01520.220.180.69*Includes an on-chip LDO *From 16 samples Estimated from plots17.6:A 100MHz Self-Calibratin
197、g RC Oscillator Capable of Clock-Glitch Detection for Hardware Security in a 3nm FinFET Process 2025 IEEE International Solid-State Circuits Conference19 of 19Conclusions A 100 MHz RC oscillator for hardware security applications capable of clock-glitch attack detectionstable oscillation frequency enabled by digital frequency-locked loophigh 0-trim accuracy using background self-calibrationAcknowledgements:We thank MSDV team of Nvidia,Santa Clara,for equipment and test support with special thanks to Neil Pham,Lamar Tatro,and Andy Tran.Thank you for your time and attention!