當前位置：首頁 > 報告詳情

用于全同態加密的高效密鑰交換加速器.pdf

上傳人：蘆葦編號：651854 2025-05-01 PDF PDF 35頁 3.01MB

該報告所屬合集： 第三十屆亞洲及南太平洋設計自動化會議（ASP-DAC 2025）嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/35

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《用于全同態加密的高效密鑰交換加速器.pdf》由會員分享，可在線閱讀，更多相關《用于全同態加密的高效密鑰交換加速器.pdf（35頁珍藏版）》請在三個皮匠報告上搜索。

1、Efficient Key Switching Accelerator for Fully Homomorphic EncryptionSeoyoon Jang,Sungjin Park,Dongsuk JeonSeoul National UniversitySeoul,South KoreaASP-DAC 2025Motivation Advent of FHE Fully Homomorphic Encryption(FHE)The savior of privacy-preserving computation in cloud service The main bottleneck

2、operation,Key-Switching(KS)BottleneckMotivation What makes KS expensive?Expensive Number Theoretic Transform(NTT)and inverse-NTT operations in KS =()=01 At least log computations Irregular memory access pattern 0 1 2 3 4 5 6 701234567Stage 1Stage 2Stage 3Motivation What makes KS expensive?Expensive

3、Number Theoretic Transform(NTT)and inverse-NTT operations in KS Frequent transitions in data access patterns(Element/Ring/Coefficient-wise)Element-wiseMotivation What makes KS expensive?Element-wiseData access within each polynomial(NTT/INTT,)Ring-wise Expensive Number Theoretic Transform(NTT)and in

4、verse-NTT operations in KS Frequent transitions in data access patterns(Element/Ring/Coefficient-wise)Motivation What makes KS expensive?Ring-wiseElement-wiseRing-wiseRing-wiseCoefficient-wise Expensive Number Theoretic Transform(NTT)and inverse-NTT operations in KS Frequent transitions in data acce

5、ss patterns(Element/Ring/Coefficient-wise)Motivation What makes KS expensive?Expensive Number Theoretic Transform(NTT)and inverse-NTT operations in KS Frequent transitions in data access patterns(Element/Ring/Coefficient-wise)Overall KS dataflowMotivation What makes KS expensive?Expensive Number The

6、oretic Transform(NTT)and inverse-NTT operations in KS Frequent transitions in data access patterns(Element/Ring/Coefficient-wise)Overall KS dataflowRing-wiseRing-wiseCoefficient-wiseElement-wise Redundant external memory access Motivation What makes KS expensive?Expensive Number Theoretic Transform(

7、NTT)and inverse-NTT operations in KS Frequent transitions in data access patterns(Element/Ring/Coefficient-wise)Overall KS dataflowRing-wiseRing-wiseCoefficient-wiseElement-wise Redundant external memory access Design a dedicated KS accelerator for maximum energy efficiencyParameter selected for KS

8、acceleratorlog,=17,35,9,4:Number of coefficients for each polynomial:Maximum circuit depth level:decomposition number=+1/1 computations,required swk Constraints for parameters selectionSecurity level 128Directly related with/log large enough to guarantee=1520 for bootstrapping that enables FHE provi

9、ding a sufficient number of lightweight prime moduli for large =2 20 21 22+1sparse!enable efficient HW implementation of modular multipliersOverall Design of KS accelerator Router transferring instructions and external data to the target core LUT for moduli set and modulus-related constants required

10、 by each coreOverall Design of KS accelerator NTT unit NTT/iNTT operation(unified)Process=29 coefficients per each unit(=)Modular-Multiply-and-Accumulate(MMAC)unit Conv operation Local distributor:internal routerMMAC unitNTT unitNTT unitProposed Design Techniques for Energy EfficiencyMMMMI.Modular M

11、ultiplier for Sparse Moduli SetII.NTT UnitA.Efficient Twiddle Factor Generator(TFG)B.Conflict-free Addressing Scheme for Single-port MemoryIII.Bandwidth-efficient Behavior in CoreI.Modular Multiplier for Sparse Moduli Set=2 20 21 22+1 and in Barrett modular multiplication replaced with shift-adders

12、=22/=2 20 21 22 1Benefits of sparsity(2 1+1,)I.Modular Multiplier for Sparse Moduli Set=2 20 21 22+1 and in Barrett modular multiplication replaced with shift-adders =22/=2 20 21 22 1Benefits of sparsity(2 0+1,)41 moduli available Sufficient for bootstrapping in FHEUsing =59Take the most advantage o

13、ut of this inherent sparsity!I.Modular Multiplier for Sparse Moduli Set =22(0+1)1+1 2+1 1=2 20 21 22+1Simplified computation shift-adding removing one multiplication=22/=2 20 21 22 12=221 20 21 22 1()I.Modular Multiplier for Sparse Moduli Set=2 20 21 22+1Simplified computation shift-adding removing

14、one multiplication(V*T)=(+1)s sparsity=22/=2 20 21 22 1 1 1 1 from sign bit of 20 in I.Modular Multiplier for Sparse Moduli Set=2 20 21 22+1Simplified computation shift-adding removing one multiplication=22/=2 20 21 22 1Area(Power)vs.Non-Sparse:47.8(46.0)%vs.Sparse 1:24.6(22.5)%1 Kim et al.,“Fpga-ba

15、sed accelerators of fully pipelined modular multipliers for homomorphic encryption,”ReConFig,2019.()=01 II-A.Efficient Twiddle Factor Generator(TFG)twiddle factors required for each NTT/iNTT on coefficients Too much overhead in data loading latency and memory area!Twiddle Factor Generator Saving mem

16、ory area for the twiddle factors Generate twiddle factors during NTT/iNTT operationsII-A.Efficient Twiddle Factor Generator(TFG)TFG saving memory area for the twiddle factors from log Generate twiddle factors using geometric progression with log seed elementsII-A.Efficient Twiddle Factor Generator(T

17、FG)Additional pre-processing stage for further reduction of seed elements(1)Pre-processing(before NTT/iNTT starts)Input:Seed elements Output:Secondary seed elements(2,3)Geometric progression(run-time)Input:Secondary seed elements Output:Twiddle factorsMM:modular multiplierII-A.Efficient Twiddle Fact

18、or Generator(TFG)Reduction of twiddle factor memory area log Additional pre-processing stage for further reduction on seed elements 2 Kim et al.,“Ark:Fully homomorphic encryption accelerator with run-time data generation and inter-operation key reuse,”MICRO,2022.6 Geelen et al.,”Basalisc:Flexible as

19、ynchronous hardware accelerator for fully homomorphic encryption,”preprint,arXiv,2022.17 Kim et al.,“Hardware architecture of a number theoretic transform for a bootstrappable rns-based homomorphic encryption scheme,”FCCM,2020.II-B.Conflict-free Addressing(CFA)Scheme for Single-port MemoryDual-port/

20、2/2Single-portMemoryNetwork of butterfly units for NTT UnitPotential conflicts:Read-after-writeMemory access(multiple banks)Write tNumber of cycles for pipelining()Read(+)=16Write result of Read tWrite tRead tConflictII-B.Conflict-free Addressing(CFA)Scheme for Single-port MemoryDual-port/2/2Single-

21、portMemoryNetwork of butterfly units for NTT UnitWrite tNumber of cycles for pipelining()Read(+)=16Write result of Read tWrite tRead tNo throughput degradation,Reduce silicon area&powerCFAAccess by(,)01 0,/2)II-B.Conflict-free Addressing(CFA)Scheme for Single-port MemoryGoal:No conflict!+16=1,Networ

22、k pipelining stage()Dual-port/2/2Single-portCFAAccess by(,)01 0,/2)II-B.Conflict-free Addressing(CFA)Scheme for Single-port MemoryGoal:No conflict!+16=1,Network pipelining stage()3:2 3:2 +16=3:2 (3:2()+1)=1Then,should satisfy:However,this is not satisfied for all t II-B.Conflict-free Addressing(CFA)

23、Scheme for Single-port MemoryGoal:No conflict!+16=1,Network pipelining stage()3:2 3:2 +16=3:2 (3:2()+1)=1 3:2 3:2 +16=3:2 (3:2()+1)=1Applying Bit-reverse+GrayNow,this is satisfied for all t!II-B.Conflict-free Addressing(CFA)Scheme for Single-port Memory +16=1,=0,=12,15=2,=0,3Example of Stage 0 Stage

24、 1 in INTTMemory access conflict most likely occur during stage transitionGoal:No conflict!II-B.Conflict-free Addressing(CFA)Scheme for Single-port MemoryExample of Stage 0 Stage 1 in INTT +16=1,Memory access conflict-free,=0,=12,15=2,=0,3Same throughput Area(Power)67.87(44.39)%III.Bandwidth-efficie

25、nt Behavior in CoreFrequent data access pattern transition between NTT and MMAC unit Expensive external memory access!MMAC unitNTT unitNTT unitIII.Bandwidth-efficient Behavior in CoreDataflow in KS modified for better data utilization,using dedicated buffers(Tmp)between NTT and MMAC unitsIII.Bandwid

26、th-efficient Behavior in CoreDataflow in KS modified for better data utilization,using dedicated buffers(Tmp)between NTT and MMAC units External memory access 38.7%Chip ImplementationGD(LD):Global/Local DistributorComparison with Prior Works35Conclusion Designed a dedicated accelerator for KS that r

27、equires frequent transitions in data access patterns incurring redundant expensive external memory accesses Proposed design techniques on various levels for high energy efficiency modular multiplier,NTT unit,and data access behavior in core,thus full-stack optimization As a result,the design shows s

28、ignificant improvement in performance in energy efficiency compared with prior FHE implementations.Although designed specifically for KS,it remains highly applicable since KS operations dominate power,time,and bandwidth across the entire computation.Techniques can be also applied to other parameter sets modular multiplier as long as same moduli set used and +1 41,NTT unit down to =216(the least amount to support bootstrapping reported in literature)

相關圖表

本文介紹了一種用于全同態加密（FHE）的專用加速器，主要針對FHE中的關鍵操作——密鑰切換（KS）。密鑰切換操作在FHE中至關重要，但其計算成本高，主要瓶頸在于Number Theoretic Transform（NTT）及其逆變換操作，以及頻繁的數據訪問模式切換導致的冗余外部內存訪問。文章提出了幾種設計技術，以提高能量效率： 1. 針對稀疏模數集設計的模ular乘法器，減少了乘法次數和外部內存訪問。 2. 高效的NTT單元，包括使用幾何級數生成Twiddle因子，減少內存占用。 3. 核心數據訪問行為的優化，通過修改KS的數據流，使用專用緩沖器在NTT和MMAC單元之間傳輸數據，減少外部內存訪問。核心數據： - 設計了一個專用的KS加速器，其參數選擇考慮了安全級別、電路深度和模數分解等因素。 - 加速器設計中考慮了Twiddle因子的生成，以減少內存占用和數據加載延遲。 - 通過優化數據訪問模式，減少了外部內存訪問，從而降低了能量消耗。綜上所述，本文通過設計一個專用的密鑰切換加速器，以及提出相應的設計技術，顯著提高了全同態加密計算的能量效率。這些技術也可以應用于其他參數集，以提高FHE的整體性能。

如何提高全同態加密的能源效率？專用同態加密加速器的設計挑戰是什么？同態加密中鍵切換操作的優化方法有哪些？

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站