具有分層結構和奇偶數據映射的高并行內存 NTT 引擎.pdf

編號:651796 PDF 27頁 1.26MB 下載積分:VIP專享
下載報告請您先登錄!

具有分層結構和奇偶數據映射的高并行內存 NTT 引擎.pdf

1、High-Parallel In-Memory NTT Engine with Hierarchical Structure and Even-Odd Data Mapping Institute of Microelectronics,Chinese Academy of Sciences1Capital Normal University2Institute of Computing Technology,Chinese Academy of Sciences3,University of Chinese Academy Sciences4Bing Li1,Huaijun Liu2,Yib

2、o Du3,4,Ying Wang3,4OutlineBackground and MotivationProposed Method Overview Architecture&Data MappingEvaluation and ResultsConclusionFully Homomorphic EncryptionMedical TreatmentCloud ComputingMachine LearningFitness App FHE ReviewViand A,et al.,S&P 2021 Data Security Powerful Functionality High Co

3、mputational OverheadClassic NTT Challenges&Advantagesa0A0!#a4A1-1a2A2!#a6A3-1a1A4!#a5A5-1a3A6!#a7A7-1!#$#!#$#!#%#$#&#-1-1-1-1-1-1-1-1Stage1Stage2Stage3Algorithm In-Place Cooley-Tukey-based NTTInput:a=(an1,.,a0)R,n-th root of unity in%with bit-reversed orderOutput:A=NTT(a)in bit-reversed order1:=2:fo

4、r(=1;=2)do3:=/24:for(=0;n-1 3.t2=t1 mu4.t3=t2 n+15.r1=c%2n+16.r2=(t3 q)%2n+17.r=r1-r2Condition:r q/2?(r-q):rReturn rImplementing in CIMCalculation:r=c mod q(q:n bit)1.x=cn-1;2.a=x q/2?(r-q):rReturn rOptimizationMod Algorithm Optimization Adapt the original Barrett algorithm to the efficient implemen

5、tation on CIM111010111111111000000001110101000000001110101Right shift000000001110101cxxa829,qn=829,qn=MSBMSBLSBLSB111010111111111c000000000001011na000000100000000t00b1000000000000000000000011111111()tb+()cna+Sub(a)Shift in CIM(b)Subtraction in CIM Low Latency Low Energy Left shiftMod Algorithm Optim

6、izationMod Module-Data MappingRTLA0,msbA0,lsbA3,msbA3,lsbSense AmplifierSubArray0SubArray64Read/Write&ComparatorWL Decoder&DriverSense AmplifierSubArray128Sense AmplifierSubArray192Sense AmplifierSubArray191Sense AmplifierSubArray255Sense AmplifierMOD PEMOD PEResult A0qmsbqlsbqmsbqlsbResult A3RTLA25

7、2,msbA252,lsbA255,msbA255,lsbSense AmplifierSubArray63SubArray127Sense AmplifierResult A252qmsbqlsbqmsbqlsbResult A255RTLA0,msbA0,lsbA3,msbA3,lsbSense AmplifierSubArray0SubArray64Read/Write&ComparatorWL Decoder&DriverSense AmplifierSubArray128Sense AmplifierSubArray192Sense AmplifierSubArray191Sense

8、 AmplifierSubArray255MOD PESense AmplifierMOD PEResult A0qmsbqlsbqmsbqlsbResult A3RTLA252,msbA252,lsbA255,msbA255,lsbSense AmplifierSubArray63SubArray127Sense AmplifierResult A252qmsbqlsbqmsbqlsbResult A255Mod Module-ComputationOutlineBackground&MotivationProposed Method Overview Architecture&Data M

9、appingEvaluation and ResultsConclusionEvaluation SetupDesignPlatformAlgorithmNTT Parameters(n,log2q)HP-CIM(Ours)6T SRAMMVM(32K,32)BP-NTT6T SRAMCT Butterfly(1024,16)MeNTT6T SRAMCT Butterfly(32K,32)RM-NTTReRAMMVM(1024,16)CryptoPIM(Baseline)ReRAMButterfly(32K,32)HP-CIM SettingsMVM Module16 PEs,32 KB/PE

10、256 SubArrays/PE,6416 SubArrayMOD Module2 PEs,8 KB/PE128 SubArrays/PE,864 SubArrayEvaluation Result(256,14)(512,16)(1024,16)Normalized Latency0.54%100%0.54%0.51%0.41%0.72%1.56%142%328%(,7)33.5%34.2%41.3%100%100%75%50%25%1%0.5%0HP-CIM RM-NTT MeNTT BP-NTT CryptoPIM90.1%100%HP-CIM achieves a latency re

11、duction of up to 3.08 compared to thefastest existing CIM-based NTT accelerator,RM-NTTEvaluation Result(256,14)(512,16)(1024,16)5.62%100%11.6%0.01%0.21%(,7)0.44%0.19%5.58%6.45%0.12%100%100%8.09%0.46%100%60%20%0.6%0.4%0.2%01.32%HP-CIM RM-NTT MeNTT BP-NTT CryptoPIMNormalized Energy HP-CIM provides sig

12、nificant energy savings of up to 4.96 overthe most energy-efficient prior solution,MeNTTEvaluation Result Under large-scale NTT parameter settings,HP-CIM outperformsother designs in terms of latency and energyLatency(us)HP-CIMMeNTTCryptoPIM47935130175286150Energy(uj)500400300150100500200015001000300

13、2001000n=32K,log2q=323.7x13x1.7x20 xEvaluationResult050000010000001500000200000025000002565121024Execution Time(us)Polynomial Order(n)CPU_16bOurs_16bCPU_32bOurs_32b HP-CIM reduces execution time by over 2.4 compared to CPUConclusion1.High Parallelism with Hierarchical SRAM ArchitectureIntroduced a d

14、igital SRAM-based CIM NTT engine,utilizing a hierarchical structure to achieve high parallelism and scalability for large-scale NTT operations.2.Novel Even-Odd Data Mapping StrategyProposed an even-odd data mapping approach to optimize memory utilization,enabling efficient reuse of intermediate comp

15、utation results for better scalability.3.Integrated Mod Computation within CIM ArraysDeveloped efficient mod operations directly within CIM arrays using SRAM read-write capabilities,eliminating the need for extra peripheral circuits and enhancing area and energy efficiency.4.Significant Performance

16、and Energy ImprovementsAchieved up to 3.08 faster execution and 4.96 energy savings compared to prior CIM-based designs,validated through extensive comparisons with state-of-the-art methods.High-Parallel In-Memory NTT Engine with Hierarchical Structure and Even-Odd Data Mapping Institute of Microele

17、ctronics,Chinese Academy of Sciences1Capital Normal University2Institute of Computing Technology,Chinese Academy of Sciences3,University of Chinese Academy Sciences4Bing Li1,Huaijun Liu2,Yibo Du3,4,Ying Wang3,4THANK YOUReference1 Gentry C.Fully homomorphic encryption using ideal latticesC.Proceeding

18、s of the forty-first annualACM symposium on Theory of computing,Bethesda,Maryland,2009:169-178.2 Fan J,Vercauteren F.Somewhat Practical Fully Homomorphic EncryptionJ.IACR Cryptology ePrintArchive,2012,2012(2012):144-162.3 Kim S,Kim J,Kim M J,et al.Bts:An accelerator for bootstrappable fully homomorp

19、hic encryptionC.Proceedings of the 49th Annual International Symposium on Computer Architecture,New York,2022:711-725.4 Samardzic N,Feldmann A,Krastev A,et al.F1:A fast and programmable accelerator for fullyhomomorphicencryptionC.MICRO-54:54thAnnualIEEE/ACMInternationalSymposiumonMicroarchitecture,G

20、reece,2021:238-252.5 He Y,Qu S,Lin G,et al.Processing-in-SRAM acceleration for ultra-low power visual 3D perceptionC.Proceedings of the 59th ACM/IEEE Design Automation Conference,San Francisco California,2022:295-300.6 Li D,Pakala A,Yang K.MeNTT:A compact and efficient processing-in-memory number theoretictransform(NTT)acceleratorJ.IEEE Transactions on Very Large Scale Integration(VLSI)Systems,2022,30(5):579-588.7 Albrecht M,Chase M,Chen H,et al.Homomorphic encryption standardJ.Protecting privacy throughhomomorphic encryption,2021:31-62.

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(具有分層結構和奇偶數據映射的高并行內存 NTT 引擎.pdf)為本站 (蘆葦) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站