《HC2022.UTokyo.Kota_Shiba.v01.pdf》由會員分享,可在線閱讀,更多相關《HC2022.UTokyo.Kota_Shiba.v01.pdf(14頁珍藏版)》請在三個皮匠報告上搜索。
1、Kota Shiba,The University of Tokyo A 7-nm FinFET 1.2-TB/s/mm23D-Stacked SRAM with an Inductive Coupling Interface Using Over-SRAM Coils and Manchester-Encoded Synchronous Transceivers2022 Hot Chips 34 SymposiumAug.21 23,Virtual ConferenceKota Shiba1,Mitsuji Okada2,Atsutake Kosuge2,Mototsugu Hamada2,
2、and Tadahiro Kuroda21The University of Tokyo,2Research Association for Advanced Systems(RaaS)2/14Kota Shiba,The University of Tokyo A 0.7-pJ/bit,8.5-Gbps/link inductive coupling inter-chip wirelesscommunicationinterfacefora3D-stackedSRAMhasbeendeveloped in a 7-nm FinFET process.A new physical placem
3、entmethod that allows coils to be placed over off-the-shelf SRAMmacros with small magnetic field attenuation,together with theuse of synchronous communication using Manchester encodingand a clocked comparator to enable the detection of small-swingsignals,achieve a 26%reduction in SRAM die area compa
4、red toTSV-based stacking.Inter-chip communication at 0.7-pJ/bit,8.5-Gbps/link was confirmed using test chips.A 4-hi 3D-stacked SRAMmodule using the proposed interface is estimated to achieve a 1.2-TB/s/mm2area efficiency,representing a two-orders-of-magnitudeimprovement over state-of-the-art 3D-stac
5、ked SRAM.Abstract3/14Kota Shiba,The University of Tokyo Mobile AI devices need high-bandwidth,low-latency memory with small form factor 3D-stacked SRAM(3D-SRAM)can meet these demands But current 3D-SRAM using TSV and m m-bump has issues with cost,yield and area efficiency 12IntroductionSRAMLogic3D-s
6、tacked SRAM with TSV and m m-bump 121 K.Cho,et al.,Hot Chips,2020 2 S.-K.Seo,et al.,ECTC,2021SiTSVCostYieldSim m-bump40 m mm4/14Kota Shiba,The University of Tokyo To eliminate TSV and m m-bump,ThruChip Interface(TCI)is proposed,which is a wireless version of TSV TCI is compatible with standard CMOS
7、process,leading to low cost and high yieldInductive Coupling Technology(TCI)ReceiverCoilITXTxDataTXRXMITXVRX+-RxDataTxDataITXRxDataVRXTimeTSV+m m-bump 1TCI 3ProcessAdditional ProcessStandard CMOS processCostHighLowYieldLowHigh3 D.Ditzel,et al.,Hot Chips,20145/14Kota Shiba,The University of Tokyo A 4
8、0-nm 96-MB 3D-stacked SRAM using inductive coupling was proposed But limited bandwidth due to large coils is an issueConventional 3D-SRAM Using TCIInductiveCoupling(TCI)200 m mm200 m mm200 m mm200 m mm2250 m mm1945 m mmTotal512-KBSRAM/CHSRAMMacro2250 m mm1945 m mmLogicTX1TX2TX3TX4TX5CLKCSRX1RX2DQSRX
9、3RX4RXRXRXRXRXRXRXTXTXTXTXTXRXRXRXRXRXTXTXTXTXTXTXTXLarge coil area4 K.Ueyoshi,et al.,JSSC,20195 K.Shiba,et al.,TCAS-I,20206/14Kota Shiba,The University of Tokyo This work proposes 3D-stacked SRAM using inductive coupling with minimized area overhead,reducing SRAM die area by 26%vs TSV(A)Over-SRAM c
10、oils:enable high area efficiency while limiting magnetic field attenuation to 30%(B)Manchester-encoded synchronous transceiver:detects small received signal with low powerProposed 3D-SRAM Using TCI(c)Scaling trend of SOTA SRAMInductiveCoupling(TCI)(b)Channel floorplan(a)System overview430 m mm1200 m
11、 mmSRAM128 kb1 MB/chTXRX110 m mm3D-stacked SRAM channel areaachieving 1 MB and 4.3 GB/s mm2CMOS process node7 nm5 nm3 nm0.58mm2(0.84)*12,0.70 mm2(1)This work,0.52mm2(0.74)0.49mm2(0.71)*Extrapolated based on 7-nm SRAM 6 and 5-nm SRAM 70.41.00.50.60.70.80.94.3 GB/s/vault6 J.Chang,et al.,ISSCC,20177 J.
12、Chang,et al.,ISSCC 2020.7/14Kota Shiba,The University of Tokyo Proposed physical layout method of coils over off-the-shelf SRAM macros suppresses magnetic field attenuation due to eddy currents on SRAM macros(A)Over-SRAM CoilRef.6:K.Shiba,IEEE TCAS-I,2021.(a)Off SRAM 5(c)Over 7-nm SRAMGood area effi
13、ciency(b)Over legacy-node SRAMGood area efficiencyVRxITxITxVRxCoilSRAM macrosCoilEddyCurrentSRAM macroCoilSRAM macroEfficient communicationPoor area efficiency99%attenuationOnly 30%attenuationVRxITxBandwidth:4.3 GB/s(=0.76Gbps 45 data links)IO Area Overhead:0.179 mm2Area Efficiency:24 GB/s/mm2(d)TSV
14、+m m-bump 12Bandwidth:4.3 GB/s(=8.5 Gbps 4 data links)IO Area Overhead:0.0037 mm2Area Efficiency:1162 GB/s/mm2(e)Proposed method430 m mm(0.74)1200 m mmSRAM128 kb:Tx or Rx:Power mesh580 m mm(1)SRAM128 kb1200 m mm0.76Gb/s40-m mmpitch112bumps8/14Kota Shiba,The University of Tokyo (a)Clocked comparator
15、and(b)Manchester encoding achieve detection of small pulse signal with low transmission power(a)detects low-swing pulse by utilizing clock-triggered positive feedback,leading to low transmission power(b)generates pulse signal in every cycle for clock-triggered data reception(B)Manchester-encoded Syn
16、chronous TRxTxDataTxDataTXITRxDataVRPAmpVRND2SFrom Ser.Hys.HysteresisComparatorDifferential-to-Single-ended conversionBlock diagramWaveforms(a)Conventional TCI(b)Proposed TCIWaveformsBlock diagramOUTPINPCLKOUTNCLKCLKINNClocked ComparatorCLKTxDataENCPITVRPSB0,RB0RxData0RxData1CLKHSB0RB0RB0SB010010(a)
17、Small power thanks to clocked comparator(b)Pulse generated in everycycle thanks to ManchesterencodingVRPTxData1001IT0RxData(a)Large power requiredto increase received pulse(b)Pulse signal not generated in every cycleTxDataTxDataTXITENCPENCNRxData0VRPAmpVRNSRLatchSRLatchRxData1CLKHCLKHClockedComparat
18、orCLKSB0RB0From Ser.To Des.9/14Kota Shiba,The University of Tokyo Test chip was fabricated in a 7-nm FinFET processTest Chip2.0 mm2.0 mm128-kb SRAM128-kb SRAM128-kb SRAM128-kb SRAMTRxCoil110 m mm10/14Kota Shiba,The University of Tokyo Inter-chip wireless communication at 0.7 pJ/bit,8.5 Gbps/link mea
19、sured for a 2-hi 3D-SRAM A 4-hi 3D-SRAM estimated to achieve 1.2 TB/s/mm2,a two-orders-of-magnitude improvement over TSV-based 3D-SRAM 1Measurement Results(a)Measured bathtub curveBit error rateTiming ps-200-10102010-1210-610-914 ps10-710-810-1010-11(b)Effect of sandwiched SRAMs on TCI#of SRAM diesN
20、ormalized driver power a.u.0320.51.0010.5 pJ/bit0.7 pJ/bit0.7 pJ/bit4-Hi4MeasuredSimulatedOver 7-nm SRAMw/clocked comparatorOver 7-nm SRAMw/hysteresis comparator11/14Kota Shiba,The University of Tokyo Performance ComparisonsMICRO17 8ISSCC20 9Hot Chips20 1Hot Chips20 1(Extrapolated to 4 Hi)This workT
21、echnology20-nm DRAM1y-nm DRAM7-nm FinFET7-nm FinFET7-nm FinFETMemory typeHBM2 DRAMHBM2E DRAMSRAMSRAMSRAMData busBi-directionalBi-directionalUni-directionalUni-directionalUni-directionalStack#812144Bandwidth256 GB/s640 GB/s24.3 GB/s24.3 GB/s4.3 GB/sm m-bump pitch48/55 m mm48/55 m mm40 m mm40 m mm-IO
22、area overhead(*1)2.8 mm22.8 mm20.92 mm20.92 mm20.0037 mm2Bandwidth per IO area overhead92 GB/s/mm2231 GB/s/mm226 GB/s/mm226 GB/s/mm21162 GB/s/mm2Data-rate2.0 Gb/s5.0 Gb/s0.76 Gb/s0.76 Gb/s8.5 Gb/sI/O energy consumption 2 pJ/bitN/A(2.5 pJ/bit(*2)0.1 pJ/bit0.4 pJ/bit(*3)0.7 pJ/bitInterface typeTSV+m m
23、-bumpTSV+m m-bumpTSV+m m-bumpTSV+m m-bumpTCIChip size12mm 8mm11mm 10mm9.0mm 9.0mm-2.0mm 2.0mm*1:IO area only for signal excluding power*2:Estimated from ratio of the squared voltage and stack#of HBM2(1.2 V,8 Hi,8)and HBM2E(1.1 V,12 Hi,9)*3:Capacitance load of 4#of Rxs,m m-bumps and TSVs driven by Tx
24、 compared with 1 Hi8 M.OConnor,MICRO,2017 9 C.-S.Oh,et al.,ISSCC,202012/14Kota Shiba,The University of Tokyo A 3D-stacked SRAM using inductive coupling is proposed with two new methods to minimize area overhead.(A)Over-SRAM coils:enable high area efficiency while limiting magnetic field attenuation
25、to 30%.(B)Manchester-encoded synchronous transceiver:detects small received signal with low power.Test chip was fabricated in a 7-nm FinFET process.Inter-chip wireless communication at 0.7 pJ/bit,8.5 Gbps/link was measured for a 2-hi 3D-SRAM A 4-hi 3D-SRAM achieves 1.2 TB/s/mm2,a two-orders-of-magni
26、tude improvement over conventional TSV-based 3D-SRAM.Conclusion13/14Kota Shiba,The University of Tokyo 1 K.Cho,et al.,“SAINT-S:3D SRAM Stacking Solution based on 7nm TSV technology,”IEEE Hot Chips,Aug.2020.2 S.-K.Seo,et al.,“CoW Package Solution for Improving Thermal Characteristic of TSV-SiP for AI
27、-Inference,”IEEE ECTC,June 2021.3 D.Ditzel,et al.,“Low-cost 3D chip stacking with ThruChip wireless connections,”IEEE Hot Chips,Aug.2014.4 K.Ueyoshi,et al.,“QUEST:Multi-purpose log-quantized DNN inference engine stacked on 96-MB 3-D SRAM using inductive coupling technology in 40-nm CMOS,”IEEE JSSC,v
28、ol.54,no.1,pp.186-196,Jan.2019.5 K.Shiba,et al.,“A 96-MB 3D-Stacked SRAM Using Inductive Coupling with 0.4-V Transmitter,Termination Scheme and 12:1 SerDes in 40-nm CMOS,”IEEE TCAS-I,vol.68,no.2,pp.692-703,Feb.2021.6 J.Chang et al.,“A 7nm 256Mb SRAM in high-k metal-gate FinFET technology with write-
29、assist circuitry for low-VMIN applications,”IEEE ISSCC,Feb.2017.7 J.Chang et al.,“A 5nm 135Mb SRAM in EUV and High-Mobility-Channel FinFET Technology with Metal Coupling and Charge-Sharing Write-Assist Circuitry Schemes for High-Density and Low-VMIN Applications,”IEEE ISSCC,Feb.2020.8 M.OConnor,et a
30、l.,“Fine-Grained DRAM:Energy-Efficient DRAM for Extreme Bandwidth Systems,”MICRO-50,Oct.2017.9 C.-S.Oh et al.,“A 1.1V 16GB 640GB/s HBM2E DRAM with a Data-Bus Window-Extension Technique and a Synergetic On-Die ECC Scheme,”IEEE ISSCC,Feb.2020.References14/14Kota Shiba,The University of Tokyo The authors would like to thank UltraMemory Inc.and Jedat Inc.for theirtechnical support in design,implementation,and evaluation.This work wassupported by JST,ACT-X Grant Number JPMJAX210A and JSPS KAKENHIGrant Number 21J11729.Acknowledgement