《SESSION 30 Nonvolatile Memory and DRAM.pdf》由會員分享,可在線閱讀,更多相關《SESSION 30 Nonvolatile Memory and DRAM.pdf(159頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2025SESSION 30 Nonvolatile Memory and DRAM30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference1 of 24A 28Gb/mm2 4XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOsSang-Soo Park,Jae-Doeg Lyu,My
2、ungjun Kim,Jaeyun Lee,Younsun Song,Chung-Ho Yu,Hirano Makoto,YongseokKwon,Jong-Hoon Park,Ho-Joon Kim,Daein Lee,Donghyun Seo,Byungrok Go,Seoyoon Jeon,Yoonjee Kim,Doo-Hyun Kim,Youngmin Jo,Hyunjun Yoon,Junehong Park,Inmo Kim,Sunghoon Kim,Hokil Lee,Je-Hyeon Yu,Sang-Lok Kim,Hwan-Seok Ku,Jungmin Seo,Jindo
3、 Byun,Seung-Hyeon Yun,Kyoungtae Kang,Seung-Beom Kim,YohanLee,Yongkyu Lee,Kyunghwa Kang,Han-Jun Lee,Younghwan Ryu,Hyundo Kim,Wontae Kim,Hyeongdo Choi,Juho Jeon,Ansoo Park,Raehyun Song,Jae-Hwan Kim,Jung-Soo Kim,Hwa-Seok Lee,Moo-Kyung Lee,Jae-Ick Son,Jiho Cho,Moosung Kim,Jae-Woo Im,Jongmin Park,Hyuckjo
4、on Kwon,Youngdon Choi,Chiweon Yoon,SeungjaeLee,Kiwhan Song,Sung-Hoi HurSamsung Electronics,Hwaseong,Korea30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference2 of 24Outline IntroductionNAND ChallengesMerits of BV-NA
5、ND Architecture(IO,Bit-density)Key Features Key Design2-Transistor Coded-GSLStack-Dependent Pass-Voltage ControlExternal Power Assisted Core DrivingProposed SCA protocol with/ODTLow power High speed IO scheme for 5.6-Gb/s/pin(PI-LTT,DFE,RDCA)Conclusion30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding
6、3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference3 of 24Contents IntroductionNAND ChallengesMerits of BV-NAND Architecture(IO,Bit-density)Key Features Key Design2-Transistor Coded-GSLStack-Dependent Pass-Voltage ControlExternal Power Assisted Core DrivingProp
7、osed SCA protocol with/ODTLow power High speed IO scheme for 5.6-Gb/s/pin(PI-LTT,DFE,RDCA)Conclusion30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference4 of 24NAND Challenges This work is focused on three challenge
8、s:1)Array size reduction,2)Power reduction,3)Improving Data Transfer EfficiencyDifficulty in Vertical scaling and Lateral scaling for high-capacity NANDIncreasing power consumption as#of WL increases(due to WL loading increases)Decreasing efficiency on IO bandwidth as IO Speed increases(due to comma
9、nd overhead)*Source:Samsung#of channel holes(a.u)#of WLs(a.u)1.22.43.64.85.6V-NAND GenerationCore Power(a.u)Increasing power consumption for WL charging1000 WLsOther current componentsDecreasingData Transfer efficiency on IO BandwidthIO Bandwidth(a.u)IO Speed GbpsVertical scalingLateral scaling30.1
10、A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference5 of 24Merits of BV-NAND Architecture(IO,Bit-density)012345651015202530Bit Density Gb/mm2IO Speed Gb/s/pinWafer Bonding(BV-NAND)Cell Over Peri(COP)CellPeripheralCellCe
11、ll and Peri process manufactured togetherafter separate manufacturing processPeripheralPeripheral2 Kawai,ISSCC249 Cho,ISSCC214 Kim,ISSCC2210 Higuchi,ISSCC21This WorkCOP BV-NANDminmaxminmax3 Kim,ISSCC241 Jung,ISSCC244b/cell Adopted WF-Bonding process to improve IO speed and Bit-densityCMOS circuits i
12、n the peripheral are not affected by the heat from the cell processThe transistor characteristics are improved,and the size is reducedThe fastest IO speed(5.6Gbps)and the highest bit density(28Gb/mm2)HeatCell wafer and Peri wafer bonded(Bonding Vertical-NAND)30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-B
13、onding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference6 of 24Key Features3-bit/cell4-bit/cellThis Work2 ISSCC 20243 ISSCC 20234 ISSCC20221 ISSCC 20245 ISSCC2022ArchitectureBV-NANDCOPCOPCOPCOPCOPDensity(Tb)111111#of Planes464444#of Stacked WLs400200300220280
14、176Page Size(KB/Page)161616161616Bit Density(Gb/mm2)28.2*2025*2025*11.5528.514.8I/O Speed(Gb/s/pin)5.63.62.42.43.21.6tR(s)383234458590Write Throughput(MB/s)2313001941644140 Focus points 28.2Gb/mm2:400 WLs,Lateral scaling 400 WL Layers:Power reduction scheme 5.6Gb/s/pin:Efficient IO schemeDie-photogr
15、aph with 4-plane architecture*Estimated from figures*Including the additional blocks of 0.2Tb for the design for test30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference7 of 24Contents IntroductionNAND ChallengesMe
16、rits of BV-NAND Architecture(IO,Bit-density)Key Features Key Design2-Transistor Coded-GSLStack-Dependent Pass-Voltage ControlExternal Power Assisted Core DrivingProposed SCA protocol with/ODTLow power High speed IO scheme for 5.6-Gb/s/pin(PI-LTT,DFE,RDCA)Conclusion30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cel
17、l WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference8 of 242-Transistor Coded-GSL Motivation:Lateral Scaling&Power ReductionAdopted a dummy-hole less SSL-cut process in the merged-GSL layerTo reduce power consumption during NAND core operation Conce
18、pt:Electrically Controlled Merged-GSL layerCoded with GSL-Vth and Vg control individuallyImplementation of 2-Transistor in a GSL layer to enhance the data-retention reliability30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circu
19、its Conference9 of 24Removal of dummy-holes with a Coded-GSL Adopted a Coded-GSL in a dummy-hole less SSL cut processThe array size is reduced by 4%by removing the dummy hole areaCoded-GSL scheme enables to separate merged-GSL from each SSL electricallyWL charging current is reduced due to the small
20、 capacitance of SSL1-channel4%Block CutBlock CutBlock Cut.Block Cut.SSL/GSL CutSSL onlyCutChannel holeTop Vieww/dummy holew/odummy holeControl of Coded-GSLSelected SSL0Unselected SSL1GSL1(High-voltage)GSL0(Low-voltage)High VthTurn-offHigh VthLow VthLow VthVGSL1VGSL0VGSL0VGSL1SelectedconductingUnsele
21、ctedconductingCSLBLCSLGSL0BLSSL1SSL0GSL1onoffononCSLBLCSLGSL0BLSSL1SSL0GSL1SelectedconductingUnselectedfloatingonoffonoffMerged-GSL(Large capacitance)Coded-GSL(Small capacitance)LowLowLowLowLowHighHighLow1)SSL:Select String Line,2)GSL:Ground Select Line30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding
22、 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference10 of 24Merits of 2-Transistor Coded-GSL1-Tr.Coded GSL2-Tr.Coded GSLVertical ViewCSLBLCSLGSL0BLSSL1SSL0GSL1BLCSLGSL0BLCSLSSL1SSL0GSL1Core Power Consumption(a.u)MergedGSL1-Tr.Coded GSL2-Tr.Coded GSL8%1-Tr.Coded
23、 GSL2-Tr.Coded GSL10yrs1yr5VData Retention 30Cell-Vth of GSLTime(year,log)0.5V By implementing 2-Transistor Coded-GSL scheme2-Transistor Coded-GSL has better data-retention characteristics up to 0.5VCore power consumption is reduced by 8%for both 1-Tr and 2-Tr Coded GSLHighHighLowLowLowLowHighHighLo
24、wHighHighLow30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference11 of 24Stack-Dependent Pass-Voltage Control(SPVC)Motivation:Vertical Scaling&Cell ReliabilityAdopted 3-stack channel hole etch process and Increased
25、WL layersTo reduce pass-voltage disturbance which degrades the cell reliability Concept:Cut-off the other unselected stacks and lower the pass-voltageControlled the voltage of dummy WLs during programming sequencesLowered the other unselected stacks pass-voltage level30.1 A 28Gb/mm24XX-Layer 1Tb 3b/
26、cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference12 of 243-Stacks StructureStack-3 ProgramStack-2 ProgramStack-1 ProgramOperation of SPVC schemeBLCSLSSLGSLDummy WLsDummy WLsStack-3 WLsStack-2 WLsStack-1 WLs The pass-voltage of unselected stack
27、s can be lowered when the gate voltage of dummy WLs is lowered below its cell-Vth The pass-voltage of upper stacks must be maintained to ensure BL forcing operation(Stack-Dependent Pass-Voltage Control)Unselected WL Pass-VoltagePGMStack-3DUM WLsStack-2(Erase)WLsLower Pass-VoltageDUM WLsBL forcingDUM
28、 WLsLower Pass-VoltageWLsStack-2PGMStack-1(Erase)WLsLower Pass-VoltageWLsLower Pass-VoltageStack-1(Erase)Stack-3(Programed)Stack-1PGMBL forcingWLsLower Pass-VoltageDUM WLsLower Pass-VoltageDUM WLsStack-3(Programed)Stack-2(Programed)DUM WLs30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash
29、 with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference13 of 243-Stacks StructureSPVC ProgramMerit of SPVC schemeBit Error Rate(a.u)SPVC(4V)This work4%SPVC(2V)Previous workAll stack Pass-Voltage(8V)#of CellsErase StateField relaxation(8V 2V)BLCSLSSLGSLDummy WLsDummy WLsStack-3
30、WLsStack-2 WLsStack-1 WLs The SPVC scheme is designed to reduce the pass-voltage disturbance by lowering the pass-voltage for field relaxation.The Bit Error Rate is improved by 4%compared to previous workStack-3Stack-2Stack-1Lower Pass-VoltageLower Pass-VoltageLower pass-VoltageLower pass-VoltagePro
31、gram(2)Program(1)Program(3)(Stack-Dependent Pass-Voltage Control)30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference14 of 24External Power-Assisted Core Driving(EPACD)Motivation:Low Power NAND Flash MemoryTo reduc
32、e core power consumption Concept:Additional External Power SupplyRegulators/DC circuits in the NAND directly utilize the VppL power supplyImprovement of the inefficiency of using the internal charge pump30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE Inte
33、rnational Solid-State Circuits Conference15 of 24Implementation and Merits of EPACD schemeCore Power(a.u)Previous WorkThis Work9.4%ProgramPrevious WorkThis Work6.2%ReadPerformance(a.u)Previous WorkThis Work1.2%ProgramPrevious WorkThis Work2.8%ReadVppHVppLInternal PumpWL/BLGeneratorsWL/BLVREFDCsPerip
34、heralsInternal PumpCVoltage Regs.(Higher Voltage)(Lower Voltage)Adopted additional external low power supply(VppL)Core power consumption is reduced by 9.4%,6.2%(Program,Read)With fast setup time,performance is enhanced by 1.2%,2.8%(Program,Read)(External Power-Assisted Core Driving)30.1 A 28Gb/mm24X
35、X-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference16 of 24Proposed SCA(Separate Command and Address)protocol with/ODT Motivation:IO bandwidthTo improve Data Transfer Efficiency(DTE)Concept:Non-Target-ODT enabled by an additional
36、pin(/ODT)Removing the Non-Target-ODT Command between the DMA=(+)N is the number of diestDMA is the time required to perform DMAtCMD is the time required to transfer command/Address(On Die Termination)(JEDEC Standard)30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2
37、025 IEEE International Solid-State Circuits Conference17 of 24Conventional protocol vs SCA protocolDQ7:0DQ7:0ALE/CLESCA ProtocolConv.ProtocolDTE improvementCMD/ADDR(Chip 1)CMD/ADDR(Chip 2)DATA(Chip 1)DATA(Chip 2)DATA(Chip 1)CMD/ADDR(Chip 1)DATA(Chip 2)CMD/ADDR(Chip 2)latencyParallel operationof CA b
38、us and DQ busNANDDieDQ7:0DQSRE/CEALECLE/WE/DQS/RENANDDieDQ7:0DQSRE/CA_CECA0CA1CA_CLK/DQS/RESCAConv.ProtocolSCA Protocol The SCA protocol separates the command/address and DQ busDTE is improved due to the parallel operation of CA1:0 and DQ7:0CA1:030.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAN
39、D Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference18 of 24Merit of Advanced SCA protocol with/ODTProposed SCA with NT-ODT enabled by/ODT pinCA BusDQ Bus/ODTCMD O/HCMD O/HSCESCTDMAReduced CMD overheadReduced CMD overheadCA1:0DQ7:0/ODT pinNAND#0NAND#1 Improvement of D
40、TE by 3.8%Reduced CMD overheadConventional SCA with NT-ODT enabled by NTO commandCA BusDQ BusCMD O/HCMD O/HNTO OffNTO OnSCESCTDMACA1:0DQ7:0NAND#0NAND#1*SCE:Select Chip Enable*SCT:Select Chip Terminate*DMA:Direct Memory Access*NTO:Non-Target ODTData Transfer Efficiency(a.u)Previous workThis workJDEC
41、SCASCA with/ODT3.8%SCA with/ODT30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference19 of 24Low-Power High-Speed IO Scheme Motivation:Low-Power High-Speed NAND FlashTo reduce channel power consumptionTo improve eye-
42、window(EW)and eye-height(EH)Concept:PI_LTT with DFE/RDCA schemePower Isolated LTT(PI_LTT)with lower voltage VccQLMitigation of signal-integrity degradation with DFE and RDCA scheme30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State C
43、ircuits Conference20 of 24Power Isolated Low Tapped Termination(PI_LTT)Channel Power()=,+,=/+VccQLChannel Power(a.u)CTTLTTPI-LTT1 AU0.66 AU0.18 AUVccQ=1.2VVccQ=1.2VVccQL=0.8VVccQL=0.6VPI-LTT0.41 AU By adding low power source VccQL to the main IO-Driver circuitPI_LTT achieves 82%and 73%power reductio
44、n compared to CTT and LTTThe low output swing of PI_LTT can affect signal-integrity due to Inter-Symbol-Interference and Duty-Cycle-Distortion82%73%TxVccQLVccQVRXRxCRCircuit30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits
45、 Conference21 of 24Circuit Diagram and Impact of DFEWrite Eye Mask with DFE 5.6Gb/s/pinV(mV)0.300.100.150.200.250.35DFE OffEW 62%UIEW 67%UIEH 140mV0.100.150.200.250.300.35400206080 100 120 140 160 1801e+001e-011e-021e-031e-041e-051e-061e-071e-081e-091e+001e-021e-041e-061e-081e-10400206080 100 120 14
46、0 160 180EH 150mVTime(ps)Time(ps)DFE OnV(mV)yk+-zkCLKDecision(Slicer)dkW1Z-1Feedback(FIR)Filter0 1 0 1 1 0 1 0DFE(Decision Feedback Equalizer)DFE is used to equalize channel loss by feeding back the weighted signal based on previous data DFE improves EW and EH by 5%and 10mV,achieving EW=67%UI and EH
47、=150mV30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference22 of 24Circuit Diagram and Impact of RDCAV(mV)EH 111mVEW 70%UI7006005004003002001000-1000179-179Time(ps)RDCA offEW 76%UIEH 130mV7006005004003002001000-1000
48、179-179Time(ps)V(mV)RDCA onRDCARDCARxTxVccQLVccQRDCA(Read Duty Cycle Adjustment Tx)CR RDCA is for adjusting the duty cycle to be 50%RDCA improves EW and EH by 6%and 19mV,achieving EW=76%UI and EH=130mVRead Eye Mask with RDCA 5.6Gb/s/pin30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash wi
49、th 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference23 of 24Contents IntroductionNAND ChallengesMerits of BV-NAND Architecture(IO,Bit-density)Key Features Key Design2-Transistor Coded-GSLStack-Dependent Pass-Voltage ControlExternal Power Assisted Core DrivingProposed SCA protoc
50、ol with/ODTLow power High speed IO scheme for 5.6-Gb/s/pin(PI-LTT,DFE,RDCA)Conclusion30.1 A 28Gb/mm24XX-Layer 1Tb 3b/cell WF-Bonding 3D-NAND Flash with 5.6Gb/s/pin IOs 2025 IEEE International Solid-State Circuits Conference24 of 24Conclusion 4XX-Layer 1Tb 3b/cell 3D-NAND device was fabricatedTo enha
51、nce bit density and IO Speed,we adopted1)BV-NAND architecture,and 2)Dummy-hole less SSL-cut processTo reduce core power consumption and to improve performance,we proposed1)2-Transistor Coded-GSL,2)SPVC,and 3)EPACD schemes Low power High speed(5.6Gbps)IO was achievedTo reduce channel power consumptio
52、n,we proposed PI_LTTTo enhance IO bandwidth and IO speed,we proposed1)SCA protocol with/ODT,2)DFE,and 3)RDCA schemes 2)Stack-dependent Pass-Voltage Control,3)External Power Assisted Core Driving2)Decision Feedback Equalizer,3)Read Duty Cycle Adjustment30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Im
53、proved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference1 of 22A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/OsKosuke
54、 Yanagidaira1,Mario Sako1,Yasuhiro Hirashima1,Junya Matsuno1,Yumi Higashi1,Yutaka Shimizu1,Akihiro Imamoto1,Kazuaki Kawaguchi1,Koji Tabata1,Takeshi Nakano1,Yusuke Ochi1,Hiroaki Hoshino1,Takeshi Hioka1,Shigehito Saigusa1,Hiroki Date1,Masaki Unno1,Jumpei Sato1,You Kamata1,Hardwell Chibvongodze2,Naoki
55、Ojima2,Hiroshi Sugawara2,Masahiro Kano2,Jang-woo Lee2,Hiroyuki Mizukoshi2,Ryuji Yamashita2,Kensaku Abe2,Naohito Morozumi2,In-Soo Yoon2,Takuya Ariki2,Jong Hak Yuh2,Khin Htoo2,Yosuke Kato2,Yoshihisa Watanabe1,Toshiyuki Kouchi11Kioxia Corporation,Tokyo,Japan,2Western Digital Corporation,Milpitas,CA30.2
56、:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference2 of 22Outline Introduction Power network with CMOS directly bonded to array(CBA)I/O features Word line(WL
57、)voltage-swing reduction control Summary30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference3 of 22Outline Introduction Power network with CMOS directly b
58、onded to array(CBA)I/O features WL voltage-swing reduction control Summary30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference4 of 22Introduction of CBAM.
59、Tagami,“CMOS Directly Bonded to Array(CBA)Technology for Future 3D Flash Memory,”IEEE IEDM Tech.Digest,2023.invertedMemory chipCMOS chipSchematic cross section of CBA chip and bond wire Bond interface(1)Independent process of memory cells and CMOS on different wafers(2)Wafer bonding(3)Wafer dicing(4
60、)Wire bondingRough process flow(*)Bonding pads for packagingControl circuitsCell arrayBP(*)overhangBond wireWafer bond process30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-S
61、tate Circuits Conference5 of 22History of 1Tb 3b/cell chipsHigher area density,smaller chip sizeNo additional metal layers for CMOSHow we connect extensive power network is a key to high area efficiency.Higher energy consumption with higher speed operationsHigh energy efficiency is required.This wor
62、k1 17Gb/mm2,3.2Gb/s7 10Gb/mm2,2.0Gb/sDensity:29Gb/mm2I/O:4.8Gb/s1 M.Sako et al.,IEEE VLSI Tech.and Cir.,p.978,2023.7 T.Higuchi et al.,ISSCC,p.428,2021.30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 I
63、EEE International Solid-State Circuits Conference6 of 22Chip architectureThe highest bit density and smallest chip size in the 1Tb products.Capacity 1Tb(3bits/cell)TechnologyCBA,332 WL layersBit density29 Gb/mm2(highest in the world)OrganizationPage size:16kB+ECC,Block size:93MB(5976 pages/block),4
64、planes(351+ext.blocks/plane)Overhang from memory cells for bonding pads for packaging(BPs)(98%of chip size)Highly area-efficient memory chip30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE Interna
65、tional Solid-State Circuits Conference7 of 22Outline Introduction Compaction of circuits and wires with CMOS directly bonded to array(CBA)I/O features WL voltage-swing reduction control Summary30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-I
66、solated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference8 of 22Design challenge in CBACMOS chipMemory cell array chip(no control circuits)Control circuitsCell arrayBP Compaction of circuits and wires suitable for memory cell array migration(higher stack or smaller
67、area)No unnecessary openings remain in memory cell array chip.No drastic design rule change(no additional layer)in CMOS chip.Control circuitsCell arrayBPAppropriate compaction enables low migration cost.openingSchematic cross section of CBA chips bondoverhangoverhang30.2:A 1Tb 3b/cell 3D Flash Memor
68、y with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference9 of 22Results of appropriate compaction Bit density improvement:71%(29 Gb/mm217 Gb/mm2)#of WL layers increase:52%(332 layers218 layers
69、)Migration with high area efficiencyLarge openings near BPsThis work1 M.Sako et al.,IEEE VLSI Tech.and Cir.,p.978,2023.Few opening near BPs30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE Internat
70、ional Solid-State Circuits Conference10 of 22Power network Memory cell array chipTop metal layer is dedicated to power network and BPs CMOS chipUpper 2 of 5 metal layers are used for power network The two lowest sheet resistance layers for power network in one CBA chip are effectively connected with
71、 BWWPC.Strong power networkBLWLMemory cell array chipCMOS chipStacked WLsBP(arranged on bottom side)Bonding pads for wafer-to-wafer power connection(BWWPC)Control circuitsTop metal layers have the lowest sheet resistance in the CBA chip.Plane 3Plane 0Plane 1Plane 2Connection region of stacked WLsPee
72、l-back view of the CBA chip Top metal layers30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference11 of 22How CBA power network is effective Vccq(low voltag
73、e)domainPrior architecture has wire track limitation.Vccq covered only around BP area.This work has sufficient wire tracks.Vccq can cover the whole chip.WLBLprior architecturethis work Energy reduction mJ/GB in data pathData input:-15%Data output:-20%Top view Vccq domain Prior architecture30.2:A 1Tb
74、 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference12 of 22Outline Introduction Power network with CMOS directly bonded to array(CBA)I/O features WL voltage-swing
75、reduction control Summary30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference13 of 22I/O featuresImprovement itemsenergy areaperformancePower-isolated low
76、-tapped termination(PI LTT)Per-pin VREFtraining(PPVT)2-way time-interleaved decision-feedback equalizer(2TI DFE)Separated-command-address(SCA)protocol30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IE
77、EE International Solid-State Circuits Conference14 of 22Power-isolated low-tapped termination(PI LTT)VSSVccqInternal circuitryOutput bufferVccqLDQNAND die PI LTT for output buffers:VccqL Vccq is introduced.DQ has large capacitanceLower power supply voltage simply reduces energy consumption.VccqL dom
78、ain is requiredCBA enables to add a new power domain without any area penalties of power lines.This workPrior architecture30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State
79、 Circuits Conference15 of 22PPVT and 2TI DFE 2TI DFEDFE improves data input margin2TI architecture only requires half area compared to 4TI DFETiming margin is improved with smaller area.PPVT for input receivers(IRECs)Internal VREFfor all DQs are separated to VREFii.2TI:2-way time interleaved30.2:A 1
80、Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference16 of 22SCA protocol CA bus independent of DQ busCommand/address input and data input/output are manageable in
81、 parallel.26%less latency from command input to data output.Conventional protocolDQ7:0chip 0chip 0chip 1chip 1CMD,ADDRdataCMD,ADDRdatalatencySCA protocolDQ7:0CA1:0chip 0chip 0chip 1chip 1chip 2chip 2(data)(CMD,ADDR)chip 3less latencyPerformance improvementSCA:Separated command addressRole changeNAND
82、DieNANDDieDQ7:0DQS,BDQS/RE,/REn/WP/CA_CECA0CA1CA_CLKSCADQ7:0DQS,BDQS/RE,/REn/WP/CEALECLE/WESCA protocolConventionalprotocol30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-Stat
83、e Circuits Conference17 of 22Outline Introduction Power network with CMOS directly bonded to array(CBA)I/O features WL voltage-swing reduction control Summary30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os
84、 2025 IEEE International Solid-State Circuits Conference18 of 22Background of WL voltage-swing reduction Total capacitance in the unselected WLs increasesSmaller voltage swing reduces tRead and energy consumptionSource lineBLSGSSGDWL331WL330WL0WL1Source lineBLSGSSGDWL217WL216WL0WL1This work1 M.Sako
85、et al.,IEEE VLSI Tech.and Cir.,p.978,2023.Larger parasitic capacitance331unselected WLs.217unselected WLs.30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Confer
86、ence19 of 22 Proposal Conventional readWL voltage-swing reduction controlUnselectedSelectedVreadSelectedUnselectedVreadVSS1st WL2nd WLUnselectedUnselectedVreadVreadVreadVSSUnselectedSelectedVreadSelectedUnselectedVreadIntermediate potential1st WL2nd WLUnselectedUnselectedVreadVreadenergy reductionRe
87、ad from 1st WL 2nd WL Nth WLtRead reductionVSSreduced swing30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference20 of 22WL voltage-swing reduction control
88、Improvements of tRead and energy consumption30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference21 of 22Outline Introduction Power network with CMOS direc
89、tly bonded to array(CBA)I/O features WL voltage-swing reduction control Summary30.2:A 1Tb 3b/cell 3D Flash Memory with a 29%-Improved-Energy-Efficiency Read Operation and 4.8Gb/s Power-Isolated Low-Tapped-Termination I/Os 2025 IEEE International Solid-State Circuits Conference22 of 22SummaryWe devel
90、oped an area-and energy-efficient 3D flash memory.Power network with CBA,consisting of two-low resistance metal layers,enabled 71%-improved bit density,even though the number of WL layers is increased only by 52%.I/O system,composed of low voltage small circuits such as PI LTT and 2TI DFE,achieved 4
91、.8Gb/s data transfer rate.WL voltage-swing reduction control realized 29%improvement on read energy consumption.30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circui
92、ts Conference1 of 40A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCKDistribution,an RC-Optimized Dual-Emphasis TX,andVoltage/Time-Margin-Enhanced Power ReductionSang-Hoon Kim,Jaehyeok Baek,Moon-Chul Choi,Daewoong Lee,Donggun An,Se mi Kim,Yeonggeun Song,Minkyo Shim,Sung-Yong Cho,Dongha Lee,Gunhee Cho,In
93、-Woo Jun,Juseop Park,TaeYoon Lee,Hwan-Chul Jung,Chanyong Lee,Gil-Young Kang,Hye-Ran Kim,Jongmyung Lee,Young Su Joo,Hyo-Jin Jung,Bokyeon Won,Ji-Hak Yu,Sangkeun Han,Yechan Hwang,Chungman Kim,Seok-Jung Kim,YoungSeok Lee,Young-Tae Kim,Myeong-O Kim,Wonhwa Shin,Tae-Young Oh,SangJoon HwangSamsung Electroni
94、cs,Hwaseong,Korea30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference2 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual
95、-emphasis TXTechniques for high-density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid
96、-State Circuits Conference3 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual-emphasis TXTechniques for high-density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an
97、RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference4 of 40GDDR Application The demand for high data rate and density has increased.High-resolutionGraphics ProcessingAutomotiveGaming consoleLarge-scaleData analysisAI/M
98、LGraphicDRAM30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference5 of 40 Developing a GDDR DRAM solution is essential to meet these growing requirements
99、.GDDR Trend30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference6 of 40GDDR Trend Developing a GDDR DRAM solution is essential to meet these growing req
100、uirements.42.5Gb/s and 24Gb30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference7 of 40DQ/WCKCADQ/WCKCADQ/WCKCADQ/WCKCAGDDR Challenge High Density Incre
101、ased Global I/O(GIO)length by large array sizeFor high-density memory,need to compensate the increased GIO loadingGIO lineGIO line30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International
102、Solid-State Circuits Conference8 of 40DQ/WCKCADQ/WCKCADQ/WCKCADQ/WCKCAWCKWCK Reduced length of clock distribution by DQ arrangementPower for clock distribution to DQ is reduced by 26%GDDR Challenge High Data Rate with Low PowerLongest clock pathDQDQDQDQDQDQ1-row DQ2-row DQ30.3:A 24Gb 42.5Gb/s GDDR7
103、DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference9 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual-emphasis TXTechniques for high-density memo
104、ryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference10 of 40WCK Archit
105、ecture in Previous Work 1ModeBufferLow-Freq(LF)CMOS_EVCHigh-Freq(HF)CMLRXw/DCCLF C2CCMOS DIV2CMLDIV2CML DIV2CMOS DIV2DQCAWCK4_DQWCK2_CMLCK4_CAWCK2WCK4HF pathLF path*EVC:External supply voltage30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage
106、/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference11 of 40WCK Architecture in Previous Work 1ModeBufferLow-Freq(LF)CMOS_EVCHigh-Freq(HF)CML Unnecessary current near the fLFbecause of CMLs static currentRXw/DCCLF C2CCMOS DIV2CMLDIV2CML DIV2CMOS DIV2DQCAWCK4_
107、DQWCK2_CMLCK4_CAWCK2WCK4HF pathLF pathLF(CMOS_EVC)FREQPrev.Work 1HF(CML)fLFfHF30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference12 of 40WCK Architect
108、ureModeBufferLow-Freq(LF)CMOS_EVCHigh-Freq Low-Power(HF_LP)CMLCMOS_EVC Less power,poor PSIJ(SE clocking)High-Freq(HF)CMLRXw/DCCLF C2CCMOS DIV2CMLDIV2HF_LP C2CCML DIV2CMOS DIV2DQCAWCK4_DQWCK2_CMLWCK2_CMOSCK4_CAWCK2WCK4HF pathHF_LP pathLF path30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribu
109、tion,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference13 of 40WCK ArchitectureRXw/DCCLF C2CCMOS DIV2CMLDIV2HF_LP C2CCML DIV2CMOS DIV2DQCAIVC regionWCK4_DQWCK2_CMLWCK2_CMOSCK4_CAWCK2WCK4HF pathHF_LP pathLF pathMod
110、eBufferLow-Freq(LF)CMOS_EVCCMOS_IVC Better PSIJHigh-Freq Low-Power(HF_LP)CMLCMOS_IVC Less powerHigh-Freq(HF)CML*IVC:Internal supply voltage30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE Inter
111、national Solid-State Circuits Conference14 of 40RXw/DCCLF C2CCMOS DIV2CMLDIV2HF_LP C2CCML DIV2CMOS DIV2DQCAIVC regionWCK4_DQWCK2_CMLWCK2_CMOSCK4_CAWCK2WCK4HF pathHF_LP pathLF pathWCK Architecture CMOS_IVC Smaller power(HF_LP)and better PSIJ(LF)LF(CMOS_EVC)FREQPrev.Work 1HF(CML)fLFfHF30.3:A 24Gb 42.5
112、Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference15 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual-emphasis TXTechniques for high-
113、density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference16 of
114、40+TT-coilTXLeg_A_BMainFFEACEQOn-chipEQ+CTLE+x4RXHLDFETransmitter and Receiver TX:AC Equalizer(ACEQ)+Feed-Forward Equalizer(FFE)RX:CTLE+DFE*CTLE:Continuous-Time Linear Equalizer/DFE:Decision Feedback Equalizer30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphas
115、is TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference17 of 40 TX:AC Equalizer(ACEQ)+Feed-Forward Equalizer(FFE)RC-optimization:Shared-R Source-Series Terminated(SST)driver Dual-emphasis:ACEQ using DRAM Metal-Oxide-Metal(MOM)capacitorTransmitte
116、r+TT-coilTXLeg_A_BMainFFEACEQOn-chipEQ+CTLE+x4RXHLDFE30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference18 of 40+TSERMainFFEACEQOn-chipEQ+BP and ACT r
117、esistorHigh sheet resistance of ACT resistor Needs small areaSensitive to process variationPrevious work 1RoutCodeBPACTTransmitter SST Driver30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE Int
118、ernational Solid-State Circuits Conference19 of 40 Only BP resistorLess sensitive to process variation Increased area by low sheet resistance(x1/10)Previous work 1+TSERMainFFEACEQOn-chipEQ+RoutCodeBPACTRoutCodeBPTransmitter SST Driver30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an
119、 RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference20 of 40Transmitter RC-optimized SST Driver Only BP resistor+Shared RLess sensitive to process variation Increased area by low sheet resistance(x1/10)Reduced area an
120、d capacitance+TSERMainFFEACEQOn-chipEQ+Previous work 1This workRoutCodeBPACTRoutCodeBP30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference21 of 40 MOS
121、capacitorHigh capacitance per unit areaCapacitance variation across voltagesTransmitter ACEQ using DRAM MOM Capacitor+TSERMainFFEACEQOn-chipEQ+CapacitanceVoltage30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Red
122、uction 2025 IEEE International Solid-State Circuits Conference22 of 40 MOS capacitor(Bi-direction)High capacitance per unit areaLess capacitance variation across voltages Need an ESD protection device because of its exposed gate+TSERMainFFEACEQOn-chipEQ+CapacitanceVoltageVoltageTransmitter ACEQ usin
123、g DRAM MOM Capacitor30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference23 of 40 MOM capacitorLower capacitance per unit area than MOS capacitor Still
124、enoughStable capacitance across voltagesNo SI degradation caused by ESD protection device+TSERMainFFEACEQOn-chipEQ+VoltageCapacitanceVoltageVoltageTransmitter ACEQ using DRAM MOM Capacitor30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Tim
125、e-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference24 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual-emphasis TXTechniques for high-density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.
126、3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference25 of 40Techniques for High-Density GDDRs core access time is the shortest due to its speed.Limited le
127、ngth of GIO line to support high-frequency operation Increased density Increased GIO lengthHigher GIO loading causes Fmaxdegradation and an increase in current.Techniques are required to enhance voltage margin and optimize power.DQ/WCKCADQ/WCKCADQ/WCKCADQ/WCKCAGIO lineGIO line30.3:A 24Gb 42.5Gb/s GD
128、DR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference26 of 40LIO/LIOBtVV_LIOV_LSAGIO/GIOBtV Increased GIO length Decreased voltage marginVoltage margin is enhanced by cross-co
129、upled PMOS.Techniques for High-Density Cross-coupled PMOSw/o XC-PMOSw/o XC-PMOSDQ/WCKCADQ/WCKCAGIO lineLIOGIOBGIOLIOLIOBVLIOVLSAEN_LSALSA_SRCIncreasedLIO/dGIOSplit TRsizeSplit signal timingLIOBFarNearGIO SW30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis
130、TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference27 of 40DQ/WCKCADQ/WCKCAGIO lineIOSAGIO lineGIOSWLSANearFar Increased GIO length Increased powerCore power in 4W and 4R is reduced by 5.5%and 4.9%by GIO switches.Techniques for High-Density GIO
131、 SwitchIOSAGIO lineLSANearFarGIOSWGIO SWFarNear30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference28 of 40I_LSA totalGIO lengthttEN_LSAEN_LSA_FAREN_LS
132、A_NEARTFARNEARGIO lineLIOGIOBGIOLIOLIOBVLIOVLSAEN_LSALSA_SRCIncreasedLIO/dGIOSplit TRsizeSplit signal timingLIOBFarNearGIO SW Longer activated duration in near cell by RC delayUnnecessary current consumptionTechniques for High-Density Split TimingDQ/WCKCADQ/WCKCAGIO line tEN_LSA t t 30.3:A 24Gb 42.5
133、Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference29 of 40I_LSA totalGIO lengthttEN_LSAEN_LSA_FAREN_LSA_NEARTFARNEARGIO lineLIOGIOBGIOLIOLIOBVLIOVLSAEN_LSALSA_SRCInc
134、reasedLIO/dGIOSplit TRsizeSplit signal timingLIOBFarNearGIO SW Reduced activated duration in near cell by split timingMinimizing unnecessary current consumptionTechniques for High-Density Split TimingFARNEARDQ/WCKCADQ/WCKCAGIO line30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC
135、-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference30 of 40ttQLIOLIOBEN_LSALSA_SRCTechniques for High-Density Split TR SizeDQ/WCKCADQ/WCKCAGIO lineLIOGIOBGIOLIOLIOBVLIOVLSAEN_LSALSA_SRCIncreasedLIO/dGIOSplit TRsizeSplit
136、 signal timingLIOBFarNearGIO SWFARQMINttQLIOLIOBEN_LSALSA_SRCQMINNEARtt Longer activated duration in near cell by RC delayUnnecessary power consumption30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 202
137、5 IEEE International Solid-State Circuits Conference31 of 40LIOLSA_SRCttttLIOBSmaller TRQMINQQLIOLIOBEN_LSAEN_LSALSA_SRC Reduced bias current in the near cell by split TR sizeMinimizing unnecessary power consumptionTechniques for High-Density Split TR SizeDQ/WCKCADQ/WCKCAGIO lineLIOGIOBGIOLIOLIOBVLI
138、OVLSAEN_LSALSA_SRCIncreasedLIO/dGIOSplit TRsizeSplit signal timingLIOBFarNearGIO SWFARNEARQMINQMIN30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference3
139、2 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual-emphasis TXTechniques for high-density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis
140、 TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference33 of 40Proposed On-Chip EVC Detector Different paths for input and strobe Timing skewAt High VDD:Leading strobe signal Setup margin(Tsetup)limitAt Low VDD:Lagging strobe signal Hold margin(Th
141、old)limit30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference34 of 40Proposed On-Chip EVC Detector Different paths for input and strobe Timing skewAt H
142、igh VDD:Leading strobe signal Setup margin(Tsetup)limitAt Low VDD:Lagging strobe signal Hold margin(Thold)limit Increased timing margin by adjusting T in EVC detector30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Powe
143、r Reduction 2025 IEEE International Solid-State Circuits Conference35 of 40Outline Introduction Key featuresProposed WCK distributionRC-optimized dual-emphasis TXTechniques for high-density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR
144、7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference36 of 40ProcessDRAM 1b nmSupply voltage1.2VData rate42.5Gb/s 1.2V38.8Gb/s 1.1VPackage type266B FBGADensity24Gb/PKGChip size3
145、8.85mm2/dieChip Implementation30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference37 of 40 42.5Gbps data rate at 1.2V(38.8Gbps at 1.1V)Measurement Resu
146、lts tCK ShmooPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPFFFFFPPPPPPPPPPPPPPPPPPPPPPPFFFFFFFFFPPPPPPPPPPPPPPPPPPPFFFFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFF
147、FFFFFFFFFFPPPPPPPPPPPFFFFFFFFFFFFFFFFFFFFFPPPPPPPFFFFFFFFFFFFFFFFFFFFFFFFFFPPFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFtWCKps42.5Gb/s 1.2V38.8Gb/s 1.1V PASS FAILFFF30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-E
148、nhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference38 of 404W/4R:Reduced by 9.1%/15.5%Power efficiency:Improved by 36%/13.6%over G63/Prev.G71Measurement Results Power ConsumptionIDDQ4WIDDQ4RRData rate Gb/sCurrent/PKG mAData rate Gb/sHF_LPHF-15.5%-9.1%HFHF_LPfHFfHFPower Ef
149、ficiencyG6 3G7 1This work-13.6%-36.0%30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEEE International Solid-State Circuits Conference39 of 40Outline Introduction Key featuresProposed WCK distribu
150、tionRC-optimized dual-emphasis TXTechniques for high-density memoryProposed on-chip EVC detector Implementation and measurement results Conclusion30.3:A 24Gb 42.5Gb/s GDDR7 DRAM with Low-Power WCK Distribution,an RC-optimized Dual-Emphasis TX,and Voltage/Time-Margin-Enhanced Power Reduction 2025 IEE
151、E International Solid-State Circuits Conference40 of 40 A 24Gb GDDR7 is implemented with 1b DRAM technology The fastest DRAM I/O speed(42.5Gbps)For low power and high data rateLow-power WCK distributionRC-optimized dual-emphasis transmitterOptimized circuits on cell array For enhancing voltage/time-
152、marginFor voltage margin:Cross-coupled PMOS in LSAFor time margin:On-chip EVC detectorConclusion(1)1-TR and shared-R SST driver (2)ACEQ using DRAM MOM cap(1)GIO switches (2)Split timing (3)Split transistor size30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Couple
153、d Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference1 of 27A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM ProcessJin-Hyeok Baek,Jang-Hoo
154、Kim,Yoo-Chang Sung,Jae-Woo Jeong,Jin-Kwan Park,Hyun-Kyu Oh,Bo-Hyeon Lee,Dong-Wan Ko,Tae-Seob Oh,Seung-Gi Hong,Chang-Ki Kwon,Daihyun Lim,Myeong-O Kim,Seung-Jun Bae,Tae-Young Oh,Sang-Jun HwangSamsung Electronics,Hwasung,Korea30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration
155、and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference2 of 27Outline Introduction Key schemes4-phase Self-CalibrationAC-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phas
156、e Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference3 of 27Outline Introduction Key schemes4-phase Self-CalibrationAC-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-P
157、ro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference4 of 27Introduction High-speed I/O and high-capacity requirementData Rate(Gb/s/pin)4.2Gb/s/pin6.4Gb/s/pin8.5Gb/s/pin9.6Gb/s/pin10
158、.7Gb/s/pinThis work(12.7Gb/s/pin)YearLPDDR4XLPDDR5LPDDR5XLPDDR5XLPDDR5XLPDDR5XLPDDR5XLPCAMM2TXRXMulti RankDRAM TransceiverBig loadings(4-Rank)Channel*LPCAMM2:Low-power compression-attached memory module2Memory bandwidth requirements30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Cal
159、ibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference5 of 27Introduction LPDDR5X adopts an internal 4-phase signaling.The quality of 4-phase clocks significantly impacts I/O margin.WCKBufferWCKWCKB/2DividerInter
160、nal4-phase WCKWCK_0WCK_90WCK_180WCK_270WCKBWCKInternalWCKtWCKPhase skew=0Phase skew 0Eye diagramIdeally,90 phase difference30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE Internationa
161、l Solid-State Circuits Conference6 of 27Outline Introduction Key schemes4-phase Self-CalibrationAC-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 202
162、5 IEEE International Solid-State Circuits Conference7 of 27Key schemes Overall configuration of the proposed LPDDR5XCalibrating internal4-phase clocksReducing the ISI in I/O30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-G
163、eneration 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference8 of 27Outline Introduction Key schemes4-phase Self-CalibrationAC-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equa
164、lization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference9 of 274-phase Self-Calibration The 4-phase clocks can be distorted due to non-idealities such as transistor and layout mismatches.These timing skews degrade the quality of the eye margin.phase skew
165、phase skewphase skewInternal clock path30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference10 of 274-phase Self-Calibration Conventional DCA i
166、s limited in correcting the 4-phase skews.To remove remaining skews,additional correction is needed.DCA=0 DCA0*DCA:Duty Cycle AdjusterThe duty cycle isadjusted by DCA.The phase difference between0-180 and 90-270cannot be adjusted.30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calib
167、ration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference11 of 274-phase Self-Calibration The monitoring path also has its own non-idealities.This offset from non-idealities must be canceled out toimprove correction a
168、ccuracy.phase skew monitoring pathclocks requiring correction30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference12 of 274-phase Self-Calibrat
169、ion The flip mux outputs clock combinations depending on thecalibration state.Calibration mode0-18090-270unflip/flip stateunflipflipunflipflipFlip muxoutputMUX_APhase0Phase180Phase90Phase270MUX_BPhase90Phase270Phase180Phase0MUX_CPhase90Phase270Phase180Phase0MUX_DPhase180Phase0Phase270Phase9030.4:A 1
170、6Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference13 of 274-phase Self-Calibration The phase skew behaves as a differential offset.skew :+CPIN-CPINB
171、CPIN-CPINB Unflip state Flip state+High dutyLow dutyflip muxPhase0MUX_APhase180MUX_DPhase180MUX_APhase0MUX_DMUX_AMUX_D30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Sol
172、id-State Circuits Conference14 of 274-phase Self-Calibration The phase skew behaves as a common-mode offset.skew :+CPIN-CPINBCPIN-CPINB Unflip state Flip state+High dutyHigh dutyflip muxPhase0MUX_APhase180MUX_DPhase180MUX_APhase0MUX_DMUX_AMUX_D30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Ph
173、ase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference15 of 274-phase Self-Calibration The duty-cycle error is converted into a voltage level.FOUT-FOUTB(v)FOUT-FOUTB(v)UNFLIPFLIPUNFLIP,FLIPError is ca
174、nceled outPhase skew is detected30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference16 of 274-phase Self-Calibration The codes for unflip and
175、flip are averaged.The same calibration process applies to phase 90 and 270.negativepositiveUnflip000100Flip010000AVG code000001+4-2+1111000000000000111Code sweepCOMPOUT000100(Unflip)30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization i
176、n a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference17 of 274-phase Self-CalibrationManual trimming of phase skews can increase the test time.The proposed self-calibration operates automatically after power-up.Power-upDRAMInitializationSelf-CalibrationTraining
177、(incl.DCM/DCA)Calibrate the phasebetween 0 and 180Calibrate the phasebetween 90 and 270Operate internal osc.Generate codes andapply them to the circuitOperatesautomatically*DCA:Duty Cycle Adjuster*DCM:Duty Cycle Monitor30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and
178、AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference18 of 27Outline Introduction Key schemes4-phase Self-CalibrationAC-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase S
179、elf-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference19 of 27AC-Coupled Transceiver Fully-differential cascode amplifier with an ACCB.The ACCB prevents the CML divider from failing due to small swings at
180、high frequencies.WCK ACCB=offACCB=on WCK Buffer Output 30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference20 of 27AC-Coupled Transceiver Auxi
181、liary input transistors help the HF operation of the SA.Superimposed outputs,OUT and OUTB,can be rapidlycharged and discharged.DQ Rx ACCE offACCE onISI is compensatedWrite Eye30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th
182、-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference21 of 27AC-Coupled Transceiver The phase splitter is used to reduce power consumption.The operating bandwidth of READ is extended with ACCP.DQ Tx Main signalACCPSOCAMM 512GB,4-Rank,6.7GbpsACCP on30tick31tickSlope im
183、provementRead EyeACCP off30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference22 of 27Outline Introduction Key schemes4-phase Self-CalibrationA
184、C-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference23 of 27Measurements The 1-distribution of th
185、e 4-phase skew is reduced by 33.7%.With Self-CalibrationWithout Self-Calibration30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference24 of 27Me
186、asurements Eye opening at 12.7Gbps and VDD2H=1.05V30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference25 of 27Measurements VDD2H vs.tWCK Shmoo
187、DQ07CADQ815Cell AreaCell Area Chip photo 30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference26 of 27Outline Introduction Key schemes4-phase S
188、elf-CalibrationAC-Coupled Transceiver Measurements Conclusion30.4:A 16Gb 12.7Gb/s/pin LPDDR5-Ultra-Pro DRAM with 4-Phase Self-Calibration and AC-Coupled Transceiver Equalization in a 5th-Generation 10nm DRAM Process 2025 IEEE International Solid-State Circuits Conference27 of 27Conclusion 4-phase Se
189、lf-CalibrationEmploys flip and unflip technique:Enables accurate correctionOperates automatically:Reduces the test time(no manual trimming)AC-Coupled TransceiverACCB,ACCE,and ACCP:Extend I/O bandwidth 4-phase skew is reduced by 33.7%.A 12.7Gb/s/pin LPDDR5X DRAM is achieved.30.5:A 321-Layer 2Tb 4b/ce
190、ll 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference1 of 25A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with 75MB/s Program ThroughputWanik Cho1,Chanhui Jeong1,Jongwoo Kim1,Jongseok Jung1,Keunseon Ahn1,Jayoon Goo1,SangkyuLee1,Kayoung Cho1,T
191、ei Cho1,Dauni Kim1,Gwan Park1,Yushin Ahn1,Sooyeol Chai1,Gwihan Ko1,Sunyoung Jung1,Eunwoo Jo1,Taehun Park1,Jinhyun Ban1,Cheoljoong Park1,Jae Hyun Park1,Sanghoon Oh1,Sojin Jeong1,Youngjun Kwak1,Kyungsoo Jeong1,Jinyeop Kim1,Minchol Shin1,EunhoYang1,Taisik Shin1,Youngil Kim1,Jiseong Mun1,Chanyang Ryu1,H
192、uihyeon Park1,Changwan Ha2,Jong Tai Park2,Peng Zhang3,Sooyong Park2,Rezaul Haque3,Hang Tian2,Sunghwa Ok1,Wonbeom Choi1,Junyoun Lim1,Dongkyu Yoon1,Sechun Park1,Wonsun Park1,Kichang Gwon1,Seungpil Lee1,Hwang Huh1,Woopyo Jeong1,Jungdal Choi11SK hynix,Icheon,Korea,2SK hynix,San Jose,CA,3SK hynix,Rancho
193、Cordova,CA30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference2 of 25Outline Introduction Place&Routing for 2Tb/6-plane Arch.Method To Reduce Program Disturbance Consecutive Bias Generator across Temperatures Area/
194、CIOReduction on Tx Interface Circuit Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference3 of 25Outline Introduction Place&Routing for 2Tb/6-plane Arch.Method To Reduce Program Disturbance Consecutive Bia
195、s Generator across Temperatures Area/CIOReduction on Tx Interface Circuit Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference4 of 25Chip ArchitectureRow DecoderPlane0333Gb ArrayPlane1333Gb ArrayPlane2333
196、Gb ArrayRow DecoderPlane3333Gb ArrayPlane4333Gb ArrayPlane5333Gb Array30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference5 of 25Key ParameterCapacity2Tb(4 bits/cell)Technology321 WL LayerChip Size68.31mm2Organizat
197、ion(16KB+ECC)/Page,10272 Pages/Block,160.5MB/Block,(293+EXT)Blocks/Plane,6 PlaneThroughputRead(tR):80us(Ave.)Program:75MB/sI/O Speed3200Mbps DDR,X8Power SupplyVCC:2.35 to 2.75VVCCQ:1.2V30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-St
198、ate Circuits Conference6 of 254b/Cell 3D NAND ComparisonISSCCThis WorkSamsung 24SK hynix 22Technology321 WL Layers280 WL Layers176 stacked WLDensity2Tb1Tb1Tb#of plane644Bit Density28.8Gb/mm228.5Gb/mm214.8Gb/mm2tR80us85us90usProgram Throughput75MB/s41MB/s40MB/sIO Speed3200MB/s3200MB/s1600MB/s30.5:A 3
199、21-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference7 of 25Outline Introduction Place&Routing for 2Tb/6-plane Arch.Method To Reduce Program Disturbance Consecutive Bias Generator across Temperatures Area/CIOReduction on Tx
200、 Interface Circuit Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference8 of 25 Increased#of dies within limited pkg heightRequested for Increased NAND density Due to increased NAND density,Requested for i
201、ncreased#of planes to enhance seq.performanceNeed for 2Tb-6Plane ArchitectureStack UpLimited HeightNAND Package1TB2TB4TBSeq.Write Performance 2Tb-6Plane 2Tb-4Plane2.04XMB/s30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits
202、Conference9 of 25 Problem:VCC IR dropIncreased VCC power line resistance (R1 R2 R2(6Plane)VCC pad30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference11 of 25 Problem:Limited top area available to fit wordline(WL)vo
203、ltage circuitry for 46 planesHow to place it and route the WL voltage lines?Challenge&Solution in 6-Plane Architecture(3/3)WL Voltage Lines in 4PWL Voltage Lines in 6PTopAreaHV REG:High Voltage Regulator/GW DEC:Global Wordline DecoderUse HighlyConductiveMetal(1)Short from PUMP to HV REG(2)Long from
204、HV REG to GW DEC(1)Long from PUMP to HV REG(2)Short from HV REG to GW DEC(1)+(2)resistance is equal b/w 4P and 6P30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference12 of 25Outline Introduction Place&Routing for 2T
205、b/6-plane Arch.Method To Reduce Program Disturbance Consecutive Bias Generator across Temperatures Area/CIOReduction on Tx Interface Circuit Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference13 of 25 As
206、 channel is boosted up higher,program disturbance is restrainedRaising bitline voltage level increases the channel boosting levelProgram Disturb ImprovementBitline Level(VCORE)Program Disturb VPGM:program pulse voltageVCORE:internal cell supply voltageProgram BitlineInhibit BitlineSelf Boosting30.5:
207、A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference14 of 25 Higher bitline level(VCORE)helps mitigate program disturb,but it results in an increase in power/currentVCORE is elevated up during program pulse onlyChalleng
208、e and Idea Increased CurrentDuring Read&VerifyConventional VCOREThis Work30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference15 of 25 As a result,+18%margin between erase and RV1 cells while other RV cells are unaf
209、fectedResult A This WorkVt DistributionProgram ICCErase-RV1 MarginA:Maintain VCORE(Conventional Method)B:Increase VCORE all the time during PGM#of Cells30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference16 of 25Ou
210、tline Introduction Place&Routing for 2Tb/6-plane Arch.Method To Reduce Program Disturbance Consecutive Bias Generator across Temperatures Area/CIOReduction on Tx Interface Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State
211、 Circuits Conference17 of 25 Problem:Temperature Coefficient(TC)is generated by discrete temperature sensor codeOutput voltage is discontinued across temperaturesVoltage Bias Generator across TemperaturesR2 is changedby digit codesOutputmVOutputColdHotTempCodeDiscrete ValueJumpTC30.5:A 321-Layer 2Tb
212、 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference18 of 25 Fully differential amplifier is used for accurate and continuous voltage biasConsecutive Voltage Bias Generator(1/2)CTAT:complementary-to-absolute-temperature coefficientPTA
213、T:proportional-to-absolute-temperature coefficientCMFB:common-mode feedback=+-TVBE30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference19 of 25 OUTC(or OUTP)shows continuous bias across temperatures through a two-st
214、ep process:Consecutive Voltage Bias Generator(2/2)=/()=+/()Bias Adjustment FlowStep1:At hot temp.,Equalize both CTAT and PTAT level,=/()Step2:At cold temp.,Sweep R2/R1 ratio to reach desired OUTC level,=/()Change R230.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 20
215、25 IEEE International Solid-State Circuits Conference20 of 25Outline Introduction Place&Routing for 2Tb/6-plane Arch.Method To Reduce Program Disturbance Consecutive Bias Generator across Temperatures Area/CIOReduction on Tx Interface Circuit Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memo
216、ry with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference21 of 25Both Low-tap termination(LTT)and Center-tap termination(CTT)are still required for interoperabilityAs data transfer rate increases,it is necessary to minimize IO capacitance(CIO),but leakage increases
217、in Tx circuitNAND Tx Interface TrendDual InterfaceLTTCTTPull upNMOSPMOSPull DownNMOSNMOSTx Driver1600MB/s2400MB/s3200MB/spFIO Capacitance30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference22 of 25To reduce leakage
218、 current,power gating is necessary,but CIOand area is reducedPower-gating header is added with pull-up NMOS only in order to decrease CIOand areaPower Gating Method in Tx Interface ConventionalThis Work30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE Inter
219、national Solid-State Circuits Conference23 of 25ResultTx Interface AreaTx IO Capacitance-55%-20%30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference24 of 25Outline Introduction Place&Routing for 2Tb/6-plane Arch.Me
220、thod To Reduce Program Disturbance Consecutive Bias Generator across Temperatures Area/CIOReduction on Tx Interface Circuit Conclusion30.5:A 321-Layer 2Tb 4b/cell 3D-NAND-Flash Memory with a 75MB/s Program Throughput 2025 IEEE International Solid-State Circuits Conference25 of 25Conclusion A 321-Lay
221、er 2Tb 4b/cell 3D-NAND-Flash Memory with 75MB/s Program Throughput has been developed successfully 6-Plane architecture is achieved by decreasing VCC IR drop and improving line resistance between pump and global wordline decoderThe BL voltage is increased during the program pulse to lessen disturb,w
222、here it is then reduced by 18%Fully differential amplifier is used to generate consecutive voltage across temperaturesRemoving the pull-up PMOS header in the Tx interface circuit led to a 55%decrease in area and a 20%reduction in CIO 2025 IEEE International Solid-State Circuits Conference1 of 2030-6
223、:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellA 64Gb DDR4 STT-MRAM Usinga Timing-Controlled Discharge-ReadingScheme for a 0.001681m21Selector-1MTJ Cross-Point CellKosuke Hatsuda1,Katsuhiko Hoya1,Ryousuke Takizawa1,Fumiyoshi M
224、atsuoka1,Takaya Yasuda1,Akira Katayama1,Tadashi Miyakawa1,Kazuyo Senju1,Kazuki Okawa1,Yuka Furukawa1,Yu Shimada1,Katsuya Kotake1,Sayaka Hirokawa1,Min Chul Shin2,Dong Keun Kim2,Tae Ho Kim2,Kyunghoon Kim2,Hisanori Aikawa3,Jeonghwan Song2,Toshihiko Nagase3,Soo Man Seo2,Soo Gil Kim2,Jaeyun Yi2,Seon Yong
225、 Cha21KIOXIA Corp.,Yokohama,Japan,2 SK hynix Inc.,Icheon,Korea,3 KIOXIA Korea Corp.,Seoul,Korea 2025 IEEE International Solid-State Circuits Conference2 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellOutline Introduct
226、ionIssues for high density MRAM Novel Read schemeTiming-controlled discharge read(TCDR)schemeLocal capacitance(LC)mode Evaluation results Summary 2025 IEEE International Solid-State Circuits Conference3 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um
227、2 1Selector-1MTJ Cross-Point CellIntroduction Demand for new applicationAI,big-data processing,Storage class memory(SCM)MRAM is one of the candidates High-density MRAM has notyet been achieved.Why?4GbISSCC2017 had been the largestDRAMSCMFlashSRAM10ns100ns1us100Gb100Mb1Tb10Gbfasterlarger1)Shrinking t
228、he cell size2)Reliably reading the small signal Challenging items are 2025 IEEE International Solid-State Circuits Conference4 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellSmallest cross-point 1S1M cell&arrayWL(4K)S
229、/A,W/DWLSWBLSWBLSWWLSWBL(2K)LogicWLBLH.Aikawa et al.,IEDM2024WLBLSelectorMTJ41nm0.001681um2WLBLMTJSelectorPeripheral under cells(PUC)Cell array 2025 IEEE International Solid-State Circuits Conference5 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2
230、1Selector-1MTJ Cross-Point CellCharacteristics for Selector and 1S1MSelector+MTJ(1S1M)SelectorVoltageOff-stateOn-stateCurrent(I)VthVholdIholdP-stateAP-stateOff-stateOn-stateVoltageCurrent(I)MTJSelectorSelector 2025 IEEE International Solid-State Circuits Conference6 of 2030-6:A 64Gb DDR4 STT-MRAM Us
231、ing a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellIssue of read operation for 1S1M cellTimeVVthVhold(3)(1)(2)(3)(1)(2)(1)V(1)(2)(3)Ir_limIVthVholdIholdConstant Current(CC)TimeVVth(1)(2)V(1)IVth(2)(3)Ir_limVreadIread(3)Constant Voltage(CV)VreadReaddistur
232、bSignalinstabilityIread=Ir_lim 2025 IEEE International Solid-State Circuits Conference7 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellOutline IntroductionIssues for high density MRAM Novel Read schemeTiming-controlle
233、d discharge read(TCDR)schemeLocal capacitance(LC)mode Evaluation results Summary 2025 IEEE International Solid-State Circuits Conference8 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellTiming-controlled discharge read
234、(TCDR)Voltage(a.u.)Time(a.u.)VBL(AP-state)BL(P-state)WL(1)(2)(3)V(1)(3)IVth(2)PAPIr_limIholdSignalStabilityReadDisturbAttribute in reading cellsCCUnstable(Oscillation)CVLarge current flowTCDRAllowable current and stable&DischargeStopDischargeStop 2025 IEEE International Solid-State Circuits Conferen
235、ce9 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point Cell(1)Timing-controlled discharge read(TCDR)Voltage(a.u.)Time(a.u.)VIVthPAPIr_limPRCHSINKWLBLArrayAmpWLSWBufferOffOnOffBLSWDriverOn(1)IholdBLWLVholdPre-charge and floati
236、ngOffOnOff 2025 IEEE International Solid-State Circuits Conference10 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point Cell(1)Timing-controlled discharge read(TCDR)Voltage(a.u.)Time(a.u.)VIVthPAPIr_limPRCHSINKWLBLArrayAmpWLS
237、WBufferBLSWDriver(1)(2)(2)IholdBL(AP-state)BL(P-state)WLVthVholdVholdPull-downOffOnOffOnOffOff 2025 IEEE International Solid-State Circuits Conference11 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point Cell(1)Timing-control
238、led discharge read(TCDR)Voltage(a.u.)Time(a.u.)VIVthPAPIr_limPRCHSINKWLBLArrayAmpWLSWBufferBLSWDriver(2)(2)(3)IholdVBL(AP-state)BL(P-state)WLVhold(3)Pull-upCurrent stopby pulling up WLOnOffOffOffOnOff 2025 IEEE International Solid-State Circuits Conference12 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Ti
239、ming-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellTiming-controlled discharge read(TCDR)(1)Time(a.u.)(2)(3)BL(AP)BL(P)WLVVhold(1)Voltage(a.u.)Time(a.u.)(2)(3)BL(AP-state)BL(P-state)WLVUsing Vhold of selectorWL timing control(This work)Current stopby turning off
240、of selector itselfCurrent stopby pulling up WLVoltage(a.u.)This case is advantageous for read margin 2025 IEEE International Solid-State Circuits Conference13 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellOutline Int
241、roductionIssues for high density MRAM Novel Read schemeTiming-controlled discharge read(TCDR)schemeLocal capacitance(LC)mode Evaluation results Summary 2025 IEEE International Solid-State Circuits Conference14 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.0
242、01681um2 1Selector-1MTJ Cross-Point CellLocal capacitance(LC)modeWLBLGBLCGBLCBL(ii)Local capacitance mode:C=CBLWLBLAmpGBLArrayCGBLCBL(i)Conventional mode:C=CBL+CGBLBLSWTime(a.u.)Cell Current(a.u.)Amp(i)Conv.mode(ii)Local cap.modeBLWLVoltage(i)(ii)Discharge timeSpike currentTime 2025 IEEE Internation
243、al Solid-State Circuits Conference15 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellOutline IntroductionIssues for high density MRAM Novel Read schemeTiming-controlled discharge read(TCDR)schemeLocal capacitance(LC)mo
244、de Evaluation results Summary 2025 IEEE International Solid-State Circuits Conference16 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellResults for TCDR scheme+LC modePAPVReadout BLvoltage(a.u.)Dot:MeasuredLine:Simulat
245、edWL pulse width(ns)12345678WL pulse width(ns)Stop by WLStop by SelectorAPPReadout voltage variation(a.u.)12345678BLVoltageAPPTimeVholdbetterOptimumStop by WLStop by SelectorWLBLVoltageAPPTimeVRead voltage variationRead voltage 2025 IEEE International Solid-State Circuits Conference17 of 2030-6:A 64
246、Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellRead margin for TCDR scheme+LC modeSigma0-1-2-3-44321S/A reference voltage(a.u.)VPAPVLocal cap.modeConv.modeAPP4 2025 IEEE International Solid-State Circuits Conference18 of 2030-6:A 6
247、4Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellOutline IntroductionIssues for high density MRAM Novel Read schemeTiming-controlled discharge read(TCDR)schemeLocal capacitance(LC)mode Evaluation results Summary 2025 IEEE Internatio
248、nal Solid-State Circuits Conference19 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellSummary A reliable operation for 64Gb high-density MRAMusing the worlds smallest cross-point 1S1M cellhas been presented.Proposed Ti
249、ming-controlled discharge readingwith Local capacitance mode enablesread disturb suppression,high-speed read pulseand large read margin.These results demonstrated that MRAM has potential to participate in new markets such asSCM applications and CXL.64Gb MRAMCircuit implementation capable of supporti
250、ng 128Gb(2-layers-array)Peri.+DDR4 I/FCoreControlMemoryArray 2025 IEEE International Solid-State Circuits Conference20 of 2030-6:A 64Gb DDR4 STT-MRAM Using a Timing-Controlled Discharge-Reading Schemefor a 0.001681um2 1Selector-1MTJ Cross-Point CellAcknowledgementsThe authors appreciate the invaluable contributions fromall SK hynix and KIOXIA current and former development members.Thank you for your attention.