《SESSION 13 - High-Density Memory and Interfaces.pdf》由會員分享,可在線閱讀,更多相關《SESSION 13 - High-Density Memory and Interfaces.pdf(278頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2024SESSION 13High-Density Memory and Interfaces13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference1 of 34A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO CircuitryJa
2、ehyeok Yang,Hyeongjun Ko,Kyunghoon Kim,Hyunsu Park,Jihwan Park,Ji-Hyo Kang,Jinyoup Cha,Seongjin Kim,Youngtaek Kim,Minsoo Park,Gangsik Lee,Keonho Lee,Sanghoon Lee,Gyunam Jeon,Sera Jeong,Yongsuk Joo,Jaehoon Cha,Seonwoo Hwang,Boram Kim,Sangyeon Byeon,Sungkwon Lee,Hyeonyeol Park,Joohwan Cho,Jonghwan Kim
3、SK Hynix13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference2 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Architecture TX Implementation RX Implementati
4、on Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference3 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Architecture TX Impl
5、ementation RX Implementation Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference4 of 34This Work0610141822268162432Published YearSpeed Gb/s/pin40ISSCC18ISSCC18ISSCC17ISSCC
6、16ISSCC11ISSCC09ISSCC08ISSCC07ISSCC06ISSCC21ISSCC21ISSCC22GDDR3GDDR4GDDR5(X)GDDR6(X)GDDR7NRZPAM3Trend of Graphics Memory13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference5 of 34IDD3NIDD0IDD4RIDD4WIDD2N
7、3DMark TimeSpy_4K 14GbpsActiveStandbyIssue on High-speed MemoryTEMPERATUREPOWERSPEED0MIN Performance degradation due to thermal throttlingNew PAM3-related blocks inevitably increase in absolute powerIDD3N contributes significantly to system power consumptionPAM315MIN13.1 A 35.4-Gb/s/pin 16-Gb GDDR7
8、with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference6 of 34CDN_ACDN_BC/ATX/RXGEN/BIASETCWRRDBank ActiveBank IdleACTPREReadingWritingIDD3NACTRD/WRPREVIOUSDESIGNTHIS DESIGNONONOFF(Power saving)ON(Fast wake-up)Active Stand-by Current Compo
9、nents CDN for T/RX must be ready for incoming/transmitting data Focused on reducing CDN current in IDD3NACTIVE STANDBY(IDD3N)CDN TotalCDN ENABLE TIMING13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference
10、7 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Architecture TX Implementation RX Implementation Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid
11、-State Circuits Conference8 of 34CADATA PERIPHERYCA PERIWCKWCKBBANK_UPWCK CDNDQDQ256-bit WCK_t/_c:Common clock inputs for DQs and CA Single Data Rate(SDR)CA#of CA pin:10 5 PAM3/NRZ signalingBANK_DN(8DQs+DQX,Y,E)CHANNEL ACHANNEL B 32n prefetch size&256 bit per arrayCHACHBCHCCHD 4-separate independent
12、 channelsGDDR7 Architecture 5 Key Changes13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference9 of 34Divided WCK4CAPW/GNDCACACACACA RX UNITCA RX UNITCA RX UNITCA RX UNITCA RX UNITPW/GNDPW/GNDPW/GNDPW/GNDP
13、W/GNDPW/GNDWCKWCKBLEFT DQRIGHT DQCAECC_UPECC_DNPAM3 ENC/DECBANK_UPBANK_DNSCRAM-BLERStaggered PAD C/A PERI.DATA PERI.GDDR7 Architecture-PlacementC/A Peri.:Pin-to-pin skew and phase mismatch of the divided WCKDATA Peri.:Logic delay and power(configured as signal repeaters)13.1 A 35.4-Gb/s/pin 16-Gb GD
14、DR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference10 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Architecture TX Implementation RX Implementation Measurement Results Conclusions13
15、.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference11 of 34Clocking ArchitectureDQnDQ0WCKWCKBDCCActive FBCMOS/2PHASE INTERPO-LATORCMOS/2PHASE INTERPO-LATOR2Ghz8GhzLCGO.REF.I.REF.LCGO_REF4GhzI_REFWCKWCKBWC
16、K GLOBAL LINECML2CMOSMUXMUXDQ1LCGSYNC_FLAG4 Global linesHalf(2)Quad(4)Octa(4)Ext.InputMulti-phase clocks are generated based on single-phase refMore advantageous in terms of transmitting power and skew Disadvantage in terms of wake-up time13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Ar
17、chitecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference12 of 34 WCKD ICLK I_REF QCLKEdge-aligned4-PHASEGEN.O4-PHASEGEN.ICML2CMOS44WCKWCKBI_REFO_REFLOCAL CLOCKGENERATOR2WCKD/BI/QCLKDQ TRANSCEIVERLocal Clock GeneratorLATLATLATLATLATLATWCKDWCKDBI_REFICLK0ICLKB180QCLK90Q
18、CLKB270Edge-aligned multi-phase clocks from the each stage of latchesA Few cycles to pass through the series of latches13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference13 of 34RXCML/2CML2CMOSCML/2CML2
19、CMOS4-PHASEGEN.O4-PHASEGEN.IC2C4-PHASEGEN.O4-PHASEGEN.IC2C4-PHASEGEN.O4-PHASEGEN.IC2C11WCKWCKBDQ0DQ1DQnTRXTRXTRXDQ0DQ1DQnStabilization TimePassPassL.C.G Fast Wake-up(Previous Design)Ref.clock is passed sequentially From near LCG to far LCG Long Initialization Time13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 wit
20、h a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference14 of 34RXCMOS/2CMOS/24-PHASEGEN.O4-PHASEGEN.IC2C4-PHASEGEN.O4-PHASEGEN.IC2C4-PHASEGEN.O4-PHASEGEN.IC2C11WCKWCKBDQ0DQ1DQnTRXTRXTRXDQ0DQ1DQnPHASE-INTERPOLATORPHASE-INTERPOLATORCML2CMOSStab
21、ilization TimeMulti-DropL.C.G Fast Wake-up(This Design)Multi-drop Ref.clock delivery:All LCGs simultaneously Short Initialization Time13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference15 of 34C2C Fast
22、Wake-up ImplementationDirect path is activated at wake-up for faster DC-convergence time 83%reduced from 12 to 2,at 32Gb/sVDD/2Enable TimeOFFWCK ONWCK/WCKBNode A W/O D.BNode A W/D.BConvergence TimeWCKWCKBAADirectBridge13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 I
23、O Circuitry 2024 IEEE International Solid-State Circuits Conference16 of 34PDEPDXACTWTS0.S15CADQWLIDD2PIDD2NIDD3NAS LOW AS IDD2P,2NRAPID CHANGEIDDMODECLOCK ENABLECLK_ENREDUCEDIDD4WW/OW/FAST W.UActive-standby Current ReductionCDN can be enabled by WT CMD due to fast wake-up featureRapid changes in cu
24、rrent lead to SNR degradation(PSIJ)Enhanced PDN13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference17 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Archit
25、ecture TX Implementation RX Implementation Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference18 of 3432:16MUX16:16MUX16:16MUXMUXMUXPAM3/NRZ16:816:88:48:44:24:22:1DQDLSBDM
26、SBMSB 16bLSB 16bFor NRZ Mode2:1ICLKOCLKOcta-RateQuad-RateHalf-RateWCK80 Ohm40 OhmPD_MSBPD_LSBPU_MSBPU_LSB80 Ohm80 Ohm80 OhmTx Architecture 2:1 Serializer using half-rate clock DDJ improvement PAM3/NRZ dual mode supportSYMBOLMSB/LSB00LL01LH11HHSYMBOL 01 CASE13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low
27、-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference19 of 34RTRTRONRONCPCPTx Architecture R-shared resolves the trade-off between RLM and bandwidth1.Sufficient Vds regardless of Rt and 2.Faster slew rateRTRONRONCPNon shared-R SSTShared-R SSTRLM B
28、WRLM BWor RLM BW 13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference20 of 34TX Equalization ACEQ:Main cursor boosting+post-cursor comp.(DFE)FFE:Little post-cursor left to be compensated by DFE ACEQCCRON
29、RONFFECA2CA1CA0W/DFECF0CF2CF1W/DFECapacitive Boosting1.01.15PWPWCapacitive EQ13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference21 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Cha
30、nges&Placement Clocking Architecture TX Implementation RX Implementation Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference22 of 34PAM3 Rx structure CTLE and DFE to compe
31、nsate for channel insertion loss Gain/BW:Programmable for different channel environmentsVREFDGEN.VREFDGEN.CTLECTLEDQVREFDHVREFDLIDFEGAINDFEGAINSRLATX4SRLATX4DFE_HPAM3DECCMFBHLDFE_LI,Q,IB,QBII,Q,IB,QBCTLESAMPLER&DFEALIGN13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3
32、IO Circuitry 2024 IEEE International Solid-State Circuits Conference23 of 34VREFDHVREFDTR_HVOFFHVREFDLVREFDTR_LVOFFLCTLEDQVREFDHVOFFHIVOFFIQVOFFQIBVOFFIBQBVOFFQBVCMVOFFICTLESAMPLERVOFFI,comp.VCMPer DQ TrainingMERGEOffset Calibration Need!Rx Offset Cancellation CTLE:Compensation with VREFDtraining pe
33、r each DQ SAMPLER:each samplers offset cannot be compensated13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference24 of 34INB0OUTBIN0CLKCLKOUTMUXOC_ENONCNTOCOC_ENX1X2X4X8LLLLMUXCNTB OCBOC_ENX1X2X4X8RRRRRx
34、Offset Cancellation Offsets are calibrated during the post-manufacturing test Transistor size is minimized to reduce the driving power4b CNTREGOUTCNTOSC_CLKCONTROL BLOCKS13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State
35、Circuits Conference25 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Architecture TX Implementation RX Implementation Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE
36、 International Solid-State Circuits Conference26 of 34VoltageBoardModelBOARDGDDR7 DRAM MODELCHACPMC1R1I1C2R2I2P1P2G1CHBCPMCHCCPMCHDCPMPKG MODEL(S-parameter)(S-para.)Regulator21.510.50Time s1.161.171.181.191.20VDDVC-Die 39nF(Vpp=43.8mV)C-Die 394nF(Vpp=31.2mV)IDD4WIDD4RIDD2SIDD2SIDD2SSimulation Result
37、Peak-to-Peak of the supply variation is reduced from 43.8mV to 31.2mVArea-efficient de-coupling capacitor is 10 x larger than the previous design13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference27 of
38、34Measurement ResultMeasured distribution of All DQs output timing over 4000 readPeak at the 1900th read stabilizes at the 3750th readThe maximum drift is reduced from 3.22ps to 2.5ps compared to the GDDR6 design VDDQV#_of_RDTXDQ0Mean of the all DQ output distributionDrift value psRD#600RD#1200RD#19
39、00RD#2500RD#3750RD#300Max Drift=2.5ps13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference28 of 34158mV45ps171mV49ps152mV45ps165mV48psRX SHMOO(ALL DQ)MIN 4ps CAL.ONCAL.OFFCAL.ONCAL.OFFEYE WIDTHEYE HEIGHTE
40、/H 13mV E/W 3ps CAL.ONCAL.OFFStdDev 0.0025 0.0018MIN 6.6mV Std:11.53 9.39MIN 4ps StdDev 0.0025 0.0015MIN 13.2mV Std:13.44 10.30Measurement Result RX valid window is improved by 13mV and 3ps at 28Gbps Standard deviation of each DQs EYE size is also decreased13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low
41、-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference29 of 34Measurement Result 28GbpsVertical:136mVHorizontal:0.54UI 32GbpsVertical:129mVHorizontal:0.41UI PAM3 Eye-Diagram with PRBS15 data patternVHVH13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Po
42、wer Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference30 of 3435.4Gbps 2tCK-tCCD operation 1.20VFurther Optimization38.8Gb/sMeasurement Result VDD-vs-Speed Shmoo plot,tested at 108C condition35.730.326.323.320.818.917.31.31.21.11.00.9Supply Voltage VD
43、ata Rate Gb/s43.513.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference31 of 34POWER DOWNREADWRITE35%20%10%ACTIVE20%IDLE15%Measurement Result Simple workload for power efficiency comparison Power Efficienc
44、y improves by 47%GDDR618GbpsGDDR732Gbps47%13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference32 of 34Chip Implementation13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 I
45、O Circuitry 2024 IEEE International Solid-State Circuits Conference33 of 34Outline Industry Trends&Issues GDDR7 Overall ArchitectureMajor Changes&Placement Clocking Architecture TX Implementation RX Implementation Measurement Results Conclusions13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clock
46、ing Architecture and PAM3 IO Circuitry 2024 IEEE International Solid-State Circuits Conference34 of 34Conclusions 1a Tech 16Gb GDDR7 with PAM3 I/O is implemented to achieve 35.4Gb/s speed at 1.2V operating condition The power efficiency of GDDR7 at 32Gb/s improves by 47%compared to GDDR6 at 18Gb/s T
47、he clocking architecture with fast wake-up feature provides IDD3N as low as the power-down mode The enhanced power-distribution network reduces the supply voltage variation and PSIJ drift13.1 A 35.4-Gb/s/pin 16-Gb GDDR7 with a Low-Power Clocking Architecture and PAM3 IO Circuitry 2024 IEEE Internati
48、onal Solid-State Circuits Conference35 of 34Please Scan to Rate This Paper13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference1 of 27A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mo
49、saic Architecture in a 5th-Generation 10-nm DRAM ProcessIkjoon Choi,Seunghwan Hong,Kihyun Kim,Jeong-Sik Hwang,Seunghan Woo,Young-Sang Kim,Cheong-Ryong Cho,Eun-Young Lee,Hun-Jae Lee,Min-Su Jung,Hee-Yun Jung,ju-Seong Hwang,Junsub Yoon,Wonmook Lim,Hyeong-Jin Yoo,Won-Ki Lee,Jung-Kyun Oh,Dong-Su Lee,Jong
50、-Eun Lee,Jun-Hyung Kim,Young-Kwan Kim,Su-JinPark,Byung-Kyu Ho,Byong-Wook Na,Hye-In Choi,Chung-Ki Lee,Soo-Jung Lee,Hyunsung Shin,Young-Kyu Lee,Jang-Woo Ryu,Sangwoong Shin,Sungchul Park,Daihyun Lim,Seung-Jun Bae,Young-soo Sohn,Tae-Young Oh,SangJoon HwangSamsung Electronics,Hwasung,Korea13.2:A 32-Gb 8.
51、0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference2 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve
52、high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM
53、 Process 2024 IEEE International Solid-State Circuits Conference3 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority vot
54、ingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference4 of 27DDR Trend DDR2/3 D
55、DR4 DDR5 DDR2/3 DDR4 DDR5 High-speed,high-density and low-power consumption32Gb DDR5 achieves 8Gb/s/pin 1.1V supply voltage13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference5 of 27Ou
56、tline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting E
57、xit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference6 of 27How to double the capacityPackage sizeB16GbADCABCDPackage sizeB32GbBDDAACCY 10.0 mmX 11.0 m
58、mBADCABCDBADCABCDBADCABCDBDCAA ACCCAB BDDBDBBDDAACCBDCAA ACCCAB BDDBDBBDDAACCBDCAA ACCCAB BDDBDBBDDAACCBDCAA ACCCAB BDDBDConventional methods of doubling capacity within the specified package size are not possibleIt is possible to increase 1.5 times horizontally and 1.33 times vertically The total c
59、apacity doublesWe wanted to make double capacity in 5th-Generation 10-nm DRAM process,but13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference7 of 27Previous and proposed 32Gb DDR5 arch
60、itectureISSCC23 D5 24GbISSCC21 LP5 16GbISSCC24 D5 32Gb(Proposed)ArchitectureDescriptionConventionally16Gb 24GbLength 1.5 timesComplex mosaic bank12Gb 16Gb1Gb35+512Mb 2Symmetric mosaic bank:16Gb 32Gb2 stacked banks share 1 data line(global IO line:GIO)FeatureHorizontal length DRAM densityDesign hard
61、Non-symmetric bankPut 32Gb density in fixed 10mm 11mm packagewhile maintaining center IO&PAD structure Read Latency16Gb 32Gb2.1 ns3.9 ns1.9 ns(a)1.33x1.5xI/O InterfaceCOLDECROWDEC0.67Gb0.33GbBank ABank ABank BBank BGIO ShareROWDEC(b)13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Archit
62、ecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference8 of 27How to manage large density stack bank Column DecoderC bankA bankColumn DecoderColumn Select LineGlobal IO line(data line)Column Select Line Physical bank Logical bank Divide one physical bank
63、 into two logical banks using tCCD_L timing Taking advantage of guaranteed“read to read”timing within different logical bank Two adjacent banks with large capacity share refresh noise with each otherIn refresh operation,using longer offset calibration time C/DC bankA bankC/DColumn Select LineGlobal
64、IO line(data line)Column Select LineGlobal IO line(data line)Conventional Proposed13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference9 of 27 The larger capacity,the larger chip size L
65、atency,power To overcome 2 shortcomings,we use HKMG process HKMG process:high-K metal gate processComparison table with previous work(24Gb DDR5)IDD7 power 20%reduction:operating/standby power,architectureProposed 32Gb DDR5 architecture and powerProduct24Gb DDR5 1Proposed 32Gb DDR5Fabrication Process
66、10nm DRAM(4thGeneration)10nm DRAM(5thGeneration)Die density24 Gb32 GbDensity per bank0.75 Gb1 GbGIO Length1x1.33xIDD7(mW)4800Mbps/x4/1.166V/100459 mW364 mW13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-St
67、ate Circuits Conference10 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock distribution with open loo
68、p DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference11 of 273DS DRAM Command path Previous 3DS DRAM command path(a)Al
69、l rank command decoder is operated in all ranks every time a command is received.It consumes huge power.Proposed 3DS DRAM command path with CID pre-decoding(b)Command decoder operation is performed only on the rank that matches the CID of the commandIt can reduce power consumption,due to unnecessary
70、 decoding operations.Primary RankCommand DecoderCS/CA PADCMDChip ID(a)previous Command pathtIS/HSecondary RankCommand DecoderCMDChip IDtIS/HTSVPrimary RankCommand DecoderCS/CA PADCMDChip ID(b)proposed Command pathtIS/HChip IDPre-decodeSecondary RankCommand DecoderCMDChip IDtIS/HChip IDPre-decodeTSVD
71、RV EN/DIS13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference12 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID P
72、re-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a
73、5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference13 of 27Proposed DFE architectureDQSSUMOBSUMODQSSRLatchCMLSummerFB1FB2SRLatchCMLSummerFB1FB2SRLatchCMLSummerFB1FB2SRLatchFB1FB2+XXXW2W3W4FB2FB3FB4+XW1FB1CML SummerSamplerDFE_OUTFB3FB4DFE_OUTFB3FB4DFE_OUTFB3FB4DF
74、E_OUTFB3FB4DQS IDQS QDQS IBDQS QBH1DDQSH1B CML Summer Replica DACFB+FB-CMDSIGNBFB+FB-VREFSeparated DFE ArchitectureCML Summer Replica DAC VBTAPVBTAP/2VBTAP/2VBCMLSIGNDQSDQSSampler Separated DFE architecture1st tap feedback is on sampler:DFE Feedback time 2nd4th tap feedback are on summer:precise tun
75、ing&PVT variation TxDQS=52.1tickTxV=48tickTxDQS=47.9tickTxV=40tickwrite eye simulation at 7200MbpsNormal DFE Separated DFE13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference14 of 27Ou
76、tline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting E
77、xit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference15 of 27Input offset cal.with majority votingInput offset cal.take place with ZQ calibration at th
78、e same timeCalibration direction is decided by majority voting of 4-outputTo minimize cal.Error,VGA gain is set to max during auto calibration13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits
79、Conference16 of 27Input offset cal.with majority voting:4.34mV:1.80mVInput referred offsetFrequencyAuto Cal.OffAuto Cal.OnMonte-carlo simulation resultsMeasured per pin VrefDQ training before/after offset with/without Input offset calibration13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosa
80、ic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference17 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE archite
81、ctureInput offset calibration with majority votingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-Sta
82、te Circuits Conference18 of 27Read clock distribution with open loop DCCDLLQECCLK treeDQ/DQSCKBUFCKBOpenloopDCC Conventional DDR5 read block diagramDLL:matches the skew of CK/CKB and DQS/DQSBDLL code changes from external factor cause duty cycle distortionQEC:matches the 4-phase skewDuty cycle disto
83、rtion of DLL lead to 4-phase skew at QECIf QEC cannot correct it immediately,deterministic jitter is generated13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference19 of 27Read clock dis
84、tribution with open loop DCCDLLQECCLK treeDQ/DQSCKBUFCKBOpenloopDCCI/QIB/QBI/QIB/QBOpen Loop Duty-cycle Corrector(DCC)Proposed DDR5 read block diagram13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State C
85、ircuits Conference20 of 27Read clock distribution with open loop DCC4-phase skewFrequencyDCC OffDCC On:3.24ps:1.83psI/QIB/QBI/QIB/QBOpen Loop Duty-cycle Corrector(DCC)Monte-carlo simulation results with/without open loop DCC13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in
86、 a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference21 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset
87、calibration with majority votingRead clock distribution with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Confer
88、ence22 of 273DS DRAM data-out path Previous 3DS DRAM data out path(a)Nested 2 FIFO system.One(FIFO_B)is for inner rank variation and the other(FIFO_A)is for inter rank variation.Monolithic and 3DS dies have different data paths.Proposed 3DS DRAM data out path with broadcasting EXIT(b)Using broadcast
89、ing EXIT signal,one FIFO system(FIFO_A)can compensate inner and inter rank variation.Monolithic and 3DS dies have same data paths.CommandDRAM coreData busDQ/DQS PADI/O CircuitDATA TSV ARRAYPrimary RankSecondary RankI/O Circuit(a)previous Data Out pathFIFO_BFIFO_BFIFO_ACommandDRAM coreData busCommand
90、DRAM coreData busDQ/DQS PADI/O CircuitDATA TSV ARRAYPrimary RankSecondary RankI/O CircuitFIFO_ACommandDRAM coreData bus(b)proposed Data Out pathEXITbroadcastingFIFO_AMono13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE Interna
91、tional Solid-State Circuits Conference23 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock distributio
92、n with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference24 of 27Measurement resultsProposed DDR5 tCK shmoo
93、 Read and write operation shmoo6.6 Gb/s 1.02 V8.0 Gb/s 1.10 V 6.4 Gb/s 1.10 V/previous work6.4 Gbpsat previous work13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference25 of 27Measureme
94、nt results Proposed open loop DCC8ps read timing margin improvement 8Gb/s Proposed Auto offset calibration25mV write voltage margin improvement 8Gb/sRead Shmoo 8GbpsWrite Shmoo 8Gbps13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024
95、IEEE International Solid-State Circuits Conference26 of 27Outline Introduction of DDR5 How to achieve high density DRAMProposed 32Gb DDR5 architecture3DS DRAM with“Chip ID Pre-Decode”How to achieve high speed performanceProposed DFE architectureInput offset calibration with majority votingRead clock
96、 distribution with open loop DCC3DS DRAM with“Broadcasting Exit”Measurement results Conclusion13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference27 of 27ConclusionPeripheral AreaCell
97、ArrayCell Array 8Gb/s 32Gb DDR5 SDRAM is implemented in a 5thgeneration 10nm DRAM technology Chip size:76.75 mm2 Proposed DDR5 includesSymmetric mosaic bank architectureSeparated DFE architectureInput offset calibration with majority votingRead clock distribution with open loop DCCNew 3DS technique
98、with CID pre-decoding13.2:A 32-Gb 8.0-Gb/s/pin DDR5 SDRAM with a Symmetric-Mosaic Architecture in a 5th-Generation 10-nm DRAM Process 2024 IEEE International Solid-State Circuits Conference28 of 27Please Scan to Rate This Paper13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal
99、Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference1 of 24A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO RateWontaeck Jung,Hyunggon Kim,Do-Bin Kim,Tae-Hyun Kim,Namhee Lee,Dongjin Shin,Minyong Kim,Youngs
100、ik Rho,Hun-Jong Lee,Yujin Hyun,Jaeyoung Park,Taekyung Kim,Hwiwon Kim,Gyeongwon Lee,Jisang Lee,Joonsuc Jang,Jungmin Park,Sion Kim,Su Chang Jeon,Suyong Kim,Jung-Ho Song,Min-Seok Kim,Taesung Lee,Byung-Kwan Chun,TongsungKim,Young Gyu Lee,Hokil Lee,Soowoong Lee,Hwaseok Lee,Dooho Cho,Sang-Wan Nam,Yeomyung
101、 Kim,Kunyong Yoon,Yoonjae Lee,Sunghoon Kim,Jungseok Hwang,Raehyun Song,Hyunsik Jang,Jaeick Son,Hongsoo Jeon,Myunghun Lee,Mookyung Lee,Kisung Kim,Eungsuk Lee,Myeongwoo Lee,Sungkyu Jo,Chan Ho Kim,Jong Chul Park,Kyunghwa Yun,Soonock Seol,Ji-Ho Cho,Seungjae Lee,Jin-Yub Lee,and Sung-Hoi HurSamsung Electr
102、onics,Hwaseong,Korea13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference2 of 24Outline IntroductionTrend of Bits-Per-Cell Transition and Bit density of QLCKey Features Write performance
103、Reprogram Challenges of QLC 3D NANDCoded-data On-Chip Buffered Program(C-OBP)Multi-State Predictive Program(MPP)Bit-DensityCharge Pump Stack Control scheme IO performanceTime-domain Decision Feedback Equalizer(TD-DFE)Recursive ZQ Calibration Scheme Conclusion13.3 A 280 layer 1Tb 4bit/cell 3D NAND Fl
104、ash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference3 of 24Contents IntroductionTrend of Bits-Per-Cell Transition and Bit density of QLCKey Features Write performanceReprogram Challenges of QLC 3D NANDCoded-data On-Chip Buffer
105、ed Program(C-OBP)Multi-State Predictive Program(MPP)Bit-DensityCharge Pump Stack Control scheme IO performanceTime-domain Decision Feedback Equalizer(TD-DFE)Recursive ZQ Calibration Scheme Conclusion13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Spe
106、ed IO Rate 2024 IEEE International Solid-State Circuits Conference4 of 24Bits-Per-Cell Transition*Source:Forward Insights13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference5 of 24Bit D
107、ensity Trend of 4bit/cell NAND FlashTechnologyBit Density Gb/This work280 Stacked WLISSCC2022ISSCC2021ISSCC2020ISSCC201813.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference6 of 24Key fe
108、atures Die photograph Focus Points Comparison Table QLC Issue:Reprogram Technique Demand on Cost-effective Storage:Maximize Bit-density Demand on High speed I/O:Maximize I/O Bandwidth13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024
109、IEEE International Solid-State Circuits Conference7 of 24Contents IntroductionTrend of Bits-Per-Cell Transition and Bit density of QLCKey Features Write performanceReprogram Challenges of QLC 3D NANDCoded-data On-Chip Buffered Program(C-OBP)Multi-State Predictive Program(MPP)Bit-DensityCharge Pump S
110、tack Control scheme IO performanceTime-domain Decision Feedback Equalizer(TD-DFE)Recursive ZQ Calibration Scheme Conclusion13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference8 of 24Rep
111、rogram Challenges of QLC 3D NAND Reprogram is inevitable technique for high reliability of QLC 3D NAND.Several challenges on using reprogram technique.1)Write Performance and area overhead for the coarse program data back-upwith On-chip buffered program(OBP).2)Increasing number of verify operation.M
112、ore advanced techniques are needed to enhance reprograming efficiency!Concept of ReprogramConcept of OBP Corse programRecovery data block Coarse program data back-up(4-page SLC Program)Power loss(Data loss)Fine program Core program data recoveryOverhead!13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash M
113、emory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference9 of 24Coded-data On-Chip Buffered Program(C-OBP)Motivation:Program performance improvement Reduction on SLC program for a coarse program data backup Concept:Generate 1-bit coded-
114、data for recovery The coded-data is mapped to 0 and 1,representing even/odd state information of 16 states.The read window can be wide with the coded-data,mapped with the even/odd state information.13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Spee
115、d IO Rate 2024 IEEE International Solid-State Circuits Conference10 of 24Concept of C-OBP After the coarse program operation,the error bit ratio for data read is significantly high because the windows between states are very narrow.The Even/Odd State information separates adjacent states,and it enab
116、les recovery read with high accuracy.Mapping 4bit state data to 1-bit coded data Even/Odd state Information(1-bit)Read window improvement using the coded-data,during the coarse program data recoveryWidened read window due to the coded-dataCoarse program data recovery with the coded-dataNarrow read w
117、indow after coarse programIt can not be read.+Coded-data13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference11 of 24Impact of C-OBP The amount of data that needs to be backed up is redu
118、ced to a quarter,resulting in a 75%reduction in the size of the data backup block.5%improvement in write performance.Conventional On-chip buffered program(OBP)Coded-data On-chip buffered program(C-OBP)Original Data(1-bit Program x4)Program time&Area overhead for the backup data Coded compact bits(1-
119、bit Program x1)Program time reduction&Density improvement 13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference12 of 24Multi-State Predictive Program(MPP)Motivation:Program performance i
120、mprovement Reduction#of verify operation only coarse programming.Concept:Auto-inhibit after additional program pulse Pn:normal(conventional)verify and inhibit operation.Pn+k:verify Pnand auto-inhibit after kthprogram pulse without verify.13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5G
121、b/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference13 of 24Concept of MPP&Optimal Point Decision A cell,targeting PN+7state,is determined as off-cell with VPN,the cell becomes inhibited after applying 7 program pulses.MPP with 2-groups becomes an
122、 optimal point where the trade-off between the performance improvement and the degradation of the Vthdistribution is well balanced.GroupVerify VoltageIncluded Verify StatesG1V1P1,P2,P3,P4,P5,P6,P7G2V8P8,P9,P10,P11,P12,P13,P14,P15State Step Down(w/each PGM pulse off-cell)PN+1PN+6PN+7PNPN+6PN+513.3 A
123、280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference14 of 24Impact of MPP In the re-program scheme,MPP is applied to the coarse program,resulting in a 75%decreasing on the number of verify ope
124、rations in the coarse program compared to 1(8-Groups predictive verify).The total programming time is reduced by 12%compared to 1,with the minimum Vth distribution degradation.Conventional reprogram Multi-State Predictive ProgramCoarseCoarse13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28
125、.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference15 of 24Contents IntroductionTrend of Bits-Per-Cell Transition and Bit density of QLCKey Features Write performanceReprogram Challenges of QLC 3D NANDCoded-data On-Chip Buffered Program(C-OBP)
126、Multi-State Predictive Program(MPP)Bit-DensityCharge Pump Stack Control scheme IO performanceTime-domain Decision Feedback Equalizer(TD-DFE)Recursive ZQ Calibration Scheme Conclusion13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 I
127、EEE International Solid-State Circuits Conference16 of 24Charge Pump Stack Control scheme Motivation:To enhance Bit-density Significant area overhead of pump,due to increasing number of wordlines.Concept:Using a shared common pump During the read operation,the common pump is connected in parallel wi
128、th the unselect voltage pump.During the program and erase operations,the common pump is connected in series with the select voltage pump.13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Confer
129、ence17 of 24Voltage&Current Margin for Each Operations The unselect Voltage(Vunselect)pump requires large current capacity,and the select voltage(Vselect)pump requires high voltage.During the program and erase operations,the unselect voltage pump has margin on the current budget.On the other hand,th
130、e select voltage pump has voltage margin for the target level during the read operation.Re-configuration of pump block is needed to utilize the additional margins.VunselectPumpVselectPumpVselectPumpVunselectPumpSelected WLPGMHigh-voltage drive Massive WLs drive to channelERS13.3 A 280 layer 1Tb 4bit
131、/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference18 of 24Operation&Impact In the read operation,the common pump supports the Vunselectpump to provide sufficient amount of current.During the program and erase
132、 operations,the number of stages on Vselectpump is extended due to the common pump.Bit density enhancement on NAND flash memory by reducing the area overhead of the Vunselectpump about 40%.Pump Area(a.u)ConventionalStack ControlVunselectCommonVunselectVselectVselect40%Reduction!13.3 A 280 layer 1Tb
133、4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference19 of 24Contents IntroductionTrend of Bits-Per-Cell Transition and Bit density of QLCKey Features Write performanceReprogram Challenges of QLC 3D NANDCode
134、d-data On-Chip Buffered Program(C-OBP)Multi-State Predictive Program(MPP)Bit-DensityCharge Pump Stack Control scheme IO performanceTime-domain Decision Feedback Equalizer(TD-DFE)Recursive ZQ Calibration Scheme Conclusion13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density
135、 and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference20 of 24IO Performance Improvement Schemes Motivation:Achieve a data rate of 3.2Gbps Degradation in signal integrity due to heavy channel capacitive loads(CIO)in multi-stacked NAND dies Increased RONmismatch due t
136、o binary type DQ driver.Concept Time-domain decision feedback equalizer(TD-DFE):uses pulse width modulation to enhance Signal Integrity Recursive ZQ calibration scheme:converts the reference impedance into Target RONvalue to reduce RONmismatch 13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with
137、 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference21 of 24Circuit Diagram and Impact of TD-DFE TD-DFE implements two pulse width modulated paths with loop-unrolled manner,and the path is selected depending on previous decision data.TD-DFE
138、enables adopting DFE in matched DQS and DQ path architecture(tDLY_DQS=tDLY_DQ),improving 18%of eye-width.DQVREFLHDQTreeDQS TreeDecisionDataZ-1tDLY_DQtDLY_DQS!#!#!#!#!#!#!#!#!#!#!#!#!#!RxVRxTDFE OFFE.WE.HE.WE.HDFE Off!#!#!#!#!#!#!#!#!#!#!#!#!#!RxVRxTDFE ON1.18*E.W E.H1.18xE.WE.HDFE On13.3 A 280 layer
139、 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference22 of 24Circuit Diagram of Recursive ZQ Calibration Recursive ZQ calibration scheme converts the reference impedance into the target RONvalue,by adju
140、sting the impedance of a dummy driver based on the previous ZQ code.RONmismatch can be reduced,and signal integrity can be improved.DUMMYZQVREFLOGICZQ PURZQPREV.LOOP CODERZQ/NLogicPrevious Loop CodeCodeImATargetRecursiveCodeZQCodeRON MismatchShiftedCodeRZQValue13.3 A 280 layer 1Tb 4bit/cell 3D NAND
141、Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference23 of 24Contents IntroductionTrend of Bits-Per-Cell Transition and Bit density of QLCKey Features Write performanceReprogram Challenges of QLC 3D NANDCoded-data On-Chip Buf
142、fered Program(C-OBP)Multi-State Predictive Program(MPP)Bit-DensityCharge Pump Stack Control scheme IO performanceTime-domain Decision Feedback Equalizer(TD-DFE)Recursive ZQ Calibration Scheme Conclusion13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2GB/s High
143、Speed IO Rate 2024 IEEE International Solid-State Circuits Conference24 of 24Conclusion Robust 4b/cell(QLC)3D NAND device is fabricated.To overcome limitations on reprogram,we proposed 1)C-OBP,and 2)MPP schemes.Cost-effective storage with high speed I/O is achieved.To enhance bit-density and I/O per
144、formance,we proposed1)Charge pump stack control,2)TD-DFE,and 3)Recursive ZQ calibration.Compared with the other equal-generation devices,highest bit-density(28.5Gb/mm2),and fastest I/O speed(3.2Gbps)are achieved.13.3 A 280 layer 1Tb 4bit/cell 3D NAND Flash Memory with 28.5Gb/mm2Areal Density and 3.2
145、GB/s High Speed IO Rate 2024 IEEE International Solid-State Circuits Conference25 of 24Please Scan to Rate This Paper13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference1 of 27A 4
146、8-GB 16-High 1280-GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area OptimizationJinhyung Lee*,Kyungjun Cho*,Chang Kwon Lee,Yeonho Lee,Jae-Hyung Park,Su-Hyun Oh,Yucheon Ju,Chunseok Jeong,Ho Sung Cho,Jaeseung Lee,Tae-Sik Yun,Jin Hee Cho,Sangmuk Oh,Junil Moon,Young-Jun Pa
147、rk,Hong-Seok Choi,In-Keun Kim,Seung Min Yang,Sun-Yeol Kim,Jaemin Jang,Jinwook Kim,Seong-Hee Lee,Younghyun Jeon,Juhyung Park,Tae-Kyun Kim,Dongyoon Ka,Sanghoon Oh,Jinse Kim,Junyeol Jeon,Seonhong Kim,Kyeong Tae Kim,Taeho Kim,Hyeonjin Yang,Dongju Yang,Minseop Lee,Heewoong Song,Dongwook Jang,Junghyun Shi
148、n,Hyunsik Kim,Changki Baek,Hajun Jeong,Jongchan Yoon,Seung-Kyun Lim,Kyo Yun Lee,Young Jun Koo,Myeong-Jae Park,Joohwan Cho,Jonghwan KimSK hynix,Icheon,Korea*Equally Credited Authors(ECAs)13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimizat
149、ion 2024 IEEE International Solid-State Circuits Conference2 of 27Outline Introduction New Design SchemesAll-around Power Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQS)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS)Byte-mapping Swap Scheme Experimental Results Conclusion13.4:A 48
150、GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference3 of 27Introduction HBMs in ISSCCLanguage model size approaches to 1 trillion 10 x bandwidth,48x density increases over the last 10 year
151、s0.010.111010010002017201820192020202120222023Model Size(in Billions)0102030405002004006008001000120014002012 2014 2016 2018 2020 2022 2024 2026Cube Density(GB)Bandwidth(GB/s)bandwidth(GB/s)density(GB)HBM2HBM1HBM2EHBM3HBM3E(This work)13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV
152、and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference4 of 27Introduction HBM Architecture 3D-stacked structure by using TSV a base die with core dies up to 16 slices 2.5D IC package by using silicon interposer 1024 IOs with 16 channelsProcessorSi
153、licon Interposer1024 IOs(Single-ended)HBMPackage SubstrateBase DieCore Dies(Rank0)Core Dies(Rank3)C4 BumpCHiCHmCHbCHfCHjCHnCHcCHgCHkCHoCHdCHlCHpCHbfjnCHcgkoPHYCHaCHeCHhCHaeimCHdhlpTSVDAMBIST13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optim
154、ization 2024 IEEE International Solid-State Circuits Conference5 of 27Introduction HBM Architecture 4 channels per slice and 16 banks per pseudo channel Need to extend 25%of bandwidth and 50%of die density in same chip sizePseudo Channel 1CHcCHbTSV AreaCHcCHdCHbCHaXCTRLBK0BK1BK2BK3BK4BK5BK6BK7Y&ECCY
155、&ECCPseudo Channel 0XCTRLPseudo Channel 1Pseudo Channel 1XCTRLBK8BK9BK10BK11BK12BK13BK14BK15Y&ECCY&ECCCategoryItemHBM3HBM3EPerformanceMax.Bandwidth1024GB/s1280GB/sData Rate8Gb/s/pin10Gb/s/pinConfigurationStack Height12-High16-HighDie Density16Gb24GbCube Density24GB48GBChip size11mm x 11mm11mm x 11mm
156、+25%+50%same13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference6 of 27Outline Introduction New Design SchemesAll-around Power Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQ
157、S)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS)Byte-mapping Swap Scheme Experimental Results Conclusion13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference7 of 27All
158、-around Power TSV The number of bank power TSV:+475%Height of periphral region:-31%HBM3HBM3ECell ArrayCell ArrayPeripheral Region#ofPower TSV:+475%Cell ArrayCell ArrayPeripheral RegionPeripheral Height:-31%Bank Power TSVBank Power TSV13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV
159、and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference8 of 27Dynamic Voltage Drop Color Map VppIR drop improves by 75%VPPVSSVDDPower HBM3HBM3E16%4%75%13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for T
160、SV Area Optimization 2024 IEEE International Solid-State Circuits Conference9 of 27Outline Introduction New Design SchemesAll-around Power Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQS)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS)Byte-mapping Swap Scheme Experimental Results Co
161、nclusion13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference10 of 276-Phase RDQS Scheme OverviewRDQSCH12CH0CH8CH4 Memory density extension Reduce number of TSVsRequired delay for
162、consecutive read commands to different rank13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference11 of 274-Phase RDQS Scheme(HBM3)Multiple sets of RDQS TSVs 2 sets of FDQS for 1 cha
163、nnel(at each Data PERI)CMD DECRD FIFODQSCTRLDQCTRLFIFO-outPC0 BankFIFO-inDataDQRDQS0FDQSCMDData PERICMD PERICore Die(Rank0)Core Die(Rank1)RDQS1Data PERI(PC1)PC1 BankRDQS0FDQSRDQS1Base Die13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimiza
164、tion 2024 IEEE International Solid-State Circuits Conference12 of 276-Phase RDQS Scheme(HBM3E)1 set of RDQS TSVs 1 set of FDQS for 1 channel(at CMD PERI)Number of TSVs:8%PERI Height:31%RD FIFODQSCTRLDQCTRLFIFO-outPC0 BankFIFO-inDataDQRDQSFDQSCMDData PERICore Die(Rank0)Core Die(Rank1)Data PERI(PC1)PC
165、1 BankBase DieCMD DECPCFDQSSPLITRDQSCMD PERI13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference13 of 27Timing Diagram of RDQS TSV ControlRDQS(Rank0)RDQS_EN(Rank0)Rank0FDQSRank14-
166、phase RDQS Scheme(HBM3)RDQS(Rank1)RDQS_EN(Rank1)2tCK 2tCKTSV RDQS CTRL Rank0TSV RDQS CTRL Rank1Rank0Rank16-phase RDQS Scheme(HBM3E)2tCK2tCKTSV RDQS CTRL Rank0TSV RDQS CTRL Rank1tCCDRtCCDRRD CMDRDQS(Rank0)RDQS_EN(Rank0)FDQSRDQS(Rank1)RDQS_EN(Rank1)RD CMD1tCK1tCKNo Timing Margin1tCK Timing Margin Guar
167、antee 1-tCK timing margin between two EN pulse from different rank13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference14 of 27Outline Introduction New Design SchemesAll-around Pow
168、er Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQS)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS)Byte-mapping Swap Scheme Experimental Results Conclusion13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE
169、International Solid-State Circuits Conference15 of 27Voltage-drift Compensator for WDQS Supply noise causes delay drift at WDQS CDN,degrading sampling margin of DQ RxD2SD2SDQS_ITD2S_tD2S_cFS_tFS_cVDDC_PVDDC_NVCBG8-DQ I/Os8-DQ I/Os8-DQ I/Os8-DQ I/OsWDQSBufferWDQS Buffer w/Proposed Voltage-Drifted Del
170、ay Compensator(VDDC)WDQS H-Tree in DWORD8-DQ I/OsWDQS_tWDQS_c1st stage2nd stageDQ I/ODQ I/ODQ I/ODQ I/ODQ I/ODQ I/ODQ I/ODQ I/ORPT8-DQ I/Os4-PhaseGenerator(Div.by 2)DQS_IBDQS_QTDQS_QBVDDC13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimiza
171、tion 2024 IEEE International Solid-State Circuits Conference16 of 27Voltage-drift Compensator for WDQS VCBG:Bias current tracks supply voltage change VDDC:Compensate for delay drift13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2
172、024 IEEE International Solid-State Circuits Conference17 of 27Voltage-drift Compensator for WDQS Delay difference within JEDEC voltage range is decreased by 5.7 times on average13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024
173、IEEE International Solid-State Circuits Conference18 of 27Outline Introduction New Design SchemesAll-around Power Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQS)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS)Byte-mapping Swap Scheme Experimental Results Conclusion13.4:A 48GB 16-Hi
174、gh 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference19 of 27Byte-mapping Swap Scheme PHY bump map at base die:mirror symmetry Cell array at core die:shift symmetryPseudo Channel 1CHcCHbCHaCHiCHmCH
175、bCHfCHjCHnCHcCHgCHkCHoCHdCHlCHpCHbfjnCHcgkoDW0DW1AWDW1DW0AWDW0DW1AWDW1DW0AWByte0Byte1Byte2Byte3CHaByte0Byte1Byte2Byte3Base DieTSV AreaCHaCore DieCHaByte0Byte1Byte2Byte3XCTRLBK0BK1BK2BK3BK4BK5BK6BK7Y&ECCY&ECCPseudo Channel 0XCTRLBK0BK1BK2BK3BK4BK5BK6BK7Y&ECCY&ECCCHaByte3Byte2Byte1Byte0CHaByte0Byte1By
176、te2Byte3CHaByte0Byte1Byte2Byte3CHaByte3Byte2Byte1Byte0CHaByte3Byte2Byte1Byte0CHaByte0Byte1Byte2Byte3Left ChannelByte Mapping(HBM3&3E)Right ChannelByte Mapping(HBM3)Right ChannelByte Mapping(HBM3E)13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area
177、 Optimization 2024 IEEE International Solid-State Circuits Conference20 of 27Byte-mapping Swap Scheme ECC correction capability is the same Data pattern within the octet unit(OCT)used for memory cell screening remains unchangedOCT0OCT1OCT2OCT3OCT4OCT5OCT6OCT7OCT8OCT9OCT10OCT11OCT12OCT13OCT14OCT15OCT
178、16OCT17OCT18Bank Organization for HBM3 and Left Channel of HBM3EByte0Byte1ParityECCByte2Byte3OCT15OCT16OCT17OCT18OCT11OCT12OCT13OCT14OCT8OCT9OCT10OCT4OCT5OCT6OCT7OCT0OCT1OCT2OCT3Bank Organization for Right Channel of HBM3EByte3Byte2ParityECCByte1Byte013.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-
179、Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference21 of 27Outline Introduction New Design SchemesAll-around Power Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQS)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS
180、)Byte-mapping Swap Scheme Experimental Results Conclusion13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference22 of 27tCK Shmoo 10Gbps data rate at 1.1V(9.6Gbps at 1.0V)Data Rate G
181、bpsVDD V10.09.08.01.001.051.101.151.201.251.3010.0Gbps 1.1V13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference23 of 27Comparison Table Backward CompatibilityData RateBandwidthMax
182、.DensityMicrobump ballmapOrganizationSupply VoltageChip SizeAddressesMicrobump pitchGenerationVDD=1.1V,VDDQ=1.1V,VDDQL=0.4V,VPP=1.8V7.0 Gb/s/pin*,8.0 Gb/s/pin896 GB/s*,1024 GB/sVDD=1.1V,VDDQ=1.1V,VDDQL=0.4V,VPP=1.8V10.0 Gb/s/pin1280 GB/s16 Gb 12-High=24 GB24 Gb 16-High=48 GBRA,CA,BA,SIDRA,CA,BA,SID1
183、6 channel 2 PCH 32 I/O16 channel 2 PCH 32 I/O7.08 mm 8.82 mm96 m 110 m11 mm 11 mm7.08 mm 8.82 mm96 m 110 m11 mm 11 mmHBM3HBM3E*Ref.1 Ref.213.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits
184、 Conference24 of 27Die Photograph 16-Hi,4 slices per rank,4 channels per sliceBase DieCore Die16-HiSection ViewPHYTSVDACHaCHbCHcCHd13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Confer
185、ence25 of 27Outline Introduction New Design SchemesAll-around Power Through-silicon Via(TSV)6-Phase Read-data-strobe(RDQS)SchemeVoltage-drift Compensater for Write-data-strobe(WDQS)Byte-mapping Swap Scheme Experimental Results Conclusion13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power T
186、SV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference26 of 27Conclusion A 48GB 16-high 1280GB/s HBM3E DRAM is introduced 4 new design schemes and features are proposed for 25%bandwidth and 2x density extensionAll-around Power TSV6-Phase RDQS S
187、chemeVoltage-drift Compensater for WDQSByte-mapping Swap Scheme Proposed HBM3E DRAM is expected to meet the required momory bandwidth and capacity for high-end systems with backward compatibility13.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area
188、Optimization 2024 IEEE International Solid-State Circuits Conference27 of 27References1 M.-J.Park et al.,“A 192-Gb 12-High 896-GB/s HBM3 DRAM With a TSV Auto-Calibration Scheme and Machine-Learning-Based Layout Optimization,”IEEE JSSC,vol.58,no.1,pp.256-269,Jan.2023.2 Y.Ryu et al.,“A 16 GB 1024 GB/s
189、 HBM3 DRAM With Source-Synchronized Bus Design and On-Die Error Control Scheme for Enhanced RAS Features,”IEEE JSSC,vol.58,no.4,pp.1051-1061,Apr.2023.3 M.Mansuri and C.-K.K.Yang,A low-power adaptive bandwidth PLL and clock buffer with supply-noise compensation,IEEE JSSC,vol.38,no.11,pp.1804-1812,Nov
190、.2003.4 JESD238A:JEDEC Standard High Bandwidth Memory(HBM)DRAM Specification,Jan.202313.4:A 48GB 16-High 1280GB/s HBM3E DRAM with All-Around Power TSV and a 6-Phase RDQS Scheme for TSV Area Optimization 2024 IEEE International Solid-State Circuits Conference28 of 27Please Scan to Rate This Paper13.5
191、:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference1 of 29A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-
192、Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOSWeitao Wu,Hongzhi Wu,Liping Zhong,Xuxu Cheng,Xiongshi Luo,Dongfan Xu,Catherine Wang,Zhenghao Li,Quan PanSouthern University of Science and Technology,Shenzhen,China13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged P
193、re-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference2 of 29Outline Motivation Architecture Circuit ImplementationProposed Merged C-Peaking XTCReconfigurable FS-FFEClock Calibration Measurement Results
194、Conclusions13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference3 of 29Outline Motivation Architecture Circuit ImplementationProposed Merg
195、ed C-Peaking XTCReconfigurable FS-FFEClock Calibration Measurement Results Conclusions13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conferenc
196、e4 of 29MotivationISSCC 2021,MicronGDDR6XPAM-4 has been adopted to increase per-pin data rate in GDDRCrosstalk limits scaling of channel pitchTo further increase the throughput,crosstalk cancellation(XTC)is requiredT.M.Hollis,ISSCC21 H.-G.Ko,ISSCC2013.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter wi
197、th a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference5 of 29MotivationISSCC 2021,MicronGDDR6XPAM-4 has been adopted to increase per-pin data rate in GDDRCrosstalk limits scaling of channel
198、pitchTo further increase the throughput,crosstalk cancellation(XTC)is required=T.M.Hollis,ISSCC21 H.-G.Ko,ISSCC2013.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Sol
199、id-State Circuits Conference6 of 29MotivationComplex transitions introduce larger data-dependent jitter(DDJ)Fractional-spaced FFE(FS-FFE)to extend BW and reduce DDJThe SNR is degraded by 9.5dB while eye height is reduced to 1/3Avoiding extra swing reduction caused by de-emphasis XTCKai Sheng,ISSCC23
200、13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference7 of 29Outline Motivation Architecture Circuit ImplementationProposed Merged C-Peakin
201、g XTCReconfigurable FS-FFEClock Calibration Measurement Results Conclusions13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference8 of 29Arc
202、hitectureMSB(2x)CKPCKN4-Phase Clocks Gen.Adaptive DCCMSB(2x)TAPA(6x)TAPC(6x)4:1 MUXTap Position ModulatorSSTSSTQEC4C42222LSB(1x)4x2Pattern Gen.DNLSB0XTE24:1 SER with Tap Position Modulator 1-UI Pulse Gen.1-UI Pulse Gen.1-UI Pulse Gen.1-UI Pulse Gen.Re-timerTAPB(12x)PreDRV2OUT4:1 SER with Tap Positio
203、n Modulator4:1 SER with Tap Position Modulator4:1 SER with Tap Position Modulator2LSB1LSB2LSB3DN+1/N-14x20,90,180,27090,180,270,0180,270,0,90270,0,90,180FFE PresetControlControl3 Tap Reconfigurable FS-FFE4BufferCLK PathXTC PathFIR Data PathSSTPreDRVPreDRV2P_biasN_bias2DIV.3-Tap Reconfigurable FS-FFE
204、 To further extend BW Clock Calibration Adaptive DCC and QEC Merged C-Peaking XTC Pre-emphasis Quarter-Rate Architecture More time margin13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 202
205、4 IEEE International Solid-State Circuits Conference9 of 29Outline Motivation Architecture Circuit ImplementationProposed Merged C-Peaking XTCReconfigurable FS-FFEClock Calibration Measurement Results Conclusions13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-P
206、eaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference10 of 29Prior Arts:FIR-XTCMain PathXTC PathD0D0D1D1D1delayD1delayShort Current High power consumption due to short current Decreased output swing due to de-emphasis XTC sign
207、al aligns well with FEXTS.-Y.Kao,JSSC13 FIR-XTC is achieved by the subtraction of FIR taps13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Confe
208、rence11 of 29 Avoiding some patterns to decrease crosstalk-induced jitterPrior Arts:Fibonacci Coding XTC Pin efficiency is decreased by 25%Complexity and cost is increased for PAM-4 No need for coefficient controlQ.Liu,JSSC2313.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis
209、 Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference12 of 29Prior Arts:Dual-Mode Equalization C-peaking driver as a FEXT compensator with adjacent input Pre-emphasis without SNR degraded The excessive parasitic a
210、t the output limits BWS.-M.Lee,ISSCC20 A promising XTC technique for PAM-413.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference13 of 29Pro
211、posed Merged C-Peaking XTCD1D0OUT05b Cap Array5b Cap ArrayPhase AlignerXTC signalXTC signalMain signalVDDPDRVMP2MP1MN1MN2XTC pathMain pathImpedance Control Phase aligner XTC signal aligns with FEXT across PVT variation Cap arrays Coefficient control of XTC 13.5:A 64-Gb/s/pin PAM4 Single-Ended Transm
212、itter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference14 of 29Comparison with Prior XTCsNormalized TX Output Swing with the Variation of FEXT Proposed XTC is based on pre-emphasis wi
213、thout SNR degraded To ensure an adequate PAM-4 vertical eye-opening Proposed XTC is merged into driver to decrease the parasitic at the output To improve TX BW for high-speed operationTrise=12.3psTrise=10.2psDual-Mode EqualizationProposed merged C-Peaking XTCBW=28.4GHzBW=34.3GHz 21%Simulated Eye Dia
214、grams of TX(32Gb/s NRZ)13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference15 of 294:1 Serialization4:1 MUXTap Position ModulatorDIN01-UI
215、 Pulse Gen.1-UI Pulse Gen.1-UI Pulse Gen.1-UI Pulse Gen.Re-timerDIN1DIN2DIN30,90,180,27090,180,270,0180,270,0,90270,0,90,180R0R1R2R3C44FFE presetP0,N0P1,N1P2,N2P3,N32222P0N0C4_0R0C4_0C4_180C4_180C4_90C4_90R0C4_270C4_270(NOR)(NAND)P1P2P3N1N2N3DOUT1-UI Pulse Gen.4:1 MUX Dynamic logic(NAND+NOR)Fast spe
216、ed with small parasitic Tap position modulator Voltage control delay line in CLK path to achieve reconfigurable FS-FFE Inverter-based MUX Low power without static current13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for M
217、emory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference16 of 29Reconfigurable FS-FFED0D4CK90TAPA/TAPBD8CK0TAPCCK90TAPCD0 D1 D2 D3D5D6 D7D0 D1 D2 D3 D4 D5D6 D7DOUTTAPC(Post)DIN0CK0TAPA/TAPBDOUTTAPA/TAPB(Main)D4Adjusted by tap position modulator64Gb/sW/Post-Slice FS-FFE o
218、nW/Post-Slice UI-Spaced FFE on70mV6.25ps64Gb/s 70mV6.25ps0.39UI0.49UI0.36UI0.47UI0.53UI0.41UIReconfigurable FS-FFE enhances the width of the top,middle,and bottom eyes by 20.5%,8.2%,and 17.1%Timing diagram MeasuredFS-FFE performance13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-E
219、mphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference17 of 29Clock CalibrationQECs and DCCs are adopted before 4-phase clock generator To half the power and area consumptionAdaptiveDCC4-Phase Clock Gen.CK0/
220、1802QECDIV.2AdaptiveDCC4-Phase Clock Gen.CK90/270QECCLKP/N2CLKICLKQ2213.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference18 of 29Clock Ca
221、librationVDD VDDVDDCKINTuneCKINCKOUTDuty-cycle detectorCKOUTPCKOUTNTune4-Phase Clock Generator/S2DControl SignalCKIN QEC Adaptative DCCcurrentcurrentDutyDuty13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfa
222、ces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference19 of 29Measured Clock Calibration Performance Quadrature error From 11.36%to 1.12%Duty cycle error From 16.16%to 0.48%31.1ps31.25ps 31.4ps 31.25ps30.9ps31.6ps30.9psquadrature error:1.1226.2ps33.8ps29.7psduty cycle error:16.1635
223、.3ps27.7ps27.7ps34.8ps quadrature error:11.36duty cycle error:0.48w/o QEC w/o DCC w/QEC w/DCC 13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits C
224、onference20 of 29Outline Motivation Architecture Circuit ImplementationProposed Merged C-Peaking XTCReconfigurable FS-FFEClock Calibration Measurement Results Conclusion13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Me
225、mory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference21 of 29Die Photo and Power Breakdown2000m960mLane-1 Lane-2ABCDABCDLane-4ABCDABCDLane-3170um500mCLK PathSerializerPre-driverDriver Driver(27mW)CLK Path(25mW)Serializer(21mW)Pre-driver(8mW)Total Power=81mW(Per lane)In
226、 a 28nm CMOS technology Energy efficiency:1.27pJ/b Core area:0.085mm2/lane 13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference22 of 29Me
227、asurement SetupPower SupplyKeysightN6705CPCTest ChipRegulationBoard16GHz TX CLKChannelWith XTI2CVISAPWRVbiasVDDOscilloscope Keysight N1060 AWG Keysight 8196A8GHz Sampling CLKPower SupplyAWG Keysight 8196ATest ChipChannelsOscilloscope Keysight N1060-11dB16GHz-15.8dB16GHz Channel response13.5:A 64-Gb/
228、s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference23 of 29Measured 32Gb/s NRZ Eye Diagramsw/o XTw/XT&w/o XTCw/XT&w/XTC0.64UI80mV6.25ps0.32UI6.25ps10
229、0mV0.6UI90mV6.25psCrosstalk reduces the horizontal eye-opening from 0.64UI to 0.32UI XTC improves the horizontal eye-opening from 0.32UI to 0.6UI13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm
230、CMOS 2024 IEEE International Solid-State Circuits Conference24 of 29Measured 64Gb/s PAM-4 Eye Diagrams0.43UI6.25ps90mV6.25ps100mVw/o XTw/XT&w/o XTCw/XT&w/XTC0.36UI6.25ps90mVCrosstalk reduces the horizontal eye-opening from 0.43UI to 0 XTC improves the horizontal eye-opening from 0 to 0.36UI13.5:A 64
231、-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference25 of 29Measured Eye DiagramsEye OpenningXTC offXTC on878232Gb/s NRZ0.320.6XTC off64Gb/s PAM-4
232、XTC onVertical Eye Opening(mV)100180CIJ Reduction()Horizontal Eye Opening(UI)000.3636The CIJ reduction ratio of the proposed XTC is 87%for 32Gb/s NRZ and 82%for 64Gb/s PAM-4CIJ Reduction Ratio=The width of eye w/o XT The width of eye w/XT&w/o XTCThe width of eye w/XT&w/XTC The width of eye w/XT&w/o
233、XTC13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference26 of 29Outline Motivation Architecture Circuit ImplementationProposed Merged C-Pe
234、aking XTCReconfigurable FS-FFEClock Calibration Measurement Results Conclusions13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference27 of
235、29Conclusions*Figure of Merit:Energy Efficiency(pJ/bit)/IL at Nyquist Frequency(dB)*Estimated from the reduction ratio of the crosstalk noise amplitude*According to the power breakdownFEXTNyquist Frequency(dB)-4.5/-16.4/-15.8Kim CICC2228nm60PAM-43.22-Tap FFE/200%NOJSSC23 428nm10NRZ22-Tap FFEFibonacc
236、iCoding75%NO-7.8ISSCC20 5ISSCC20 6This WorkReferenceJSSC13 228nmData Rates(Gb/s/pin)7.518464Technology65nm8nm65nmPAM-4ILNyquist Frequency(dB)5.91010.211SignalingNRZNRZNRZ3-Tap Reconfigurable FS-FFEXTC TypeFIR-XTCDual-ModeEqualizationFIR-XTCMerged C-peaking XTC TX Equalization2-Tap FFE2-Tap FFE2-TapS
237、ub-UI FFE200%Support PAM-4 XTCNONONOYESPin Efficiency100%100%100%60%(NRZ)/36%(PAM-4)CIJ Reduction Ratio75%*(NRZ)/78%(NRZ)87%(NRZ)/82%(PAM-4)Jitter Reduction Ratio50%(NRZ)40%(NRZ)46%(NRZ)45%(NRZ)65*(NRZ)28%(PAM-4)/1.27FoM*(pJ/bit/dB)0.596/0.1370.115Energy Efficiency(pJ/bit)3.52/1.4*1.19*0.5951.670.52
238、13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference28 of 29Conclusions A 64-Gb/s/pin PAM4 single-ended transmitter with a merged pre-emp
239、hasis capacitive-peaking crosstalk cancellation scheme for memory interfaces in 28nm CMOS Proposed merged C-Peaking XTC decreases 82%CIJ for PAM-4 without reducing output swing 3-tap reconfigurable FS-FFE is adopted to decrease DDJ and improve the width of PAM-4 eyes QEC and adaptive DCC calibrate q
240、uadrature error and duty cycle error to reduce output jitter13.5:A 64-Gb/s/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference29 of 29Q&A13.5:A 64-Gb/s
241、/pin PAM4 Single-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation Scheme for Memory Interfaces in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference30 of 29Please Scan to Rate This Paper13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equaliz
242、ation and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference1 of 31A 16Gb 37Gb/s GDDR7 DRAMwith PAM3-Optimized TRX Equalizationand ZQ CalibrationSung-Yong Cho,Moon-Chul Choi,Jaehyeok Baek,Donggun An,Sanghoon Kim,Daewoong Lee,Seongyeal Yang,Gil-Young Kang,Juseop Park,Kyungho Lee,H
243、wan-Chul Jung,Gunhee Cho,Chanyong Lee,Hye-Ran Kim,Yong-Jae Shin,Hanna Park,Sangyong Lee,Jonghyuk Kim,Bokyeon Won,Jungil Mok,Kijin Kim,Unhak Lim,Hong-Jun Jin,YoungSeok Lee,Young-Tae Kim,Heonjoo Ha,Jinchan Ahn,Wonju Sung,Yoontaek Jang,Hoyoung Song,Hyodong Ban,TaeHoon Park,Tae-Young Oh,Changsik Yoo,San
244、gJoon HwangSamsung Electronics,Hwasung,Korea13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference2 of 31 Introduction of GDDR7 Key schemesAdaptive gain controlled TX FFEPAM3 optimized RX equalizerZQ calibrationWC
245、K distribution Measurements ConclusionOutline13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference3 of 31 Introduction of GDDR7 Key schemesAdaptive gain controlled TX FFEPAM3 optimized RX equalizerZQ calibrationW
246、CK distribution Measurements ConclusionOutline13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference4 of 31 Higher I/O speed,lower supply voltageGDDR TrendISSCC16GDDR5ISSCC17GDDR5XISSCC18GDDR6ISSCC18GDDR6ISSCC21GD
247、DR6XISSCC21GDDR6ISSCC22GDDR6510204020152016201720182019202020212022202320242025Data Rate Gb/s/pinYearThis WorkGDDR7GDDR5/5X GDDR6/6X GDDR7 VDD=1.5V VDD=1.35V VDD=1.2V 13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Co
248、nference5 of 31Introduction of GDDR7 Comparison between GDDR6 vs.GDDR7GDDR6GDDR7Supply1.35V1.2VMax.Bandwidth27Gb/s/pinISSCC2237Gb/s/pinThis workDQ SignalingNRZPAM3/NRZClock PinsWCK_T/C,CK_T/CWCK_T/C,RCK_T/CDQ Pins/CHDQ15:0,DBI1:0,EDC1:0DQ9:0,DQECA Pins/CHCA9:0CA4:0,ERR#of CH/PKG2413.6:A 16Gb 37Gb/s
249、GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference6 of 31Introduction of GDDR7 GDDR7 consists of 2-die for 4-CH configurations CK_T/C are eliminated and RCK_T/C are added GDDR7 employs PAM3 signaling for higher bandwidthReceiver
250、TransmitterReceiver16:8 Mux8:16 DeMuxTransmitterPAM3Encoder/DecoderReceiverDividerWCK_t/WCK_cCA4:0DQ9:0,DQE16:8 MuxDividerCellArrayController GDDR7 PHY GDDR7 Channel 44242PAM3RCK_t/RCK_cCK pathDQ/WCKCACADQ/WCKDQ/WCKCACADQ/WCKDQ/WCKCACADQ/WCKDQ/WCKCACADQ/WCKCH-ACH-BCH-DCH-C1-PKG1-Die(2-CH)4-CH Mode2-
251、CH Mode13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference7 of 31PAM3 in GDDR7 PAM3 signaling is employed for bandwidth extension while using the minimum number of DQsWith the same WCK frequency,GDDR7 bandwidth
252、 is double compared to GDDR6while increasing 2-DQs only owing to PAM3 coding GDDR6 GDDR7 DQ0DQ1DQ2DQ7DQ3EDCCell Array9-I/OPinsSub-CHNRZDQ0DQ1PAM3 Coding(11-bits to 7-symbols)DQ2DQEDQ3Cell Array1-CH11-I/OPinsDQ9PAM3NRZ13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibrati
253、on 2024 IEEE International Solid-State Circuits Conference8 of 31 Introduction of GDDR7 Key schemesAdaptive gain controlled TX FFEPAM3 optimized RX equalizerZQ calibrationWCK distribution Measurements ConclusionOutline13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibrat
254、ion 2024 IEEE International Solid-State Circuits Conference9 of 31PAM3 TXFractionally-spaced FFE is usedOptimal FFE coefficients are achieved by ZQ calibration resultsDQFFE EncoderD0D1D2D3On-chip EQ4ZQ Calibration4:1Leg ALeg BPU_An:0PD_An:0PU_FFE_Am:0PD_FFE_Am:0PU_FFE_Bm:0PD_FFE_Bm:0PD_Bn:0PU_Bn:0PD
255、_An:0PU_An:0MainDrv.T-coilWCK0/1/2/3FFEDrv.PU_FFE_Am:0PD_FFE_Am:0Delayp:0TFractionally-spaced FFEPVT compensated FFE gain control13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference10 of 31PAM3 TX 1TR-1R driver
256、is employedSmaller CdieBetter linearityBetter slewPUn:0PDn:0INDOUTPUn:0PDn:0INDOUT Conventional TX This work 1TR-1R TX has a better voltage margin!*16Gbaud/s simulation results13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Ci
257、rcuits Conference11 of 31 Introduction of GDDR7 Key schemesAdaptive gain controlled TX FFEPAM3 optimized RX equalizerZQ calibrationWCK distribution Measurements ConclusionOutline13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State
258、Circuits Conference12 of 31PAM3 RX It consists of CTLE and DFE with two reference voltages PAM3 DFE is implemented using current summer typeSAQSAQSAQSAQCTLE_HV2IV2IV2IV2IWCK0WCK90WCK180WCK270VREFD_HDQCTLE_LVREFD_LV2IV2IV2IV2I4-phase S/ADFE13.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equaliz
259、ation and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference13 of 31PAM3 DFE 4-phase feedback merged by V2I improves the feedback timing margin Conventional DFE Proposed DFE V2IV2IV2IV2ILSA_H0SA_L0Proposed DFECTLE_HCTLE_LVREFD_HDQVREFD_LSimplified feedback path44HHCTLE_H4CTLE_LLV
260、REFD_HDQVREFD_LDirect Feedback DFESA_L0LH LSA_H0HH LCongestedfeedback path4444413.6:A 16Gb 37Gb/s GDDR7 DRAM with PAM3-Optimized TRX Equalization and ZQ Calibration 2024 IEEE International Solid-State Circuits Conference14 of 31PAM3 DFE*01 to 2x)vstepDual SSPC Concept0VVBL:VBL_sspc2VccVBL_sspcfaster
261、 slow inhibitfast PVPPVvstepConv.SSPC0VVBL:VccVBL_sspcslow inhibitfast vstepvsteppulses 1 to 2x)vstep0VVBL:VBL_sspc2VccVBL_sspcfaster slow inhibitfast vstep/2BL_inhibitBL_sspcBL_sspc2BL_pgmVcc0VVBL_sspc(source follower)VBL_sspc213.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with
262、 a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference10 of 20 Dual Selective Slow Program ConvergenceWL modulation for the additional SPPV level to maximize flexibilityExtra verify of SPPV level adds additional time during program verify PVLnLn+1Ln+2PPVLnLn+1Ln+2SPPVLn
263、Ln+1Ln+2WLPVn+1PVnWLPVn+2T_extra1T_extra2T_extra2PVPPVSPPVvstep*(1 to 2x)vstep0VVBL:VBL_sspc2VccVBL_sspcfaster slow inhibitfast vstep/213.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference11 of 20
264、OutlineDesign ArchitectureKey Feature ComparisonKey Design ItemDual Selective Slow Programming Convergence(Dual SSPC)Page Buffer sequencer with stable oscillatorPeak Power Management(PPM)3.6GT/s&Wafer Level Speed Test(WLST)Conclusion13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology
265、 with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference12 of 20 Page Buffer sequencer per planetR&tProg reduction by parallel offloading from main controllerBy-plane sequencer with dedicated FIFO queue and oscillatorPVT robust oscillator to minimize by plane variati
266、onPage BufferSeq(plane parallel)OSCQueueControllerCUALUNAND planeNAND die13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference13 of 20 PVT robust local oscillator BJT based 1storder zero Tc(Tempera
267、ture coefficient)Iref and Vbg generator Source-follower based comparator in SR latch relaxation coreAchieves 5%variation over process,voltage and T range of 130CVBGk*IPTATj*ICTATm*IPTATn*ICTATVGRS latchSF comparatorPTAT genCTAT genk*IREFk*IREFk*IREFCLKVBGVGVSVSIREFZ trimdTCC13.7 A 1-Tb Density 3-b/C
268、ell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference14 of 20 OutlineDesign ArchitectureKey Feature ComparisonKey Design ItemDual Selective Slow Programming Convergence(Dual SSPC)Page Buffer sequencer with stable oscillato
269、rPeak Power Management(PPM)3.6GT/s&Wafer Level Speed Test(WLST)Conclusion13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference15 of 20 Peak-Power-ManagementRound-Robin scheme with a synchronized co
270、unter2-stage arbitration:Internal arbitration performed by Threads Manager(TM):External arbitration performed by PPM Manager(PM)TokenDie#0Die#1Die#2Die#nPeak Current PMPMPMPMTMTMTMTM13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE Interna
271、tional Solid-State Circuits Conference16 of 20 User can provide max Peak Current BudgetA priority control feature allows the system to give priority to some operations which need to execute with minimum performance impactPeak-Power-ManagementPeak Current requestLogic TM interfaceTM PM interfaceSum R
272、equestTokenThreads ManagerPPM ManagerplanePlane RequestGrant BudgetplaneAllow each to runListen to other die&Check Budget13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference17 of 20 OutlineDesign
273、ArchitectureKey Feature ComparisonKey Design ItemDual Selective Slow Programming Convergence(Dual SSPC)Page Buffer sequencer with stable oscillatorPeak Power Management(PPM)3.6GT/s&Wafer Level Speed Test(WLST)Conclusion13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB
274、/s Write Throughput 2024 IEEE International Solid-State Circuits Conference18 of 20 WLST(Wafer Level Speed Test)functionHigh speed AC parameters can be screened by WLSTData pattern transfer and pass/fail checks are done by page bufferCacheLatchLatchLatchWLST OscillatorDe-serializerFIFOSerializer1stF
275、/FOutput DriverInput bufferDQSerializerOutput DriverDQS/REPage bufferCLKClock generator13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference19 of 20 3.6GT/s input/output measurement resultsMeasured
276、 pass window during write is 68%UI marginEye diagram measurement during read shows 75%UI margin190ps(68.5%UI)Vref 186mV1UI=277ps75.5%UIWriteReadPassFail01002003004000200400600800100013.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE Interna
277、tional Solid-State Circuits Conference20 of 20 ConclusionHigh bit density:20 Gb/mm22YY-tiers technologyCMOS under arrayHigh Write Throughput:300MB/s6 planes architectureDual Selective Slow Program ConvergencePage Buffer sequencer with stable oscillatorHigh Speed Interface:3.6GT/sWafer Level Speed Te
278、st13.7 A 1-Tb Density 3-b/Cell 3D-NAND Flash on a 2YY-Tier Technology with a 300-MB/s Write Throughput 2024 IEEE International Solid-State Circuits Conference21 of 20 Please Scan to Rate This Paper13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Cal
279、ibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference1 of 21A 1a-nm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance ReductionYangho Seo*,Jihee Choi*,Sunki Cho,
280、Hyunwook Han,Wonjong Kim,Gyeongha Ryu,Jungil Ahn,Younga Cho,Sungphil Choi,Seohee Lee,Wooju Lee,Chaehyuk Lee,Kiup Kim,Seongseop Lee,Sangbeom Park,Minjun Choi,Sungwoo Lee,Mino Kim,Taekyun Shin,Hyeongsoo Jeong,Hyunseung Kim,Houk Song,Yunsuk Hong,Seokju Yoon,Giwook Park,Hokeun You,Changkyu Choi,Hae-Kang
281、 Jung,Joohwan Cho,Jonghwan KimSK Hynix13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference2 of 21Outline Introduction Speed Enhancement Featu
282、resWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference3 of 21Outline
283、Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-Stat
284、e Circuits Conference4 of 21 LPDDR product family seeksSupply voltage as low as possibleIO speed as fast as possibleMobile DRAM TrendVDDVData rateMb/s/pinThis work(10.5G)Photo Credit:Qualcomm13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrate
285、d Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference5 of 21Outline Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction
286、 Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference6 of 21Speed Enhancement Features Interface architecture and key featuresCONTROLLERF/FDIV/2WCK_TWCK_C4DRAMCHANNELDQDataiWCKiDCMCML2CMOSWCKDist.SingleEdgeROD444DCA
287、4:1N-NF/FF/FF/FF/FTESTTESTBuf./Drv.Self-DCCPhase averagingDCM/DCA4 phase trimmingOffset calibrationDFEIO for DRAM testWCKRXPad cap.reduction 13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 202
288、4 IEEE International Solid-State Circuits Conference7 of 21Outline Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and
289、 Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference8 of 21 Self-Duty Cycle Corrector(DCC)WCK Correction Strategy(1/3)WCK input duty error:+/-7%Buffer output w/DCC:duty error+/-4%Buffer w/self-DCCWCKO_TWCKO_CWCK_TWCK_CENBENBSelf-DCCInternal Duty ErrorROD_OUTROD_IN
290、CNTCNT_OUT11:012tSkewtCommonWCK_TWCK_CDIV/2CML2CMOS4DQWCKDISTDCM4Single Edge RODTX4 Phase skewDCCDCAExt.Duty ErrorC.Lee et al.,“An 8.5-Gb/s/Pin 12-Gb LPDDR5 SDRAM with a Hybrid-Bank Architecture,Low Power,and Speed-Boosting Techniques,”IEEE JSSC,vol.56,no.1,pp.212-224,Jan.,202113.8:A 1anm 1.05V 10.5
291、Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference9 of 21 Duty cycle monitor/adjuster(DCM/DCA)&phase averagingENENBENBWCK_HWCK_LOUTBOUTLAT_ENOUTBOUTLAT_ENWCK_HtO
292、UTOUTBWCK_LENLAT_EN VWCK Correction Strategy(2/3)Internal Duty ErrorROD_OUTROD_INCNTCNT_OUT11:012tSkewtCommonWCK_TWCK_CDIV/2CML2CMOS4DQWCKDISTDCM4Single Edge RODTX4 Phase skewDCCDCAExt.Duty Error Fast process skew Slow process skewWCK_0WCK_180WCK_0DWCK_180DTRIM02:0TRIM1802:0ROD_OUTPhase AveragingCom
293、parator in DCM13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference10 of 21 4-phase skew trimming with single edge RODWCK Correction Strategy(
294、3/3)WCK_0WCK_90WCK_180WCK_270t4-phase skew 0-90Dont careWCK_0#of samplesskew_errorps=3.01 1.24 ps13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Co
295、nference11 of 21Outline Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE
296、International Solid-State Circuits Conference12 of 21Receiver Double-tail latch with 1 tap direct DFE Offset calibration scheme WCKBINVREFOINOIPOIPONEQENEQENBDFE0:2DFEB0:2OSDN0:1OSUP0:1OINOPWCKWCKBDQ0 RXDQ1 RXWCKTWCKCDQ6 RXDQ7 RXBL3BL2BL1BL0WCKGEN13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM
297、 with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference13 of 211-Tap DFE Current summer at input stageWCKBINVREFOINOIPOIPONEQENEQENBDFE0:2DFEB0:2OSDN0:1OSUP0:1OINOPWCKWCKBFFFFFFFFFFFFFFFFFFFFFFFFFF
298、FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFPPPFFFFFFFFFFFFFFFFFFFFFPPPPPFFFFFFFFFFFFFFFFFFFPPPPPPPFFFFFFFFFFFFFFFFFPPPPPPPPFFFFFFFFFFFFFFFFFPPPPPPPPPFFFFFFFFFFFFFFFPPPPPPPPPPPFFFFFFFFFFFFFFPPPPPPPPPPPFFFFFFFFFFFFFPPPPPPPPPPPPPFFFFFFFFFFFFPPPPPPPPPPPPPFFFFFFFF
299、FFFFPPPPPPPPPPPPPFFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFFPPPPPPPPPPPPPPFFFFFFF
300、FFFFFPPPPPPPPPPPPPFFFFFFFFFFFFPPPPPPPPPPPPFFFFFFFFFFFFFFPPPPPPPPPPFFFFFFFFFFFFFFFPPPPPPPPPPFFFFFFFFFFFFFFFFPPPPPPPPFFFFFFFFFFFFFFFFFFPPPPPPPFFFFFFFFFFFFFFFFFFPPPPPPFFFFFFFFFFFFFFFFFFFFPPPPFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
301、FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFPFFFFFFFFFFFFFFFFFFFFFFFPPPFFFFFFFFFFFFFFFFFFFFFPPPPFFFFFFFFFFFFFFFFFFFFPPPPPPFFFFFFFFFFFFFFFFFFPPPPPPPFFFFFFFFFFFFFFFFFFPPPPPPPPFFFFFFFFFFFFFFFFPPPPPPPPPFFFFFFFFF
302、FFFFFFFPPPPPPPPPPFFFFFFFFFFFFFFPPPPPPPPPPPFFFFFFFFFFFFFFPPPPPPPPPPPPFFFFFFFFFFFFPPPPPPPPPPPPPFFFFFFFFFFFFPPPPPPPPPPPPPFFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPPFFFFFFFPPPPPPPPPPPPPPPPPPFFFF
303、FFFPPPPPPPPPPPPPPPPPPFFFFFFFFPPPPPPPPPPPPPPPPPPFFFFFFFPPPPPPPPPPPPPPPPPPFFFFFFFPPPPPPPPPPPPPPPPPFFFFFFFFFPPPPPPPPPPPPPPPPFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFFPPPPPPPPPPPPPPPFFFFFFFFFFFPPPPPPPPPPPPPPFFFFFFFFFFFPPPPPPPPPPPPPFFFFFFFFFFFFFPPPPPPPPPPPPFFFFFFFFFFFFFPPPPPPPPPPPFFFFFFFFFFFFFFFPPPPPPPPPFFFFFFFF
304、FFFFFFFFPPPPPPPPPFFFFFFFFFFFFFFFFFPPPPPPPFFFFFFFFFFFFFFFFFFPPPPPPPFFFFFFFFFFFFFFFFFFFPPPPPFFFFFFFFFFFFFFFFFFFFFPPPFFFFFFFFFFFFFFFFFFFFFFFPPFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFw/o DFEw/DFEInput stageOutput stageINN0 1VREFWCKOUTN-1DFE strength ctrl.OUTN
305、DFE13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference14 of 21 Sequential calibration(from DQ0 to DQ7)Detect&latch OS codeInternal CLKVREFOD
306、TDet.CodeLatchRXOS CODEOS CodeComp.outOSOSBOSBOSOSOffset CalibrationDQ=3.38 1.40mV()-1515#of Sample00250VoffsetmVVoffset mV#of samplesDQ0.BL0BL1BL3BL2.Internal CLKOCC ENReset_nCal.All BLCal.All DQ&DMICal.All BLCal.All DQ&DMIReset_nOCC ENInternal CLK13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DR
307、AM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference15 of 21Outline Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.
308、05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference16 of 21IO for DRAM Test Pad for DRAM test with simplified TX&RXNo defect on package pad and no additio
309、nal capacitanceDQ PADTEST PADData ctrlWCKiDatai4F/FDatai44IO for DRAM test13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference17 of 21Outline
310、 Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-Sta
311、te Circuits Conference18 of 21 Shmoo of tCK-VDD2 Write&read eye window at 10.5Gb/s1.131.010.959 940ps:PASS:FAILtCK10Gbps VDD2=0.95V10.5Gbps VDD2=1.05VVDD2 V1.07840ps740psMeasurements Write Shmoo 10.5Gbps 5 mV/step72.5ps(76.1UI)170mV175mV75ps(0.79UI)2.5ps/step2.5ps/step5 mV/step72.5ps(0.76UI)Read Shm
312、oo 10.5Gbps 13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference19 of 21Chip Micrograph 10.5Gb/s/pin 16Gb LPDDR5T DRAMProductLPDDR5TurboFabri
313、cation Process1a-nm DRAMMaximum Data Rate10.5 Gb/s/pinArea51.3 mm2/Ch.Supply VoltageVDD2=1.05VVDDQ=0.5VPackage496B PoP13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International So
314、lid-State Circuits Conference20 of 21Outline Introduction Speed Enhancement FeaturesWCKReceiverIO for DRAM Test Measurements and Chip Micrograph Conclusion13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance
315、 Reduction 2024 IEEE International Solid-State Circuits Conference21 of 21Conclusion A 10.5Gb/s/pin 16Gb LPDDR5T is implemented using a 1a-nm DRAM process.Speed enhancement schemes are used.WCK duty correction:self-DCC&DCM/DCAWCK skew correction:phase-averaging&trimmingDQ RX with 1-tap DFE&OCCPad ca
316、p.reduction:IO for DRAM test13.8:A 1anm 1.05V 10.5Gb/s/pin 16Gb LPDDR5 Turbo DRAM with WCK Correction Strategy,a Voltage-Offset-Calibrated Receiver and Parasitic Capacitance Reduction 2024 IEEE International Solid-State Circuits Conference22 of 21Please Scan to Rate This PaperA 25.2-Gb/s/pin NRZ/PAM
317、-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference1 of 32A 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency
318、 and 19.3%DBI EfficiencyChanheum Han*,Ki-Soo Lee*,and Joo-Hyung ChaeKwangwoon University,Seoul,Korea(*Equally-Contributed Authors)A 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Sol
319、id-State Circuits Conference2 of 32Outline Motivation Proposed Scheme:Embedded pDBI for PAM-3 Implementation Measurement Results Comparison Table and ConclusionsA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficie
320、ncy 2024 IEEE International Solid-State Circuits Conference3 of 32Outline Motivation Proposed Scheme:Embedded pDBI for PAM-3 Implementation Measurement Results Comparison Table and ConclusionsA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin
321、Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference4 of 32 DRAM trends:higher bandwidth,lower power consumption A high-bandwidth and low-power memory interface is requiredDRAM TrendsA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achie
322、ving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference5 of 32Low BWGood SNRBad SNRHigh BWBetter SNR than PAM-4Higher BW than NRZSignaling:NRZ vs.PAM-4 vs.PAM-3A 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Ac
323、hieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference6 of 32Problem Using PAM-3 in Memory InterfaceMismatch between PAM-3(3b/2UI)transmitting bit and Burst Length(2N b)PAM-3 theoretical data rate improvement:150%PAM-3 practical dat
324、a rate improvement:133%with dummy bit8b+1dummy 9b/6UIA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference7 of 32Power Improvement:DBI-DCAdditional DBI pin
325、 and powerReduce signaling currentReduce IR drop of the power supplyA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference8 of 32Problem Using NRZ DBI with
326、PAM-3 DQNRZ DQ+NRZ DBIPower consumption of DQ driver High:O,low:XPower consumption of additional DBI driver Signaling power reduction of 18.3%88.9%pin efficiency(8DQ/9Pin)PAM-3 DQ+NRZ DBIPower consumption of DQ driver High and middle:O,low:XPower consumption of additional DBI driver Signaling power
327、reduction of 5.6%88.9%pin efficiency(8DQ/9Pin)A 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference9 of 32Outline Motivation Proposed Scheme:Embedded pDBI
328、for PAM-3 Implementation Measurement Results Comparison Table and ConclusionsA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference10 of 32 Data encoding pr
329、ocess with pDBI logicpDBI encoding:based on the#of“1”s in D7:2Proposed Scheme:Embedded pDBI for PAM-3 A 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference
330、11 of 32 Expected signaling current according to pDBI positionMinimum signaling current at position A Expected signaling current applying pDBI in 6b/4UIInversion of data in D7:2Reduction of signaling current(1)pDBI in 6b/4UIProposed Scheme:Embedded pDBI for PAM-3(2)pDBI positionA 25.2-Gb/s/pin NRZ/P
331、AM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2024 IEEE International Solid-State Circuits Conference12 of 32 Expected signaling current according to pDBI positionMinimum signaling current at position A Expected signaling cu
332、rrent applying pDBI in 6b/4UIInversion of data in D7:2Reduction of signaling current(1)pDBI in 6b/4UIProposed Scheme:Embedded pDBI for PAM-3(2)pDBI positionA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2
333、024 IEEE International Solid-State Circuits Conference13 of 32 p g1D4D7D1D3D6D0D2D5ABCDQPAM36UI#of 1s 3 Partial DBI(D7:2)0D4D7D1D3D6D0D2D5ABCDQPAM36UI#of 1s 3(1)pDBI in 6b/4UI9b/6UI-based pDBI encodingProposed Scheme:Embedded pDBI for PAM-3(2)pDBI positionA 25.2-Gb/s/pin NRZ/PAM-3 Dual-Mode Transmitter with Embedded Partial DBI Achieving a 133%I/O Bandwidth/Pin Efficiency and 19.3%DBI Efficiency 2