《F1 - Efficient Chiplets and Die-to-Die Communications.pdf》由會員分享,可在線閱讀,更多相關《F1 - Efficient Chiplets and Die-to-Die Communications.pdf(368頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2024Forum 1Efficient Chiplets and Die-to-Die Communications 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024 Forum 1:Efficient Chiplets and Die-to-Die CommunicationsFebruary 18th,2024Presentation start at 8:15amISSCC 2024-Forum 11 of 4 2024 IEEE International Solid-State Circu
2、its ConferenceOrganizing CommitteeOrganizers:Shidhartha(Sid)Das,AMD,Cambridge,United KingdomJohn Wuu,AMD,Fort Collins,ColoradoCo-Organizers:Yvain Thonnart,CEA-List,Grenoble,FranceHugh Mair,MediaTek,Austin,TexasChampionsFatih Hamzaoglu,Intel,Hillsboro,OregonKostas Doris,NXP,Eindhoven,The NetherlandsI
3、SSCC 2024-Forum 12 of 4 2024 IEEE International Solid-State Circuits Conference8 talksEach 45-minute talk will be followed by 5 minutes of Q&APlease state your name and affiliation during Q&A2 coffee breaks and one lunch breakDigital copy of all slides will be provided for the forumPlease switch you
4、r mobile devices to silent modePlease remember to complete speaker evaluation formsGeneral Information ISSCC 2024-Forum 13 of 4 2024 IEEE International Solid-State Circuits ConferenceStartTitleSpeakerAffiliation8:15 AMIntroductionJohn WuuAMD8:25 AMAdvanced CMOS and Packaging Technology for Multi-chi
5、plet and Trillion Transistor 3DIC System-in-Package by 2030Yujun LiGeoffrey YeapTSMC9:15 AMThe Packaging and Interconnect Requirements of the IC Industrys Chiplet-based FutureSam NaffzigerAMD10:05 AMBreak10:20 AMDo Chiplets Open the Space for Emerging Memory in the HPC System?Sebastien CouetGouri Sa
6、nkar KarIMEC11:10 AMIn-memory Computing Chiplets for Future AI AcceleratorsEchere IroagaEnCharge AI12:00 AMLunch1:20 PMEfficient Domain-Specific Compute with ChipletsDejan MarkovicUCLA2:10 PMInnovations in Chiplet Interconnects,Protocols and the Path to StandardizationLihong CaoASE US3:00 PMBreak3:1
7、5 PMPhotonics for Die-to-Die Interconnects:Links and Optical I/O ChipletsChen SunAyar Labs4:05 PMRobust Circuit/Architecture Co-Design for Chiplet IntegrationWen-Chou WuMediaTek4:55 PMClosing RemarksSid DasAMDForum AgendaISSCC 2024 Forum 14 of 4“Why”“What”“How”2024 IEEE International Solid-State Cir
8、cuits ConferenceAdvanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030Dr.Yujun Li,Director,HPC Business DevelopmentDr.Geoffrey Yeap,Vice President,R&DTaiwan Semiconductor Manufacturing Company LimitedUnleash Innovation1 of 49ISSCC 2024-Forum F
9、1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceOutlineForces Driving Chiplet and IntegrationAdvanced CMOS TechnologiesDomain Specific CMOS Chiplet Optimization Advanced Packagi
10、ng TechnologiesSpecialty Chiplet for Platform SolutionsDesign Enablement and EcosystemISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 20302 of 49 2024 IEEE International Solid-State Circuits ConferenceAdvanced Packaging
11、Insatiable Market NeedsProcess Technology DevelopmentPackaging Technology DevelopmentThe three forces that drive higher level of integration and the adoption of chiplets 3 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Packa
12、ge by 2030The Perfect Storm 2024 IEEE International Solid-State Circuits ConferenceGenerative AI Accelerates Computing Needs4 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State
13、 Circuits ConferenceGrowing AI/HPC Computing Requirement Heterogeneous computeMore computing coresHigher memory capacityHigher memory bandwidthMore I/O bandwidthAdvanced process technology&advanced 3DIC packaging technology are the key enablers to achieve trillion transistor system-in-package.SoC/So
14、ICHBMsSoC/SoICHBMsCoWoS+SoC/SoIC5 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceDegrees of System Level Integration AMD MI300X GPU with 153 billion tran
15、sistors SoIC and up to 192 GB HBM3 MemoryTSMC N5/N6 FinFET ProcessCerebras WSE-246,225 mm Silicon with Wafer Scale Integration2.6 Trillion transistorsTSMC N7 FinFET ProcessSource:https:/ https:/ of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transist
16、or 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceAdvanced CMOS Technology7 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits C
17、onference20102012201420162018Logic Density 2020202220242026N3PN2N16N10N7N6N5N4N3EN22N28Process Technology EvolutionDevice ArchitectureNanosheetLow-R MEOL/BEOLSelf-Aligned FeaturesLow K SpacerFinFlex with 1-FinHigh Mobility Channel Super High Density MIMFinFETEnhanced Strained-SiHigh Density MIMHigh-
18、KMetal GatePlanarLithographyEUVEUVDouble Patterning ImmersionSingle Patterning Double Patterning Self Aligned Double Patterning ImmersionSource:TSMC8 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE I
19、nternational Solid-State Circuits ConferenceTransistor Architecture OutlookPower,Performance,Area(PPA)Year*TMD:Transition Metal Dichalcogenides2D TMD*Beyond SiCNTSourceGateDrainSourceSourceGateDrainSourceSourceGateDrainSourceDevice ArchitectureFinFETNanosheetCFETSource:YJ Mii,2022 VLSI Symposium9 of
20、 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceTechnology Innovation Drives Energy EfficiencyCore Area(m2)Speed(GHz)3nm5nm7nmSource:YJ Mii,2022 VLSI Sympos
21、ium1.83Xlogic density+13%speed-21%energy1.57Xlogic density+11%speed-30%energy10 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceTechnology Enables Energy
22、Efficient ComputeN28N16N10N7N5N3Perf/Watt/mm2N28N16N10N7N5N3Power Efficiency Source:TSMC11 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceMetal Pitch Sca
23、ling Continues But Slowing DownSource:ASML 2021 Investor Day2x every 6 years12 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceInterconnect Technology Inn
24、ovations ContinueYearNode in Log Scale10.01.00.10.00.01987 1989 1991 1993 1995 19971999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019Cu+Low KELKLow-R barrierCo cap layer/Co linerImmersionDouble patterningEUVMetal oxide ESLMaterial InnovationLithography Innovation2022 2024Source:YJ Mii,2019 TSMC
25、Technology Symposium13 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSRAM Bit Cell Area Reduced by 100X 0.010.111019982000200220042006200820102012201420
26、162018202020222024Bit Cell Area(mm2)90nm65nm45nm130nm28nm20nm16nm10nm7nm5nm3nmTall cell Wide cellHigh Current/High Density cellsStrained SiliconHigh-K Metal GateTechnologyDouble patterningM0 Bid-LineFinFETEUV and high mobility channelDesignColor aware design and design assistWrite assist for FinFETN
27、ovel Dual Rail SchemeFLY BL Double World LineMetal coupling Negative Bid-LineCompact Periphery Layout(FCST)Source:TSMC14 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Packageby 2030 2024 IEEE International Solid-State Circu
28、its ConferenceAdvanced PackagingInsatiable Market NeedsProcess Technology DevelopmentPackaging Technology DevelopmentWhat Drives Chiplet Adoption?The three forces that drive higher level of integration and the adoption of chiplets Rise of AI insatiable amount Need for more compute/memory/IO Workload
29、 optimization architecture diversification Heterogeneous integration CMOS scaling continues More DTCO contributions Pace of scaling differs logic,SRAM,Analog/IO Increasing process complexity Wafer scale advanced packaging 3DIC WoW and CoW Much more effective way to integrate compute/memory/IO15 of 4
30、9ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceDomain-Specific Chiplet Optimization16 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for M
31、ulti-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceFrom SoC to Multi-Chip SoICHistorically process is optimized for SoC to serve broader audiencesWith chiplet,process can be further optimized to achieve better PPASoCGenerations
32、of successSRAM&analog/IO face scaling challengesChipletCompute die on node N for highest performanceAnalog/IO on N-1 or N-2 to optimize costMCM for low-cost interconnectSoICLogic SoIC to increase performance,CPU,GPU,or SRAM can stackFutureOptimizing logic with different technology nodesLogic stackin
33、g to increase performanceOn-board memory to improve memory bandwidthInterposer for higher connection bandwidthSubstrateCPU/GPUSRAM/AMSSoCMCMCPU/GPUSRAMAnalog/IOMCMCPU/GPUSRAMLogicInterposerCPUGPUSoCChipletSoICFutureAnalog/IOAnalog/IOMemoryLogicLogic17 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Pac
34、kaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceChiplet Optimization PPAC and TTMChiplet can be independently optimizedAdoption timeDie sizeProcess nodeOptimizing chipletCompute dies need latest process
35、 tech for best PPASRAM&Analog/IO scaling are slowing down;FinFET vs.GAADefect density improves over time with more learningIP availabilityDefect DensityTimeNode N-2Node N-1Node NProductlaunchInterposerCPULogicGPULogicAnalog/IOMemory18 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology
36、 for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceChiplet Design&Process Optimization SoC needs to balance common process window for all devices&layout styles Process window for chiplets can be optimized thanks to special
37、ization and focus Chiplet with process simplification can reduce cost&improve yieldStandardCellSRAMAnalog/IOCommonDefect densityDie sizeSoC common process windowSRAM chipletSoC products1919 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3D
38、IC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceChiplet Process Optimization by ApplicationsHPC vs.MobileDynamics vs.static powerClient CPU vs.Server CPUPeak vs.throughput perfCPU vs.GPUMetal stack optimizationClient CPU Over driveNetworking,serverModerate to low
39、VddMiningExtremely low VddN3XHigh Performance where it matters Transistors optimized for high performance over-drive conditions Selective use of N3X std cell to speed up critical paths while minimizing impact to leakage at product level.20 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Techn
40、ology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Conference0.20.61.01.41.82.2RELATIVE DIE COSTSOC DIE SIZE MMSoC die costChiplet total costChiplet vs.MonolithicBy breaking up large SoC,chiplet at smaller die size enjoys bette
41、r yield&lower costChiplet vs.monolithic product choice needs to be carefully balancedCross over point depends on factors such as cost components,defect density,harvesting yield,etcChipletSoCLower package costLower D0Higher redundancy/harvest21 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging T
42、echnology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceAdvanced Packaging Technology22 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package
43、 by 2030 2024 IEEE International Solid-State Circuits ConferenceTSMC 3DFabricTMTechnology PortfolioSoIC:System on Integrated Chips3D Si Stacking PoP:Package on Package;RDL:Redistribution Layer Advanced Packaging CoWoS Si Interposer(CoWoS-S)RDL Interposer(CoWoS-L/R)TSMC-SoIC SoIC-P(Bumped)SoIC-X(Bump
44、less)InFOInFO-PoPInFO-2.5DInFO-3DCoW,Pitch:18-25mSoIC-X-C(CoW)Pitch:4.5-9mSoIC-X-W(WoW)Pitch:3mSource:TSMC23 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Confere
45、nceTSMC-SoIC SoIC-P(Bumped)SoIC-X(Bumpless)CoW,Pitch:4.5-9m mmWoW,Pitch:3m mm3D Si Stacking SchemesCoW,Pitch:18-25m mmSoIC-P-RB(With RDL)Chip1Chip 2Chip 2Chip 1Chip 2Chip 3Chip 1Chip 2 Chip 1Chip 2Chip 3SoIC-P-F(Without RDL)SoIC-X-C(Chip on Wafer)SoIC-X-W(Wafer on Wafer)Source:TSMC24 of 49ISSCC 2024
46、-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceCross-Chip Interconnects Improve ThroughputBW density/energy(Tbps/mm2/pJ/bit)40mNext-bump Pitch25m9m6m11010010001000025mD
47、2D Routing Length(mm)1010.10.01Advanced Packaging(2D/2.5D)Chip Stacking(3D)NextBondPitchNext-bump Pitch25 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Conference
48、TechnologyCoWoSSoIC-XStructure(Cross-Section)Pad/uBump pitch(um)40964.5Density1.0 X20 X45 X80 XMax Areal Bandwidth Density(GB/s/mm2)1.0 X45 X75 X180 XInterconnect Energy Consumption(pJ/bit)*1.0 X 0.10 X 0.05 X 0.05 XD2D Interconnect Comparison(2.5D vs 3D)*The interconnect energy efficiency includes
49、only the die-to-die interconnect 4Gbps.It does not include the energy consumption of the physical layer circuits.F2BF2FSource:TSMC26 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Soli
50、d-State Circuits ConferenceSystem Integration with TSMC 3DFabricTMSoC/ChipletHBMsSoC/ChipletHBMs2.5D:CoWoSChip AChip B2D:InFOSoC 1SoC 2 3D:Micro Bump3D:SoICSource:M.Liu,Unleashing the Future of Innovation,2021 ISSCCInFO:Integrated Fan-OutSoIC:System on Integrated ChipsChip 2Chip 1Chip 2Chip 327 of 4
51、9ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSource:D.Yu-Foundry Solutions for 2.5D/3D Integration,ISSCC 2021 Higher Integration,Compact Electronic Systems
52、Die on DieDie-to-DieOn InterposerFunction per FootprintSystem PerformancePKG to PKGOn Board3D+2.5D2D3D+2.5D3DChip 1Chip 2Chip 328 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-S
53、tate Circuits ConferenceD2D Interface ESD Roadmap and Design ImprovementFor D2D ESD requirement,industry is moving toward enhanced package process control for switching power reduction and D2D interface density increaseESD Cap(fF)Package-IO ESDuBump-IO ESDCDM 50VCDM 30VSoIC-bond IO ESDCDM 10VCDM 5VE
54、SD Area(um2)FutureRef:Roadmap of CDM target to D2D interfaces,JEP196 CDM 250VIndustry spec29 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferencePlatform Leve
55、l Solutions-Specialty Technology Optimization30 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferencePackage Is The New SoCTechnology+Advanced PackagingAlready
56、 in useUnder development2.5D+3D integrationSpecialty forInterconnect SpeedHBMLogicMemory+LogicSRAMLogicMemory BandwidthSHDMiMPadMetal RoutinDTC in InterposerLogicCap over ActiveSpecialty for Power Delivery+Die PartitionSoC SoC SoC3D LogicLogicIntegrationPICLogicLogicOptical Engine on substrateVR int
57、egrationLogicLogic VR Memory+LogicDRAMLogicHolistic System Level Optimization31 of 49 2024 IEEE International Solid-State Circuits ConferenceMemory Bandwidth-A Limiter to System ThroughputSource:H.-S.P.Wong et al.,DAC,20201020062008201020122014201620182020Year10101.56x/2 years1.81x/2 yearsNormalized
58、 Logic ThroughputNormalized Memory Bandwidth32 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSource:SK Hynix,TSMCGDDR6XHBM3E3D Stacking DRAMDRAM Bandwid
59、thMemory Bandwidth Improvement by 3D Stacking33 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSwitch Bandwidth Doubles Every 2 yearsSource:Broadcom pres
60、s releases;Co-Packaged datacenter optics:Opportunities and challenges,Cyriel Minkenberg etc.;TSMC 512x100G5nmFaster lanes51.2T202234 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Soli
61、d-State Circuits ConferenceIncreasing Power%Contribution from SerdesCPO(Silicon Photonics)SwitchRelative power contribution of SerDes to total switch ASIC powerSource:Co-Packaged datacenter optics:Opportunities and challenges,Cyriel Minkenberg etc.35 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Pack
62、aging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Conference1/5X PowerTransmission PowerCu(10m)1XFiber(Km)1/3XCPO(Km)100mmCu cable100mmFiberCu interconnectSource:TSMC10 mmFiber interconnectCo-Packaged Optics36 of 49
63、ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceCo-Packaged Optics-Benefit of 3DSide-by-Side OE w/MCMCPO w/COUPEASICEICPICCPO PerformanceMCMw/uBump OECoWoS_S w
64、/COUPEOE-ASIC LinkLink Length(mm)51W/S(um)22/440.4/0.4Routing density1.0 X80 XBW Density1.0 X37.6 XSystemEnergy Consumption1.0 X0.19 XHBMASICPICEICPICSource:Douglas Yu,et.Al,2021 IEDM Invited PaperCOUPE:(Compact Universal Photonic Engine)37 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Tech
65、nology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceGen 1:Data Center Power Architecture(Old Design)Source:Next Generation of Power Supplies by Fred C Lee,Virginia Tech,https:/cpes.vt.edu/library/download/31672Transfo
66、rmer Cable480V AC 3 Phase38 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceGen 2:Data Center Power Architecture(New Design)Source:Next Generation of Powe
67、r Supplies by Fred C Lee,Virginia Tech,https:/cpes.vt.edu/library/download/3167239 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceGrid to 48V AC/DC48 to
68、12/6VDC/DC module1.8V to 0.7VMulti-phase IVR1.8V0.7/1V48V12V/6V12V/6V to 1.8VMulti-phase DC/DC208V ACGrid to 12V AC/DCCPU/GPU(0.7V/1V)12V to 1VMulti-phase DC/DC12V208V AC0.7V-1VConventionalNew GenerationPower Delivery Network for HPC/Data CenterNewNew40 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and P
69、ackaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferencePower ICPMIC Chiplet in LBGA Power modulexPULast Stage VR Chiplet Modular VR Chiplet to Main Processor Going VerticalPower ICInductorCost EffectiveFlexi
70、bleIntegrated VR(IVR)Chiplet inside CoWoSReduce input current at the ballVR41 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceTechnology Optimized Chiplet
71、sCPUGPUNPUxPUIODSRAMChiplet with Advanced TechnologyChiplet with Specialty Technology3DIC System-in-Package SolutionAdvanced PackagingCoWoS or equivalentInFO or equivalentSoIC(CoW or WoW)MemoryInterconnect/serdesSilicon PhotonicsCapacitor chipletIVR chipletGaN chiplet42 of 49ISSCC 2024-Forum F1.1:Ad
72、vanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceDesign Enablement From 2D to 2.5D/3D43 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion
73、 Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceIncreasing Design Complexity-2D to 2.5D/3D Tightly-coupled package/chip co-design flow for faster design convergenceRequires strong EDA/IP ecosystem collaborationSerial IO+ESDDie-to-Die SI/PI,STA/DFTPar
74、allel IO+ESDDie-to-Die STA/DFTPDN and decapco-design,sign-offThermal-aware designKGDKGSKGPTraditional 2D SoC/Package Design FlowHBMHBM2D2.5D/3DSource:YJ Mii,2022 VLSI Symposium44 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-i
75、n-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceIntegrated Chip/Package Design Co-OptimizationDesignVerificationPackageDesignVerificationChipDesignVerificationChipDesignVerificationChipSystemPartitioningSystemIntegration VerificationModularization Simplifies Design Flow for
76、All Package TypesAPRDRC/LVS/RCXSI/PI/IR/EMDFTMulti-die Timing AnalysisThermalDie-2-Die InterfaceHierarchical Timing Analysis Mitigates Exponential Multi-die Process Corners Hierarchical Thermal Analysis Balances Runtime and AccuracySource:YJ Mii,2022 VLSI Symposium45 of 49ISSCC 2024-Forum F1.1:Advan
77、ced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceThe 3Dblox StandardModularized 3Dblox language constructs chiplet,interface,and connectionLanguage constructs designed to model all curren
78、t and future 3DIC structures Streamline EDA design flow and promote interoperability3DIC ComponentsDie InterfaceRDL InterfaceBridge Interface3Dblox Language ConstructsChiplet:Conn:Physical ConstructsConnection ConstructsList of physical Chiplet Chiplet3Dblox Full Stack RepresentationsFull Stack-Conn
79、ectionConn1:Path1:Path AssertionsFull Stack-PhysicalFull Stack-ConnectionChiplet 2Chiplet 3Chiplet 1Chiplet 2&3Chiplet 1&3Chiplet 1 Chiplet 2Conn2:https:/3dblox.org/newscenter46 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in
80、-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceOne Format,Multiple ProductsOne 3Dblox representation for all downstream analysisReplace hundreds of repetitive codes for each toolsSupport from all EDA vendors creates unified design eco system3Dblox is open to allEDA1EDA2EDA3E
81、DA4EDA53Dblox3D PDN3D ThermalThermal TF3D STA3D DRC/LVS3Dblox47 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSystem Integration is the Future2.5D InFO2
82、.5D CoWoS3D SoICSystem-level IntegrationMore Memory Emerging MemoryStacked SRAM2D ShrinkDTCO3D TransistorMore TransistorsCV2Thermal ManagementEnd-to-end OptimizationHere!Source:YJ Mii,2022 VLSI Symposium48 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trilli
83、on Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSummaryAdvanced CMOS process and packaging technologies for logic and memory integration are already deployed today to enable growing datacenter and AI market demand.Emerging heterogeneous integration
84、 solutions,such as memory/logic chip stacking,silicon photonics Co-Packaged Optics(CPO),and integrated voltage regulators(IVRs)can enable Trillion transistor 3DIC System-in-package by 2030.To achieve faster design convergence,the industry needs a tightly-coupled package/chip co-design flow.3DBlox is
85、 receiving growing support from EDA vendors.49 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:50 of 16Please Scan to Rate Please Sca
86、n to Rate This PaperThis Paper1ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024 Forum 1.2Samuel NaffzigerAMD2ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceOutline Fundamental drivers of modula
87、r chiplet designs Chiplet interconnect classes and metrics Examples from the AMD chiplet product portfolio Organic package based chiplets Advanced packaging chiplet architecture 3D stacked chiplets Key levers for the future3ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State
88、 Circuits Conference1.000.740.560.430.330.310.3100.20.40.60.812018202020222025202820312034-2 4 645nm32nm28nm20nm14/16nm7nm5nmNormalized Cost/yielded mmIncreasing Cost/mm2SoC Scaling Flattening Key Challenge:Economics of Si Scaling4ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Soli
89、d-State Circuits Conference0.11101002000200520102015202020255/3nm7/10nm16/14nm20/22nm32nm45nmModular Chiplet Architectures and DSA EssentialDomain specific AcceleratorDomain specific AcceleratorEfficiencyGeneral Purpose CPU/GPUApplication Space1Slowing of Moores Law Scaling trends and benefits of do
90、main-specific compute make chiplets essential5ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceBASICS OF CHIPLET ECONOMICS6ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceMonolithic Die ManufacturingWafer7
91、ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXXXXXXXMonolithic Die Manufacturing8ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceYielded ProcessorsMonolithic Die Manufacturing9ISSCC 2024-Forum 1.2:AMDS
92、amuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceWaferHigh-level Chiplets Concept10ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXXXXXXXHigh-level Chiplets Concept11ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE Internationa
93、l Solid-State Circuits ConferenceMore Yielded ProcessorsHigh-level Chiplets Concept12ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXX/2X/2Silicon cost is non-linearwith die areaChiplet Cost13ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE Internat
94、ional Solid-State Circuits ConferenceX/2X/2Chiplet Overheads Inter-chiplet communication interfaces Per-die functionality14ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference Architectural design effort,partitioningX/2X/2 X Inter-chiplet communication int
95、erfaces Per-die functionalityChiplet Overheads15ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceAdvanced Packaging Interconnects Short reach 2mm Requires closely matched placements Wide lower speed links Low energy 0.6pJ/bitANCHORChiplet Interconnect O
96、rganic Package Interconnects Enables chiplet spacings up to 25mm Most flexible placements and die sizes Narrow High speed Serial links Link energy Architectural need for bandwidth,die partition options and package technology create a multi-disciplinary optimization equationChiplet package architectu
97、re selection requires balancing a complex equation20ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceImproving Key ParametersLinear Interconnect Density(Wires/mm/layer)Area Interconnect Density(Wires/mm2)101000100000110100100010000Highest Performance,Lo
98、west Power,and Area2D MCM 3D Chiplets2.5D Si INT,EFBDriving High-Performance Computing Forward21ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceORGANIC PACKAGE-BASED CHIPLETS22ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State
99、 Circuits ConferenceEPYC Server CPU Example Great cost benefit vs.monolithic0.00.51.01.52.064 Cores48 Cores32 Cores24 Cores16 CoresNormalized Die Cost7nm CCD*+12nm IOD*Hypothetical Monolithic 7nm Two tape-outs for full stack Linear cost with core count Makes 64 cores possible Full memory and IO*CCD:
100、CPU Complex Die,IOD:I/O Die23ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceLeveraging Technology Across MarketsCCDCCDCCDCCDCCD CCDCCD CCDDDRDDRDDRDDRI/O I/OI/O I/ODDRDDRDDRDDRI/O I/OI/O I/OIODI/O I/ODDRDDRCCDCCD Up to 16-core desktop Direct IOD IP le
101、verage CCD Reuse24ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet Benefits for AMD Ryzen Processors0.00.51.01.52.02.516 Cores8 CoresNormalized Die CostChiplet(CCD+cIOD)Hypothetical Monolithic 7nmI/O I/ODDRDDRCCDCCDcIOD=client I/O Die25ISSCC 202
102、4-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet ModularityServerIODCCDCCDCCDCCD2nd-Generation3rd-Generation26ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet ModularityServerIODcIODX570ChipsetCCDCCD
103、CCDCCD3rd-Generation2nd-Generation3rd-Generation27ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceADVANCED PACKAGING CHIPLETS28ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceTraditional MonolithicEPYC CP
104、U Server100s of signals“Navi21”GPU10s of 1000s of signalsChiplet Technology:Applied to GPUs Chiplets enabled use of advanced nodes where they benefit CPU performance but mature nodes for IO and interfaces High speed organic package links meet CPU Bandwidth requirements GPU shader engines require mas
105、sive amounts of connectivity compared to CPUs A different approach is required29ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet Technology:A Better Way to Partition“Navi21”“Navi31”Memory InterfacesAMD InfinityCacheGraphicsEngineGCDMCDMCDMCDMCDM
106、CDMCD The graphics engine is what benefits from advanced N5 technology AMD Infinity Cache critical to performance but barely shrinks into N5 GDDR6 interfaces are also large and wont shrink at all Split those poorly scaling components off as a chiplet and shrink the GFx core into N5 Full N5 performan
107、ce,better yield for perf/$and configurabilityMCD=Memory Cache DieGCD=Graphics Compute Die30ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference0.0 x2.0 x4.0 x6.0 x8.0 x10.0 x12.0 x14.0 xCPU Chiplet BWMCD required BWTotal Bandwidth“Navi31”GCDMCDMCDMCDMCDMCD
108、MCDHow to Connect the Chiplets?GCD-MCD partitioning is great,but the bandwidth requirements are still extremely high Over 10X what a CPU CCD requires in EPYC Breakthrough Advanced packaging and a new interface is required:High Performance Fanout and Ultra Short Reach(USR)links31ISSCC 2024-Forum 1.2:
109、AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceDie-to-dieFan out routingMCDGCDUSR GCD-MCD ConnectivityBandwidth Density USR Links,operating at 9.2Gb/s with High Performance Fanout provide almost 10X the BW density of the IFOP links used in Ryzen and EPYC Enables industry-
110、leading peak bandwidth of 5.3TB/s0.0 x2.0 x4.0 x6.0 x8.0 x10.0 x12.0 xOrganic PackageLinksUSR 2.5D LinksBandwidth DensitySEE ENDNOTE RX-81732ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference25 wires on organic substrate compared to 50 wires on High Perf
111、ormance FanoutOrganic substrate High Perf FanoutHigh Performance FanoutImages approximately to scaleInterconnect33ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference0.0 x0.2x0.4x0.6x0.8x1.0 x1.2xOrganic Package LinksUSR 2.5D LinkspJ/bitUSR Link Power Effi
112、ciency USR Links are engineered for low voltage operation and aggressive clock gating for low power Save up to 80%energy per bit relative to organic package links Result:3.5TB/s effective bandwidth for less than 5%of GPU power consumption34ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE Internati
113、onal Solid-State Circuits ConferenceMemory Latency0.0 x0.2x0.4x0.6x0.8x1.0 x1.2xNavi21Navi31 at Navi21 FrequencyNavi31DRAM accessInfinity Cache“Navi31”GCD-10%USR Link Latency The USR chiplet interfaces costs a modest amount of latency vs.on-die We eliminate this latency with higher clock rates Base
114、Infinity Fabric clock by+43%GFx game clock+18%Common case of Infinity cache hit is 10%lower latency on“Navi31”35ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference54%Up toPerformance per WattGCDMCDMCDMCDMCDMCDMCD“NAVI 31”with Chiplet ArchitectureGPU Chipl
115、ets:Summary Chiplet architecture with advanced packaging is the future,AMD leveraged our leadership chiplet expertise to deliver the first chiplet-based gaming GPU Massive 5.3TB/s bandwidth with innovative USR Links on High Performance Fanout Negligible overheads in latency and power enable leadersh
116、ip performance/WattSEE ENDNOTE RX-81736ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference3D STACKED CHIPLETS37ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference Dedicated accelerator engines for AI and HPC 3.
117、5D packaging with 4thGen AMD Infinity architecture Optimized for performance and power efficiencyAMD CDNA 3Next-gen AI Accelerator Architecture38ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceBandwidth Drives AI Performance Server and gaming require h
118、igh bandwidth but leading-edge AI is at another level Power efficient delivery of these bandwidth requirements demands new approaches2.5D interconnect3D interconnect665GB/s2.4TB/s3TB/s2.1TB/s0.0TB/s2.0TB/s4.0TB/s6.0TB/s8.0TB/s10.0TB/s12.0TB/s14.0TB/s16.0TB/s18.0TB/s20.0TB/s0 W100 W200 W300 W400 W500
119、 W600 W700 W800 WGaming MemoryBWGaming cacheread BWAI memory BWAI cache readBWBandwidth+Power1RequirementsOff-package PwrUCIe-SP PwrUCIe-AP Pwr3D PwrBandwidth1.AMD Internal Analysis39ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference3.5D Packaging Motiva
120、tion0X10X20X30X40X50X60XOff-Package CopperOff-Package OpticalOn-PackageAdvanced Packaging3D StackedRelative Bits/Joule Key to power-efficient performance is tight integration Advanced 3D Hybrid bonding provides by orders of magnitude the densest,most power efficient chiplet interconnect Advanced 2.5
121、D enables more compute and HBM in a package Increased system-level efficiency40ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference3D Hybrid Bonding EvolvedAMD 3D V-Cache Technology Hybrid Bonding size:7 x 10 mm Logic die as base N7(X3D)on N5 base(CCD)die
122、Significant performance gains for desktop gaming and servers Up to 2.5TB/s vertical bandwidthAMD Instinct MI300 Accelerator Leverage integration and manufacturing learnings from V-cache Hybrid Bonding size:13 x 29 mm(0.45x reticle)Logic die on top enables improved thermals N5 XCD/CCD stacked on N6 b
123、ase die(IOD)Same 9 TSV pitch Up to 17TB/s vertical bandwidth41ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceMI300 Advanced PackagingCarrier SiXCDXCDIODDMYCarrier SiCCDIODCCDCCDDMYHBMHBMSilicon InterposerLGA padsLIDBSM+TIMIllustration purpose onlyBPMB
124、PVAdvanced 3D Hybrid Bonded Architecture compute density and perf/WAdvanced 2.5D Architecture for IOD-IOD and HBM3 integrationLarge module on substrate42ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet Reuse and Modularity Benefits ExemplifiedSa
125、me CCD for Genoa+MI300AToIOD“GMI”3D Interface to IOD Same CCD adapted to work for 4thGen EPYC CPUs and AMD Instinct MI300A 3D stack EPYC MCM uses“GMI”SerDes interface through package substrate AMD Instinct MI300A vertical stack uses dense TSV interface from IOD to CCD in two-link wide mode Dramatica
126、lly higher 3D signal density enabled virtually no die size increase with simple interface multiplexing 43ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceAMD Instinct MI300 Accelerator Modular ConstructionIOD R180IOD-MirrorIOD Mirror R180IODXCDXCDR180XC
127、DXCDR180XCDXCDR180CCDCCDR180CCDR180 Multi-variant(APU/XPU)architecture requires all chiplets to act as if they are LEGO blocks Many new construction and analysis tools needed to be developed to enable this capability Mirrored versions of the IODs enable symmetric construction44ISSCC 2024-Forum 1.2:A
128、MDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXCDBPV fieldsCCDBPV fieldsTSV fields are highlightedFor CCDFor CCDsFor XCDsConnecting Chiplets in 3.5DMirrored Heterogeneous Chiplet Interfaces BPV:Bond Pad Via.The landing site on the stacked die that is aligned with TSV in I
129、OD IOD Supports 2 separate landing sites for CCD BPVs to enable IOD mirroring while CCDs can only be rotated(not mirrored)Similarly,XCD/IOD interface also had extra TSVs to support IOD mirroring(red circle)45ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Confer
130、enceAMD Instinct MI300 AcceleratorBPVsTSVsuBUMPsHybrid BondedStacked Chiplet(SC)Base Chiplet(AID)Floorplan Power TSVs Power delivery to top die must support IOD mirroring XCD/CCD rotation(0 and 180 degree)Different stacked die(CCD and XCD)This placed new symmetry requirements on power grid Significa
131、nt advanced planning to ensure exact alignment of all power and ground TSV+BPVs XCDXCDCCDCCDCCD46ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceCPU-intensiveGPU-intensiveCPU+GPU balanced Memory-intensivePowerPower SharingCPU CCDsGPU XCDsAIDHBMGPU-inte
132、nsiveMemory-intensiveAMD Instinct MI300 AcceleratorPower Management and Heat Extraction Key to MI300 power efficiency is the ability to dynamically“slosh”power between fabric(IOD),GPU(XCD),and CPU(CCD)Massive HBM and Infinity Cache bandwidth can drive high data movement power in the SoC domain Compu
133、te capability can similarly consume high power Creates 2 types of extreme operating conditions-GPU-intensive and memory-intensive Both thermal and power delivery must support the full range careful engineering of TSVs and power map47ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International So
134、lid-State Circuits ConferenceSummary Technology trends are pushing the industry to heterogeneous domain specific computing Economics require small die and chiplet architectures Advanced packaging is the frontier for technology-architecture synergy enabling cost-effective,efficient designs Standardiz
135、ed interfaces and a chipletecosystem will unlock the potential The Road Ahead is Paved with ChipletsEfficiencyApplication Space48ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceRX-817:Based on AMD internal analysis,November 2022,comparing the published
136、 chiplet interconnect speeds of Radeon RX 7900 Series GPUs to Intel Ponte Vecchio GPU and Apple M1 Ultra.RX-817.Endnotes49ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceDisclaimerThe information presented in this document is for informational purposes
137、 only and may contain technical inaccuracies,omissions,and typographical errors.Theinformation contained herein is subject to change and may be rendered inaccurate for many reasons,including but not limited to product and roadmap changes,component and motherboard version changes,new model and/or pro
138、duct releases,product differences between differing manufacturers,software changes,BIOS flashes,firmware upgrades,or the like.Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMDassumes no obligation to update or otherwise correct or revise t
139、his information.However,AMD reserves the right to revise this information and to makechanges from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED AS IS.”AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT
140、 TO THE CONTENTS HEREOF AND ASSUMESNO RESPONSIBILITY FOR ANY INACCURACIES,ERRORS,OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.AMD SPECIFICALLY DISCLAIMS ANYIMPLIED WARRANTIES OF NON-INFRINGEMENT,MERCHANTABILITY,OR FITNESS FOR ANY PARTICULAR PURPOSE.IN NO EVENT WILL AMD BE LIABLE TO ANYPERSON FOR
141、 ANY RELIANCE,DIRECT,INDIRECT,SPECIAL,OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN,EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.Third-party content is licensed to you directly by the third party that owns the content and is not license
142、d to you by AMD.ALL LINKED THIRD-PARTY CONTENTIS PROVIDED“AS IS”WITHOUT A WARRANTY OF ANY KIND.USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NOCIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT.YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGE
143、STHAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.AMD,the AMD Arrow logo,CDNA and combinations thereof are trademarks of Advanced Micro Devices,Inc.Other product names used in this publication arefor identification purposes only and may be trademarks of their respective companies.2024 Advanced Mi
144、cro Devices,Inc.All rights reserved.50ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferencePlease Scan to Rate Please Scan to Rate This PaperThis PaperDoes chiplets open the space for emerging memory in the HPC system?Sebastien Couet,Gouri Sankar KarImec,Le
145、uven,Belgium 2024 IEEE International Solid-State Circuits Conference 2024 IEEE International Solid-State Circuits ConferenceOutlineIntroductionCompute needs and bottlenecksChiplet approachInterconnect pitch scaling and advantageChiplet revolution in High Performance Compute(HPC)3D System Integration
146、 TechnologyOpportunity for emerging memoryComparative Analysis:3D interconnect req.3D Interconnect Tech.in Production and imec RoadmapMemory Road Maps and AlternativesMagnetic MemoryBEOL compatible Capacitorless(2T0C)e-DRAMDRAM Road Map&possibilities CXL MemoryFerroelectricOvonic Threshold Switch(OT
147、S)memory Summary2 of 44ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceCompute Needs For Machine Learning Continue to GrowISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?3
148、 of 44 2024 IEEE International Solid-State Circuits ConferenceDiversity of Applications and WorkloadsISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?4 of 44AR/VRLow power Ultra low latency High memory bandwidthSmall form factorAutonomous drivingMulti-sensor fusio
149、n Distributed real-time computation Reliable and explainable AIGPUs for TrainingHigh throughput parallel compute Very high memory bandwidthVery high GPU-GPU bandwidth 2024 IEEE International Solid-State Circuits ConferenceCompute Capability Improving Faster than Memory Interconnect Bandwidth5 of 44A
150、mir Gholami,et.Al.“AI and Memory Wall”,https:/ scaling a.u.CPU/GPU peak performance3.1x/2 yearsInterconnect bandwidth1.4x/2yearsISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference2D-SoC Chiplet Approach6 of 442D
151、-SoCChiplet-partitioningMemoryCoreCommunication Chiplet PhY blockISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference2D-SoC 3D-SoC7 of 442D-SoCMemoryCoreCommunication Multi-tier memoryLogic/SRAMstacking3D-SoCISSC
152、C 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceInterconnect pitch scaling and advantages8 of 44ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-S
153、tate Circuits ConferenceChiplet revolution in HPC 9 of 44New way of integrating and delivering High Performance Compute New possibilities for many technologies example,new memory technologies,3DI,optical interconnect etc.The idea behind chiplets is to break apart the system on a chip into its compos
154、ite functional blocks,or parts.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference3D System Integration Technology10 of 44Driven by the“Memory-wall”Memory logic partitioningNeed for High Bandwidth&low energy 3D
155、interconnectNeed for different technologies with different 3D integration densitiesArea-array interconnectLateral interconnectISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference11 of 44ApplicationsGaming,for exa
156、mple rapidly changing background texture,shapes etc.Modern graphics use ray-tracing,image enhancementArchitecture64 MB in Ryzen 9 7950X;128 MB in Ryzen 9 7950X3D,E-cores in V-cache run at slightly lower clock speeds trading-off high capacitiesMemoryMagnetic STT,SOT Good option for infrequently acces
157、sed memory(ex.storing weights etc.)IGZO 2T0C eDRAM Good option for frequently accessed memory(activation memory).1T1C FerroCapa Depends on endurance for embedded applicationOpportunity for Embedded Emerging MemoryHow will V-cache enable emerging embedded memory?1)Logic compatibility2)Cost;no need of
158、 expensive logic for memory,and expensive metals levels can be avoided3)Remove logic critical thermal budget/process limitationsISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference12 of 44Comparative Analysis:3D
159、interconnect req.Design Technology MetricsOptimal Memory Tier:L2+LLC/SLCSub-system partitioning scheme3D interconnects req.15-20KpinsArea/3D interconnects req.3D pitches required 15-10mFan-Out impact minimal due to 2D length replaced(mm)Optimal Memory Tier:L2+LLCSub-system partitioning scheme3D inte
160、rconnects req.40-50K pinsArea/3D interconnects req.3D pitches required 12-8mFan-Out impact minimal due to 2D length replaced(mm)Optimal Memory Tier:L1(data)+SMEM+L2Sub-system partitioning scheme3D interconnects req.750K-800K pinsArea/3D interconnects req.3D pitches required 4-2mFan-Out impact minima
161、l due to 3D pitchMOBILESERVERGRAPHICS/GAMINGISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference3D Interconnect Tech.in Production and imec Roadmap13 of 44Includes data from H.-S.P.Wong et al.,“A Density Metric f
162、or Semiconductor Technology”,Proc.IEEE,vol.108,No.4,2020.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?Memory Road Maps and Alternatives 2024 IEEE International Solid-State Circuits ConferenceMemory and storage roadmaps15 of 441.00E-061.00E-051.00E-041.00E-031.
163、00E-021.00E-012022202420262028203020322034Effective bitcell area um2NanosheetsClassical 2D scaling3D NANDSRAMDRAMAlternative 2D or 3D DRAMCFET2 to 4 Tiers,toward 1000 layersSeparate peri wafer,multiple array wafersISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?M
164、agnetic Solutions for LLC applications 2024 IEEE International Solid-State Circuits ConferenceMagnetic Memory Options For high density on/off chip e-NVMOK for all levels cache.GHz capabilityLike HD SRAMMarginal gain over HP SRAMGood.Read and write paths separatedResearchSPIN TRANSFER TORQUE(STT)MgO
165、barrier can be tuned for reliabilitySPIN ORBIT TORQUE(SOT)VOLTAGE CONTROL MAGNETIC ANISOTROPY(VCMA)MTJ optimized for current-basedoperation,low RMTJ optimized for voltage-basedoperation,higher RPerformanceOK for last level cacheCost,area50%gain over HD SRAMPowerBig gain at system level with use of n
166、on-volatilityReliabilityDifficult for high endurance specs(failing tail bits)Maturity ReadyTo be defined50%gain over HD SRAMVoltage controlled,lowest powerProbably OK(low current device)Exploratory VOLTAGE GATED SPIN ORBIT TORQUE(VGSOT)(SOT)OK for all level cache.GHz capability60%gain over HD SRAMBi
167、g gain over HP/HD SRAMGood.Read and write paths separatedResearchSingle transistor in write path,cell selectivity through voltage gateISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?17 of 44 2024 IEEE International Solid-State Circuits ConferenceSTT-MRAM18 of 44C
168、urrent statusEmbedded flash replacement in MCU,.(eflash scaling wall)Large non-volatileembedded cachefor Edge-AI,.Large capacitymemory cache,possibly as chipletIn ProductionN28 to N14N14 to N7/5E-flash specHigh Cache-like specHigh endurance(1012)5-10ns latencyIn R&D/on theroadmapNext:automotiveWhat
169、could bebenefit?ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceSTT-MRAM19 of 44Potential value proposition as a chipletSRAMMRAM3X Typically 3X density gain at bitcell level vs SRAM,independent of node Reduce
170、 to 2X gain due to larger control peripheryS.Sakhare,imec,IEDM 2018 Write power cross-over is around 5MB Significant improvement in read latencydue to the WL length reductionISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circu
171、its ConferenceWhat comes next after STT?20 of 44SOT-MRAM*K.Garello et al.,VLSI,81-82(2018)*S.Couet et al.,VLSI(2021)*M.Gupta et al.,IEDM,24.5.1-24.5.4(2020)*K.Garello et al.,VLSI Circuit,T194-T195(2019)Key features:3-terminal deviceWrite:Spin currentRead:TMR readoutSeparated paths for read&write=Bet
172、ter endurance(1015)Sub-nanosecond switching*=High speed NVM cache-like memoryBEOL compatible*Challenges:Density(two transistors/bit),comparable to SRAM cell size*High switching current Field-free switching:magnetic hard mask*ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the H
173、PC System?2024 IEEE International Solid-State Circuits ConferenceVoltage gated SOT deviceMultipillar with individual pillar selectionVG-SOT Concept2T1R (n+2)TnRLess transistors=smaller cell size possible21 of 44Multi-pillar schematic&integrationISSCC 2024-Forum F1:Do Chiplets Open the Space for Emer
174、ging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceSelectivity demonstration-WERK.Cai et.al.,2022 VLSIStatistic measurements:Individual Pswof two bitsJoint Pswof two bits(12):Working windowlow switching currentISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerg
175、ing Memory in the HPC System?22 of 44 2024 IEEE International Solid-State Circuits ConferenceDensity scaling:SOT-footprint A14 node:SOT-1S1MTJ=footprint-50%w.r.t.6-track SRAM;on par with A5 SRAM.SOT-1S1MTJ iso-bit density as VGSOT-2MTJ=4 tracks/bit,without invoking VCMA.SOT-1S1MTJVGSOT-2MTJ:A14 with
176、 CPP=42 nm,MP=18 nmMRAMs on imec SRAM scaling roadmapISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?23 of 44 2024 IEEE International Solid-State Circuits ConferenceWhat comes after STT?24 of 44Present status of best-in-class SOT deviceStill many challenges need
177、to be addressed:Cell size,energy efficiency,field free switching,etc.Kaiming Cai,Imec,IEDM 2022ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?BEOL compatible e-DRAM 2024 IEEE International Solid-State Circuits ConferenceIGZO-based DRAM enables long retention wit
178、h low storage capacitance 26 of 44 Thanks to the extremely low IOFFof IGZO TFTs tret10s with CS=1 aF CScan be as low as Coxof scaled transistors 2T0C configuration proposedRetention loss in DRAM cells is mainly driven by IOFF of access transistor2T0C configurationWtrWWLWBLSNRBLRWLRtrCox,Rtr2T1C conf
179、igurationBelmonte et al.,IEDM 2020ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference2T0C configuration:Layout&TEM27 of 44Wtr=write transistorRtr=read transistorAl2O3SiO2IGZOAl2O3SourceGateDrainSourceGateDrainSi
180、 SubstrateWBLSTORAGE NODE(SN)RBLWWLRWLWtrWWLWBLSNRBLRWLRtrCox,RtrChallenges:Disruptive technology H sensitivity of IGZO ReliabilityKey features:Capacitor-less 2T cell Gain cellComplete BEOL solutionLong retention low refresh powerISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in
181、the HPC System?2024 IEEE International Solid-State Circuits ConferenceElectrical characteristics of RIE-patterned devicesLG=25nmFull wafer 138 dies100%yield for LG=25nmION 10 A/mUniformity across the waferSingle transistor2T0CIOFF 4.5 hours corresponding to IOFF 310-21A/m is achievedA.Belmonte et al
182、.,VLSI 2023ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?28 of 44 2024 IEEE International Solid-State Circuits ConferenceWBLWWLtwrite1011write cycles demonstrated with twrite1011write cycles achieved with twrite104)Simplified integration&scaling Low operating c
183、urrent(15mA)Low power Fast operation(20 years old technology,still in production PZT based material system,scaling was most challenging Recent breakthrough in hafnia based ferroelectric material research,thickness scaling was possibleOperating field is in the(MV/cm)Challenges:Destructive read,cyclin
184、g(1E12),2Pr40uC/cm2 in 3D capa,and write voltage 1V etc.1 Tahara et al.,VLSI,2021.2 Kozodaev et al.,JAP,2019.3 Kim et al.,Adv.Electron.Mater.,2021.4 Fu et al.,IEDM 2022,2022.M.Popovici,imec,IEDM 2022.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE Inter
185、national Solid-State Circuits ConferenceRecent breakthrough in high density FeRAM42 of 44Micron,IEDM 2023 Excellent stackable solution with poly-silicon channel select transistor and cylindrical ferro capacitor.But it does not help to increase the density significantly over DRAM and not an efficient
186、 bit-cost scaling solution.Required real 3D solutions.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?Summary 2024 IEEE International Solid-State Circuits Conference2.5D,3D,chiplet opening the door for emerging memory solutionsEmerging memory and their challenges
187、 Summary44 of 44ApplicationTopicChallengesLLCSTT-MRAMSwitching current,foot-print,tail-bit,and costSOT-MRAMSOT track(Isw100uA),field free sw,high density bit-cellBEOL eDRAMIGZO reliabilityFeRAM1e14 cyc)Memory3D DRAM4F2 vs.3D,Which channel?Epi vs.Oxide Semiconductor3D FeRAMWhich cell architecture?1T1
188、C vs.1TnC;2Pr50uC/cm2,1e12 cyc.OTS memoryMaterial research to deliver performance and reliability,ALD OTS for 3D memory1S1MTJ(MRAM)Finding selector and high-density patterning3D FeFETMany competitors for same applicationMLFeRAM(NDR)Window 20,1TnC,NDRStorage3D NANDGate stack with ferro layer to reduc
189、e Vprog by 2VAirgap,Nitride cut.Trench cell with window 10VISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:45 of 16Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE I
190、nternational Solid-State Circuits ConferenceIn-Memory Computing Chiplets for Future AI AcceleratorsEchere Iroaga()ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators1 2024 IEEE International Solid-State Circuits ConferenceOutline2 AI Deployment Trends In-memory Computing(IMC
191、)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceTrend 1:AI Model SizeAlexNetResNet-50ResNet-101ResNet-152TransformerGPT-1BERT-LargeMegatronGPT-2GPT
192、-31101001,00010,000100,0002010201220142016201820202022Giga-OperationsYearAlexNetResNet-50ResNet-101ResNet-152TransformerGPT-1BERT-LargeMegatronGPT-2GPT-3101001,00010,000100,0001,000,0002010201220142016201820202022Millions of ParametersYear10,000 X10,000 X Model sizes are large and increasing,driving
193、 the number of operations required3ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits Conference AI is needed across a wide range of form factors/compute capabilitiesSmart Phones/Mobile devices (MobileNet,ResNet,Gemeni Nano,Llama
194、 2B.)Client devices/Laptops (MobileNet,ResNet,Llama 13B,Stable diffusion,ViT.)On-prem servers (Llama 70B)Hyperscale/Cloud Datacenters (GPT-3.5/4)Chiplets enable scalability across scaling form-factors&Model sizeTrend 2:AI Platform Diversity4ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Futur
195、e AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceLearning from Todays Multi-Chip LLM Execution Large Models(and inference artifacts)dont fit into GPU memory driving the need for multi-GPU inference solutions.GPT-3.5/4 Inference needs 18 H-100(SXM5)GPUs purely from a memory pe
196、rspective for each group of inferences(Batch size lots of data Energy in processing engine Data access bit at a time Data movement Eliminated Energy in processing engine Drastically reduced data access Access compute result over many bits Data movement Eliminated Eliminate energy in processing engin
197、e Analog processing inside memoryISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceCompute/data-movement energies10MULT(INT8):0.3pJMULT(INT32):3pJMULT(FP32):5pJMULT(INT4):0.1pJMemory Size()Energy per Access 64b Word(pJ
198、)1MB(45nm technology)Data-movement costs are significant relative to compute costs!ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceTodays Digital Accelerators for Maximal Re-use11Data reuse is critical to addressing
199、data-movement costsMotivates spatial architectures=In-memory is well suited for thisMVM(=):=,Spatial Architecture:Reuse 60-200(32kB buffer)MemoryBoundComputeBoundAmount of Data Re-useISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circ
200、uits Conference10X higher Efficiency makes Memory the bottleneck.D.Bankman,ISSCC18Insufficient to address compute costs without addressing data-movement costsCompute engineMemoryMany Neural Networks(e.g.,Conv.Nets.)101012ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 20
201、24 IEEE International Solid-State Circuits ConferenceIn-memory computing(IMC)13,BitBit-cell arraycell array+ADCADC,MVM(=):=,Systolic ArrayIMC IMC maximizes 2D reuse via dense processing engines(Bit Cells)Spatial ArchitecturesISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerator
202、s 2024 IEEE International Solid-State Circuits ConferenceFundamental IMC trade-off:SNR14Memory(D1/2 D1/2array)ComputationMemory&Computation(D1/2 D1/2array)D1/2TraditionalIMCMetricTraditionalIn-memoryBandwidth1/D1/21LatencyD1EnergyD3/2 DSNR11/D1/2 Consider:accessing bits of data associated with compu
203、tation,from array with columns rows.IMC benefits communication&computation energy/delay at cost of SNRIMC tradeoff is controlled by row parallelismISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceWhat about digital IM
204、C?15Reduction by in-memory digital-logic adder treeY.-D.Chih,ISSCC21Bit-cell area:0.379 m2Macro area(64 kb):202,000 m2Bit-cell array just 11%of macro area!Tech.NodeArchitecture4-b TOPS/W4-b TOPS/mm2Adv.22 nmDigital accel.20121.2X-2.5XDigital IMC150155 nmDigital accel.40801.9XDigital IMC2751501Y.-D.C
205、hih,ISSCC21,2H.Fujiwara,ISSCC22Digital IMC reverts to digital acceleration Advantage from custom implementation/layoutISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceBest digital(7nm)N.Shanbhag,OJ-SSCS22Where does an
206、alog IMC stand today16Low-SNR IMC(22nm)high-SNR SC IMC(28nm)Low-SNR IMC(22nm)A.Papistas,CICC21Noise sigma is 0.43LSB of 6-bADC,for onecolumn at single temp.MVM output 6-b ADCHigh-SNR SC IMC(28nm)J.Lee,VLSI21Noise sigma is 0.3LSB of 8-bADC,for 256columns across temp.Error bars show sigma across 256 I
207、MC columns256-column overlayMVM output 8-b ADCIMC enables 10X higher efficiency(and throughput)than digital.But SNR trade-off poses the critical limitation today!ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceOutlin
208、e17 Large Scale AI:Properties and Trends In-memory Computing(IMC)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.1:SRAM-based binarized IMCVBIAS,
209、OVBIAS1x2x16xMA,R MD,R CLASS_ENX0X1X4WL_RESETWLXOffsetBLBLBMA MD Bit-cell replicaI-DACIMC ModeSRAM Mode J.Zhang,VLSI16J.Zhang,JSSC175-b WL DAC:00.10.20.30.4WL Voltage(V)Time(ns)012345X=5b 00001X=5b 11111X=5b 000000.020.040.06WLDAC CodeVBL(V)05101520253035Ideal transfer curveNominal transfer curve Bi
210、t-cell and peripheral circuitry limits SNR18ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.2:FLASH-based IMCInner-product Accuracy=,Accumulation=,=,X.Guo,IEDM17 Flash variation limits SNR,output TIA degrades powe
211、r efficiency19ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.3:RRAM-based IMC20Two CellsReRAM ArrayADCs(3b)&muxesS.Yin,T-ED Oct.2020 Low Bit-Cell SNR requires high-sensitivity readout,degraded AreaSignal(4x)Noise
212、TSMC 40nm C.-C.Chou,ISSCC18RRAM Cell SNRISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.4:MRAM-based IMC P.Deaville,VLSI Symp.2022 Low Bit-Cell SNR requires high-sensitivity readout,degraded Area21Signal(2x)NoiseR
213、esistanceGF 22nm D.Shum,VLSI17MRAM Cell SNRISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.5:Capacitor-based IMCADC Output Code(8b)25020015010050002004006008001000 1200Ideal Output Value(1152 inner dimension)H.Val
214、avi,VLSI Symp.2018J.Lee,VLSI Symp 2021TransistorsInterconnectCapacitor Precision BEOL capacitors provide high SNR enabling aggressive IMC efficiency and compute density.22CLmAm,1RSTVRSTIAN/IAbNIA1/IAb1diff.DACdiff.DACAm,NM-BCM-BCBLbmBLmWLnIAnIAbnAbnmAnmto ADCCLmwm,Nwm,1x1xNISSCC 2024-Forum 1.4:In-me
215、mory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceOutline23 Large Scale AI:Properties and Trends In-memory Computing(IMC)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Computing Chiplets for F
216、uture AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceArchitectural Considerations for IMC Systems Re-use IMC Spatial architecture SNR Capacitor Based INC Programmability Support for variety of operations Utilization Support for Parallelism D2D impact on future IMC systemsMust
217、 be addressed to enable wide-scale adoption of in-memory computing solutions!ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceNeed for programmability25Residual ConnectionsDepth-wise ConvolutionsDilated Convolutions W
218、ide variety of operations for inter-layer(dataflow)and intra-layer(convolutions).ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceRange of AI-model operations26B.Fleischer,VLSI18General Matrix Multiply(256 2300=590k e
219、lements)Single/few-word operands(traditional,near-mem.acceleration)MVM only 70-90%of operations IMC must integrate heterogenous architecturesISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceIMC utilization mapping cha
220、llenges27IMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC IMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC +Weight LoadingIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC Macro UtilizationIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC Bit-cell Utilization IMC is efficient but rigidNeed to watch ut
221、ilization to maintain efficiencyWeight loading(temporal utilization),macro and bit-cell utilizationISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceIMC utilization parallelism challenges Different forms of parallelism
222、=different overheadsData Parallelism(replication)Model Parallelism(broadcast)Pipeline Parallelism(pipelining)Weight-loading overheadNetwork communication overheadLatency overhead28ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuit
223、s ConferenceScalable dataflow IMC-architectureCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUOn-chip Network On-chip Network On-chip Network On-chip Network Segmented Weight BufferActivation BufferActivation BufferActivation BufferActivation BufferOff-chipControlPLLWeight NetworkWei
224、ght NetworkWeight NetworkWeight NetworkWeight NetworkWeight NetworkWeight NetworkWeight NetworkCompute-In-Memory Unit(CIMU)Compute-In-Memory Array(CIMA)Programmable Digital SIMDCompute and Dataflow BuffersProg ing&ControlOn-Chip Network(OCN)Network Out BlockNetwork Out BlockNetwork In BlockNetwork I
225、n BlockDisjoint Buffer SwitchDuo-DirectionalPipelined RoutingSwitch BlockCIMU Out PortFully-synthesized,pipelined routing segments Fully-disjoint switch blockConfigured via dedicated network1152256 IMC bank(CIMA),bit scalability from 1-8 b Programmable digital near-memory computing SIMDLocal bufferi
226、ng Local controlH.Jia,ISSCC21Dataflow architecture enables flexible optimization of parallelism(data/pipeline)29ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceD2D interconnect impact on future IMC Systems Efficient&
227、scalable workload mapping across chiplet systemsIMC architectures employ dataflow network of cores for flexible parallelismCompiler optimizations within IMC die translate across IMC dies,with proper heuristics applied for D2D-interconnect bandwidth and energy IMC compute density for enabling short-r
228、ange D2D interconnectsIn distributed execution,compute die TOPS optimized for attached memory BWHigh IMC compute density,enables smaller die and shorter chiplet interconnects Optimization with emerging memory technologyIMC must work with secondary memory(especially for large parameter models).Will r
229、equire optimization with next gen memory capacity/interconnect bandwidth.2024 IEEE International Solid-State Circuits ConferenceOutline31 Large Scale AI:Properties and Trends In-memory Computing(IMC)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Co
230、mputing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceConclusions32Execution of large scale AI models require multi-chip execution compute die to die interconnect technologies are well suited for this.Efficient AI compute requires solving compute AND data
231、-movement bottlenecks IMC is distinctly suited for this.IMC instates fundament energy/throughput vs.SNR tradeoffs These drive macro technologies and approaches.IMC faces architectural challenges for programmability&efficient execution Parallelism must be addressed through specialized architectures.I
232、MC chiplet based systems are coming soon(commercially)Die to Die interconnect performance will drive optimizations across the system.ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceFuture IMC Based Products on the Ho
233、rizon33Automotive/Industrial EdgeClient ComputingNPU Chipletsmulti-die ASICPCIe CardCompute ServersOn-prem Enterprise ServersCloud DatacentersISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI AcceleratorsChiplet powered scalable solutions from the edge to the cloudM.2 Card 2024 IEEE In
234、ternational Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:34 of 16Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024 Forum F1:Efficient Chiplets and Die-to-Die Communications1.5:Efficient Domain-Specific Compute with C
235、hipletsProf.Dejan MarkoviUCLA ECE Departmentdejanucla.eduISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets1 of 62 2024 IEEE International Solid-State Circuits ConferenceEvolving Standards:Flexibility&EfficiencyObjectives:lower development cost and shorter time-to-market SoC/ASIC r
236、evision/iteration is$($100M in 16nm CMOS)Long design cycles(1 yr)3with increasing design complexity1,2ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets2 of 62 2024 IEEE International Solid-State Circuits ConferenceSoCs Today=CPU/GPU+AcceleratorsMaltiel Consulting estimates4 Shao e
237、t al.IEEE Micro15Apple A12 die photo912172229A12201844Hardware accelerators(45%area)35A11Number of accelerator blocks in Apple APsA8A7A6A5A4ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets3 of 62 2024 IEEE International Solid-State Circuits ConferenceTwo ways to think about it Ad
238、d flexibility to accelerators Narrow coverage of DSPsThe how Interconnect Switch-boxes Sw toolchainOptimize for Efficiency and Flexibility1010.10.0011100.010.1Average Area Efficiency(GOPS/mm2)Average Energy Efficiency(GOPS/mW)ProcFPGACPUFPGA*DSPsDedicatedThis Work100*DSPs include CGRAs&FPGA-DSPs5-7I
239、SSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets4 of 62 2024 IEEE International Solid-State Circuits ConferenceEfficient Multi-Chip Module(MCM)Scaling Large SoCs incur higher costLower yield of larger chipsDelayed time-to-market Cost benefits of MCM scalingSmaller chips give bette
240、r yieldAMD 32-core chip(777mm2):1.0 x Cost4 x 8-core chiplet(4x213mm2):0.6x20(+)894%76%35%YieldG:10B:18T:28G:103B:33T:136G:620B:38T:658ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets1,600mm2400mm2100mm25 of 62 2024 IEEE International Solid-State Circuits ConferenceChallenges:Hig
241、h bandwidth density Low link latency Low energy transfer Low I/O areaChiplet size:Sweet spot:100mm2 UDSP prototype($limited):6mm2Challenges with MCM Design2x2 UDSP on Si-IFISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets6 of 62 2024 IEEE International Solid-State Circuits Confere
242、nce Domain-specific hardware acceleration ASIC-like energy efficiency and throughput Just-enough flexibility for a domain Key:flexible cores,efficient interconnect Tile-able chiplets on Silicon Interconnect Fabric(Si-IF)Develop scalable interconnects Near-range I/O and PHY for cutting-edge bandwidth
243、/latency/energy Low-area,portable timing correction circuits for Si-IF I/OsResearch AimsISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets7 of 62 2024 IEEE International Solid-State Circuits ConferenceUniversal Digital Signal Processor(UDSP)ArrayA 16nm 2x2 Chiplet with 10-m Pitch-I
244、/OUDSP Chiplet2-Layer Si-IF10-m I/O bump pitch9 U.Rathore,S.Nagi,S.Iyer,D.Markovic,ISSCC 2022.ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets8 of 62 2024 IEEE International Solid-State Circuits ConferenceUDSP Multi-Chip,Multi-Program TenancySNR-10 Link Vertical StackInactive Pro
245、gram(Soft Reset)Simultaneous Multi ProgramCross UDSP AlgorithmsProgram being ErasedControl&PLL2-Layer Si-IFUDSP Dielet10-m I/O bump pitchISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets9 of 62 2024 IEEE International Solid-State Circuits ConferenceCo-designComputeInterconnectI/O
246、channelCompilersPackageUDSP OverviewMemCore(1)Interconnect(2)RTRAMCM Assembly CompilerRTRAProgramming(5)Switchbox(3)I/O(4)ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets10 of 62 2024 IEEE International Solid-State Circuits ConferenceEvolution of UDSP Core24.5mm2(40nm)Slice L/MSl
247、ice L/MSlice L/MSlice LDSP-48,Slice L,BRAMSlice L/MSlice L/MSlice L/M64-8kFFT16-core UDSPFPGAInterconnectCHIP AREA10 C.C.Wang,et al.,ISSCC 2014.143Mtrans.Logic25%75%Logic25%25%Post-Proc.Pre-Proc.Path Selc.Path Selc.fastpathinterconnectdata mem.data mem.fastpathinterconnectShifter&Multiplier2014 Lewi
248、s Winner AwardUDSP coreISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets11 of 62 2024 IEEE International Solid-State Circuits ConferenceEfficiency and Flexibility in Comm.DSP1010.10.0110100.11001Average Area Efficiency GOPS/mm2Average Energy Efficiency GOPS/mWUDSP21CPUASICFPGA10 C
249、.Wang et al.,ISSCC 2014.11 F.-L.Yuan et al.,VLSI 2014.12 F.-L.Yuan et al.,VLSI 2015.D-CLASIC(v1)CLASIC(v2)UCLA FPGA:1.eFPGA interconnect2.Coarse-grain kernels UDSP based CLASIC designs Domain-specific for comm.DSPISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets12 of 62 2024 IEEE
250、International Solid-State Circuits ConferenceExample DSP kernels derived from common DSP algorithms Up/Down Conv.MIMO IFFF/FFT Neural Network Zero Forcing MMSE Vector-dot product MAC,FIR,EuclidianAlgorithm Ontology:Example DSP Kernels|2|2Lattice FilterFIR FilterRadix-2Mtx-MultVDP/BFZF/MMSEEDComplex-
251、MACCordicISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets13 of 62 2024 IEEE International Solid-State Circuits Conference16-bit fix-pt1.1 GHz Clk256b D-Mem384b I-Mem4 In,4 OutIterative Process of Core Design|2|2Lattice FilterFIR FilterRadix-2Mtx-MultVDP/BFZF/MMSEEDComplex-MACCord
252、icI-MemD-MemConnection AdjustMappingDSP KernelsUDSP Core v4.2 Balancing core granularity and core utilization to maximize energy and area efficiencyISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets14 of 62 2024 IEEE International Solid-State Circuits ConferenceDesign challenges En
253、ergy/area Flexibility Scalability Clk speedInterconnects:An Exercise in Co-DesignCoreInterconnectAlgorithmRoutingDSP ArrayCompilerISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets15 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-1 Interconnect(Distance=1)Vertic
254、al StackLayer 3 Switchboxes Layer 2 Switchboxes Layer 1 Switchboxes Bottom Layer of Cores4 x 16bDistance=ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets16 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-2 Interconnect(Distance=)Vertical StackLayer 3 Switchboxe
255、s Layer 2 Switchboxes Layer 1 Switchboxes Bottom Layer of CoresDistance=2 x 16bISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets17 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-3 Interconnect(Distance=2)Vertical StackLayer 3 Switchboxes Layer 2 Switchboxes Lay
256、er 1 Switchboxes Bottom Layer of CoresDistance=2 x 16bISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets18 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-4 InterconnectVertical StackCDF(Wire Distance)Distance from NodeFraction of WiresDistance from NodeFraction
257、of Wires0.650.60.750.850.80.70.950.9123457618Vertical StackISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets19 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-4 InterconnectCDF(Wire Distance)Distance from NodeFraction of WiresDistance from NodeFraction of Wires0
258、.650.60.750.850.80.70.950.9123457618Vertical StackRegistered Layer 4 SBLonger Distance RoutesLayer 4Vertical StackISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets20 of 62 2024 IEEE International Solid-State Circuits Conference Hyper-vector cross-correlation(HVCC)in each dimension
259、(layer)HVCC for a layer measures inter-dependencies of pathsN-Layer Switch Box:Hyper-Matrix ModelN-Layer Switch BoxI4I3I2I1O1O2O3O4M1M2M3M4N-D Hyper-Matrix RepresentationIMO1234N-D Hyper-MatrixN-Layer Switch BoxISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets21 of 62 2024 IEEE In
260、ternational Solid-State Circuits Conference MCBF&MCBF/HWC plotted against layer density for 3-layer SBDSE:Search Space TraversalDistance from NodeHW Cost(HWC)103542768100150200250350300509MCBF0.00500.0150.0250.020.010.0350.030.04HW Cost(HWC)10015020025035030050MCBF/HWCFully ConnectedCompiler Hw Area
261、 SparseCompiler Hw Area MCBF:Mean ConnectionsBefore FailureISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets22 of 62 2024 IEEE International Solid-State Circuits Conference Sw/Hw balance is at the peak of MCFB/HWCDSE:Maximizing Silicon Area EfficiencyDistance from NodeHW Cost(HWC)
262、103542768100150200250350300509MCBF0.00500.0150.0250.020.010.0350.030.04HW Cost(HWC)10015020025035030050Hw/Compiler Co-OptimizedSwitch BoxMCBF/HWCMCBF:Mean ConnectionsBefore FailureISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets23 of 62 2024 IEEE International Solid-State Circuit
263、s ConferenceFour UDSP dielets8mm x 8mm Si-IFSi-IF interface7,168 data pins160 control+PLL pins22,291 power/ground pins2-layer routingAssembly considerationsSelect known good diesSelect known good Si-IFDie handling,cleaning,ESDDielet alignment,bondingSi-IF Assembly OverviewISSCC 2024-Forum 1.5:Effici
264、ent Domain-Specific Compute with Chiplets59m24 of 62 2024 IEEE International Solid-State Circuits Conference Low loss at 10 GHz 10 ps RTT,negligible ISISi-IF Characteristics10-m pitch Cu bumps*350 mChannel Loss(dB)ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets13*9.8 m with opti
265、cal shrink25 of 62 2024 IEEE International Solid-State Circuits Conference UDSP dielet powered on to verify Clk tree and shift-registersLow-freq Clk applied using a probe station Dice defect-free Si-IF sites for assemblyTemplate-based wafer scan for repeated patterns Dielets assembled on Si-IF using
266、 direct Cu-Cu TCBIn-situ formic acid(FA)vapor treatment Ionizers on the bonding tool to ensure an ESD-safe assemblyDefault 20 l/min flow interferes with the FA vapor flow of 4.5 l/minLeads to inadequate cleaning of Cu pads,inferior bonding qualityShear strength$1TGrowing double by 2030PC EraSmartpho
267、ne EraData CenterAI Era$205B2000Semiconductor Industry Landscape 2024 IEEE International Solid-State Circuits Conference4MORE THAN MOOREMarket demand for AI performance is faster than Moores Law doubling transistors every 18 months AI/ML performance has increased nearly 6.8x11x in the past two years
268、 from 2021 2022 that outstrips and more than Moores LawScaling for AI eraISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits Conference5Compute Requirements Exploded in AI era20102020203020402050Current Tre
269、nd(Device Scaling)“Market dynamics limited”scenarioWorlds energy productionCompute Energy in J/year1.E+181.E+201.E+22ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationExascale=21MW(52GF/watt)Zettascale=500MW(2140 GF/watt)Nuclear plant1GW 2024 IEEE In
270、ternational Solid-State Circuits Conference6Technology Converging and Business Ecosystem2020sCSYS Multi-dieChipletsIDMs/Foundry/OSTAs/EMSEcosystem1990s2000s 2010sSoCSoC w/IP MCMIDM orFoundryFoundry&3rdParty IPIDM,Foundry&OSAT2010s 2000s1990sSiPHDI PCB FR4 PCB OSATsEMSEMS SiPSystem integrated on boar
271、d driven by OSATs/EMS in semiconductor markets Si CMOS(foundry-focused)to CSYS(Complementary Systems)Chiplets&heterogenous integration on substrate become mainstream in the futureISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE Internatio
272、nal Solid-State Circuits Conference Mix&Match systems-Enable construction of Different Si Nodes Reuse IPsPackage becomes new System-on a Chip(SoC)System flexibility-Processors,accelerator Performance optimization-Low latency&high BW Time to Market Low CostEnable optimal process technology;Smaller fo
273、r better yieldModularized SoC(Chiplets)Monolithic SoCDrivers for on-package Chiplets 72023 IEEE 73rd Electronic Components and Technology Conference Orlando,Florida May 30 June 2,2023Chiplets and Heterogeneous Integration7ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the P
274、ath to Standardization 2024 IEEE International Solid-State Circuits Conference8Chiplet 1Process node 1On-die busChiplet 2Process node 2PCI/CXLControllerOptional interposer/bridgepackageD2DPHYInterfacelogicD2DPHYInterfacelogicOn-die bus1-20 mmMonothetic Chip:Scale SoC&Homogenous one die in a packageC
275、hiplet:Split SoC&Heterogeneous multiple dies in a packageD2D interconnectOff-package interconnect+simple packageOn-package interconnect+more complex packagesChiplets-Disaggregation&IntegrationISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IE
276、EE International Solid-State Circuits ConferenceChiplets More Complex Workflow Design&VerificationCustomerSpecificationAssemblyPackaging&TestFoundrySystemSoftwareOEM andProductcodesignphysical modularityFunctional modularityadditional constraints with chipletsKey factors for more complexFunctional m
277、odularityPhysical modularityInterconnectPackagingTest and operationsSupply chainChipletsISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization9 2024 IEEE International Solid-State Circuits ConferenceChiplets Design EcosystemNode/Board Level IntegrationCP
278、UCPUAcceleratorI/O TileMemMemMemMemCXL/PCIe/CPU-CPU(Electrical/Optical/)DDRPackage Level IntegrationOn-die Integration Seamless Integration from Node Package On-die Standardization for Chiplets D2D and interoperation Same Software,IP,and Subsystem to build scalable solutionsISSCC 2024-Forum F1.6:Inn
279、ovations in Chiplet Interconnects,Protocols and the Path to Standardization10 2024 IEEE International Solid-State Circuits ConferenceChiplet Form FactorDie Size/bump locationPower deliverySoC Construction(Application Layer)Reset and InitializationRegister accessSecurityDie-to-Die Protocols(Data Link
280、 to Transaction Layer)PCIe/CXL/Streaming Plug and play IpsDie-to-Die I/O(Physical Layer)Electrical,bump arrangement,channel,reset,power,latency,test repair,technology transition Die-to-Die I/ODie(Chiplet)ProtocolDie-to-Die I/OProtocolDie(Chiplet)ChipDie-to-DieI/ODie-to-DieProtocolChipletForm FactorS
281、oC Construction(Example SoC showing two chiplets only)Chiplets Interconnect&Interoperation11ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits Conference D2D interface Functional block connecting data inte
282、rface between two dies assembled in same package(MCM)/interposer(2.5D,Fan out,Si bridge,3D stacking)D2D very short channels High Power efficiency High bandwidth Chiplets Die to Die Interface D2D Structures Typically made of PHY&a controller Block(a physical layer,link layer,and transaction layer)Two
283、 types of PHY Architectures SerDes series connection(standard MCM)High density parallel(2.5D,Fan out RDL,Si bridge,3D stacking)Standard D2D&Proprietary IP D2D Open-source standards:UCIe,Bow,OHBI,More IP D2D:NVlink(Nvidia),Lipincon(TSMC),Infinity Fabric(AMD),MDIO/AIB)(Intel),XSR/USR(Rambus)ISSCC 2024
284、-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization12 2024 IEEE International Solid-State Circuits ConferenceChiplets Growth in StandardsComponentStatusD2D Interconnect(Huge growth/awareness)UCIe,BoW,AIB,XSRTestIEEE 1838,IEEE P3405Chiplet descriptionJEDEC-OCP J
285、EP 30 CDXML(new in 2023)Size guardrailsX Power delivery guardrailsXThermal guardrailsXWiring density guardrailsXMechanical guardrailsXBump and assembly pitch guardrailsX13ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Soli
286、d-State Circuits ConferenceChiplets D2D Standard-UCIeLayered Approach with industry-leading KPIsPhysical Layer:Die-to-Die I/ODie to Die Adapter Support for multiple protocols:bypassed in raw modeProtocol:CXL/PCIe and Streaming CXL/PCIe for volume attach&plug-and-playSoC construction issues are addre
287、ssed w/CXL/PCIe CXL/PCIe addresses common use casesI/O attach,Memory,Accelerator Streaming for other protocolsScale-up(e.g.,CPU/GP-GPU/Switch from smaller dies)Protocol can be anything(e.g.,AXI/CHI/SFI/CPI/etc.)Well defined specification:interoperability and future evolutionISSCC 2024-Forum F1.6:Inn
288、ovations in Chiplet Interconnects,Protocols and the Path to Standardization14 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D Standard-UCIeISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization15DIE-TO-DIE ADAPTERPHYSICAL LAYERPROTOCO
289、L LAYERPCIe,CXL,Streaming(e.g.,AXI,CHI,symmetric coherency,memory,etc.)Flit-Aware Die-to-Die Interface(FDI)Raw Die-to-Die Interface(RDI)Link TrainingLane Repair/Reversal(De)Scrambling,Analog Front end/Clocking Sideband,Config&Registers ChannelArb/Mux(if multiple protocols)CRC/Retry(when applicable)L
290、ink state managementParameter negotiation&Config Registers(Bumps/Bump Map)Form FactorRaw Mode(bypass D2D Adapter to RDI e.g.,SERDES to SoC)2024 IEEE International Solid-State Circuits ConferenceChiplet D2D StandardUCIe-PHYByte to Lane mapping for data transmission Interconnect redundancy remappingWi
291、dth degradationScrambling&training pattern generationLane reversalLink initialization,training&power management statesTransmitting&receiving sideband messagesOne,two or four module per Adapter allowed both advanced&standard PackageStandard package example configurations16ISSCC 2024-Forum F1.6:Innova
292、tions in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceUCIe Usage Model Streaming for PCIe/CXL AMBA CHIUCIe ProtocolStreamingAdapterPHY Transporting same on-chip protocol allows seamless use of architecture specific features wi
293、thout protocol conversion Streaming interface with additional flit formats provide link robustness using UCIe defined data-link CRC&retryCHI/CXLUCIe ProtocolStreamingAdapterPHYISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization17AccelerationMem contro
294、llerGPUAcceleratorsInternal(CHI interconnect)ComputeCPUCPUCHI interconnectCPUD2DAdapterMemory controllerCPUCPUCPUCPUCPUCPUCPUCPUCPUPHYComputeCPUCPUCHI interconnectCPUMemory controllerCPUCPUCPUCPUCPUCPUCPUCPUCPUPHYD2DAdapterPHYCHI/CXLUCIeUCIeUCIeUCIe(3 dies on one package)PHYD2DAdapterD2DAdapterCHI/C
295、XL 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D-UCIe Key Metrics18UCIe 1.0/1.1 Characteristics and Key Metrics 2024 IEEE International Solid-State Circuits Conference19AreaOutputD2D PHYBunch of Wires 1.0Bunch of Wires 2.02.1 in flight1ststandard scaling laminate to advanced pa
296、ckaging,2-32 Gbps/laneImplementations from 65,22,16,12,7,6,5,4,3nm 10+products in flightProven power of 0.3 to 0.5 pJ/bit,D2D SpreadsheetV3.0 in flightBiennial release compares data on all PHYsD2D Link and TransactionTLP 1.0Only known“streaming mode”open link layerD2D Transaction ProfileNXP DiPort/O
297、ther ProfilesOnly known open maps of AXI SOC Traffic to D2D PHYsCDXJEDEC-OCP JEP 30 PM;Open 3DKStandard for physical chiplet descriptionWorkflow white papersBusinessChiplet cost modelBusiness white paperOpen spreadsheet model to compare chiplet/monoProduct planning assistance documentPrototypingTest
298、 Package1stopen chiplets integration across vendorsFully open package design and analysisChiplets D2DOCP/ODSA Standards BoWISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D Standard
299、-BoWD2D is made up of wires,slices,and stacksPhysical Layer:Slice(Die-to-Die I/O)It must have 18 or 20 signal bumps.It must have 2 bumps for the differential clock and 16 single-ended data bumps It also have the optional single-ended signals AUX and FEC.The long edge of a slice must be parallel to t
300、he chip edge A stack composed of one or more slices stacked from chip edge to center A link composed of one or more stacks along the chip edgeBoW Link Components20ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State
301、Circuits ConferenceChiplets D2D Standard-BoWBoW PHY in the ODSA Stack BoW for Common Transaction Protocols 21ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D Standard-BoWBoW PHY Mo
302、des and Targets 22ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceStandardThroughputDensityMax.DelayAdvanced Interface Bus(AIB,Intel)2 Gbps504 Gbps/mm5 nsBandwidth Engine(Mosys)10.3 GbpsN/A2.4
303、 nsBunch of Wires(BOW,OCP/ODSA)16 Gbps1280 Gbps/mm5 nsUniversal Chiplet Interconnect express(UCIe)32Gbps1350Gbps/mm2 nsHBM3(JEDEC)4.8 GbpsN/AN/AInfinity Fabric(AMD)10.6 GbpsN/A9 nsLipincon(TSMC)2.8 Gbps536 Gbps/mm14 nsMulti-die IO(MDIO,Intel)5.4 Gbps1600 Gbps/mmN/AXSR/USR(Rambus)112 GbpsN/AN/A23ISSC
304、C 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationChiplets D2D Interface Summary 2024 IEEE International Solid-State Circuits ConferenceChiplet IntegrationStandard&Advanced Packages(Standard Package)(Multiple Advanced Package Choices)Die-2Package Substra
305、teDie-0Die-1 Standard Package:2D cost effective,longer distance Advanced Package:2.5D,high density Fanout,embedded Si bridge power-efficient,high bandwidth density 24ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationPackage SubstrateSilicon Bridge(e.
306、g.EMIB)(e.g.EMIB)Silicon Bridge Die-1Die-0Die-2Package SubstrateInterposer(e.g.CoWoS)Die-1Die-0Die-2Package SubstrateInterposer(e.g.FOCoS-B)Silicon Bridge Silicon Bridge Die-1Die-0Die-2 2024 IEEE International Solid-State Circuits Conference251995-NowPerformance1984-Now2009-20212022 Flip ChipBall Gr
307、id Array2.5DThroughSilicon ViaFOCoSWire BondCu Pillar Flip Chip(Dev 2006)FOPOP2.5/3DFan-OutWafer Level PackageFOSIPSolder Flip Chip(Dev 1964)FOCoS-BVIPack PlatformCo-SiPhWire bond(Dev in 1956)Mobile-Networking-Compute-AI,Edge Automotive-IndustrialDensity3D Advanced RDL TechnologyFan Out Package(Dev
308、2009)ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationASE Advanced Packaging Technology Offerings 2024 IEEE International Solid-State Circuits Conference26High Density Interconnection-High I/O connect 10000 with fine RDL L/S 2/2um-Support package si
309、ze 60 x60mm Chip Last(FOCoS-CL)Chip First/CL w/Bridge(FOCoS-B)Chip First(FOCoS-CFP)ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationFOCoS Packaging Technology Offerings 2024 IEEE International Solid-State Circuits Conference27 2ASIC+4HBM2+4 Si Bridg
310、e die Module size:47x31 mm2 1 RDL,L/S 10/10 um Si Bridge Die L/S 0.8/0.8um Package size:78x70 mm2 Total 10 chiplets in MCM package ASIC+2HBM3 4 RDL,L/S 2/2um Package size:75x75 mm2 ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationFOCoS Packaging Tec
311、hnology 2024 IEEE International Solid-State Circuits Conference28 High-Density Interconnect Min.L/S 0.4/0.4um Power module or DTC integration Optics integration(optic I/O and Photonic)ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization2.5D TSV Heterog
312、enous Integration 2024 IEEE International Solid-State Circuits ConferenceFan-out WaferDRC,DFM&LVSGDSII out Layout Cadence APDPKG Netlist Net name and coordinateDOCSAuto-routerCadenceLayoutChecking ProgramAuto-mask DesignPDKCalibre More than 50%layout cycle-time saving by auto-router 29ISSCC 2024-For
313、um F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationIntegrated Design Ecosystem(IDE)2024 IEEE International Solid-State Circuits Conference UCIe D2D interface bump out diagram foradvancedpackagingwiththebumppitchbetween 40um to 50umVia land diameter=16umRDL L/S=2/2
314、umuBump pitch=45um FOCoS RDL Design Rule 10 columns for x64 TX&RX data lanes&total156lanesfortheD2DinterfaceroutingUCIe D2D Interconnect using FOCoS Packaging 30ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Ci
315、rcuits ConferenceUCIe D2D Interconnect Design for FOCoS PackagingMoldingUBMUBMUBMCuPCuPCuPGSGGSPI2PI1PI3RDL1RDL2PI4PI5RDL3RDL4RDL5PI7GSGGSSGSSGGSGGSSGSSG2 um 2 umUnderfill GSG type X-sectionSGSSGPI6RDL6 Total 1516 I/O for 1 bump pitch 3 I/O layers&2 isolation GND for SS type design 6L RDL layers nee
316、ded for each I/O routing with ground RDL traces surrounding31ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceElectrical Analysis for UCIe D2D in Advanced Packages32ISSCC 2024-Forum F1.6:Innova
317、tions in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits Conference Key Performance Indicators Bandwidth density(linear&area)Data Rate&Bump Pitch Energy Efficiency(pJ/b)Scalable energy consumption Low idle power(entry/exit time)Latency(end
318、-to-end:Tx+Rx)Channel Reach Technology,frequency&BER Reliability&Availability Cost(Standard vs advanced packaging)Factors Affecting Wide Adoption Interoperability Full-stack,plug-and-play with existing s/w is+Different usages/segments Technology Across process nodes&packaging options Power delivery&
319、cooling Repair strategy(failure/yield improvement)Debug controllability&observability Broad industry support/Open ecosystem Learnings from other standards efforts33ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationChiplets D2D Interface Standards Ado
320、ption 2024 IEEE International Solid-State Circuits ConferenceKey Takeaways Chiplets heterogeneous integration optimizes system performance to continue scaling Moores law with cost advantage Interoperability,plug and play for different usages and broad industry support are very critical to the wide a
321、doption of chiplets D2D interface standardization Advanced packaging solutions(HDFO,2.5D&3D)enables chiplets and heterogeneous integration that optimizes system performance34ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International S
322、olid-State Circuits ConferenceInclude Key References“Bunch of Wires PHY Specification”,The Open Domain-Specific Architecture BoW Workstream,2022“UCIe Specification”,July 2023“Interconnects for 2D and 3D Architectures”Heterogenous Integration Roadmap(HIR)2021 EditionSamuel Naffziger et al.,“Pioneerin
323、g Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families:Industrial Product,”2021 ACM/IEEE 48thAnnual International Symposium on Computer Architecture(ISCA)Anthony Mastroianni et al.,“Proposed Standardization of Heterogenous Integrated Chiplet Models,”2021 IEEE International 3D
324、Systems Integration Conference(3DIC)Shahab Ardalan at at.,“Bunch of Wires:An Open Die-to-Die Interface”,2020 IEEE Symposium on High-Performance Interconnects(HOTI)John Park,”Chiplets and Heterogeneous Packaging Are Changing System Design and Analysis”,Cadence white paper,Lihong Cao et at.,“Advanced
325、Packaging Design Platform for Chiplets and Heterogeneous Integration”ECTC,2023R.Farjadrad et at.,A Bunch of Wires(BoW)Interface for Inter-Chiplet Communication,Hot Interconnect,201935ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE Intern
326、ational Solid-State Circuits Conference36Thank youISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:37 of 16Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024
327、 IEEE International Solid-State Circuits ConferencePhotonics for Die-to-Die Interconnects:Links and Optical I/O ChipletsChen SunAyar Labs,Inc.ISSCC 2024-Forum 1.7:Photonics for Die-to-Die Interconnects:Links and Optical I/O Chiplets1 of 47 2024 IEEE International Solid-State Circuits Conference Emer
328、ging computing applications,such as AI/ML,have an ever-insatiable demand for interconnect bandwidths.Gap between in-package and off-package I/O bandwidth continues to grow.Interconnect BW Growth Driven by AI/ML(Nvidia GTC March 2022)G.Keeler,DARPA ERI Summit 2019ISSCC 2024-Forum 1.7:Photonics for Di
329、e-to-Die Interconnects:Links and Optical I/O Chiplets2 of 47 2024 IEEE International Solid-State Circuits ConferenceScaling Challenges for Off-package I/O*Source:Gordon Keeler,DARPA MTO,ERI Summit 2019Target for Optical I/O ChipletsHBM2e112G XSRCritical Performance Metrics:ISSCC 2024-Forum 1.7:Photo
330、nics for Die-to-Die Interconnects:Links and Optical I/O ChipletsEnergy efficiency(pJ/bit)Bandwidth density(Gbps/mm)Reach(mm to meters)Latency(ns)Optical I/O chiplets bridge the in-package and off-package performance gap 3 of 47 2024 IEEE International Solid-State Circuits Conference Chiplets for Opt
331、ical I/OSystem architectureBuilding blocksChiplet D2D interfaces Retimed optical I/O Chiplet designProcess and fiber attachElectrical interface designOptical transceiver design Measurement results ConclusionAgendaISSCC 2024-Forum 1.7:Photonics for Die-to-Die Interconnects:Links and Optical I/O Chipl
332、ets4 of 47 2024 IEEE International Solid-State Circuits Conference Chiplets for Optical I/OSystem architectureBuilding blocksChiplet D2D interfaces Retimed optical I/O Chiplet designProcess and fiber attachElectrical interface designOptical transceiver design Measurement results ConclusionAgendaISSC
333、C 2024-Forum 1.7:Photonics for Die-to-Die Interconnects:Links and Optical I/O Chiplets5 of 47 2024 IEEE International Solid-State Circuits Conference Optical I/O chiplets can bridge the D2D interfaces between two socketsOptical I/O System ArchitectureASIC PackageASIC PackageASICASICOptical I/O ChipletOptical I/O ChipletElectrical D2D I/FElectrical D2D I/FSingle Mode fiber ExternalMulti-wavelength