F1 - Efficient Chiplets and Die-to-Die Communications.pdf

編號:154974 PDF 368頁 29.47MB 下載積分:VIP專享
下載報告請您先登錄!

F1 - Efficient Chiplets and Die-to-Die Communications.pdf

1、ISSCC 2024Forum 1Efficient Chiplets and Die-to-Die Communications 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024 Forum 1:Efficient Chiplets and Die-to-Die CommunicationsFebruary 18th,2024Presentation start at 8:15amISSCC 2024-Forum 11 of 4 2024 IEEE International Solid-State Circu

2、its ConferenceOrganizing CommitteeOrganizers:Shidhartha(Sid)Das,AMD,Cambridge,United KingdomJohn Wuu,AMD,Fort Collins,ColoradoCo-Organizers:Yvain Thonnart,CEA-List,Grenoble,FranceHugh Mair,MediaTek,Austin,TexasChampionsFatih Hamzaoglu,Intel,Hillsboro,OregonKostas Doris,NXP,Eindhoven,The NetherlandsI

3、SSCC 2024-Forum 12 of 4 2024 IEEE International Solid-State Circuits Conference8 talksEach 45-minute talk will be followed by 5 minutes of Q&APlease state your name and affiliation during Q&A2 coffee breaks and one lunch breakDigital copy of all slides will be provided for the forumPlease switch you

4、r mobile devices to silent modePlease remember to complete speaker evaluation formsGeneral Information ISSCC 2024-Forum 13 of 4 2024 IEEE International Solid-State Circuits ConferenceStartTitleSpeakerAffiliation8:15 AMIntroductionJohn WuuAMD8:25 AMAdvanced CMOS and Packaging Technology for Multi-chi

5、plet and Trillion Transistor 3DIC System-in-Package by 2030Yujun LiGeoffrey YeapTSMC9:15 AMThe Packaging and Interconnect Requirements of the IC Industrys Chiplet-based FutureSam NaffzigerAMD10:05 AMBreak10:20 AMDo Chiplets Open the Space for Emerging Memory in the HPC System?Sebastien CouetGouri Sa

6、nkar KarIMEC11:10 AMIn-memory Computing Chiplets for Future AI AcceleratorsEchere IroagaEnCharge AI12:00 AMLunch1:20 PMEfficient Domain-Specific Compute with ChipletsDejan MarkovicUCLA2:10 PMInnovations in Chiplet Interconnects,Protocols and the Path to StandardizationLihong CaoASE US3:00 PMBreak3:1

7、5 PMPhotonics for Die-to-Die Interconnects:Links and Optical I/O ChipletsChen SunAyar Labs4:05 PMRobust Circuit/Architecture Co-Design for Chiplet IntegrationWen-Chou WuMediaTek4:55 PMClosing RemarksSid DasAMDForum AgendaISSCC 2024 Forum 14 of 4“Why”“What”“How”2024 IEEE International Solid-State Cir

8、cuits ConferenceAdvanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030Dr.Yujun Li,Director,HPC Business DevelopmentDr.Geoffrey Yeap,Vice President,R&DTaiwan Semiconductor Manufacturing Company LimitedUnleash Innovation1 of 49ISSCC 2024-Forum F

9、1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceOutlineForces Driving Chiplet and IntegrationAdvanced CMOS TechnologiesDomain Specific CMOS Chiplet Optimization Advanced Packagi

10、ng TechnologiesSpecialty Chiplet for Platform SolutionsDesign Enablement and EcosystemISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 20302 of 49 2024 IEEE International Solid-State Circuits ConferenceAdvanced Packaging

11、Insatiable Market NeedsProcess Technology DevelopmentPackaging Technology DevelopmentThe three forces that drive higher level of integration and the adoption of chiplets 3 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Packa

12、ge by 2030The Perfect Storm 2024 IEEE International Solid-State Circuits ConferenceGenerative AI Accelerates Computing Needs4 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State

13、 Circuits ConferenceGrowing AI/HPC Computing Requirement Heterogeneous computeMore computing coresHigher memory capacityHigher memory bandwidthMore I/O bandwidthAdvanced process technology&advanced 3DIC packaging technology are the key enablers to achieve trillion transistor system-in-package.SoC/So

14、ICHBMsSoC/SoICHBMsCoWoS+SoC/SoIC5 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceDegrees of System Level Integration AMD MI300X GPU with 153 billion tran

15、sistors SoIC and up to 192 GB HBM3 MemoryTSMC N5/N6 FinFET ProcessCerebras WSE-246,225 mm Silicon with Wafer Scale Integration2.6 Trillion transistorsTSMC N7 FinFET ProcessSource:https:/ https:/ of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transist

16、or 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceAdvanced CMOS Technology7 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits C

17、onference20102012201420162018Logic Density 2020202220242026N3PN2N16N10N7N6N5N4N3EN22N28Process Technology EvolutionDevice ArchitectureNanosheetLow-R MEOL/BEOLSelf-Aligned FeaturesLow K SpacerFinFlex with 1-FinHigh Mobility Channel Super High Density MIMFinFETEnhanced Strained-SiHigh Density MIMHigh-

18、KMetal GatePlanarLithographyEUVEUVDouble Patterning ImmersionSingle Patterning Double Patterning Self Aligned Double Patterning ImmersionSource:TSMC8 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE I

19、nternational Solid-State Circuits ConferenceTransistor Architecture OutlookPower,Performance,Area(PPA)Year*TMD:Transition Metal Dichalcogenides2D TMD*Beyond SiCNTSourceGateDrainSourceSourceGateDrainSourceSourceGateDrainSourceDevice ArchitectureFinFETNanosheetCFETSource:YJ Mii,2022 VLSI Symposium9 of

20、 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceTechnology Innovation Drives Energy EfficiencyCore Area(m2)Speed(GHz)3nm5nm7nmSource:YJ Mii,2022 VLSI Sympos

21、ium1.83Xlogic density+13%speed-21%energy1.57Xlogic density+11%speed-30%energy10 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceTechnology Enables Energy

22、Efficient ComputeN28N16N10N7N5N3Perf/Watt/mm2N28N16N10N7N5N3Power Efficiency Source:TSMC11 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceMetal Pitch Sca

23、ling Continues But Slowing DownSource:ASML 2021 Investor Day2x every 6 years12 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceInterconnect Technology Inn

24、ovations ContinueYearNode in Log Scale10.01.00.10.00.01987 1989 1991 1993 1995 19971999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019Cu+Low KELKLow-R barrierCo cap layer/Co linerImmersionDouble patterningEUVMetal oxide ESLMaterial InnovationLithography Innovation2022 2024Source:YJ Mii,2019 TSMC

25、Technology Symposium13 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSRAM Bit Cell Area Reduced by 100X 0.010.111019982000200220042006200820102012201420

26、162018202020222024Bit Cell Area(mm2)90nm65nm45nm130nm28nm20nm16nm10nm7nm5nm3nmTall cell Wide cellHigh Current/High Density cellsStrained SiliconHigh-K Metal GateTechnologyDouble patterningM0 Bid-LineFinFETEUV and high mobility channelDesignColor aware design and design assistWrite assist for FinFETN

27、ovel Dual Rail SchemeFLY BL Double World LineMetal coupling Negative Bid-LineCompact Periphery Layout(FCST)Source:TSMC14 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Packageby 2030 2024 IEEE International Solid-State Circu

28、its ConferenceAdvanced PackagingInsatiable Market NeedsProcess Technology DevelopmentPackaging Technology DevelopmentWhat Drives Chiplet Adoption?The three forces that drive higher level of integration and the adoption of chiplets Rise of AI insatiable amount Need for more compute/memory/IO Workload

29、 optimization architecture diversification Heterogeneous integration CMOS scaling continues More DTCO contributions Pace of scaling differs logic,SRAM,Analog/IO Increasing process complexity Wafer scale advanced packaging 3DIC WoW and CoW Much more effective way to integrate compute/memory/IO15 of 4

30、9ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceDomain-Specific Chiplet Optimization16 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for M

31、ulti-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceFrom SoC to Multi-Chip SoICHistorically process is optimized for SoC to serve broader audiencesWith chiplet,process can be further optimized to achieve better PPASoCGenerations

32、of successSRAM&analog/IO face scaling challengesChipletCompute die on node N for highest performanceAnalog/IO on N-1 or N-2 to optimize costMCM for low-cost interconnectSoICLogic SoIC to increase performance,CPU,GPU,or SRAM can stackFutureOptimizing logic with different technology nodesLogic stackin

33、g to increase performanceOn-board memory to improve memory bandwidthInterposer for higher connection bandwidthSubstrateCPU/GPUSRAM/AMSSoCMCMCPU/GPUSRAMAnalog/IOMCMCPU/GPUSRAMLogicInterposerCPUGPUSoCChipletSoICFutureAnalog/IOAnalog/IOMemoryLogicLogic17 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Pac

34、kaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceChiplet Optimization PPAC and TTMChiplet can be independently optimizedAdoption timeDie sizeProcess nodeOptimizing chipletCompute dies need latest process

35、 tech for best PPASRAM&Analog/IO scaling are slowing down;FinFET vs.GAADefect density improves over time with more learningIP availabilityDefect DensityTimeNode N-2Node N-1Node NProductlaunchInterposerCPULogicGPULogicAnalog/IOMemory18 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology

36、 for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceChiplet Design&Process Optimization SoC needs to balance common process window for all devices&layout styles Process window for chiplets can be optimized thanks to special

37、ization and focus Chiplet with process simplification can reduce cost&improve yieldStandardCellSRAMAnalog/IOCommonDefect densityDie sizeSoC common process windowSRAM chipletSoC products1919 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3D

38、IC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceChiplet Process Optimization by ApplicationsHPC vs.MobileDynamics vs.static powerClient CPU vs.Server CPUPeak vs.throughput perfCPU vs.GPUMetal stack optimizationClient CPU Over driveNetworking,serverModerate to low

39、VddMiningExtremely low VddN3XHigh Performance where it matters Transistors optimized for high performance over-drive conditions Selective use of N3X std cell to speed up critical paths while minimizing impact to leakage at product level.20 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Techn

40、ology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Conference0.20.61.01.41.82.2RELATIVE DIE COSTSOC DIE SIZE MMSoC die costChiplet total costChiplet vs.MonolithicBy breaking up large SoC,chiplet at smaller die size enjoys bette

41、r yield&lower costChiplet vs.monolithic product choice needs to be carefully balancedCross over point depends on factors such as cost components,defect density,harvesting yield,etcChipletSoCLower package costLower D0Higher redundancy/harvest21 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging T

42、echnology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceAdvanced Packaging Technology22 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package

43、 by 2030 2024 IEEE International Solid-State Circuits ConferenceTSMC 3DFabricTMTechnology PortfolioSoIC:System on Integrated Chips3D Si Stacking PoP:Package on Package;RDL:Redistribution Layer Advanced Packaging CoWoS Si Interposer(CoWoS-S)RDL Interposer(CoWoS-L/R)TSMC-SoIC SoIC-P(Bumped)SoIC-X(Bump

44、less)InFOInFO-PoPInFO-2.5DInFO-3DCoW,Pitch:18-25mSoIC-X-C(CoW)Pitch:4.5-9mSoIC-X-W(WoW)Pitch:3mSource:TSMC23 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Confere

45、nceTSMC-SoIC SoIC-P(Bumped)SoIC-X(Bumpless)CoW,Pitch:4.5-9m mmWoW,Pitch:3m mm3D Si Stacking SchemesCoW,Pitch:18-25m mmSoIC-P-RB(With RDL)Chip1Chip 2Chip 2Chip 1Chip 2Chip 3Chip 1Chip 2 Chip 1Chip 2Chip 3SoIC-P-F(Without RDL)SoIC-X-C(Chip on Wafer)SoIC-X-W(Wafer on Wafer)Source:TSMC24 of 49ISSCC 2024

46、-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceCross-Chip Interconnects Improve ThroughputBW density/energy(Tbps/mm2/pJ/bit)40mNext-bump Pitch25m9m6m11010010001000025mD

47、2D Routing Length(mm)1010.10.01Advanced Packaging(2D/2.5D)Chip Stacking(3D)NextBondPitchNext-bump Pitch25 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Conference

48、TechnologyCoWoSSoIC-XStructure(Cross-Section)Pad/uBump pitch(um)40964.5Density1.0 X20 X45 X80 XMax Areal Bandwidth Density(GB/s/mm2)1.0 X45 X75 X180 XInterconnect Energy Consumption(pJ/bit)*1.0 X 0.10 X 0.05 X 0.05 XD2D Interconnect Comparison(2.5D vs 3D)*The interconnect energy efficiency includes

49、only the die-to-die interconnect 4Gbps.It does not include the energy consumption of the physical layer circuits.F2BF2FSource:TSMC26 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Soli

50、d-State Circuits ConferenceSystem Integration with TSMC 3DFabricTMSoC/ChipletHBMsSoC/ChipletHBMs2.5D:CoWoSChip AChip B2D:InFOSoC 1SoC 2 3D:Micro Bump3D:SoICSource:M.Liu,Unleashing the Future of Innovation,2021 ISSCCInFO:Integrated Fan-OutSoIC:System on Integrated ChipsChip 2Chip 1Chip 2Chip 327 of 4

51、9ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSource:D.Yu-Foundry Solutions for 2.5D/3D Integration,ISSCC 2021 Higher Integration,Compact Electronic Systems

52、Die on DieDie-to-DieOn InterposerFunction per FootprintSystem PerformancePKG to PKGOn Board3D+2.5D2D3D+2.5D3DChip 1Chip 2Chip 328 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-S

53、tate Circuits ConferenceD2D Interface ESD Roadmap and Design ImprovementFor D2D ESD requirement,industry is moving toward enhanced package process control for switching power reduction and D2D interface density increaseESD Cap(fF)Package-IO ESDuBump-IO ESDCDM 50VCDM 30VSoIC-bond IO ESDCDM 10VCDM 5VE

54、SD Area(um2)FutureRef:Roadmap of CDM target to D2D interfaces,JEP196 CDM 250VIndustry spec29 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferencePlatform Leve

55、l Solutions-Specialty Technology Optimization30 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferencePackage Is The New SoCTechnology+Advanced PackagingAlready

56、 in useUnder development2.5D+3D integrationSpecialty forInterconnect SpeedHBMLogicMemory+LogicSRAMLogicMemory BandwidthSHDMiMPadMetal RoutinDTC in InterposerLogicCap over ActiveSpecialty for Power Delivery+Die PartitionSoC SoC SoC3D LogicLogicIntegrationPICLogicLogicOptical Engine on substrateVR int

57、egrationLogicLogic VR Memory+LogicDRAMLogicHolistic System Level Optimization31 of 49 2024 IEEE International Solid-State Circuits ConferenceMemory Bandwidth-A Limiter to System ThroughputSource:H.-S.P.Wong et al.,DAC,20201020062008201020122014201620182020Year10101.56x/2 years1.81x/2 yearsNormalized

58、 Logic ThroughputNormalized Memory Bandwidth32 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSource:SK Hynix,TSMCGDDR6XHBM3E3D Stacking DRAMDRAM Bandwid

59、thMemory Bandwidth Improvement by 3D Stacking33 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSwitch Bandwidth Doubles Every 2 yearsSource:Broadcom pres

60、s releases;Co-Packaged datacenter optics:Opportunities and challenges,Cyriel Minkenberg etc.;TSMC 512x100G5nmFaster lanes51.2T202234 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Soli

61、d-State Circuits ConferenceIncreasing Power%Contribution from SerdesCPO(Silicon Photonics)SwitchRelative power contribution of SerDes to total switch ASIC powerSource:Co-Packaged datacenter optics:Opportunities and challenges,Cyriel Minkenberg etc.35 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Pack

62、aging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits Conference1/5X PowerTransmission PowerCu(10m)1XFiber(Km)1/3XCPO(Km)100mmCu cable100mmFiberCu interconnectSource:TSMC10 mmFiber interconnectCo-Packaged Optics36 of 49

63、ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceCo-Packaged Optics-Benefit of 3DSide-by-Side OE w/MCMCPO w/COUPEASICEICPICCPO PerformanceMCMw/uBump OECoWoS_S w

64、/COUPEOE-ASIC LinkLink Length(mm)51W/S(um)22/440.4/0.4Routing density1.0 X80 XBW Density1.0 X37.6 XSystemEnergy Consumption1.0 X0.19 XHBMASICPICEICPICSource:Douglas Yu,et.Al,2021 IEDM Invited PaperCOUPE:(Compact Universal Photonic Engine)37 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Tech

65、nology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceGen 1:Data Center Power Architecture(Old Design)Source:Next Generation of Power Supplies by Fred C Lee,Virginia Tech,https:/cpes.vt.edu/library/download/31672Transfo

66、rmer Cable480V AC 3 Phase38 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceGen 2:Data Center Power Architecture(New Design)Source:Next Generation of Powe

67、r Supplies by Fred C Lee,Virginia Tech,https:/cpes.vt.edu/library/download/3167239 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceGrid to 48V AC/DC48 to

68、12/6VDC/DC module1.8V to 0.7VMulti-phase IVR1.8V0.7/1V48V12V/6V12V/6V to 1.8VMulti-phase DC/DC208V ACGrid to 12V AC/DCCPU/GPU(0.7V/1V)12V to 1VMulti-phase DC/DC12V208V AC0.7V-1VConventionalNew GenerationPower Delivery Network for HPC/Data CenterNewNew40 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and P

69、ackaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferencePower ICPMIC Chiplet in LBGA Power modulexPULast Stage VR Chiplet Modular VR Chiplet to Main Processor Going VerticalPower ICInductorCost EffectiveFlexi

70、bleIntegrated VR(IVR)Chiplet inside CoWoSReduce input current at the ballVR41 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceTechnology Optimized Chiplet

71、sCPUGPUNPUxPUIODSRAMChiplet with Advanced TechnologyChiplet with Specialty Technology3DIC System-in-Package SolutionAdvanced PackagingCoWoS or equivalentInFO or equivalentSoIC(CoW or WoW)MemoryInterconnect/serdesSilicon PhotonicsCapacitor chipletIVR chipletGaN chiplet42 of 49ISSCC 2024-Forum F1.1:Ad

72、vanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceDesign Enablement From 2D to 2.5D/3D43 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion

73、 Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceIncreasing Design Complexity-2D to 2.5D/3D Tightly-coupled package/chip co-design flow for faster design convergenceRequires strong EDA/IP ecosystem collaborationSerial IO+ESDDie-to-Die SI/PI,STA/DFTPar

74、allel IO+ESDDie-to-Die STA/DFTPDN and decapco-design,sign-offThermal-aware designKGDKGSKGPTraditional 2D SoC/Package Design FlowHBMHBM2D2.5D/3DSource:YJ Mii,2022 VLSI Symposium44 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-i

75、n-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceIntegrated Chip/Package Design Co-OptimizationDesignVerificationPackageDesignVerificationChipDesignVerificationChipDesignVerificationChipSystemPartitioningSystemIntegration VerificationModularization Simplifies Design Flow for

76、All Package TypesAPRDRC/LVS/RCXSI/PI/IR/EMDFTMulti-die Timing AnalysisThermalDie-2-Die InterfaceHierarchical Timing Analysis Mitigates Exponential Multi-die Process Corners Hierarchical Thermal Analysis Balances Runtime and AccuracySource:YJ Mii,2022 VLSI Symposium45 of 49ISSCC 2024-Forum F1.1:Advan

77、ced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceThe 3Dblox StandardModularized 3Dblox language constructs chiplet,interface,and connectionLanguage constructs designed to model all curren

78、t and future 3DIC structures Streamline EDA design flow and promote interoperability3DIC ComponentsDie InterfaceRDL InterfaceBridge Interface3Dblox Language ConstructsChiplet:Conn:Physical ConstructsConnection ConstructsList of physical Chiplet Chiplet3Dblox Full Stack RepresentationsFull Stack-Conn

79、ectionConn1:Path1:Path AssertionsFull Stack-PhysicalFull Stack-ConnectionChiplet 2Chiplet 3Chiplet 1Chiplet 2&3Chiplet 1&3Chiplet 1 Chiplet 2Conn2:https:/3dblox.org/newscenter46 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in

80、-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceOne Format,Multiple ProductsOne 3Dblox representation for all downstream analysisReplace hundreds of repetitive codes for each toolsSupport from all EDA vendors creates unified design eco system3Dblox is open to allEDA1EDA2EDA3E

81、DA4EDA53Dblox3D PDN3D ThermalThermal TF3D STA3D DRC/LVS3Dblox47 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSystem Integration is the Future2.5D InFO2

82、.5D CoWoS3D SoICSystem-level IntegrationMore Memory Emerging MemoryStacked SRAM2D ShrinkDTCO3D TransistorMore TransistorsCV2Thermal ManagementEnd-to-end OptimizationHere!Source:YJ Mii,2022 VLSI Symposium48 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trilli

83、on Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceSummaryAdvanced CMOS process and packaging technologies for logic and memory integration are already deployed today to enable growing datacenter and AI market demand.Emerging heterogeneous integration

84、 solutions,such as memory/logic chip stacking,silicon photonics Co-Packaged Optics(CPO),and integrated voltage regulators(IVRs)can enable Trillion transistor 3DIC System-in-package by 2030.To achieve faster design convergence,the industry needs a tightly-coupled package/chip co-design flow.3DBlox is

85、 receiving growing support from EDA vendors.49 of 49ISSCC 2024-Forum F1.1:Advanced CMOS and Packaging Technology for Multi-Chiplet and Trillion Transistor 3DIC System-in-Package by 2030 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:50 of 16Please Scan to Rate Please Sca

86、n to Rate This PaperThis Paper1ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024 Forum 1.2Samuel NaffzigerAMD2ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceOutline Fundamental drivers of modula

87、r chiplet designs Chiplet interconnect classes and metrics Examples from the AMD chiplet product portfolio Organic package based chiplets Advanced packaging chiplet architecture 3D stacked chiplets Key levers for the future3ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State

88、 Circuits Conference1.000.740.560.430.330.310.3100.20.40.60.812018202020222025202820312034-2 4 645nm32nm28nm20nm14/16nm7nm5nmNormalized Cost/yielded mmIncreasing Cost/mm2SoC Scaling Flattening Key Challenge:Economics of Si Scaling4ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Soli

89、d-State Circuits Conference0.11101002000200520102015202020255/3nm7/10nm16/14nm20/22nm32nm45nmModular Chiplet Architectures and DSA EssentialDomain specific AcceleratorDomain specific AcceleratorEfficiencyGeneral Purpose CPU/GPUApplication Space1Slowing of Moores Law Scaling trends and benefits of do

90、main-specific compute make chiplets essential5ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceBASICS OF CHIPLET ECONOMICS6ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceMonolithic Die ManufacturingWafer7

91、ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXXXXXXXMonolithic Die Manufacturing8ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceYielded ProcessorsMonolithic Die Manufacturing9ISSCC 2024-Forum 1.2:AMDS

92、amuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceWaferHigh-level Chiplets Concept10ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXXXXXXXHigh-level Chiplets Concept11ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE Internationa

93、l Solid-State Circuits ConferenceMore Yielded ProcessorsHigh-level Chiplets Concept12ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXX/2X/2Silicon cost is non-linearwith die areaChiplet Cost13ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE Internat

94、ional Solid-State Circuits ConferenceX/2X/2Chiplet Overheads Inter-chiplet communication interfaces Per-die functionality14ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference Architectural design effort,partitioningX/2X/2 X Inter-chiplet communication int

95、erfaces Per-die functionalityChiplet Overheads15ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceAdvanced Packaging Interconnects Short reach 2mm Requires closely matched placements Wide lower speed links Low energy 0.6pJ/bitANCHORChiplet Interconnect O

96、rganic Package Interconnects Enables chiplet spacings up to 25mm Most flexible placements and die sizes Narrow High speed Serial links Link energy Architectural need for bandwidth,die partition options and package technology create a multi-disciplinary optimization equationChiplet package architectu

97、re selection requires balancing a complex equation20ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceImproving Key ParametersLinear Interconnect Density(Wires/mm/layer)Area Interconnect Density(Wires/mm2)101000100000110100100010000Highest Performance,Lo

98、west Power,and Area2D MCM 3D Chiplets2.5D Si INT,EFBDriving High-Performance Computing Forward21ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceORGANIC PACKAGE-BASED CHIPLETS22ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State

99、 Circuits ConferenceEPYC Server CPU Example Great cost benefit vs.monolithic0.00.51.01.52.064 Cores48 Cores32 Cores24 Cores16 CoresNormalized Die Cost7nm CCD*+12nm IOD*Hypothetical Monolithic 7nm Two tape-outs for full stack Linear cost with core count Makes 64 cores possible Full memory and IO*CCD:

100、CPU Complex Die,IOD:I/O Die23ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceLeveraging Technology Across MarketsCCDCCDCCDCCDCCD CCDCCD CCDDDRDDRDDRDDRI/O I/OI/O I/ODDRDDRDDRDDRI/O I/OI/O I/OIODI/O I/ODDRDDRCCDCCD Up to 16-core desktop Direct IOD IP le

101、verage CCD Reuse24ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet Benefits for AMD Ryzen Processors0.00.51.01.52.02.516 Cores8 CoresNormalized Die CostChiplet(CCD+cIOD)Hypothetical Monolithic 7nmI/O I/ODDRDDRCCDCCDcIOD=client I/O Die25ISSCC 202

102、4-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet ModularityServerIODCCDCCDCCDCCD2nd-Generation3rd-Generation26ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet ModularityServerIODcIODX570ChipsetCCDCCD

103、CCDCCD3rd-Generation2nd-Generation3rd-Generation27ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceADVANCED PACKAGING CHIPLETS28ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceTraditional MonolithicEPYC CP

104、U Server100s of signals“Navi21”GPU10s of 1000s of signalsChiplet Technology:Applied to GPUs Chiplets enabled use of advanced nodes where they benefit CPU performance but mature nodes for IO and interfaces High speed organic package links meet CPU Bandwidth requirements GPU shader engines require mas

105、sive amounts of connectivity compared to CPUs A different approach is required29ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet Technology:A Better Way to Partition“Navi21”“Navi31”Memory InterfacesAMD InfinityCacheGraphicsEngineGCDMCDMCDMCDMCDM

106、CDMCD The graphics engine is what benefits from advanced N5 technology AMD Infinity Cache critical to performance but barely shrinks into N5 GDDR6 interfaces are also large and wont shrink at all Split those poorly scaling components off as a chiplet and shrink the GFx core into N5 Full N5 performan

107、ce,better yield for perf/$and configurabilityMCD=Memory Cache DieGCD=Graphics Compute Die30ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference0.0 x2.0 x4.0 x6.0 x8.0 x10.0 x12.0 x14.0 xCPU Chiplet BWMCD required BWTotal Bandwidth“Navi31”GCDMCDMCDMCDMCDMCD

108、MCDHow to Connect the Chiplets?GCD-MCD partitioning is great,but the bandwidth requirements are still extremely high Over 10X what a CPU CCD requires in EPYC Breakthrough Advanced packaging and a new interface is required:High Performance Fanout and Ultra Short Reach(USR)links31ISSCC 2024-Forum 1.2:

109、AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceDie-to-dieFan out routingMCDGCDUSR GCD-MCD ConnectivityBandwidth Density USR Links,operating at 9.2Gb/s with High Performance Fanout provide almost 10X the BW density of the IFOP links used in Ryzen and EPYC Enables industry-

110、leading peak bandwidth of 5.3TB/s0.0 x2.0 x4.0 x6.0 x8.0 x10.0 x12.0 xOrganic PackageLinksUSR 2.5D LinksBandwidth DensitySEE ENDNOTE RX-81732ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference25 wires on organic substrate compared to 50 wires on High Perf

111、ormance FanoutOrganic substrate High Perf FanoutHigh Performance FanoutImages approximately to scaleInterconnect33ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference0.0 x0.2x0.4x0.6x0.8x1.0 x1.2xOrganic Package LinksUSR 2.5D LinkspJ/bitUSR Link Power Effi

112、ciency USR Links are engineered for low voltage operation and aggressive clock gating for low power Save up to 80%energy per bit relative to organic package links Result:3.5TB/s effective bandwidth for less than 5%of GPU power consumption34ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE Internati

113、onal Solid-State Circuits ConferenceMemory Latency0.0 x0.2x0.4x0.6x0.8x1.0 x1.2xNavi21Navi31 at Navi21 FrequencyNavi31DRAM accessInfinity Cache“Navi31”GCD-10%USR Link Latency The USR chiplet interfaces costs a modest amount of latency vs.on-die We eliminate this latency with higher clock rates Base

114、Infinity Fabric clock by+43%GFx game clock+18%Common case of Infinity cache hit is 10%lower latency on“Navi31”35ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference54%Up toPerformance per WattGCDMCDMCDMCDMCDMCDMCD“NAVI 31”with Chiplet ArchitectureGPU Chipl

115、ets:Summary Chiplet architecture with advanced packaging is the future,AMD leveraged our leadership chiplet expertise to deliver the first chiplet-based gaming GPU Massive 5.3TB/s bandwidth with innovative USR Links on High Performance Fanout Negligible overheads in latency and power enable leadersh

116、ip performance/WattSEE ENDNOTE RX-81736ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference3D STACKED CHIPLETS37ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference Dedicated accelerator engines for AI and HPC 3.

117、5D packaging with 4thGen AMD Infinity architecture Optimized for performance and power efficiencyAMD CDNA 3Next-gen AI Accelerator Architecture38ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceBandwidth Drives AI Performance Server and gaming require h

118、igh bandwidth but leading-edge AI is at another level Power efficient delivery of these bandwidth requirements demands new approaches2.5D interconnect3D interconnect665GB/s2.4TB/s3TB/s2.1TB/s0.0TB/s2.0TB/s4.0TB/s6.0TB/s8.0TB/s10.0TB/s12.0TB/s14.0TB/s16.0TB/s18.0TB/s20.0TB/s0 W100 W200 W300 W400 W500

119、 W600 W700 W800 WGaming MemoryBWGaming cacheread BWAI memory BWAI cache readBWBandwidth+Power1RequirementsOff-package PwrUCIe-SP PwrUCIe-AP Pwr3D PwrBandwidth1.AMD Internal Analysis39ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference3.5D Packaging Motiva

120、tion0X10X20X30X40X50X60XOff-Package CopperOff-Package OpticalOn-PackageAdvanced Packaging3D StackedRelative Bits/Joule Key to power-efficient performance is tight integration Advanced 3D Hybrid bonding provides by orders of magnitude the densest,most power efficient chiplet interconnect Advanced 2.5

121、D enables more compute and HBM in a package Increased system-level efficiency40ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Conference3D Hybrid Bonding EvolvedAMD 3D V-Cache Technology Hybrid Bonding size:7 x 10 mm Logic die as base N7(X3D)on N5 base(CCD)die

122、Significant performance gains for desktop gaming and servers Up to 2.5TB/s vertical bandwidthAMD Instinct MI300 Accelerator Leverage integration and manufacturing learnings from V-cache Hybrid Bonding size:13 x 29 mm(0.45x reticle)Logic die on top enables improved thermals N5 XCD/CCD stacked on N6 b

123、ase die(IOD)Same 9 TSV pitch Up to 17TB/s vertical bandwidth41ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceMI300 Advanced PackagingCarrier SiXCDXCDIODDMYCarrier SiCCDIODCCDCCDDMYHBMHBMSilicon InterposerLGA padsLIDBSM+TIMIllustration purpose onlyBPMB

124、PVAdvanced 3D Hybrid Bonded Architecture compute density and perf/WAdvanced 2.5D Architecture for IOD-IOD and HBM3 integrationLarge module on substrate42ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceChiplet Reuse and Modularity Benefits ExemplifiedSa

125、me CCD for Genoa+MI300AToIOD“GMI”3D Interface to IOD Same CCD adapted to work for 4thGen EPYC CPUs and AMD Instinct MI300A 3D stack EPYC MCM uses“GMI”SerDes interface through package substrate AMD Instinct MI300A vertical stack uses dense TSV interface from IOD to CCD in two-link wide mode Dramatica

126、lly higher 3D signal density enabled virtually no die size increase with simple interface multiplexing 43ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceAMD Instinct MI300 Accelerator Modular ConstructionIOD R180IOD-MirrorIOD Mirror R180IODXCDXCDR180XC

127、DXCDR180XCDXCDR180CCDCCDR180CCDR180 Multi-variant(APU/XPU)architecture requires all chiplets to act as if they are LEGO blocks Many new construction and analysis tools needed to be developed to enable this capability Mirrored versions of the IODs enable symmetric construction44ISSCC 2024-Forum 1.2:A

128、MDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceXCDBPV fieldsCCDBPV fieldsTSV fields are highlightedFor CCDFor CCDsFor XCDsConnecting Chiplets in 3.5DMirrored Heterogeneous Chiplet Interfaces BPV:Bond Pad Via.The landing site on the stacked die that is aligned with TSV in I

129、OD IOD Supports 2 separate landing sites for CCD BPVs to enable IOD mirroring while CCDs can only be rotated(not mirrored)Similarly,XCD/IOD interface also had extra TSVs to support IOD mirroring(red circle)45ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits Confer

130、enceAMD Instinct MI300 AcceleratorBPVsTSVsuBUMPsHybrid BondedStacked Chiplet(SC)Base Chiplet(AID)Floorplan Power TSVs Power delivery to top die must support IOD mirroring XCD/CCD rotation(0 and 180 degree)Different stacked die(CCD and XCD)This placed new symmetry requirements on power grid Significa

131、nt advanced planning to ensure exact alignment of all power and ground TSV+BPVs XCDXCDCCDCCDCCD46ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceCPU-intensiveGPU-intensiveCPU+GPU balanced Memory-intensivePowerPower SharingCPU CCDsGPU XCDsAIDHBMGPU-inte

132、nsiveMemory-intensiveAMD Instinct MI300 AcceleratorPower Management and Heat Extraction Key to MI300 power efficiency is the ability to dynamically“slosh”power between fabric(IOD),GPU(XCD),and CPU(CCD)Massive HBM and Infinity Cache bandwidth can drive high data movement power in the SoC domain Compu

133、te capability can similarly consume high power Creates 2 types of extreme operating conditions-GPU-intensive and memory-intensive Both thermal and power delivery must support the full range careful engineering of TSVs and power map47ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International So

134、lid-State Circuits ConferenceSummary Technology trends are pushing the industry to heterogeneous domain specific computing Economics require small die and chiplet architectures Advanced packaging is the frontier for technology-architecture synergy enabling cost-effective,efficient designs Standardiz

135、ed interfaces and a chipletecosystem will unlock the potential The Road Ahead is Paved with ChipletsEfficiencyApplication Space48ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceRX-817:Based on AMD internal analysis,November 2022,comparing the published

136、 chiplet interconnect speeds of Radeon RX 7900 Series GPUs to Intel Ponte Vecchio GPU and Apple M1 Ultra.RX-817.Endnotes49ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferenceDisclaimerThe information presented in this document is for informational purposes

137、 only and may contain technical inaccuracies,omissions,and typographical errors.Theinformation contained herein is subject to change and may be rendered inaccurate for many reasons,including but not limited to product and roadmap changes,component and motherboard version changes,new model and/or pro

138、duct releases,product differences between differing manufacturers,software changes,BIOS flashes,firmware upgrades,or the like.Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMDassumes no obligation to update or otherwise correct or revise t

139、his information.However,AMD reserves the right to revise this information and to makechanges from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED AS IS.”AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT

140、 TO THE CONTENTS HEREOF AND ASSUMESNO RESPONSIBILITY FOR ANY INACCURACIES,ERRORS,OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.AMD SPECIFICALLY DISCLAIMS ANYIMPLIED WARRANTIES OF NON-INFRINGEMENT,MERCHANTABILITY,OR FITNESS FOR ANY PARTICULAR PURPOSE.IN NO EVENT WILL AMD BE LIABLE TO ANYPERSON FOR

141、 ANY RELIANCE,DIRECT,INDIRECT,SPECIAL,OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN,EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.Third-party content is licensed to you directly by the third party that owns the content and is not license

142、d to you by AMD.ALL LINKED THIRD-PARTY CONTENTIS PROVIDED“AS IS”WITHOUT A WARRANTY OF ANY KIND.USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NOCIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT.YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGE

143、STHAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.AMD,the AMD Arrow logo,CDNA and combinations thereof are trademarks of Advanced Micro Devices,Inc.Other product names used in this publication arefor identification purposes only and may be trademarks of their respective companies.2024 Advanced Mi

144、cro Devices,Inc.All rights reserved.50ISSCC 2024-Forum 1.2:AMDSamuel Naffziger 2024 IEEE International Solid-State Circuits ConferencePlease Scan to Rate Please Scan to Rate This PaperThis PaperDoes chiplets open the space for emerging memory in the HPC system?Sebastien Couet,Gouri Sankar KarImec,Le

145、uven,Belgium 2024 IEEE International Solid-State Circuits Conference 2024 IEEE International Solid-State Circuits ConferenceOutlineIntroductionCompute needs and bottlenecksChiplet approachInterconnect pitch scaling and advantageChiplet revolution in High Performance Compute(HPC)3D System Integration

146、 TechnologyOpportunity for emerging memoryComparative Analysis:3D interconnect req.3D Interconnect Tech.in Production and imec RoadmapMemory Road Maps and AlternativesMagnetic MemoryBEOL compatible Capacitorless(2T0C)e-DRAMDRAM Road Map&possibilities CXL MemoryFerroelectricOvonic Threshold Switch(OT

147、S)memory Summary2 of 44ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceCompute Needs For Machine Learning Continue to GrowISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?3

148、 of 44 2024 IEEE International Solid-State Circuits ConferenceDiversity of Applications and WorkloadsISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?4 of 44AR/VRLow power Ultra low latency High memory bandwidthSmall form factorAutonomous drivingMulti-sensor fusio

149、n Distributed real-time computation Reliable and explainable AIGPUs for TrainingHigh throughput parallel compute Very high memory bandwidthVery high GPU-GPU bandwidth 2024 IEEE International Solid-State Circuits ConferenceCompute Capability Improving Faster than Memory Interconnect Bandwidth5 of 44A

150、mir Gholami,et.Al.“AI and Memory Wall”,https:/ scaling a.u.CPU/GPU peak performance3.1x/2 yearsInterconnect bandwidth1.4x/2yearsISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference2D-SoC Chiplet Approach6 of 442D

151、-SoCChiplet-partitioningMemoryCoreCommunication Chiplet PhY blockISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference2D-SoC 3D-SoC7 of 442D-SoCMemoryCoreCommunication Multi-tier memoryLogic/SRAMstacking3D-SoCISSC

152、C 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceInterconnect pitch scaling and advantages8 of 44ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-S

153、tate Circuits ConferenceChiplet revolution in HPC 9 of 44New way of integrating and delivering High Performance Compute New possibilities for many technologies example,new memory technologies,3DI,optical interconnect etc.The idea behind chiplets is to break apart the system on a chip into its compos

154、ite functional blocks,or parts.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference3D System Integration Technology10 of 44Driven by the“Memory-wall”Memory logic partitioningNeed for High Bandwidth&low energy 3D

155、interconnectNeed for different technologies with different 3D integration densitiesArea-array interconnectLateral interconnectISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference11 of 44ApplicationsGaming,for exa

156、mple rapidly changing background texture,shapes etc.Modern graphics use ray-tracing,image enhancementArchitecture64 MB in Ryzen 9 7950X;128 MB in Ryzen 9 7950X3D,E-cores in V-cache run at slightly lower clock speeds trading-off high capacitiesMemoryMagnetic STT,SOT Good option for infrequently acces

157、sed memory(ex.storing weights etc.)IGZO 2T0C eDRAM Good option for frequently accessed memory(activation memory).1T1C FerroCapa Depends on endurance for embedded applicationOpportunity for Embedded Emerging MemoryHow will V-cache enable emerging embedded memory?1)Logic compatibility2)Cost;no need of

158、 expensive logic for memory,and expensive metals levels can be avoided3)Remove logic critical thermal budget/process limitationsISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference12 of 44Comparative Analysis:3D

159、interconnect req.Design Technology MetricsOptimal Memory Tier:L2+LLC/SLCSub-system partitioning scheme3D interconnects req.15-20KpinsArea/3D interconnects req.3D pitches required 15-10mFan-Out impact minimal due to 2D length replaced(mm)Optimal Memory Tier:L2+LLCSub-system partitioning scheme3D inte

160、rconnects req.40-50K pinsArea/3D interconnects req.3D pitches required 12-8mFan-Out impact minimal due to 2D length replaced(mm)Optimal Memory Tier:L1(data)+SMEM+L2Sub-system partitioning scheme3D interconnects req.750K-800K pinsArea/3D interconnects req.3D pitches required 4-2mFan-Out impact minima

161、l due to 3D pitchMOBILESERVERGRAPHICS/GAMINGISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference3D Interconnect Tech.in Production and imec Roadmap13 of 44Includes data from H.-S.P.Wong et al.,“A Density Metric f

162、or Semiconductor Technology”,Proc.IEEE,vol.108,No.4,2020.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?Memory Road Maps and Alternatives 2024 IEEE International Solid-State Circuits ConferenceMemory and storage roadmaps15 of 441.00E-061.00E-051.00E-041.00E-031.

163、00E-021.00E-012022202420262028203020322034Effective bitcell area um2NanosheetsClassical 2D scaling3D NANDSRAMDRAMAlternative 2D or 3D DRAMCFET2 to 4 Tiers,toward 1000 layersSeparate peri wafer,multiple array wafersISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?M

164、agnetic Solutions for LLC applications 2024 IEEE International Solid-State Circuits ConferenceMagnetic Memory Options For high density on/off chip e-NVMOK for all levels cache.GHz capabilityLike HD SRAMMarginal gain over HP SRAMGood.Read and write paths separatedResearchSPIN TRANSFER TORQUE(STT)MgO

165、barrier can be tuned for reliabilitySPIN ORBIT TORQUE(SOT)VOLTAGE CONTROL MAGNETIC ANISOTROPY(VCMA)MTJ optimized for current-basedoperation,low RMTJ optimized for voltage-basedoperation,higher RPerformanceOK for last level cacheCost,area50%gain over HD SRAMPowerBig gain at system level with use of n

166、on-volatilityReliabilityDifficult for high endurance specs(failing tail bits)Maturity ReadyTo be defined50%gain over HD SRAMVoltage controlled,lowest powerProbably OK(low current device)Exploratory VOLTAGE GATED SPIN ORBIT TORQUE(VGSOT)(SOT)OK for all level cache.GHz capability60%gain over HD SRAMBi

167、g gain over HP/HD SRAMGood.Read and write paths separatedResearchSingle transistor in write path,cell selectivity through voltage gateISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?17 of 44 2024 IEEE International Solid-State Circuits ConferenceSTT-MRAM18 of 44C

168、urrent statusEmbedded flash replacement in MCU,.(eflash scaling wall)Large non-volatileembedded cachefor Edge-AI,.Large capacitymemory cache,possibly as chipletIn ProductionN28 to N14N14 to N7/5E-flash specHigh Cache-like specHigh endurance(1012)5-10ns latencyIn R&D/on theroadmapNext:automotiveWhat

169、could bebenefit?ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceSTT-MRAM19 of 44Potential value proposition as a chipletSRAMMRAM3X Typically 3X density gain at bitcell level vs SRAM,independent of node Reduce

170、 to 2X gain due to larger control peripheryS.Sakhare,imec,IEDM 2018 Write power cross-over is around 5MB Significant improvement in read latencydue to the WL length reductionISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circu

171、its ConferenceWhat comes next after STT?20 of 44SOT-MRAM*K.Garello et al.,VLSI,81-82(2018)*S.Couet et al.,VLSI(2021)*M.Gupta et al.,IEDM,24.5.1-24.5.4(2020)*K.Garello et al.,VLSI Circuit,T194-T195(2019)Key features:3-terminal deviceWrite:Spin currentRead:TMR readoutSeparated paths for read&write=Bet

172、ter endurance(1015)Sub-nanosecond switching*=High speed NVM cache-like memoryBEOL compatible*Challenges:Density(two transistors/bit),comparable to SRAM cell size*High switching current Field-free switching:magnetic hard mask*ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the H

173、PC System?2024 IEEE International Solid-State Circuits ConferenceVoltage gated SOT deviceMultipillar with individual pillar selectionVG-SOT Concept2T1R (n+2)TnRLess transistors=smaller cell size possible21 of 44Multi-pillar schematic&integrationISSCC 2024-Forum F1:Do Chiplets Open the Space for Emer

174、ging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceSelectivity demonstration-WERK.Cai et.al.,2022 VLSIStatistic measurements:Individual Pswof two bitsJoint Pswof two bits(12):Working windowlow switching currentISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerg

175、ing Memory in the HPC System?22 of 44 2024 IEEE International Solid-State Circuits ConferenceDensity scaling:SOT-footprint A14 node:SOT-1S1MTJ=footprint-50%w.r.t.6-track SRAM;on par with A5 SRAM.SOT-1S1MTJ iso-bit density as VGSOT-2MTJ=4 tracks/bit,without invoking VCMA.SOT-1S1MTJVGSOT-2MTJ:A14 with

176、 CPP=42 nm,MP=18 nmMRAMs on imec SRAM scaling roadmapISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?23 of 44 2024 IEEE International Solid-State Circuits ConferenceWhat comes after STT?24 of 44Present status of best-in-class SOT deviceStill many challenges need

177、to be addressed:Cell size,energy efficiency,field free switching,etc.Kaiming Cai,Imec,IEDM 2022ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?BEOL compatible e-DRAM 2024 IEEE International Solid-State Circuits ConferenceIGZO-based DRAM enables long retention wit

178、h low storage capacitance 26 of 44 Thanks to the extremely low IOFFof IGZO TFTs tret10s with CS=1 aF CScan be as low as Coxof scaled transistors 2T0C configuration proposedRetention loss in DRAM cells is mainly driven by IOFF of access transistor2T0C configurationWtrWWLWBLSNRBLRWLRtrCox,Rtr2T1C conf

179、igurationBelmonte et al.,IEDM 2020ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits Conference2T0C configuration:Layout&TEM27 of 44Wtr=write transistorRtr=read transistorAl2O3SiO2IGZOAl2O3SourceGateDrainSourceGateDrainSi

180、 SubstrateWBLSTORAGE NODE(SN)RBLWWLRWLWtrWWLWBLSNRBLRWLRtrCox,RtrChallenges:Disruptive technology H sensitivity of IGZO ReliabilityKey features:Capacitor-less 2T cell Gain cellComplete BEOL solutionLong retention low refresh powerISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in

181、the HPC System?2024 IEEE International Solid-State Circuits ConferenceElectrical characteristics of RIE-patterned devicesLG=25nmFull wafer 138 dies100%yield for LG=25nmION 10 A/mUniformity across the waferSingle transistor2T0CIOFF 4.5 hours corresponding to IOFF 310-21A/m is achievedA.Belmonte et al

182、.,VLSI 2023ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?28 of 44 2024 IEEE International Solid-State Circuits ConferenceWBLWWLtwrite1011write cycles demonstrated with twrite1011write cycles achieved with twrite104)Simplified integration&scaling Low operating c

183、urrent(15mA)Low power Fast operation(20 years old technology,still in production PZT based material system,scaling was most challenging Recent breakthrough in hafnia based ferroelectric material research,thickness scaling was possibleOperating field is in the(MV/cm)Challenges:Destructive read,cyclin

184、g(1E12),2Pr40uC/cm2 in 3D capa,and write voltage 1V etc.1 Tahara et al.,VLSI,2021.2 Kozodaev et al.,JAP,2019.3 Kim et al.,Adv.Electron.Mater.,2021.4 Fu et al.,IEDM 2022,2022.M.Popovici,imec,IEDM 2022.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE Inter

185、national Solid-State Circuits ConferenceRecent breakthrough in high density FeRAM42 of 44Micron,IEDM 2023 Excellent stackable solution with poly-silicon channel select transistor and cylindrical ferro capacitor.But it does not help to increase the density significantly over DRAM and not an efficient

186、 bit-cost scaling solution.Required real 3D solutions.ISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?Summary 2024 IEEE International Solid-State Circuits Conference2.5D,3D,chiplet opening the door for emerging memory solutionsEmerging memory and their challenges

187、 Summary44 of 44ApplicationTopicChallengesLLCSTT-MRAMSwitching current,foot-print,tail-bit,and costSOT-MRAMSOT track(Isw100uA),field free sw,high density bit-cellBEOL eDRAMIGZO reliabilityFeRAM1e14 cyc)Memory3D DRAM4F2 vs.3D,Which channel?Epi vs.Oxide Semiconductor3D FeRAMWhich cell architecture?1T1

188、C vs.1TnC;2Pr50uC/cm2,1e12 cyc.OTS memoryMaterial research to deliver performance and reliability,ALD OTS for 3D memory1S1MTJ(MRAM)Finding selector and high-density patterning3D FeFETMany competitors for same applicationMLFeRAM(NDR)Window 20,1TnC,NDRStorage3D NANDGate stack with ferro layer to reduc

189、e Vprog by 2VAirgap,Nitride cut.Trench cell with window 10VISSCC 2024-Forum F1:Do Chiplets Open the Space for Emerging Memory in the HPC System?2024 IEEE International Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:45 of 16Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE I

190、nternational Solid-State Circuits ConferenceIn-Memory Computing Chiplets for Future AI AcceleratorsEchere Iroaga()ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators1 2024 IEEE International Solid-State Circuits ConferenceOutline2 AI Deployment Trends In-memory Computing(IMC

191、)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceTrend 1:AI Model SizeAlexNetResNet-50ResNet-101ResNet-152TransformerGPT-1BERT-LargeMegatronGPT-2GPT

192、-31101001,00010,000100,0002010201220142016201820202022Giga-OperationsYearAlexNetResNet-50ResNet-101ResNet-152TransformerGPT-1BERT-LargeMegatronGPT-2GPT-3101001,00010,000100,0001,000,0002010201220142016201820202022Millions of ParametersYear10,000 X10,000 X Model sizes are large and increasing,driving

193、 the number of operations required3ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits Conference AI is needed across a wide range of form factors/compute capabilitiesSmart Phones/Mobile devices (MobileNet,ResNet,Gemeni Nano,Llama

194、 2B.)Client devices/Laptops (MobileNet,ResNet,Llama 13B,Stable diffusion,ViT.)On-prem servers (Llama 70B)Hyperscale/Cloud Datacenters (GPT-3.5/4)Chiplets enable scalability across scaling form-factors&Model sizeTrend 2:AI Platform Diversity4ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Futur

195、e AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceLearning from Todays Multi-Chip LLM Execution Large Models(and inference artifacts)dont fit into GPU memory driving the need for multi-GPU inference solutions.GPT-3.5/4 Inference needs 18 H-100(SXM5)GPUs purely from a memory pe

196、rspective for each group of inferences(Batch size lots of data Energy in processing engine Data access bit at a time Data movement Eliminated Energy in processing engine Drastically reduced data access Access compute result over many bits Data movement Eliminated Eliminate energy in processing engin

197、e Analog processing inside memoryISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceCompute/data-movement energies10MULT(INT8):0.3pJMULT(INT32):3pJMULT(FP32):5pJMULT(INT4):0.1pJMemory Size()Energy per Access 64b Word(pJ

198、)1MB(45nm technology)Data-movement costs are significant relative to compute costs!ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceTodays Digital Accelerators for Maximal Re-use11Data reuse is critical to addressing

199、data-movement costsMotivates spatial architectures=In-memory is well suited for thisMVM(=):=,Spatial Architecture:Reuse 60-200(32kB buffer)MemoryBoundComputeBoundAmount of Data Re-useISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circ

200、uits Conference10X higher Efficiency makes Memory the bottleneck.D.Bankman,ISSCC18Insufficient to address compute costs without addressing data-movement costsCompute engineMemoryMany Neural Networks(e.g.,Conv.Nets.)101012ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 20

201、24 IEEE International Solid-State Circuits ConferenceIn-memory computing(IMC)13,BitBit-cell arraycell array+ADCADC,MVM(=):=,Systolic ArrayIMC IMC maximizes 2D reuse via dense processing engines(Bit Cells)Spatial ArchitecturesISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerator

202、s 2024 IEEE International Solid-State Circuits ConferenceFundamental IMC trade-off:SNR14Memory(D1/2 D1/2array)ComputationMemory&Computation(D1/2 D1/2array)D1/2TraditionalIMCMetricTraditionalIn-memoryBandwidth1/D1/21LatencyD1EnergyD3/2 DSNR11/D1/2 Consider:accessing bits of data associated with compu

203、tation,from array with columns rows.IMC benefits communication&computation energy/delay at cost of SNRIMC tradeoff is controlled by row parallelismISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceWhat about digital IM

204、C?15Reduction by in-memory digital-logic adder treeY.-D.Chih,ISSCC21Bit-cell area:0.379 m2Macro area(64 kb):202,000 m2Bit-cell array just 11%of macro area!Tech.NodeArchitecture4-b TOPS/W4-b TOPS/mm2Adv.22 nmDigital accel.20121.2X-2.5XDigital IMC150155 nmDigital accel.40801.9XDigital IMC2751501Y.-D.C

205、hih,ISSCC21,2H.Fujiwara,ISSCC22Digital IMC reverts to digital acceleration Advantage from custom implementation/layoutISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceBest digital(7nm)N.Shanbhag,OJ-SSCS22Where does an

206、alog IMC stand today16Low-SNR IMC(22nm)high-SNR SC IMC(28nm)Low-SNR IMC(22nm)A.Papistas,CICC21Noise sigma is 0.43LSB of 6-bADC,for onecolumn at single temp.MVM output 6-b ADCHigh-SNR SC IMC(28nm)J.Lee,VLSI21Noise sigma is 0.3LSB of 8-bADC,for 256columns across temp.Error bars show sigma across 256 I

207、MC columns256-column overlayMVM output 8-b ADCIMC enables 10X higher efficiency(and throughput)than digital.But SNR trade-off poses the critical limitation today!ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceOutlin

208、e17 Large Scale AI:Properties and Trends In-memory Computing(IMC)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.1:SRAM-based binarized IMCVBIAS,

209、OVBIAS1x2x16xMA,R MD,R CLASS_ENX0X1X4WL_RESETWLXOffsetBLBLBMA MD Bit-cell replicaI-DACIMC ModeSRAM Mode J.Zhang,VLSI16J.Zhang,JSSC175-b WL DAC:00.10.20.30.4WL Voltage(V)Time(ns)012345X=5b 00001X=5b 11111X=5b 000000.020.040.06WLDAC CodeVBL(V)05101520253035Ideal transfer curveNominal transfer curve Bi

210、t-cell and peripheral circuitry limits SNR18ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.2:FLASH-based IMCInner-product Accuracy=,Accumulation=,=,X.Guo,IEDM17 Flash variation limits SNR,output TIA degrades powe

211、r efficiency19ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.3:RRAM-based IMC20Two CellsReRAM ArrayADCs(3b)&muxesS.Yin,T-ED Oct.2020 Low Bit-Cell SNR requires high-sensitivity readout,degraded AreaSignal(4x)Noise

212、TSMC 40nm C.-C.Chou,ISSCC18RRAM Cell SNRISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.4:MRAM-based IMC P.Deaville,VLSI Symp.2022 Low Bit-Cell SNR requires high-sensitivity readout,degraded Area21Signal(2x)NoiseR

213、esistanceGF 22nm D.Shum,VLSI17MRAM Cell SNRISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceEx.5:Capacitor-based IMCADC Output Code(8b)25020015010050002004006008001000 1200Ideal Output Value(1152 inner dimension)H.Val

214、avi,VLSI Symp.2018J.Lee,VLSI Symp 2021TransistorsInterconnectCapacitor Precision BEOL capacitors provide high SNR enabling aggressive IMC efficiency and compute density.22CLmAm,1RSTVRSTIAN/IAbNIA1/IAb1diff.DACdiff.DACAm,NM-BCM-BCBLbmBLmWLnIAnIAbnAbnmAnmto ADCCLmwm,Nwm,1x1xNISSCC 2024-Forum 1.4:In-me

215、mory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceOutline23 Large Scale AI:Properties and Trends In-memory Computing(IMC)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Computing Chiplets for F

216、uture AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceArchitectural Considerations for IMC Systems Re-use IMC Spatial architecture SNR Capacitor Based INC Programmability Support for variety of operations Utilization Support for Parallelism D2D impact on future IMC systemsMust

217、 be addressed to enable wide-scale adoption of in-memory computing solutions!ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceNeed for programmability25Residual ConnectionsDepth-wise ConvolutionsDilated Convolutions W

218、ide variety of operations for inter-layer(dataflow)and intra-layer(convolutions).ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceRange of AI-model operations26B.Fleischer,VLSI18General Matrix Multiply(256 2300=590k e

219、lements)Single/few-word operands(traditional,near-mem.acceleration)MVM only 70-90%of operations IMC must integrate heterogenous architecturesISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceIMC utilization mapping cha

220、llenges27IMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC IMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC +Weight LoadingIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC Macro UtilizationIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMCIMC Bit-cell Utilization IMC is efficient but rigidNeed to watch ut

221、ilization to maintain efficiencyWeight loading(temporal utilization),macro and bit-cell utilizationISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceIMC utilization parallelism challenges Different forms of parallelism

222、=different overheadsData Parallelism(replication)Model Parallelism(broadcast)Pipeline Parallelism(pipelining)Weight-loading overheadNetwork communication overheadLatency overhead28ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuit

223、s ConferenceScalable dataflow IMC-architectureCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUCIMUOn-chip Network On-chip Network On-chip Network On-chip Network Segmented Weight BufferActivation BufferActivation BufferActivation BufferActivation BufferOff-chipControlPLLWeight NetworkWei

224、ght NetworkWeight NetworkWeight NetworkWeight NetworkWeight NetworkWeight NetworkWeight NetworkCompute-In-Memory Unit(CIMU)Compute-In-Memory Array(CIMA)Programmable Digital SIMDCompute and Dataflow BuffersProg ing&ControlOn-Chip Network(OCN)Network Out BlockNetwork Out BlockNetwork In BlockNetwork I

225、n BlockDisjoint Buffer SwitchDuo-DirectionalPipelined RoutingSwitch BlockCIMU Out PortFully-synthesized,pipelined routing segments Fully-disjoint switch blockConfigured via dedicated network1152256 IMC bank(CIMA),bit scalability from 1-8 b Programmable digital near-memory computing SIMDLocal bufferi

226、ng Local controlH.Jia,ISSCC21Dataflow architecture enables flexible optimization of parallelism(data/pipeline)29ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceD2D interconnect impact on future IMC Systems Efficient&

227、scalable workload mapping across chiplet systemsIMC architectures employ dataflow network of cores for flexible parallelismCompiler optimizations within IMC die translate across IMC dies,with proper heuristics applied for D2D-interconnect bandwidth and energy IMC compute density for enabling short-r

228、ange D2D interconnectsIn distributed execution,compute die TOPS optimized for attached memory BWHigh IMC compute density,enables smaller die and shorter chiplet interconnects Optimization with emerging memory technologyIMC must work with secondary memory(especially for large parameter models).Will r

229、equire optimization with next gen memory capacity/interconnect bandwidth.2024 IEEE International Solid-State Circuits ConferenceOutline31 Large Scale AI:Properties and Trends In-memory Computing(IMC)Basics IMC Macro Approaches Architectural considerations ConclusionsISSCC 2024-Forum 1.4:In-memory Co

230、mputing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceConclusions32Execution of large scale AI models require multi-chip execution compute die to die interconnect technologies are well suited for this.Efficient AI compute requires solving compute AND data

231、-movement bottlenecks IMC is distinctly suited for this.IMC instates fundament energy/throughput vs.SNR tradeoffs These drive macro technologies and approaches.IMC faces architectural challenges for programmability&efficient execution Parallelism must be addressed through specialized architectures.I

232、MC chiplet based systems are coming soon(commercially)Die to Die interconnect performance will drive optimizations across the system.ISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI Accelerators 2024 IEEE International Solid-State Circuits ConferenceFuture IMC Based Products on the Ho

233、rizon33Automotive/Industrial EdgeClient ComputingNPU Chipletsmulti-die ASICPCIe CardCompute ServersOn-prem Enterprise ServersCloud DatacentersISSCC 2024-Forum 1.4:In-memory Computing Chiplets for Future AI AcceleratorsChiplet powered scalable solutions from the edge to the cloudM.2 Card 2024 IEEE In

234、ternational Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:34 of 16Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024 Forum F1:Efficient Chiplets and Die-to-Die Communications1.5:Efficient Domain-Specific Compute with C

235、hipletsProf.Dejan MarkoviUCLA ECE Departmentdejanucla.eduISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets1 of 62 2024 IEEE International Solid-State Circuits ConferenceEvolving Standards:Flexibility&EfficiencyObjectives:lower development cost and shorter time-to-market SoC/ASIC r

236、evision/iteration is$($100M in 16nm CMOS)Long design cycles(1 yr)3with increasing design complexity1,2ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets2 of 62 2024 IEEE International Solid-State Circuits ConferenceSoCs Today=CPU/GPU+AcceleratorsMaltiel Consulting estimates4 Shao e

237、t al.IEEE Micro15Apple A12 die photo912172229A12201844Hardware accelerators(45%area)35A11Number of accelerator blocks in Apple APsA8A7A6A5A4ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets3 of 62 2024 IEEE International Solid-State Circuits ConferenceTwo ways to think about it Ad

238、d flexibility to accelerators Narrow coverage of DSPsThe how Interconnect Switch-boxes Sw toolchainOptimize for Efficiency and Flexibility1010.10.0011100.010.1Average Area Efficiency(GOPS/mm2)Average Energy Efficiency(GOPS/mW)ProcFPGACPUFPGA*DSPsDedicatedThis Work100*DSPs include CGRAs&FPGA-DSPs5-7I

239、SSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets4 of 62 2024 IEEE International Solid-State Circuits ConferenceEfficient Multi-Chip Module(MCM)Scaling Large SoCs incur higher costLower yield of larger chipsDelayed time-to-market Cost benefits of MCM scalingSmaller chips give bette

240、r yieldAMD 32-core chip(777mm2):1.0 x Cost4 x 8-core chiplet(4x213mm2):0.6x20(+)894%76%35%YieldG:10B:18T:28G:103B:33T:136G:620B:38T:658ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets1,600mm2400mm2100mm25 of 62 2024 IEEE International Solid-State Circuits ConferenceChallenges:Hig

241、h bandwidth density Low link latency Low energy transfer Low I/O areaChiplet size:Sweet spot:100mm2 UDSP prototype($limited):6mm2Challenges with MCM Design2x2 UDSP on Si-IFISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets6 of 62 2024 IEEE International Solid-State Circuits Confere

242、nce Domain-specific hardware acceleration ASIC-like energy efficiency and throughput Just-enough flexibility for a domain Key:flexible cores,efficient interconnect Tile-able chiplets on Silicon Interconnect Fabric(Si-IF)Develop scalable interconnects Near-range I/O and PHY for cutting-edge bandwidth

243、/latency/energy Low-area,portable timing correction circuits for Si-IF I/OsResearch AimsISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets7 of 62 2024 IEEE International Solid-State Circuits ConferenceUniversal Digital Signal Processor(UDSP)ArrayA 16nm 2x2 Chiplet with 10-m Pitch-I

244、/OUDSP Chiplet2-Layer Si-IF10-m I/O bump pitch9 U.Rathore,S.Nagi,S.Iyer,D.Markovic,ISSCC 2022.ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets8 of 62 2024 IEEE International Solid-State Circuits ConferenceUDSP Multi-Chip,Multi-Program TenancySNR-10 Link Vertical StackInactive Pro

245、gram(Soft Reset)Simultaneous Multi ProgramCross UDSP AlgorithmsProgram being ErasedControl&PLL2-Layer Si-IFUDSP Dielet10-m I/O bump pitchISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets9 of 62 2024 IEEE International Solid-State Circuits ConferenceCo-designComputeInterconnectI/O

246、channelCompilersPackageUDSP OverviewMemCore(1)Interconnect(2)RTRAMCM Assembly CompilerRTRAProgramming(5)Switchbox(3)I/O(4)ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets10 of 62 2024 IEEE International Solid-State Circuits ConferenceEvolution of UDSP Core24.5mm2(40nm)Slice L/MSl

247、ice L/MSlice L/MSlice LDSP-48,Slice L,BRAMSlice L/MSlice L/MSlice L/M64-8kFFT16-core UDSPFPGAInterconnectCHIP AREA10 C.C.Wang,et al.,ISSCC 2014.143Mtrans.Logic25%75%Logic25%25%Post-Proc.Pre-Proc.Path Selc.Path Selc.fastpathinterconnectdata mem.data mem.fastpathinterconnectShifter&Multiplier2014 Lewi

248、s Winner AwardUDSP coreISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets11 of 62 2024 IEEE International Solid-State Circuits ConferenceEfficiency and Flexibility in Comm.DSP1010.10.0110100.11001Average Area Efficiency GOPS/mm2Average Energy Efficiency GOPS/mWUDSP21CPUASICFPGA10 C

249、.Wang et al.,ISSCC 2014.11 F.-L.Yuan et al.,VLSI 2014.12 F.-L.Yuan et al.,VLSI 2015.D-CLASIC(v1)CLASIC(v2)UCLA FPGA:1.eFPGA interconnect2.Coarse-grain kernels UDSP based CLASIC designs Domain-specific for comm.DSPISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets12 of 62 2024 IEEE

250、International Solid-State Circuits ConferenceExample DSP kernels derived from common DSP algorithms Up/Down Conv.MIMO IFFF/FFT Neural Network Zero Forcing MMSE Vector-dot product MAC,FIR,EuclidianAlgorithm Ontology:Example DSP Kernels|2|2Lattice FilterFIR FilterRadix-2Mtx-MultVDP/BFZF/MMSEEDComplex-

251、MACCordicISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets13 of 62 2024 IEEE International Solid-State Circuits Conference16-bit fix-pt1.1 GHz Clk256b D-Mem384b I-Mem4 In,4 OutIterative Process of Core Design|2|2Lattice FilterFIR FilterRadix-2Mtx-MultVDP/BFZF/MMSEEDComplex-MACCord

252、icI-MemD-MemConnection AdjustMappingDSP KernelsUDSP Core v4.2 Balancing core granularity and core utilization to maximize energy and area efficiencyISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets14 of 62 2024 IEEE International Solid-State Circuits ConferenceDesign challenges En

253、ergy/area Flexibility Scalability Clk speedInterconnects:An Exercise in Co-DesignCoreInterconnectAlgorithmRoutingDSP ArrayCompilerISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets15 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-1 Interconnect(Distance=1)Vertic

254、al StackLayer 3 Switchboxes Layer 2 Switchboxes Layer 1 Switchboxes Bottom Layer of Cores4 x 16bDistance=ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets16 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-2 Interconnect(Distance=)Vertical StackLayer 3 Switchboxe

255、s Layer 2 Switchboxes Layer 1 Switchboxes Bottom Layer of CoresDistance=2 x 16bISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets17 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-3 Interconnect(Distance=2)Vertical StackLayer 3 Switchboxes Layer 2 Switchboxes Lay

256、er 1 Switchboxes Bottom Layer of CoresDistance=2 x 16bISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets18 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-4 InterconnectVertical StackCDF(Wire Distance)Distance from NodeFraction of WiresDistance from NodeFraction

257、of Wires0.650.60.750.850.80.70.950.9123457618Vertical StackISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets19 of 62 2024 IEEE International Solid-State Circuits ConferenceLayer-4 InterconnectCDF(Wire Distance)Distance from NodeFraction of WiresDistance from NodeFraction of Wires0

258、.650.60.750.850.80.70.950.9123457618Vertical StackRegistered Layer 4 SBLonger Distance RoutesLayer 4Vertical StackISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets20 of 62 2024 IEEE International Solid-State Circuits Conference Hyper-vector cross-correlation(HVCC)in each dimension

259、(layer)HVCC for a layer measures inter-dependencies of pathsN-Layer Switch Box:Hyper-Matrix ModelN-Layer Switch BoxI4I3I2I1O1O2O3O4M1M2M3M4N-D Hyper-Matrix RepresentationIMO1234N-D Hyper-MatrixN-Layer Switch BoxISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets21 of 62 2024 IEEE In

260、ternational Solid-State Circuits Conference MCBF&MCBF/HWC plotted against layer density for 3-layer SBDSE:Search Space TraversalDistance from NodeHW Cost(HWC)103542768100150200250350300509MCBF0.00500.0150.0250.020.010.0350.030.04HW Cost(HWC)10015020025035030050MCBF/HWCFully ConnectedCompiler Hw Area

261、 SparseCompiler Hw Area MCBF:Mean ConnectionsBefore FailureISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets22 of 62 2024 IEEE International Solid-State Circuits Conference Sw/Hw balance is at the peak of MCFB/HWCDSE:Maximizing Silicon Area EfficiencyDistance from NodeHW Cost(HWC)

262、103542768100150200250350300509MCBF0.00500.0150.0250.020.010.0350.030.04HW Cost(HWC)10015020025035030050Hw/Compiler Co-OptimizedSwitch BoxMCBF/HWCMCBF:Mean ConnectionsBefore FailureISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets23 of 62 2024 IEEE International Solid-State Circuit

263、s ConferenceFour UDSP dielets8mm x 8mm Si-IFSi-IF interface7,168 data pins160 control+PLL pins22,291 power/ground pins2-layer routingAssembly considerationsSelect known good diesSelect known good Si-IFDie handling,cleaning,ESDDielet alignment,bondingSi-IF Assembly OverviewISSCC 2024-Forum 1.5:Effici

264、ent Domain-Specific Compute with Chiplets59m24 of 62 2024 IEEE International Solid-State Circuits Conference Low loss at 10 GHz 10 ps RTT,negligible ISISi-IF Characteristics10-m pitch Cu bumps*350 mChannel Loss(dB)ISSCC 2024-Forum 1.5:Efficient Domain-Specific Compute with Chiplets13*9.8 m with opti

265、cal shrink25 of 62 2024 IEEE International Solid-State Circuits Conference UDSP dielet powered on to verify Clk tree and shift-registersLow-freq Clk applied using a probe station Dice defect-free Si-IF sites for assemblyTemplate-based wafer scan for repeated patterns Dielets assembled on Si-IF using

266、 direct Cu-Cu TCBIn-situ formic acid(FA)vapor treatment Ionizers on the bonding tool to ensure an ESD-safe assemblyDefault 20 l/min flow interferes with the FA vapor flow of 4.5 l/minLeads to inadequate cleaning of Cu pads,inferior bonding qualityShear strength$1TGrowing double by 2030PC EraSmartpho

267、ne EraData CenterAI Era$205B2000Semiconductor Industry Landscape 2024 IEEE International Solid-State Circuits Conference4MORE THAN MOOREMarket demand for AI performance is faster than Moores Law doubling transistors every 18 months AI/ML performance has increased nearly 6.8x11x in the past two years

268、 from 2021 2022 that outstrips and more than Moores LawScaling for AI eraISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits Conference5Compute Requirements Exploded in AI era20102020203020402050Current Tre

269、nd(Device Scaling)“Market dynamics limited”scenarioWorlds energy productionCompute Energy in J/year1.E+181.E+201.E+22ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationExascale=21MW(52GF/watt)Zettascale=500MW(2140 GF/watt)Nuclear plant1GW 2024 IEEE In

270、ternational Solid-State Circuits Conference6Technology Converging and Business Ecosystem2020sCSYS Multi-dieChipletsIDMs/Foundry/OSTAs/EMSEcosystem1990s2000s 2010sSoCSoC w/IP MCMIDM orFoundryFoundry&3rdParty IPIDM,Foundry&OSAT2010s 2000s1990sSiPHDI PCB FR4 PCB OSATsEMSEMS SiPSystem integrated on boar

271、d driven by OSATs/EMS in semiconductor markets Si CMOS(foundry-focused)to CSYS(Complementary Systems)Chiplets&heterogenous integration on substrate become mainstream in the futureISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE Internatio

272、nal Solid-State Circuits Conference Mix&Match systems-Enable construction of Different Si Nodes Reuse IPsPackage becomes new System-on a Chip(SoC)System flexibility-Processors,accelerator Performance optimization-Low latency&high BW Time to Market Low CostEnable optimal process technology;Smaller fo

273、r better yieldModularized SoC(Chiplets)Monolithic SoCDrivers for on-package Chiplets 72023 IEEE 73rd Electronic Components and Technology Conference Orlando,Florida May 30 June 2,2023Chiplets and Heterogeneous Integration7ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the P

274、ath to Standardization 2024 IEEE International Solid-State Circuits Conference8Chiplet 1Process node 1On-die busChiplet 2Process node 2PCI/CXLControllerOptional interposer/bridgepackageD2DPHYInterfacelogicD2DPHYInterfacelogicOn-die bus1-20 mmMonothetic Chip:Scale SoC&Homogenous one die in a packageC

275、hiplet:Split SoC&Heterogeneous multiple dies in a packageD2D interconnectOff-package interconnect+simple packageOn-package interconnect+more complex packagesChiplets-Disaggregation&IntegrationISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IE

276、EE International Solid-State Circuits ConferenceChiplets More Complex Workflow Design&VerificationCustomerSpecificationAssemblyPackaging&TestFoundrySystemSoftwareOEM andProductcodesignphysical modularityFunctional modularityadditional constraints with chipletsKey factors for more complexFunctional m

277、odularityPhysical modularityInterconnectPackagingTest and operationsSupply chainChipletsISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization9 2024 IEEE International Solid-State Circuits ConferenceChiplets Design EcosystemNode/Board Level IntegrationCP

278、UCPUAcceleratorI/O TileMemMemMemMemCXL/PCIe/CPU-CPU(Electrical/Optical/)DDRPackage Level IntegrationOn-die Integration Seamless Integration from Node Package On-die Standardization for Chiplets D2D and interoperation Same Software,IP,and Subsystem to build scalable solutionsISSCC 2024-Forum F1.6:Inn

279、ovations in Chiplet Interconnects,Protocols and the Path to Standardization10 2024 IEEE International Solid-State Circuits ConferenceChiplet Form FactorDie Size/bump locationPower deliverySoC Construction(Application Layer)Reset and InitializationRegister accessSecurityDie-to-Die Protocols(Data Link

280、 to Transaction Layer)PCIe/CXL/Streaming Plug and play IpsDie-to-Die I/O(Physical Layer)Electrical,bump arrangement,channel,reset,power,latency,test repair,technology transition Die-to-Die I/ODie(Chiplet)ProtocolDie-to-Die I/OProtocolDie(Chiplet)ChipDie-to-DieI/ODie-to-DieProtocolChipletForm FactorS

281、oC Construction(Example SoC showing two chiplets only)Chiplets Interconnect&Interoperation11ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits Conference D2D interface Functional block connecting data inte

282、rface between two dies assembled in same package(MCM)/interposer(2.5D,Fan out,Si bridge,3D stacking)D2D very short channels High Power efficiency High bandwidth Chiplets Die to Die Interface D2D Structures Typically made of PHY&a controller Block(a physical layer,link layer,and transaction layer)Two

283、 types of PHY Architectures SerDes series connection(standard MCM)High density parallel(2.5D,Fan out RDL,Si bridge,3D stacking)Standard D2D&Proprietary IP D2D Open-source standards:UCIe,Bow,OHBI,More IP D2D:NVlink(Nvidia),Lipincon(TSMC),Infinity Fabric(AMD),MDIO/AIB)(Intel),XSR/USR(Rambus)ISSCC 2024

284、-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization12 2024 IEEE International Solid-State Circuits ConferenceChiplets Growth in StandardsComponentStatusD2D Interconnect(Huge growth/awareness)UCIe,BoW,AIB,XSRTestIEEE 1838,IEEE P3405Chiplet descriptionJEDEC-OCP J

285、EP 30 CDXML(new in 2023)Size guardrailsX Power delivery guardrailsXThermal guardrailsXWiring density guardrailsXMechanical guardrailsXBump and assembly pitch guardrailsX13ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Soli

286、d-State Circuits ConferenceChiplets D2D Standard-UCIeLayered Approach with industry-leading KPIsPhysical Layer:Die-to-Die I/ODie to Die Adapter Support for multiple protocols:bypassed in raw modeProtocol:CXL/PCIe and Streaming CXL/PCIe for volume attach&plug-and-playSoC construction issues are addre

287、ssed w/CXL/PCIe CXL/PCIe addresses common use casesI/O attach,Memory,Accelerator Streaming for other protocolsScale-up(e.g.,CPU/GP-GPU/Switch from smaller dies)Protocol can be anything(e.g.,AXI/CHI/SFI/CPI/etc.)Well defined specification:interoperability and future evolutionISSCC 2024-Forum F1.6:Inn

288、ovations in Chiplet Interconnects,Protocols and the Path to Standardization14 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D Standard-UCIeISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization15DIE-TO-DIE ADAPTERPHYSICAL LAYERPROTOCO

289、L LAYERPCIe,CXL,Streaming(e.g.,AXI,CHI,symmetric coherency,memory,etc.)Flit-Aware Die-to-Die Interface(FDI)Raw Die-to-Die Interface(RDI)Link TrainingLane Repair/Reversal(De)Scrambling,Analog Front end/Clocking Sideband,Config&Registers ChannelArb/Mux(if multiple protocols)CRC/Retry(when applicable)L

290、ink state managementParameter negotiation&Config Registers(Bumps/Bump Map)Form FactorRaw Mode(bypass D2D Adapter to RDI e.g.,SERDES to SoC)2024 IEEE International Solid-State Circuits ConferenceChiplet D2D StandardUCIe-PHYByte to Lane mapping for data transmission Interconnect redundancy remappingWi

291、dth degradationScrambling&training pattern generationLane reversalLink initialization,training&power management statesTransmitting&receiving sideband messagesOne,two or four module per Adapter allowed both advanced&standard PackageStandard package example configurations16ISSCC 2024-Forum F1.6:Innova

292、tions in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceUCIe Usage Model Streaming for PCIe/CXL AMBA CHIUCIe ProtocolStreamingAdapterPHY Transporting same on-chip protocol allows seamless use of architecture specific features wi

293、thout protocol conversion Streaming interface with additional flit formats provide link robustness using UCIe defined data-link CRC&retryCHI/CXLUCIe ProtocolStreamingAdapterPHYISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization17AccelerationMem contro

294、llerGPUAcceleratorsInternal(CHI interconnect)ComputeCPUCPUCHI interconnectCPUD2DAdapterMemory controllerCPUCPUCPUCPUCPUCPUCPUCPUCPUPHYComputeCPUCPUCHI interconnectCPUMemory controllerCPUCPUCPUCPUCPUCPUCPUCPUCPUPHYD2DAdapterPHYCHI/CXLUCIeUCIeUCIeUCIe(3 dies on one package)PHYD2DAdapterD2DAdapterCHI/C

295、XL 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D-UCIe Key Metrics18UCIe 1.0/1.1 Characteristics and Key Metrics 2024 IEEE International Solid-State Circuits Conference19AreaOutputD2D PHYBunch of Wires 1.0Bunch of Wires 2.02.1 in flight1ststandard scaling laminate to advanced pa

296、ckaging,2-32 Gbps/laneImplementations from 65,22,16,12,7,6,5,4,3nm 10+products in flightProven power of 0.3 to 0.5 pJ/bit,D2D SpreadsheetV3.0 in flightBiennial release compares data on all PHYsD2D Link and TransactionTLP 1.0Only known“streaming mode”open link layerD2D Transaction ProfileNXP DiPort/O

297、ther ProfilesOnly known open maps of AXI SOC Traffic to D2D PHYsCDXJEDEC-OCP JEP 30 PM;Open 3DKStandard for physical chiplet descriptionWorkflow white papersBusinessChiplet cost modelBusiness white paperOpen spreadsheet model to compare chiplet/monoProduct planning assistance documentPrototypingTest

298、 Package1stopen chiplets integration across vendorsFully open package design and analysisChiplets D2DOCP/ODSA Standards BoWISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D Standard

299、-BoWD2D is made up of wires,slices,and stacksPhysical Layer:Slice(Die-to-Die I/O)It must have 18 or 20 signal bumps.It must have 2 bumps for the differential clock and 16 single-ended data bumps It also have the optional single-ended signals AUX and FEC.The long edge of a slice must be parallel to t

300、he chip edge A stack composed of one or more slices stacked from chip edge to center A link composed of one or more stacks along the chip edgeBoW Link Components20ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State

301、Circuits ConferenceChiplets D2D Standard-BoWBoW PHY in the ODSA Stack BoW for Common Transaction Protocols 21ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceChiplets D2D Standard-BoWBoW PHY Mo

302、des and Targets 22ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceStandardThroughputDensityMax.DelayAdvanced Interface Bus(AIB,Intel)2 Gbps504 Gbps/mm5 nsBandwidth Engine(Mosys)10.3 GbpsN/A2.4

303、 nsBunch of Wires(BOW,OCP/ODSA)16 Gbps1280 Gbps/mm5 nsUniversal Chiplet Interconnect express(UCIe)32Gbps1350Gbps/mm2 nsHBM3(JEDEC)4.8 GbpsN/AN/AInfinity Fabric(AMD)10.6 GbpsN/A9 nsLipincon(TSMC)2.8 Gbps536 Gbps/mm14 nsMulti-die IO(MDIO,Intel)5.4 Gbps1600 Gbps/mmN/AXSR/USR(Rambus)112 GbpsN/AN/A23ISSC

304、C 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationChiplets D2D Interface Summary 2024 IEEE International Solid-State Circuits ConferenceChiplet IntegrationStandard&Advanced Packages(Standard Package)(Multiple Advanced Package Choices)Die-2Package Substra

305、teDie-0Die-1 Standard Package:2D cost effective,longer distance Advanced Package:2.5D,high density Fanout,embedded Si bridge power-efficient,high bandwidth density 24ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationPackage SubstrateSilicon Bridge(e.

306、g.EMIB)(e.g.EMIB)Silicon Bridge Die-1Die-0Die-2Package SubstrateInterposer(e.g.CoWoS)Die-1Die-0Die-2Package SubstrateInterposer(e.g.FOCoS-B)Silicon Bridge Silicon Bridge Die-1Die-0Die-2 2024 IEEE International Solid-State Circuits Conference251995-NowPerformance1984-Now2009-20212022 Flip ChipBall Gr

307、id Array2.5DThroughSilicon ViaFOCoSWire BondCu Pillar Flip Chip(Dev 2006)FOPOP2.5/3DFan-OutWafer Level PackageFOSIPSolder Flip Chip(Dev 1964)FOCoS-BVIPack PlatformCo-SiPhWire bond(Dev in 1956)Mobile-Networking-Compute-AI,Edge Automotive-IndustrialDensity3D Advanced RDL TechnologyFan Out Package(Dev

308、2009)ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationASE Advanced Packaging Technology Offerings 2024 IEEE International Solid-State Circuits Conference26High Density Interconnection-High I/O connect 10000 with fine RDL L/S 2/2um-Support package si

309、ze 60 x60mm Chip Last(FOCoS-CL)Chip First/CL w/Bridge(FOCoS-B)Chip First(FOCoS-CFP)ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationFOCoS Packaging Technology Offerings 2024 IEEE International Solid-State Circuits Conference27 2ASIC+4HBM2+4 Si Bridg

310、e die Module size:47x31 mm2 1 RDL,L/S 10/10 um Si Bridge Die L/S 0.8/0.8um Package size:78x70 mm2 Total 10 chiplets in MCM package ASIC+2HBM3 4 RDL,L/S 2/2um Package size:75x75 mm2 ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationFOCoS Packaging Tec

311、hnology 2024 IEEE International Solid-State Circuits Conference28 High-Density Interconnect Min.L/S 0.4/0.4um Power module or DTC integration Optics integration(optic I/O and Photonic)ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization2.5D TSV Heterog

312、enous Integration 2024 IEEE International Solid-State Circuits ConferenceFan-out WaferDRC,DFM&LVSGDSII out Layout Cadence APDPKG Netlist Net name and coordinateDOCSAuto-routerCadenceLayoutChecking ProgramAuto-mask DesignPDKCalibre More than 50%layout cycle-time saving by auto-router 29ISSCC 2024-For

313、um F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationIntegrated Design Ecosystem(IDE)2024 IEEE International Solid-State Circuits Conference UCIe D2D interface bump out diagram foradvancedpackagingwiththebumppitchbetween 40um to 50umVia land diameter=16umRDL L/S=2/2

314、umuBump pitch=45um FOCoS RDL Design Rule 10 columns for x64 TX&RX data lanes&total156lanesfortheD2DinterfaceroutingUCIe D2D Interconnect using FOCoS Packaging 30ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Ci

315、rcuits ConferenceUCIe D2D Interconnect Design for FOCoS PackagingMoldingUBMUBMUBMCuPCuPCuPGSGGSPI2PI1PI3RDL1RDL2PI4PI5RDL3RDL4RDL5PI7GSGGSSGSSGGSGGSSGSSG2 um 2 umUnderfill GSG type X-sectionSGSSGPI6RDL6 Total 1516 I/O for 1 bump pitch 3 I/O layers&2 isolation GND for SS type design 6L RDL layers nee

316、ded for each I/O routing with ground RDL traces surrounding31ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceElectrical Analysis for UCIe D2D in Advanced Packages32ISSCC 2024-Forum F1.6:Innova

317、tions in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits Conference Key Performance Indicators Bandwidth density(linear&area)Data Rate&Bump Pitch Energy Efficiency(pJ/b)Scalable energy consumption Low idle power(entry/exit time)Latency(end

318、-to-end:Tx+Rx)Channel Reach Technology,frequency&BER Reliability&Availability Cost(Standard vs advanced packaging)Factors Affecting Wide Adoption Interoperability Full-stack,plug-and-play with existing s/w is+Different usages/segments Technology Across process nodes&packaging options Power delivery&

319、cooling Repair strategy(failure/yield improvement)Debug controllability&observability Broad industry support/Open ecosystem Learnings from other standards efforts33ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to StandardizationChiplets D2D Interface Standards Ado

320、ption 2024 IEEE International Solid-State Circuits ConferenceKey Takeaways Chiplets heterogeneous integration optimizes system performance to continue scaling Moores law with cost advantage Interoperability,plug and play for different usages and broad industry support are very critical to the wide a

321、doption of chiplets D2D interface standardization Advanced packaging solutions(HDFO,2.5D&3D)enables chiplets and heterogeneous integration that optimizes system performance34ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International S

322、olid-State Circuits ConferenceInclude Key References“Bunch of Wires PHY Specification”,The Open Domain-Specific Architecture BoW Workstream,2022“UCIe Specification”,July 2023“Interconnects for 2D and 3D Architectures”Heterogenous Integration Roadmap(HIR)2021 EditionSamuel Naffziger et al.,“Pioneerin

323、g Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families:Industrial Product,”2021 ACM/IEEE 48thAnnual International Symposium on Computer Architecture(ISCA)Anthony Mastroianni et al.,“Proposed Standardization of Heterogenous Integrated Chiplet Models,”2021 IEEE International 3D

324、Systems Integration Conference(3DIC)Shahab Ardalan at at.,“Bunch of Wires:An Open Die-to-Die Interface”,2020 IEEE Symposium on High-Performance Interconnects(HOTI)John Park,”Chiplets and Heterogeneous Packaging Are Changing System Design and Analysis”,Cadence white paper,Lihong Cao et at.,“Advanced

325、Packaging Design Platform for Chiplets and Heterogeneous Integration”ECTC,2023R.Farjadrad et at.,A Bunch of Wires(BoW)Interface for Inter-Chiplet Communication,Hot Interconnect,201935ISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE Intern

326、ational Solid-State Circuits Conference36Thank youISSCC 2024-Forum F1.6:Innovations in Chiplet Interconnects,Protocols and the Path to Standardization 2024 IEEE International Solid-State Circuits ConferenceISSCC 2024-Forum X.Y:37 of 16Please Scan to Rate Please Scan to Rate This PaperThis Paper 2024

327、 IEEE International Solid-State Circuits ConferencePhotonics for Die-to-Die Interconnects:Links and Optical I/O ChipletsChen SunAyar Labs,Inc.ISSCC 2024-Forum 1.7:Photonics for Die-to-Die Interconnects:Links and Optical I/O Chiplets1 of 47 2024 IEEE International Solid-State Circuits Conference Emer

328、ging computing applications,such as AI/ML,have an ever-insatiable demand for interconnect bandwidths.Gap between in-package and off-package I/O bandwidth continues to grow.Interconnect BW Growth Driven by AI/ML(Nvidia GTC March 2022)G.Keeler,DARPA ERI Summit 2019ISSCC 2024-Forum 1.7:Photonics for Di

329、e-to-Die Interconnects:Links and Optical I/O Chiplets2 of 47 2024 IEEE International Solid-State Circuits ConferenceScaling Challenges for Off-package I/O*Source:Gordon Keeler,DARPA MTO,ERI Summit 2019Target for Optical I/O ChipletsHBM2e112G XSRCritical Performance Metrics:ISSCC 2024-Forum 1.7:Photo

330、nics for Die-to-Die Interconnects:Links and Optical I/O ChipletsEnergy efficiency(pJ/bit)Bandwidth density(Gbps/mm)Reach(mm to meters)Latency(ns)Optical I/O chiplets bridge the in-package and off-package performance gap 3 of 47 2024 IEEE International Solid-State Circuits Conference Chiplets for Opt

331、ical I/OSystem architectureBuilding blocksChiplet D2D interfaces Retimed optical I/O Chiplet designProcess and fiber attachElectrical interface designOptical transceiver design Measurement results ConclusionAgendaISSCC 2024-Forum 1.7:Photonics for Die-to-Die Interconnects:Links and Optical I/O Chipl

332、ets4 of 47 2024 IEEE International Solid-State Circuits Conference Chiplets for Optical I/OSystem architectureBuilding blocksChiplet D2D interfaces Retimed optical I/O Chiplet designProcess and fiber attachElectrical interface designOptical transceiver design Measurement results ConclusionAgendaISSC

333、C 2024-Forum 1.7:Photonics for Die-to-Die Interconnects:Links and Optical I/O Chiplets5 of 47 2024 IEEE International Solid-State Circuits Conference Optical I/O chiplets can bridge the D2D interfaces between two socketsOptical I/O System ArchitectureASIC PackageASIC PackageASICASICOptical I/O ChipletOptical I/O ChipletElectrical D2D I/FElectrical D2D I/FSingle Mode fiber ExternalMulti-wavelength

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(F1 - Efficient Chiplets and Die-to-Die Communications.pdf)為本站 (2200) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站