《Session 1Plenary.pdf》由會員分享,可在線閱讀,更多相關《Session 1Plenary.pdf(34頁珍藏版)》請在三個皮匠報告上搜索。
1、Session 1 Overview:Plenary I NVI TED PAPERSChair:Edith Beign Meta,Menlo Park,CA ISSCC Conference ChairAssociate Chair:Thomas Burd Advanced Micro Devices,Santa Clara,CA ISSCC International Technical-Program Chair8 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/O
2、VERVI EW979-8-3315-4101-9/25/$31.00 2025 IEEEThe Plenary Session starts with welcoming remarks and introduction from the Conference Chair,Edith Beign,followed by the International Technical Program Chair,Thomas Burd,providing an overview of ISSCC 2025.The Plenary Session will feature four distinguis
3、hed keynote speakers,who are leaders and pioneers in their domain,covering together a broad spectrum of our industry.An Awards Ceremony will take place after the first two Plenary talks to recognize major technical and professional accomplishments presented by the IEEE,Solid-State Circuits Society(S
4、SCS),and ISSCC.The first plenary talk“AI Era Innovation Matrix”is by Navid Shahriari,Senior Vice President of Foundry Technology Development at Intel.This presentation describes the significant innovation needed from transistors to software to enable AI systems to continue their rapid rate of perfor
5、mance scaling.This innovation encompasses a matrix of technologies,including process technology,packaging with advanced 3D integration,interconnect at the board and system levels,power delivery across the system and to high-wattage SoCs,system hardware architecture,and a co-designed software stack.T
6、he second plenary talk“From Chips to Thoughts:Building Physical Intelligence into Robotic Systems”is by Daniela Rus,Director of the Computer Science and Artificial Intelligence Laboratory(CSAIL)and the Andrew(1956)and Erna Viterbi Professor in EECS at the Massachusetts Institute of Technology.This p
7、resentation starts with an excellent analysis of the untenability of current large-language models in terms of computational resources and power efficiency.Compact state-space models are proposed to radically improve AI energy efficiency and enable a new era of physical intelligence,followed by a di
8、scussion on how to address the new challenges posed by interactive,autonomous robotic systems.The third plenary talk“AI Revolution Driven by Memory Technology Innovation”is by Jaihyuk Song,Corporate President and CTO of Device Solutions at Samsung Electronics.This presentation analyzes the memory re
9、quirements and design challenges of AI systems and provides a comprehensive overview of memory architectures and technologies,as well as storage technologies,to prevent the“Memory Wall”from impeding the rapid performance scaling of current and future AI systems.The fourth plenary talk“The Crucial Ro
10、le of Semiconductors in the Software-Defined Vehicle”is by Peter Schiefer,President and CEO of the Automotive Division at Infineon Technologies.The first part of the presentation focuses on the key trends driving the automotive industry,the ever-increasing semiconductor content,and the key value-dri
11、vers for software-defined vehicles.As the industry evolves towards a new automotive electrical/electronic architecture,the second part explores challenges,possible solutions,and innovation needed in the areas of high-performance computation,power delivery,connectivity,and security,all while meeting
12、the extremely high reliability demanded by the automotive industry.We hope that you will find these presentations informative,inspiring,and motivating!8:30 AM FORMAL OPENI NG OF THE CONFERENCEI SSCC 2025/February 17,2025/8:30 AM9 DIGEST OF TECHNICAL PAPERS 111:15 AM 1.4 The Crucial Role of Semicondu
13、ctors in the Software-Defined Vehicle Peter Schiefer,Pres ident&CEO,Aut omot iv e Div is ion,I nfineon Technologies,Munich,Germa ny The automotive industry is undergoing a significant transformation,driven by the rise of software-defined vehicles(SDVs).Semiconductors will play a pivotal role in enab
14、ling this transition,powering the complex systems that underpin the features and functions of modern cars.This paper explores the key trends driving the growth of the automotive semiconductor market,including green mobility,autonomous driving,and smarter cars.It delves into the challenges and opport
15、unities associated with the development of SDVs,highlighting the importance of advanced microelectronics,artificial intelligence,and secure communication solutions.The paper concludes by emphasizing the crucial role of semiconductors in shaping the future of mobility.By addressing the challenges and
16、 embracing the opportunities presented by SDVs and AI,the automotive industry can create a more sustainable and innovative future.8:50 AM 1.1 AI Era I nnovation Matrix Navid Shahriari,Senior Vice Pres ident,F oundry Technology Dev elopment,I nt el,Cha ndler,AZ AI holds transformative potential for h
17、umanity,enhancing our ability to solve complex problems with speed and accuracy,and unlocking new realms of innovation and understanding.The lightning-fast progression of AI,unprecedented in history,necessitates rapid advancements at a system level,from low-power and edge-AI devices to cloud-based c
18、omputing,and in the communication networks that connect them.This need for rapid AI system scaling is driving the innovation frontier in silicon,packaging,architecture,and software.This paper describes a matrix of technologies that empowers the industry to achieve remarkable progress at every level,
19、from chips to systems.9:20 AM 1.2 From Chips to Thoughts:Building Physical I ntelligence into Robotic Systems Daniela Rus,Ma s s a chus et t s I ns t it ut e of Technology,Direct or,CSAI L&Andrew(1956)a nd Erna Vit erbi Profes s or,Ca mbridge,MA In today s robot revolution,a record 3.1 million robot
20、s are now working in factories,doing everything from assembling computers to packing goods and monitoring air quality and performance.A far greater number of smart machines impact our lives in countless other waysimproving the precision of surgeons,cleaning our homes,extending our reach to distant w
21、orldsand we are on the cusp of even more exciting opportunities.Future machines,enabled by recent advances in AI,will come in diverse forms and materials,embodying a new level of physical intelligence.Physical Intelligence is achieved when the power of AI to understand text,images,signals,and other
22、information is used to make physical machines such as robots intelligent.However,a critical challenge remains:balancing the capabilities of AI with sustainable energy usage.To achieve effective physical intelligence,we need energy-efficient AI systems that can run reliably on robots,sensors,and othe
23、r edge devices.In this paper I will discuss the energy challenges of transformer-based foundational AI models,I will introduce several state space models,and explain how they achieve energy efficiency,and how state-space models enable physical intelligence.10:45 AM 1.3 AI Revolution Driven by Memory
24、 Technology I nnovation Jaihyuk Song,Corpora t e Pres ident&CTO,Dev ice Solut ions,Sa ms ung Elect ronics,Hwa s eong,Sout h Korea The recent AI revolution,spearheaded by Large Language Models(LLMs),demands substantial computing resources and corresponding memory solutions.However,unlike processors t
25、hat can leverage advancements in fabrication processes,memory devices are increasingly struggling to meet the high bandwidth,large capacity,and power efficiency requirements of AI systems.This paper analyzes the requirements and limitations of systems in the AI era,categorizing application-specific
26、memory needs in terms of performance,power,and capacity.We introduce performance-centric solutions such as HBM(High Bandwidth Memory)and PIM(Processing-In-Memory)technologies,energy-efficient solutions including custom HBM and LPW(LPDDR Wide-IO)memory,and capacity-focused solutions like SSD(Solid-St
27、ate Drives)and CXL(Compute Express Link)Memories.Additionally,we discuss how continuous scaling of DRAM and NAND Flash processes,as well as 3D-packaging technologies,can address the trade-offs among performance,power,and capacity more effectively.Finally,the importance of software technologies in op
28、timizing the utilization of these increasingly specialized memory solutions is emphasized,along with a discussion of the enabling core technologies for each solution.To meet the high demands of AI systems,the ongoing advancement of existing memory devices and the development of new memory solutions
29、will play crucial roles.These efforts will support the advancement of AI technologies and contribute to human society.9:50 AM I SSCC,SSCS,I EEE AWARD PRESENTATI ONS 10:15 AM BREAK11:45 AM PRESENTATI ON TO PLENARY SPEAKERS 11:50 AM CONCLUSI ON10 2025 IEEE International Solid-State Circuits Conference
30、I SSCC 2025/SESSI ON 1/PLENARY/1.1979-8-3315-4101-9/25/$31.00 2025 IEEE1.1 AI Era I nnovation Matrix Navid Shahriari Senior Vice President,Foundry Technology Development,Intel,Chandler,AZ AI holds transformative potential for humanity,enhancing our ability to solve complex problems with speed and ac
31、curacy,and unlocking new realms of innovation and understanding.The lightning-fast progression of AI,unprecedented in history,necessitates rapid advancements at a system level,from low-power and edge AI devices to cloud-based computing,and in the communication networks that connect them.This need fo
32、r rapid AI system scaling is driving the innovation frontier in silicon,packaging,architecture,and software.This presentation describes a matrix of technologies that empower the industry to achieve remarkable progress at every level,from chips to systems.1.0 Introduction The rapid expansion of Artif
33、icial Intelligence(AI)is pushing traditional compute technology to its limits,requiring sustainable and energy-efficient solutions for exponential scaling of parallel compute systems.The compute industry must meet the growing demand for computing power,memory bandwidth,connectivity,high-performance
34、infrastructure,and AI across all sectors.This paper emphasizes that advancements in the technology matrix shown in Figure 1.1.1,from software and system architecture to silicon and packaging.Progress in each area is necessary,but the entire system must be co-optimized to maximize performance,power,a
35、nd cost.Strong ecosystem partnerships and novel design methodologies are crucial for efficient co-optimization and faster time to market,setting the stage for AI s transformative potential.2.0 Silicon Silicon scaling has been a fundamental driver of progress in the semiconductor industry and remains
36、 a cornerstone of the innovation matrix.The silicon roadmap is enabled by non-incremental transistor and interconnect architectural advances,High NA EUV lithography,and associated mask and modeling solutions.The feature scaling and improvements for each technology generation is guided by a design te
37、chnology co-optimization(DTCO)process that sets and drives towards holistic goals for logic,memory,and analog/mixed-signal power,performance,area(PPA),and cost scaling.This iterative loop between design and process technology is essential to achieve continued silicon scaling benefits.2.1 RibbonF ET
38、RibbonFET,a gate-all-around transistor,advances beyond FinFET architecture,offers performance scaling and workload flexibility 1.Varying ribbon widths provide custom solutions for diverse performance and efficiency needs within the same technology base.2.2.PowerVia PowerVia Power Via 2,a high-yieldi
39、ng backside power delivery technology,integrates power delivery to the transistor,reducing IR drop by 5 and providing additional front-side wiring for signal routing.It meets all JEDEC thermomechanical stress requirements with zero failures and shows over 5%frequency benefit in silicon 3.Intel 18A,I
40、ntel s leading-edge process node,will offer an industry first combination of RibbonFET and PowerVia technologies.2.3.The High NA EUV Adv a nt a ge High NA EUV enables flexible design rules,reducing parasitic capacitance and enhancing performance 4.It simplifies aspects of Electronic Design Automatio
41、n(EDA)by reducing design rule complexity and the need for multi-patterning.Intel 14A front-side interconnects are optimized for High NA single-expose patterning,improving yield and reliability(Figure 1.1.2)5.2.4.Empowering AI wit h High NA EUV for F ull-F ield La rge Die AI Applica t ions High NA EU
42、V tools have a smaller imaging field size,but Intel has developed solutions for electrically stitching die across boundaries.The EDA ecosystem is creating tools to support this,and the mask ecosystem is working towards full field size capability without reticle stitching(Figure 1.1.3),increasing pro
43、ductivity by 23-50%5.2.5.Enha ncing t he High NA EUV Adv a nt a ge wit h AI a nd Curv ilinea r Ma s k Solut ions High NA EUV lithography requires advanced modeling and mask solutions.Intel uses AI and machine learning to achieve accuracy while managing computational costs.Curvilinear masks improve p
44、attern space utilization,process window,and significantly reduces variability.3.0 3D Integrated Circuits(3DIC),Packaging,and Assembly As data processing demand grows,achieving more computing power in a smaller area with lower energy consumption is crucial.3DIC technology reduces costs and footprint
45、through heterogeneous integration,enhancing performance with higher bandwidth,and lower power consumption via vertical stacking.The base die on an advanced node is critical for enabling Through Silicon Vias(TSV)and advanced interfaces,integrating 3D elements seamlessly.On-package vertical and latera
46、l interconnects must continue to scale,providing increased interconnect densities for bandwidth growth and improved energy efficiency 6(Figure 1.1.4).Cost-effective interconnect scaling,combined with the use of standardize-based links like UCIe,is essential to create a chiplets ecosystem where plug-
47、and-play will enable product diversity and customization.Maturing the use of glass to scale package substrate interconnect geometries,size,and signaling features(Figure 1.1.5),is an important technology vector.The increasing power demanded by AI applications must be addressed by improving system lev
48、el power delivery efficiency(described later)and expanding the thermal envelope through component and system-level innovation(Figure 1.1.6).Advanced packaging technologies are evolving in a manner where the boundary between packaging and silicon backend interconnects is increasingly blurred as the f
49、eature dimensions and manufacturing processes overlap.Additionally,the package becomes a complex heterogeneous structure.Manufacturing and test processes must evolve to ensure that yield stays high(Figure 1.1.7).A modular design environment that allows for straightforward assembly of multi-Si,co-pac
50、kaged systems optimizing cost,performance,and bandwidth is critical.Comprehensive EDA tool and flow capabilities are needed for design partitioning across dies,enabling successful co-design,and optimization of dies and packages.Current 3DIC design flows lack thermal and mechanical stress modeling,le
51、ading to potential failures and redesign efforts that impact time to market.3DIC Design tools must span implementation,extraction,reliability,and verification to ensure seamless integration(Figure 1.1.8).4.0 Interconnect The exponential scaling of parallel AI workloads is putting pressure on interco
52、nnect bandwidth density,latency,and power.All three of these metrics are improved by tighter integration of components with dense 2.5D and 3D assembly technologies,as described in Section 3.New packaging techniques provide better total cost of ownership(TCO)by minimizing the very costly(both in pric
53、e and power)interconnect between the GPUs.The energy to transmit each bit of data scales as a function of the channel loss 7.This tradeoff has driven the definition of industry specifications like UCIe for low-power,high-density in-package communication.UCIe enables up to 1.35TB/s per millimeter of
54、die perimeter at 1pJ/bit.Longer interconnects within the board and rack,which constitute the high-bandwidth domain in a scale-up network topology,require increasing data serialization to account for practical connector signal density in order to scale up aggregate bandwidth.Serial per-channel data r
55、ates have scaled by a factor of 2 every 3-4 years,including industry specifications like Ethernet,PCIe,and OIF-CEI.The latest production wireline SerDes has reached 212Gb/s PAM4 to support within-rack(1 meter reach)communication at 4-6pJ/bit.The energy-per-bit for the analog circuits and digital equ
56、alization both continue to benefit from process technology scaling.Figure 1.1.9 shows measured TX and RX eye diagrams for a 212Gb/s SerDes operating in Intel 18A over a 40dB channel.As wireline interconnect data rates continue to scale up,the distance that can be bridged between SerDes retimers redu
57、ces because of higher channel loss at higher symbol rates.Adding more retimers extends reach,but adds power,latency,and cost.This empirical tradeoff has led to the adoption of optical interconnects across a range of applications,from undersea cables to rack-to-rack networks.In addition,extending the
58、 reach of the high-bandwidth domain beyond the rack with optics aligns with the scale-up network strategy for AI.Therefore,optical interconnects will need to move into the rack to scale bandwidth,and reach at an acceptable power envelope.Technologies like co-packaged optics(CPO)and direct drive line
59、ar optics are being developed to make this transition.Intel recently demonstrated a 4Tb/s(8 fibers 8 wavelengths/fiber 32Gbps/wavelength in each direction)bidirectional fully integrated Optical Compute Interconnect(OCI)chiplet based on Intel s in-house Silicon Photonics technology(Figure 1.1.10)8 an
60、d 224Gb/s PAM4 over 23km fiber with direct drive linear optics(Figure 1.1.11)9.An industry-wide effort to accelerate this in-rack optical interconnect ecosystem is underway,developing high-yield manufacturing processes,materials,and equipment while improving bandwidth density,total power,reliability
61、,and cost.5.0 Power Delivery Per-package power for parallel workloads like AI is scaling up rapidly(Figure 1.1.12).A common approach to providing power to the package is a mother board voltage regulators(MBVR)(Figure 1.1.14(a).These regulators step the board-level power supply(e.g.12V)down to the vo
62、ltage used by the dies on the package(VOUT).Whether positioned next to the package(lateral MBVR)or under the package(vertical MBVR),the current density provided by MBVRs will not keep pace with future high-performance chips.Furthermore,regulator I SSCC 2025/February 17,2025/8:50 AM11 DIGEST OF TECHN
63、ICAL PAPERS efficiency degrades with higher power and current(I2R loss),costing system performance(Figure 1.1.13).Solutions are needed that bring the voltage conversion closer to the die with high current density,conversion efficiency,and regulation bandwidth.One solution vector is the use of fully
64、integrated voltage regulators(FIVRs)that bring the last step of power conversion onto the package(Figure 1.1.14(b).Having a final voltage step-down on the package reduces the energy lost when routing power rails onto the package by reducing the current for a given power(Figure 1.1.13).Intel first in
65、troduced FIVR in the Haswell product over ten years ago 10 11 using dense on-die capacitors and air-core package inductors.This first-generation FIVR converted a 1.8V input supply rail to multiple on-die voltage domains.This architecture has been used in many products over the past decade,with incre
66、mental improvements like denser on-package magnetic inductors and on-chip capacitors.In addition to the FIVR which is integrated into the SoC,Intel has also developed a CMOS-based standalone 2.4V IVR chiplet that used Intel s high-density capacitor(HDMIM)technology to develop a switched-cap voltage
67、regulator(SCVR)with a continuously scalable voltage conversion ratio 12.Further evolutionary scaling of on-package power capacity beyond 1-2kW will suffer from an unacceptable degradation in regulator efficiency using the existing MBVR architecture,as illustrated in Figure 1.1.12.This problem can be
68、 mitigated by integrating the high-voltage(12V)power conversion onto the package.12V regulator integration will reduce the current delivered into the package,thereby reducing the I2R loss.One promising approach is to pair a high-voltage(12V)switched-cap voltage regulator(SCVR)on package with a lower
69、-voltage(1.8-2.4V)IVR for a two-step conversion(Figure 1.1.14(c).The power density and efficiency of this two-step architecture relies on dense on-package passives,like embedded deep-trench capacitors(eDTC),and magnetic inductors,along with dense on-die capacitors.The use of wide bandgap process tec
70、hnologies like gallium nitride(GaN)can enable high voltage converters with higher efficiency and density than silicon-based solutions.However,on-package implementation of power converters requires a higher switching frequency with integrated drivers that cannot be supported on a GaN only process.Fab
71、ricating GaN devices with silicon CMOS can open more opportunities for on-package integration of the high voltage power converters as it enables the design of a CMOS driver and GaN power FET on the same chip.To this end,Intel recently demonstrated a technology that combines GaN-on-Silicon technology
72、 together on the same 300mm wafer 13 14.This technology can support high voltage IVR options with an input voltage of up to 12V to enable power scaling beyond 1-2 kW.6.0 Architecture and Software Next-generation compute architectures must drive exponential improvements in system performance metrics
73、like performance-per-Watt-$-mm while addressing thermal and power integrity challenges.Innovations should enable cohesive systems by stacking and interconnecting wafers and chiplets through advanced packaging and silicon processes.Additionally,they must support the seamless integration of custom acc
74、elerators for various workloads 15.Software,a crucial part of the innovation matrix,must advance through collaboration,standardization,and interoperability in open-source ecosystems.Automation should enhance security and streamline processes,while highly optimized software is essential for efficient
75、 silicon resource use.Distributing software across thousands of GPUs presents significant bandwidth and latency challenges,like high-performance computing.AI software will be key in fine-tuning system elements,ensuring seamless integration and delivering remarkable advancements.7.0 Looking Beyond Cl
76、assical Computing Technologies such as neuromorphic and quantum computing are critical to breakthroughs in efficiency and speed needed to scale AI.Since 2018,Intel s Loihi research chips,used by over 250 labs globally,have shown that neuromorphic chips manufactured with CMOS process technology can d
77、eliver orders-of-magnitude gains to a broad range of example algorithms and applications 16.While many of these examples relate to novel brain-inspired algorithms that are currently not compatible with today s software and AI methods,an emerging class of techniques shows that 1000 gains will be achi
78、evable in the near future for the deep learning and transformer methods in wide use today 17 18.These neuromorphic innovations may be essential for the proliferation of advanced AI capabilities into power-,latency-,and data-constrained intelligent devices operating in real-time settings.Quantum comp
79、uting represents a new paradigm that harnesses the power of quantum physics to solve complex problems exponentially faster than conventional compute.It promises to revolutionize industries and solve critical problems including climate change;chemical engineering;drug design and discovery;finance;and
80、 aerospace design.Making steady progress to transitioning this transformative technology from the lab into the domain of engineering to deliver customer solutions for useful,nearer-term applications is critical.Intel s unique approach to quantum research spans the full computing stack,including qubi
81、t manufacturing 19,cryogenic-CMOS technologies for qubit control 20,software,compilers,algorithms,and applications.With more than fifty years of experience in transistor manufacturing at scale,Intel is utilizing its proven technology to develop silicon spin qubits as the optimal path forward for qua
82、ntum computing scalability 21.Intel is also investing in capabilities like custom-designed cryoprobers that dramatically speed up Intel quantum testing and validation workflows 19.The current state of quantum computing hardware does not yet have the robustness and scale to have direct impact on AI t
83、oday.Another challenge for AI with quantum computers is how to feed large amounts of data into these complex machines.However,there are clear benefits once we have a scalable fault tolerant quantum computer.Quantum computers can perform complex computations faster than classical computers and this c
84、ould enable faster training and analysis of AI models.Two of the key principles of quantum computing are superposition and entanglement,which enable exploration of multiple solutions at the same time,and this could directly benefit training and optimization of AI models.The possibility of analyzing
85、large amounts of data in parallel can also improve the ability for AI to recognize patterns,for example in images or speech.Instead of using classical AI algorithms,new AI algorithms directly optimized to leverage quantum proprieties could be developed.Finally,quantum computers should not be seen as
86、 a replacement for classical computers,but rather as compute accelerators for special applications.Therefore,the system solution for AI in the future will likely leverage a hybrid implementation of classical and quantum computing.8.0 Ecosystem Collaboration Rapidly developing the next advanced compu
87、te systems will require collaboration on this innovation matrix across the industry ecosystem.Engaging with end users and partners across the technology stack from manufacturing to design tools and IP to system design to software ensures that the development process is aligned with market needs and
88、timelines,environmentally sustainable,and leveraging key learnings and development across the ecosystem.System-level co-optimization requires close collaboration to achieve rapid progress.Interdisciplinary expertise and knowledge sharing across strategic partnerships are essential for efficient prob
89、lem-solving and accelerated development cycles.Leveraging cross-industry strengths and avoiding duplication of efforts will enable teams to work more effectively.9.0 Industry Challenges&Opportunities Nearly twenty years ago,CPU clock frequency scaling faced a dilemma the continued pursuit of exponen
90、tial performance improvements hit a wall in terms of power density.The outcome of that was a new set of parallel processor architectures along with an array of supporting technologies for silicon,packaging and thermals,interconnect,power delivery,and core architectures.Today,we are in a similar situ
91、ation where exponential performance scaling,this time in support of AI,is running into fundamental challenges for power,connectivity,and cost.Once again,incremental scaling of our systems will not be enough,and we will require new approaches to solve this problem an AI innovation matrix.There is no
92、shortage of engineering challenges to take on,from process technology scaling to 3DIC system design to power delivery,interconnect,and core architecture.We will need the combined benefits of innovation across these areas to meet the industry demand for compute power in a manufacturable,sustainable,a
93、nd cost-effective way.Acknowledgement:The author sincerely appreciates Intel subject matter experts for their visionary guidance and invaluable insights that underpin this paper.Special thanks to Maharshi Chauhan,Frank Ehrig,Frank O Mahony,Craig Orr,Mondira Pant,and Shruti Seshadri.The author also e
94、xtends thanks to the scientists,engineers,and technicians at Intel and its partners across the semiconductor ecosystem for tirelessly working to bring new technologies to market.References:1 X.Wang et al.,“0.021m2 High-Density SRAM in Intel-18A-RibbonFET Technology with PowerVia-Backside Power Deliv
95、ery,”2025 IEEE International Solid-State Circuits Conference(ISSCC),San Francisco,CA,USA,2025.2 W.Hafez et al.,“Intel PowerVia Technology:Backside Power Delivery for High Density and High-Performance Computing,”2023 IEEE Symposium on VLSI Technology and Circuits(VLSI Technology and Circuits),Kyoto,J
96、apan,2023,pp.1-2,doi:10.23919/VLSITechnologyandCir57934.2023.10185208 3 M.Shamanna et al.,“E-Core Implementation in Intel 4 with PowerVia(Backside Power)Technology,”2023 IEEE Symposium on VLSI Technology and Circuits(VLSI Technology and Circuits),Kyoto,Japan,2023,pp.1-2,doi:10.23919/VLSITechnologyan
97、dCir57934.2023.10185369.4 A.Kelleher,“Evolution of advanced lithography and patterning in the system technology co-optimization era of Moore s law,”Proc.SPIE PC12953,Optical and EUV Nanolithography XXXVII,PC1295302(10 April 2024);https:/doi.org/10.1117/12.3027043 5 M.Phillips,“Exposure tool and ecos
98、ystem status for 0.55NA EUV lithography,”Proc.SPIE 13216,Photomask Technology 2024,1321602(13 November 2024);https:/doi.org/10.1117/12.3039295 112 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.1979-8-3315-4101-9/25/$31.00 2025 IEEE6 R.Mahajan et al.,“Chapter
99、 22:Interconnects for 2D and 3D Architectures,”in Heterogeneous Integration Roadmap 2024 Edition,IEEE Electronics Packaging Society,2024.Accessed:Nov.22,2024.Online.Available:https:/eps.ieee.org/images/files/HIR_2024/HIR_2024_ch22_2D-3D.pdf 7“2024 PRESS KIT.”Accessed:Nov.22,2024.Online.Available:htt
100、ps:/ 8“Intel Demonstrates First Fully Integrated Optical I/O Chiplet,”Intel,2024.https:/ A.Khairi et al.,“Beyond 200Gb/s PAM4 ADC and DAC-based Transceiver for Wireline and Linear Optics Applications,”in IEEE Open Journal of the Solid-State Circuits Society,doi:10.1109/OJSSCS.2024.3501975.10 N.Kurd
101、et al.,“Haswell:A family of IA 22nm processors,”2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers(ISSCC),San Francisco,CA,USA,2014,pp.112-113,doi:10.1109/ISSCC.2014.6757361.11 E.A.Burton et al.,“FIVR Fully integrated voltage regulators on 4th generation Intel Core So
102、Cs,”2014 IEEE Applied Power Electronics Conference and Exposition-APEC 2014,Fort Worth,TX,USA,2014,pp.432-439,doi:10.1109/APEC.2014.6803344.12 N.Butzen et al.,“28.4 A Monolithic 12.7W/mm2 Pmax,92%Peak-Efficiency CSCR-First Switched-Capacitor DC-DC Converter,”2024 IEEE International Solid-State Circu
103、its Conference(ISSCC),San Francisco,CA,USA,2024,pp.462-464,doi:10.1109/ISSCC49657.2024.10454555.13 H.W.Then et al.,“DrGaN:an Integrated CMOS Driver-GaN Power Switch Technology on 300mm GaN-on-Si with E-mode GaN MOSHEMT and 3D Monolithic Si PMOS,”2023 International Electron Devices Meeting(IEDM),San
104、Francisco,CA,USA,2023,pp.1-4,doi:10.1109/IEDM45741.2023.10413680.14 H.W.Then et al.,“Advanced Scaling of Enhancement Mode High-K Gallium Nitride-on-300mm-Si(111)Transistor and 3D Layer Transfer GaN-Silicon Finfet CMOS Integration,”2021 IEEE International Electron Devices Meeting(IEDM),San Francisco,
105、CA,USA,2021,pp.11.1.1-11.1.4,doi:10.1109/IEDM19574.2021.9720710.15 A.Elsherbini et al.,“Enabling Next Generation 3D Heterogeneous Integration Architectures on Intel Process,”2022 International Electron Devices Meeting(IEDM),San Francisco,CA,USA,2022,pp.27.3.1-27.3.4,doi:10.1109/IEDM45625.2022.100194
106、99.16 M.Davies et al.,“Advancing Neuromorphic Computing With Loihi:A Survey of Results and Outlook,”Proceedings of the IEEE,vol.109,no.5,pp.911934,May 2021,doi:https:/doi.org/10.1109/jproc.2021.3067593.17 S.B.Shrestha,J.Timcheck,P.Frady,L.Campos-Macias and M.Davies,“Efficient Video and Audio Process
107、ing with Loihi 2,”ICASSP 2024-2024 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),Seoul,Korea,Republic of,2024,pp.13481-13485,doi:10.1109/ICASSP48485.2024.10448003.18 S.Meyer et al.,“A Diagonal Structured State Space Model on Loihi 2 for Efficient Streaming Sequence
108、Processing.”arXiv preprint arXiv:2409.15022(2024).19 Neyens,S.,Zietz,O.K.,Watson,T.F.et al.Probing single electrons across 300-mm spin qubit wafers.Nature 629,8085(2024).20 Xue,X.,Patra,B.,van Dijk,J.P.G.et al.,“CMOS-based cryogenic control of silicon quantum circuits,”Nature,vol.593,no.7858,pp.2052
109、10,May 2021,doi:https:/doi.org/10.1038/s41586-021-03469-4.21“The case for silicon again,”Nature Electronics,vol.5,no.3,pp.123123,Mar.2022,doi:https:/doi.org/10.1038/s41928-022-00750-w.Figure 1.1.1:The innovation matrix.Figure 1.1.2:Silicon Scaling the journey continues with High NA EUV.oFlInte s iyo
110、undr ni reonepi A NhgHing Hn iUVA E MHV xeFl luRngi seDelbi sel ya l reOvdanoi tulosRe noi tacfii lpmiSttnmeecnavd eRyt i l iba i raV&pi-f l ahmn8o noi tcudeemi trevohct i e Ena?narmrfoPeselb enc tuphguorThbdleYirehgiH&AEDsielifpSim?lRedevorpIm?te mTiFaster?yt i l ibai letkrMaot FadevorpIm?&tsCorewL
111、o?SWSystem ArchitectureEcosystemCollaborationSWSystem ArchitectureEcosystemCollaborationInterconnectPower DeliveryBeyond Classical ComputingSiPackaging3DICFigure 1.1.3:Full field size capability of High NA EUV.Figure 1.1.4:Lateral and vertical interconnect scaling density and energy efficiency oppor
112、tunities.10 x000,0 t tc ce er rDisorevDiFo revFoFu tcerDisoerutu 2mm/BIEM s s so o or r re e ev v vFoFoFo ndepBumc cpitpitp pBumt tc ce er rDiDiBum PoBu2mm/00010yitsmm 0 011hB Buh t i/bpJ10.0rewo/000,0001,yitnsdepumm11hchp pitpitump PSyitsneDtcen PBBegackPadradnaBSt t i/bpJ520.rewPom/772-330yitnsdep
113、 Bumm3636-5555h chpitp itB Bum t i/bpJ510.rewPo-040yitnsdepBum252525-505050h hchcpitpitpipBum2mB Bumm .20dBm Adjac ent-Channel I I P3 Cons uming les s t han 25mW”Stef van Zanten,Ronan van der Zee,Bram Nauta University of Twente,Enschede,The Netherlands 2024 Takuo Sugano Award for Outstanding Far-Eas
114、t Paper “A 12GS/s 12b 4 Time-I nt erleaved Pipelined ADC wit h Comprehens ive Calibrat ion of TI Errors and LinearizedI nput Buffer”Yuefeng Cao1,Minglei Zhang1,Yan Zhu1,R.P.Martins1,2,Chi-Hang Chan1 1University of Macau,Macau,China 2Instituto Superior Tecnico/University of Lisboa,Lisbon,Portugal 202
115、4 Jack Kilby Award for Outstanding Student Paper “0.25-t o-4GHz Harmonic-Res ilient Rec eiver wit h Built-I n HR at Ant enna and BB Ac hieving+14/+16.5dBm 3rd/5t h I B Harmonic B1dB”Soroush Araei,Shahabeddin Mohin,Negar Reiskarimian Massachusetts Institute of Technology,Cambridge,MA 2024 I SSCC Awar
116、d for Outstanding Forum Presenter “Ex t ending and Augment ing Analog wit h Digit al t o Overc ome Tec hnology Sc aling Limit at ions”Alvin L.S.Loke IEEE Solid-State Circuits Society,San Diego,CA 2024 I SSCC Award for Outstanding Forum Presenter “Predic t ion and Mit igat ion of Spurs in Frac t iona
117、l Synt hes izers”Michael Peter Kennedy University College Dublin,Dublin,Ireland 2024 Demonstration Session Certificate of Recognition “LSPU:A Fully I nt egrat ed Real-Time LiDAR-SLAM SoC wit h Point-Neural-Net work Segment at ion and Mult i-LevelkNN Ac c elerat ion”Jueun Jung1,Seungbin Kim1,Bokyoung
118、 Seo1,Wuyoung Jang1,Sangho Lee1,Jeongmin Shin1,Donghyeon Han2,Kyuho Jason Lee1 1Ulsan National Institute of Science and Technology,Ulsan,Korea 2Massachusetts Institute of Technology,Cambridge,MA 124 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/AWARDS979-8-3315-4101-9/25/$31.00
119、2025 IEEEI SSCC AWARDS 2024 Demonstration Session Certificate of Recognition “ATOMUS:A 5nm 32TFLOPS/128TOPS ML Sys t em-on-Chip for Lat enc y Crit ic al Applic at ions”Yoonho Boo,Jaewan Bae,Minjae Kwon,Karim Charfi,Jinseok Kim,Hongyun Kim,Myeongbo Shim,Changsoo Ha,Wongyu Shin,Jae-Sung Yoon,Miock Chi
120、,Byungjae Lee,Sungpill Choi,Donghan Kim,Jeongseok Woo,Seokju Yoon,Hyunje Jo,Hyunho Kim,Hyungseok Heo,Young-Jae Jin,Jiun Yu,Jaehwan Lee,Hyunsung Kim,Minhoo Kang,Seokhyeon Choi,Seung-Goo Kim,Myunghoon Choi,Jungju Oh,Yunseong Kim,Haejoon Kim,Sangeun Je,Junhee Ham,Juyeong Yoon,Jaedon Lee,Seonhyeok Park,
121、Youngseob Park,Jaebong Lee,Boeui Hong,Jaehun Ryu,Hyunseok Ko,Kwanghyun Chung,Jongho Choi,Sunwook Jung,Yashael Faith Arthanto,Jonghyeon Kim,Heejin Cho,Hyebin Jeong,Sungmin Choi,Sujin Han,Junkyu Park,Kwangbae Lee,Sung-il Bae,Jaeho Bang,Kyeong-Jae Lee,Yeongsang Jang,Jungchul Park,Sanggyu Park,Jueon Par
122、k,Hyein Shin,Sunghyun Park,Jinwook Oh Rebellions,Seongnam-si,Korea I SSCC 2025 Silkroad Award “A 6.78MHz Single-St age Regulat ing Rec t ifier wit h Dual Out put s Simult aneous ly Charged in a Half Cyc le Ac hieving 92.2%-Effic ienc y and 131mW Out put Power”Quanrong Zhuang Nanjing University,Nanji
123、ng,China I SSCC 2025 Silkroad Award “A 0.024mm2 All-Digit al Frac t ional Out put Divider wit h 257fs Wors t-Cas e J it t er Us ing Split-DTC-Bas ed Bac kground Calibrat ion”Yan Yu National University of Defense Technology,Changsha,China Hunan University,Changsha,China I SSCC 2024 Student-Research P
124、review(SRP)Poster Award “8:1 Mult iplex er Driving J os ephs on Arbit rary Waveform Synt hes izer for Quant um Applic at ions”Yerzhan Kudabay TU Braunschweig,Braunschweig,Germany I SSCC 2024 Student-Research Preview(SRP)Poster Award “A 40 W Room Temperat ure Gas Sens or Bas ed on Molec ularly I mpri
125、nt ed Polymers Demons t rat ing SARS-Cov-2 and D-Gluc os e Aeros ol Sens ing”Ryan Burns University of California,San Diego,CA I EEE SOLI D-STATE CI RCUI TS SOCI ETY AWARDS 2023 Journal of Solid-State Circuits Best Paper Award Deep-Learning-Bas ed I nvers e-Des igned Millimet er-Wave Pas s ives and P
126、ower Amplifiers Emir Ali Karahan,Zheng Liu,Kaushik Sengupta Princeton University,Princeton,NJ 2024 I EEE SSCS Chapter of the Year Award Kerala Section SSCS Chapter(India)Taipei Section SSCS Chapter(Taiwan)2024 I EEE SSCS Student Branch Chapter of the Year Award National Engineering School of Sfax SS
127、CS Student BranchChapter(Tunisia)Tsinghua University SSCS Student BranchChapter(China)2024 SSCS Chapter with Best Educational Program Award National Engineering School of Sousse SSCS Student Branch Chapter(Tunisia)2024 SSCS Chapter with Best Distinguished Lecturer Program Award Central Texas SSCS/CA
128、SS Joint Chapter(USA)Oregon Section SSCSChapter(USA)I EEE Technical Field Awards 2025 I EEE Donald O.Pederson Award in Solid-State Circuits Michiel S.J.Steyaert KU Leuven,Leuven,Belgium “F or pioneering cont ribut ions t o RF CMOS circuit s a nd int egra t ed power conv ert ers.”I SSCC 2025/February
129、 17,2025/9:50 AM 25 DIGEST OF TECHNICAL PAPERS Kamran Entesari Texas A&M University,College Station,TX “F or cont ribut ions t o millimet er-wa v e high-efficiency front ends a nd high-linea rit y mixer-firs t receiv ers.”Brian Ginsburg Texas Instruments,DallasTX “F or cont ribut ions t o CMOS mm-wa
130、 v e ra da rs.”Chih-ming Hung National Chiao Tung University,Hsinchu,Taiwan “F or cont ribut ions t o CMOS digit a lly-a s s is t ed RF des igns.”Patrick Mercier University of California,San Diego,CA “F or cont ribut ions t o low-power a nd energy-efficient circuit s a nd s ys t ems.”KaushikSengupta
131、 Princeton University,Princeton,NJ “F or cont ribut ions t omillimet er-Wa v e a ndt era hert z t echnology in s ilicon-ba s ed int egra t ed circuit s.”Adrian J.Tang NASA Jet Propulsion Laboratory/Calltech,Pasadena,CA “F or cont ribut ions t oMillimet er-Wa v e Sys t ems-on-Chip I ns t rument s for
132、 s pa ce s cience.”Kea-tiong Tang National Tsing Hua University,Hsinchu,Taiwan “F or cont ribut ions t o s ma rt minia t ure elect ronic nos e circuit s a nd s ys t em.”Nan Sun Tsinghua University,Beijing,China “F or cont ribut ions t o nois e-s ha ping ADC a nd mixed-s igna l circuit s.”Aarno Prssi
133、nen University of Oulu,Oulu,Finland “F or cont ribut ions t o direct conv ers ion a nd digit a l RF t ra ns ceiv ers a nd ha rdwa re-a wa re communica t ions s ys t em.”Nils Pohl Ruhr-UniversittBochum,Bochum,Germany “F or cont ribut ions t o wideba nd a nd ult ra-precis e millimet er-wa v e ra da r
134、s ens ors.”Patrick Schaumont Worcester Polytechnic Institute,Worcester,MA “F or cont ribut ions t o t he implement a t ion a nd ev a lua t ion of ha rdwa re s ecurit y.”Makoto Takamiya University of Tokyo,Tokyo,Japan “F or cont ribut ions t o digit a lly cont rolled int egra t ed power ma na gement
135、circuit s.”Shouyi Yin Tsinghua University,Beijing,China “F or cont ribut ions t o energy-efficient AI chip a rchit ect ures.”12025 I EEE Fellows 26 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.3979-8-3315-4101-9/25/$31.00 2025 IEEE1.3 AI Revolution Driven b
136、y Memory Technology I nnovation Jaihyuk Song Corporate President&CTO,Device Solutions Samsung Electronics,Hwaseong,South Korea 1.0 I ntroduction The memory industry is facing unprecedented challenges as it enters the AI era.The“memory wall”phenomenon,which impedes the speed of system improvements an
137、d the evolution of AI algorithms,is intensifying due to existing memory solutions.This limitation restricts the development of AI algorithms.As a result,memory companies have been preparing numerous solutions for several years,gradually applying them to systems,and ultimately providing various produ
138、cts and technologies to meet the future AI industry s power,performance,and capacity requirements.11 Recent AI Trends AI has been advancing through the application of various algorithms to achieve high speed and accurate results.Generally,as computational complexity increases,the number of bits used
139、 to represent values and the parameter size(indicating more information)also increase,leading to higher result quality.However,due to the large volume of data,obtaining results takes longer and requires more storage capacity.For example,AI based on transformers,which combine complex activation funct
140、ions and attention layers,shows superior performance in various fields compared to the previously popular CNN/RNN/LSTM models,despite its high computational complexity.Particularly,image classification and audio speech recognition are also expanding based on transformers,prominently featured in LLMs
141、 like GPT,which is currently acclaimed as shown in Figure 1.3.1.Additionally,LLMs demonstrate higher performance as larger parameter sizes are used,leading to increased system requirements.1.2 Limitations of Current AI System Traditional systems have increased their computing capability by adding mo
142、re superscalars for branch prediction and multiple out-of-order processing in the front-end.They have also expanded the number of reorder buffers and SRAM-based caches,as well as processing units and frequencies,to enhance overall performance.Additionally,the I/O frequency and pin count of physical
143、memory channels have been increased to ensure sufficient memory bandwidth and continuously supply data to computing resources.However,following the AI revolution,application characteristics have changed significantly,making it difficult to expect performance improvements solely through these methods
144、.AI systems requiring more stream processing,such as matrix operations,need to allocate more silicon area to the execution unit to provide high computing performance.However,these systems have struggled with AI applications as they were designed primarily for general-purpose performance with many co
145、nditional situations like the SPEC industry benchmark.Recently,hardware/software methodologies based on GPU/NPU,which perform workloads in a SIMD/MIMD parallel manner and allocate most of the silicon area to processing units and SRAM buffers while applying a weak cache policy to ensure parallelism,h
146、ave evolved and become mainstream due to their excellent computational power.AI algorithms have also adapted to these characteristics.Despite these trends,the emergence of LLMs,as described previously,has led to inadequate increases in the bandwidth and capacity of existing memory solutions,resultin
147、g in decreased system utilization(memory wall).Additionally,the shift towards a computing environment centered on GPUs/NPUs has also impacted the power budget of data centers.Although GPUs/NPUs are more energy-efficient compared to CPUs,the overall power consumption has increased significantly due t
148、o the substantial rise in total computations as shown in Figure 1.3.2.Furthermore,while the trend of integrating more computing power into a single chip continues,power efficiency does not decrease proportionally with the integration of transistors through process scaling 1.As a result,the overall T
149、DP(Thermal Design Power)of systems has significantly increased.1.3 The Need for New Memory Solutions Efforts are ongoing to optimize AI algorithms to align with hardware advancements and minimize the impact of memory limitations.To implement high-performance models as parameter size increase,optimiz
150、ations such as attention layer optimization 2,weight compression,quantization and pruning,and knowledge distillation are employed to maximize system computational utilization,and improve AI algorithm performance.Various software and algorithmic techniques 3 4 are also utilized to enhance the utiliza
151、tion of computing elements and achieve higher-quality results.However,while these optimization methodologies can provide additional benefits to memory performance,they ultimately require improved memory solutions to address the trade-off curve.Due to the reasons mentioned above,implementing efficien
152、t AI systems using commodity memory solutions based on quantity and cost alone is challenging.It is essential to quickly configure and provide memory solutions that emphasize specific trade-off points between bandwidth,capacity,and power efficiency,tailored to the characteristics of each system,to s
153、ecure competitiveness in the AI industry.In this paper,we introduce the limitations of memory systems in the AI era,as well as the solutions and technologies to overcome them.Section 2 explains the limitations of main memory systems in the current AI environment,including storage constraints and the
154、 diversification of memory requirements.Section 3,4,and 5 discuss memory solutions that maximize the benefits of bandwidth,capacity,and energy efficiency to address challenges in various system environments.Section 6 introduces technological efforts in DRAM,NAND Flash processes and packaging,and sof
155、tware advancements to further enhance these solutions.2.Challenges and Opportunities for AI Memory As previously mentioned,the AI era demands solutions that enhance attributes suited to different application environments.In this Section,we divide these environments into server/cloud and edge/mobile
156、categories and discuss the limitations of memory solutions in each context,as well as the challenges in storage environments and evolving memory requirements.2.1 Challenges for Server/Cloud HPC system In large-scale server and cloud environments,the challenges associated with running AI,especially L
157、LMs,can be summarized as follows:The first challenge is the limitations of memory bandwidth and capacity.To deliver higher-quality results,the parameter sizes of LLMs are increasing,and the performance of GPUs/NPUs has significantly improved to handle these larger models.Consequently,the bandwidth r
158、equirements for AI training and inference have also increased.However,the gap between the required bandwidth and the available memory bandwidth is widening,as illustrated in Figure 1.3.3.During LLM training,which is primarily conducted in server/cloud environments,the system must load the entire mod
159、el into memory.Additionally,each layer s gradient values need to be stored in full precision(FP32).Batch processing is also employed to maximize the utilization of xPUs.As a result,in environment training models with around 6 billion parameters,the required memory capacity increases by more than fiv
160、e times compared to inference,as depicted in Figure 1.3.4.In LLM inference,where response time is crucial,increasing bandwidth is important to minimize latency.For example,in the case of Llama-2 with 13 billion parameters,Figure 1.3.5 shows that higher bandwidth(represented by a steeper slope in the
161、 roofline analysis)results in better performance(OPS).Additionally,the right graph in Figure 1.3.5 shows that as memory bandwidth on the x-axis increases,the decode time on the y-axis,which represents AI inference latency,decreases.The second challenge in AI server/cloud environments is power-relate
162、d issues.Power was not a significant concern in the past;however,with the increased demand for enhanced computing resources in AI server/cloud systems,the total Thermal Design Power(TDP)of the system has also risen significantly.As a result,power reduction in memory has become essential.To meet the
163、demands for increased memory bandwidth and capacity,I/O bandwidth,channels,and ranks(stacks)have been expanded.However,since off-chip memory channels have not undergone significant physical changes,energy consumption due to data transmission does not decrease dramatically with process scaling 5.Furt
164、hermore,achieving higher bandwidth requires circuit configurations that consume additional power,such as equalizers,Pulse Amplitude Modulation(PAM),and termination circuits,which accelerate energy consumption trends.As noted earlier,the increasing parameter sizes of models lead to exponential growth
165、 in power consumption during AI training,as shown in Figure 1.3.6.To mitigate this power increase trend,solutions with lower energy consumption per bit(pJ/bit)are being implemented in cloud/server systems.This trend underscores the critical demand for energy-efficient improvements in memory solution
166、s.2.2 Challenges for Mobile/Client System Driven by increasing demand for data privacy and personalization,on-device AI is becoming widely adopted in various PC and mobile devices.Mobile applications such as image generation and real-time translation require quick response times and high energy effi
167、ciency,while PC applications,like MS Copilot,involve more complex tasks such as document editing,software coding,and various forms of personal assistance.To meet these growing needs,the latest mobile devices and PCs are introducing significant improvements in Neural Processing Unit(NPU)performance.C
168、onsequently,we expect memory bandwidth and capacity requirements to scale accordingly 6 7,at a rate similar to that of server-grade systems.Indeed,we observe that on-device LLM performance scales with memory bandwidth,thereby determining the maximum model size.Our analysis of LLM applications on the
169、 latest smartphones show a memory footprint of around 1GB out of the available 1216GB DRAM capacity,translating to a model size of only 13 billion parameters 8 9 10.This suggests that the LLM size was set based on LP5x data rate(7.58.5Gb/s),to achieve performance around 2030 tokens per second.Notabl
170、y,scaling the parameter count to 7 billion 11 reduces performance to 11 tokens per second,as the memory footprint increases to 3.8GB and average DRAM bandwidth utilization rises to 7590%.I SSCC 2025/February 17,2025/10:45 AM27 DIGEST OF TECHNICAL PAPERS Figure 1.3.7 illustrates our projection of DRA
171、M bandwidth s impact on LLM application performance,assuming DRAM bandwidth is the sole bottleneck.We project the maximum LLM size and the resulting tokens per second performance,given that the average DRAM bandwidth utilization is set to 80%.For example a 1GB LLM model with parameter sizes ranging
172、from 13 billion achieves 2030 tokens per second,assuming current LPDDR5x running at 7.5Gb/s to 8.5Gb/s(a).If we scale model size to 78 billion parameters(b1),20 tokens per second may be achieved with 10.7Gb/s LPDDR5x.Further achieving 25 tokens per second necessitates a transition to LPDDR6 memory r
173、unning at 10.711.8 Gb/s(b2).Enabling an 8GB LLM requires at least six channels of LPDDR6(c),suggesting the need for new memory technologies to scale beyond current capabilities.As our projections suggest,DRAM bandwidth and capacity determine LLM model size and estimated tokens-per-second performance
174、,ultimately influencing the functionality and real-time user experience of the application.With on-device AI applications expected to utilize DRAM bandwidth more extensively(80%)compared to gaming applications(60%DRAM bandwidth),techniques to improve power consumption may become necessary in the nea
175、r future.2.3 Storage Challenges With the evolution of AI computing generations(for example,NVIDIA A100 H100 GB200),the performance requirements for shared storage are expected to double with each generation.This trend is evident in NVIDIA s latest H100 DGX SuperPod system,which comprises eight DGX c
176、ompute nodes sharing a common storage system.This pattern is projected to continue with the next generation,GB200,which is expected to deliver a 2.3-fold increase in compute performance.As indicated by Amdahl s Law 12,merely increasing computing performance has its limitations;shared storage perform
177、ance must also improve in tandem to effectively scale overall system performance.The storage usage patterns for training and inference AI applications differ significantly.In the case of training,shared storage is used to store training inputs and snapshots generated during the training process.For
178、compute-intensive AI applications,optimizing GPU utilization is crucial.Given that SSD I/O latency is approximately 1000 times greater than memory latency,high SSD performance is essential to minimize GPU wait times during data preprocessing and snapshot storage operations.Inference applications can
179、 be classified into LLM,image,and video applications.In recent LLM applications,Retrieval Augmented Generation(RAG)technology is employed to prevent inference errors(hallucinations).RAG-based LLM systems,as shown in Figure 1.3.8,extract embedding vectors from text documents and store them in a vecto
180、r database(DB).These embedding vectors,which represent multidimensional information about the text documents,increase in size as the dimensionality increases.Consequently,vector DB storage requires high-capacity SSDs,as the size of the stored data can be several times larger than the original data.T
181、hus,with the growth in compute node performance and the utilization of shared storage,the bandwidth and capacity of storage systems must continually increase to meet evolving demands.In the overall AI processing pipeline,the computing stage and the data input/output for each stage can be succinctly
182、illustrated as shown in Figure 1.3.9.The data sizes listed are examples and can vary based on the processing capacity of the AI system.Each pipeline stage has distinct data access patterns and speed requirements:Raw Data Reading and Preprocessing:This stage involves reading large amounts of data at
183、once,making read throughput critical.High-capacity storage is required to handle large raw data sizes effectively.Vector DB Search:This stage frequently requires random reads of relatively smaller amounts of data,where low latency is crucial to ensure efficient data retrieval.Check-pointing:During t
184、raining,checkpoints are written intermittently every few iterations.This stage requires high write throughput to handle these sporadic but high-speed write operations effectively.To prevent I/O bottlenecks and optimize the overall system from a comprehensive perspective,efforts are underway to furth
185、er segment the data pipeline 13 14.Enhancing the performance of storage devices themselves is essential for continuously scaling up AI processing capabilities.2.4 Diversification of Memory Requirements As repeatedly emphasized,memory requirements in the market have diversified to such an extent that
186、 traditional classification criteria have become largely obsolete.In cloud environments,low-power operation has become increasingly important due to power budget constraints.Conversely,in mobile and client environments,ensuring adequate bandwidth to support LLM quality and responsiveness(latency)has
187、 become critical.As a result,there is a growing trend to employ different memory solutions based on application characteristics.For instance,as illustrated in Figure 1.3.10,LPDDR-based LPCAMM(Low Power Compression Attached Memory Module)may be used in servers or client devices,while high-performance
188、 and power-efficient memories such as LPW are being utilized at the edge.This reflects a blend of memory solutions tailored to meet the specific needs of various environments and applications.In this evolving landscape,focusing solely on single-dimensional trade-offs in memory solutions may hinder c
189、ompetitiveness and lead to market obsolescence.A more nuanced approach that balances or maximizes the characteristics of DRAM and NAND flash is necessary.The memory market is transitioning into an era of high-value mass customization,where close collaboration between cloud,client,and mobile host dev
190、elopers and memory developers becomes crucial.Moreover,engaging in in-depth discussions regarding requirement elements and trade-offs,and applying integrated development collaboration methodologies,ensures that memory solutions are effectively tailored to meet the diverse and evolving needs of the m
191、arket.3.High Performance Memory Solutions In the era of AI,high performance is the most critical trade-off factor for memory.Enhancing system competitiveness necessitates providing higher bandwidth to accelerate AI workloads.Two notable memory solutions addressing this need are High Bandwidth Memory
192、(HBM)and Processing-In-Memory(PIM)technology,both of which deliver superior internal bandwidth.3.1 HBM DRAM As a flagship memory solution for the AI era,HBM has been evolving with each generation to offer increasingly higher bandwidth and greater capacity.HBM technology has seen continuous advanceme
193、nts from HBM2 to HBM3E,with bandwidth expanding by approximately 1.5 times on average per generation 15 16 17.Notably,HBM4 has achieved a bandwidth increase of up to 2 times compared to HBM3E.To support higher speeds,the application range of High-Metal Gate(HKMG)technology has expanded from just the
194、 base die to include the core die.The total number of channels per stack in HBM4 has doubled compared to the HBM3 generation,and the I/O count has also increased twofold(DQ:from 1k to 2k).To accommodate these changes and reduce bump pitch area and operational power,HBM3E s base die,which previously
195、used DRAM processes,has now incorporated logic processes using Samsung Foundry s FinFET technology.As illustrated in Figure 1.3.11,HBM is set to advance with successive generations,continuously increasing both bandwidth and capacity.To achieve these improvements,the number of channels and I/O connec
196、tions has risen,leading to increased power consumption.This rise in power consumption generates additional heat,which can constrain performance.Consequently,each generation has seen an approximate 10%improvement in energy efficiency through process and internal circuit enhancements.Notably,HBM4 is e
197、xpected to offer over a 40%improvement in energy efficiency compared to HBM3E.The total capacity of HBM stacks has been increasing by 1.5 to 2 times per generation.This growth is facilitated not only by improvements in process technology but also by increasing the number of core dies.HBM3 and HBM3E
198、currently support up to 12-high(H,the stacked number of core dies)configurations,while HBM4 is anticipated to support 16H configurations using Hybrid Copper Bonding(HCB)technology,as discussed in Section 6.To address potential future limitations in HBM stack and performance enhancement,advancements
199、such as Co-Package Optic(CPO)are being explored to enable HBM modules with enhanced capabilities.3.2 PI M Technology The concept of PIM,which aims to maximize the utilization of the internal bus of DRAM,offers a unique solution to computing environments facing bandwidth bottlenecks.By leveraging ban
200、k parallelism,PIM can potentially enhance performance by a factor equal to the number of banks,without altering the external physical channels.To overcome the limitations of existing DRAM channels,which cannot provide higher bandwidth,Arithmetic Logic Units(ALUs)and Floating-Point Units(FPUs)are int
201、egrated near the DRAM cell banks,as shown in Figure 1.3.12.This approach allows data that would otherwise need to be moved to the host for computation to be processed directly near the DRAM cells,with only the reduced results sent to the host.Although this method is limited to performing relatively
202、simple operations,it has shown potential for applications including LLMs in AI,and efforts are underway to commercialize this technology through research and development.128 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.3979-8-3315-4101-9/25/$31.00 2025 IEEE
203、Using this approach,peak bandwidth can be increased by up to 8 times compared to conventional DRAM.Existing PIM systems have demonstrated that system-level performance improvements vary by application but generally achieve around 3 to 4 times better performance(latency reduction)and 70%energy saving
204、s.The PIM block,positioned near the DRAM cell,allows for significant performance improvements and area efficiency despite the area overhead associated with implementing PIM on the DRAM die.Additionally,by integrating circuitry close to the DRAM and operating passively,this approach enhances the exec
205、ution engine rather than dealing with the complexities of a general-purpose CPU/GPU front-end.HBM-based PIM solutions,such as HBM-PIM 18 19,have already been developed and presented.System-level evaluations using FPGA U280 20 and GPU MI100 21 have shown,as illustrated in Figure 1.3.13,that a cluster
206、 of 32 MI-100s targeting the T5-MoE model AI application achieved a 2.55 times performance improvement and a 2.67 times energy efficiency improvement.Additionally,AMD reported benchmark results utilizing HBM-PIM at the recent ISSCC,demonstrating an 85%reduction in power consumption 22.Based on the s
207、uccessful evaluation of PIM technology,we are planning to develop LPDDR5X-based PIM solutions to support AI applications across diverse environments 23.We are also preparing to expand these solutions to server-grade applications and collaborate with industry partners for JEDEC standardization and so
208、lution implementation.4.Energy Efficient Memory Solutions 4.1 Custom HBM(cHBM)Custom High-Bandwidth Memory(cHBM)involves preserving the core-die with DRAM cells from standard HBM generations while tailoring the base-die to meet specific client requirements.The base-die is integral to data path funct
209、ionality,interfacing with the core-die via TSV(Through-Silicon Vias)and PHY circuits for unified access.After assembling the HBM stack,the base-die includes DFT(Design for Test)blocks such as DA(Direct Access)circuit for the observability into the HBM stack and MBIST(Memory Built-In Self-Test)circui
210、ts for the core-dies.Historically,base-dies consumed substantial power due to numerous interconnections and buffers used for HBM PHY(I/O)and core-die connections via TSVs.However,with the introduction of HBM4,power consumption has been reduced through the adoption of advanced logic processes(Samsung
211、 Foundry s FinFET process).This advancement enables the integration of additional logic into the base-die to enhance usability and value of HBM at the request of customers.For instance,as shown in Figure 1.3.14,simplifying connections to the host(computing die)through D2D(Die-to-Die)connections and
212、incorporating some host circuits onto the base-die allows for more efficient use of silicon area on both the host and base-die,enabling the addition of other logic to support extra functionality.As depicted in Figure 1.3.15,Standard HBM4(sHBM4)connections to the host have increased to 2,048 I/O sign
213、als,doubling that of previous generations,placing a substantial burden on the interposer.To address this,serialized D2D connections,such as those used in UCIe-A,have been implemented.These connections reduce the area by 60%compared to traditional HBM4 PHY,shorten the channel length from 6mm to 2mm,a
214、nd lower I/O current by over 37%.Additionally,the unused area on the base-die can be leveraged to integrate the HBM controller for the D2D host,which effectively reduce power consumption of HBM PHY and TSV routing from JEDEC HBM4.However,integrating more complex logic onto the base-die and handling
215、higher bandwidth per pin for D2D connections introduces new challenges,including increased power density and thermal issues,which must be managed effectively.4.2 LLW(Low Latency Wide-I O)and LPW(LPDDR Wide-I O)LLW is designed to address the needs of applications that require both low latency and buf
216、fer memory,such as Extended Reality(XR)applications(AR/VR).This memory technology is being actively developed and promoted for potential applications beyond its initial scope,including on-device AI inference memory and SRAM cache.LLW aims to overcome limitations related to cache capacity by leveragi
217、ng DRAM cache and tiered memory solutions,providing faster access times and enhanced performance in latency-sensitive applications.LPW is tailored for on-device AI applications,offering a high-performance low-power DRAM solution.As depicted in Figure 1.3.16,LPW is specifically designed to support al
218、ways-on AI agents and improve processing speeds for LLMs through high bandwidth.LPW offers superior energy efficiency(1.9 pJ/bit compared to 3.5 pJ/bit for LPDDR5x)while providing high bandwidth(204.8 GB/s).It also supports high-capacity package solutions(16GB+).Currently,LPW is being developed in c
219、ollaboration with System-on-Chip(SoC)manufacturers and is undergoing JEDEC standardization efforts.The introduction of LPW-equipped mobile products is anticipated by 2028,promising significant advancements in mobile AI capabilities and performance.5.Extended Memory Solution for Capacity In tradition
220、al systems,the importance of system RAS(Reliability,Accessibility,Serviceability)has been significant,and this importance is expected to increase even further in AI systems.As multiple systems perform parallel processing of parameters(particularly LLMs)over extended periods,there is a growing need t
221、o ensure the reliable operation of these systems.Solutions are also required for environments where LLM training involves multiple systems sharing a common memory pool.Large-scale memory solutions are increasingly utilizing PCIe interface with growing bandwidth and have predominantly relied on NAND
222、flash(SSD)using the NVMe protocol for page transfer.Recently,there has been a shift towards using CXL interfaces,which offer cache-line access.This shift includes NAND flash and DRAM solutions that support Type 3 interfaces based on the CXL protocol.5.1 SSD The performance and capacity requirements
223、of AI applications are expected to drive the expansion of SSD performance and capacity,as illustrated in Figure 1.3.17.To scale up both performance and capacity,advancements in various technologies are required,including increased NAND flash integration based on QLC(Quad-Level Cell),higher die count
224、s per package,and support for PCIe Gen6/Gen7.Additionally,challenges such as thermal issues and signal integrity,which arise from increasing integration within constrained form factors,need to be addressed.Revisiting SSD hardware and software architectures is also necessary to enhance performance an
225、d capacity under limited power conditions.Ensuring data security and system availability,alongside performance and capacity,is crucial for AI applications.The following sections will examine technologies related to high-capacity high-performance I/O,as well as security and availability,and will disc
226、uss future directions for technological advancements in these areas.5.1.1 Technical Challenges and Advancements in High-Capacity SSDs In the implementation of Ultra High Capacity(UHC)SSDs for AI applications,efforts are being made to maximize mounting space efficiency while addressing the various ch
227、allenges arising from such high-density designs.To enhance space efficiency,it is essential to optimize the trade-offs between NAND Flash integration density(die per package)and the mounting space within the SSD.For example,as NAND capacity increases,both chip size and package size grow,which can le
228、ad to inefficient use of SSD space and a potential reduction in the final SSD capacity.Therefore,a holistic approach to optimizing spatial design is crucial.In UHC SSDs,the increased data capacity due to high integration raises the burden of ensuring data reliability and introduces risks such as red
229、uced Power Loss Protection(PLP)capacity and thermal path limitations,which may necessitate a reduction in performance.Research efforts to address these issues include advanced die-level RAID technology,die-failure recovery techniques,minimization of PLP dump energy,the application of high-capacity p
230、olymer-tantalum capacitors,convection air-path formation,improved thermal dissipation structures through mid-plate designs,and techniques to block electrostatic discharge(ESD)noise from external server environments(for example,DRAM walls and conformal shielded DRAM).5.1.2 Technical Challenges and Ad
231、vancements for High-Performance I/O As multiple NAND chips are connected to a channel to build ultra-high-capacity SSDs,signal degradation occurs due to stub effects.Minimizing stub length with DQ swap functionality can improve PCB-level signal integrity(SI)characteristics,as shown in Figure 1.3.18.
232、To address crosstalk issues in the NAND interface,DQ shielding will become essential.Advanced equalizers,such as continuous time linear equalizers and decision feedback equalizers 24,along with training/re-training technologies and I/O power reduction techniques,will present ongoing challenges.As th
233、e capacity of non-volatile memory continues to increase,the operating speed of high-speed serial PCI Express(PCIe)interfaces is also advancing.With the introduction of PCIe Gen 6.0,PAM-4 signaling will be adopted.Compared to the NRZ(Non-Return-to-Zero)signaling of previous generations,PAM-4 signific
234、antly reduces eye height,as shown in Figure 1.3.19.This reduction results in a substantial decrease in the SI margin.Minimizing crosstalk noise between lanes and power noise-induced jitter is crucial to ensure reliable performance.The design complexity of high-capacity SSDs makes them susceptible to
235、 various power noise sources,including PMIC(Power Management Integrated Circuit)switching noise and switching noise from concurrent high-capacity IC operations.As illustrated in Figure 1.3.20,power noise from various active ICs and PMICs can induce additional jitter,known as Power Supply Induced Jit
236、ter(PSIJ).Increased I/O speeds exacerbate power noise issues,which are transmitted through the hierarchical Power Distribution Network(PDN)of SSDs.Designing an efficient PDN is particularly challenging in UHC-SSDs.Recently,research is actively focusing on optimizing PDN design for power integrity in
237、 UHC-SSDs using advanced techniques,such as deep reinforcement learning 25.I SSCC 2025/February 17,2025/10:45 AM29 DIGEST OF TECHNICAL PAPERS 5.1.3 Data Security and Availability Enhancement Technologies The primary advantage of SSD solutions lies in their capacity;however,security and availability
238、are also critical for AI systems.To enhance SSD security features,Post-Quantum Cryptography(PQC)is utilized,providing resistance against decryption attempts by quantum computers.Additionally,SSDs support Security Protocol and Data Model(SPDM)version 1.2 and offer integrity and encryption features th
239、rough PCIe.To improve the availability of AI systems,several additional functionalities are implemented.These include dual-port support,fail-in-place operations 26,which allow the SSD to continue functioning even if some NAND dies fail,and predictive methods for SSD failures using telemetry data 27.
240、5.2 CXL-Based Solutions CXL-based solutions offer xPU-agnostic and memory media-agnostic capabilities,allowing for the expansion of memory independently of the CPU,GPU,or memory generation.This approach aims to reduce the overall Total Cost of Ownership(TCO)of systems and is expected to evolve into
241、memory pooling solutions.The development roadmap for each CXL-based solution is illustrated in Figure 1.3.21,and their positions within the system are shown in Figure 1.3.22.Detailed explanations of these solutions are provided below.CMM-D:This memory expansion solution utilizes the CXL interface to
242、 significantly increase the system s DRAM capacity by managing memory attached through a CXL host chip in the E3.S form factor.A single CMM-D can accommodate up to 80 DRAM components,supporting a maximum of 1TB of DRAM.CMM-H:This hybrid memory expander system integrates NAND flash with DRAM,offering
243、 two distinct configurations.The CMM-H TM(Tiered Memory)Supports up to 4TB of NAND flash capacity and uses DRAM as a cache to deliver optimal performance and large capacity.In contrast,the CMM-H PM(Persistent Memory)Supports 32GB of DRAM and includes features such as battery backup and Global Persis
244、tent Flush(GPF).The GPF feature flushes DRAM data to NAND flash in the event of a power loss and restores the data to DRAM for continued use.The CMM-H can streamline the data path between DRAM and NAND in GPU servers for AI applications,enhancing GPU utilization and overall system performance by sim
245、plifying data management.CMM-B:This Memory Pooling Appliance for Rack Computing consists of a system made up of 8 E3.S CMM-D units(PCIe Gen5).It enables disaggregated memory allocation,managing memory capacity as a pool that can be accessed remotely and shared across multiple servers through the net
246、work.CMM-B effectively separates available computing and memory resources,facilitating independent resource allocation within rack clusters.Through composable memory orchestration,it allows for the sharing of memory pools across various hosts and,as illustrated in Figure 1.3.23,combines available me
247、mory to create and allocate larger memory pools where needed.6.Expanding Trade-off Curve:Scaling To meet the growing demands of AI systems,memory solutions must address the trade-offs between performance,power,and capacity.This requires advancements in memory design,processes,and component technolog
248、ies.Currently,memory technology is approaching the limits of independent improvements for each trade-off factor,making it increasingly challenging to develop systems that simultaneously enhance all aspects as effectively as in the past.Despite these limitations,ongoing research and development are f
249、ocused on finding improvements through various technological advancements.Efforts are also being made to fully leverage memory hardware capabilities by extending into the software ecosystem.This involves optimizing how memory is utilized in software to complement advancements in hardware,potentially
250、 yielding better overall system performance and efficiency.By balancing the trade-offs between performance,power,and capacity,and integrating improvements across both hardware and software,it is possible to push the boundaries of what is achievable in memory systems for AI applications and beyond.6.
251、1 DRAM Scaling Technology Currently,cutting-edge scaling technologies,such as 3D transistors and EUV lithography,along with a range of high-speed IPs and architectures,are actively being developed.However,the most significant factor impacting competitiveness remains process migration.In particular,a
252、dvancements in smaller geometry processes are foundational for achieving high performance,large capacity,and low-power consumption.Consequently,Samsung has established a roadmap aiming to realize sub-10nm technology and is actively pursuing research and development in this area.The key technologies
253、in these advanced process technologies can be categorized into four main areas:Patterning,Cell Transistor,Cell Capacitor,and Peri/Core Transistor as shown in Figure 1.3.24.The traditional 6F DRAM cell structure faces several issues from a transistor perspective.These include cell reliability problem
254、s,such as row hammering due to the Buried WL passing adjacent to the transistor s active region,increased parasitic capacitance caused by the proximity of the Bitline(BL)and Bitline Contact(BC),and challenges related to etching,bending,and leakage due to the increased aspect ratio of the capacitor.T
255、o address these issues,new structures are being researched and developed.Prominent examples include:The 4F DRAM cell structure utilizing Vertical Channel Transistors(VCT)as shown in Figure 1.3.25 28,the Vertically Stacked DRAM(VSDRAM)which involves laying the existing cells horizontally and stacking
256、 them in 3D as shown in Figure 1.3.26 29,and the use of new channel materials such as IGZO(Indium Gallium Zinc Oxide)transistors,which help in minimizing leakage current as shown in Figure 1.3.27 28.These advancements are expected to enable continued scaling of DRAM nodes down to the sub-10 nanomete
257、r range.In addition to cell scaling,improving the performance of DRAM peripheral transistors is essential to meet the increasing bandwidth demands.Currently,HKMG(High-k Metal Gate)transistors are in use,but future developments will incorporate FinFET technology as shown in Figure 1.3.28.This transit
258、ion aims to ensure low-power consumption and high-speed operation,addressing the performance needs of next-generation DRAM devices.6.2 NAND Scaling Technology Advancements are being made not only for DRAM but also for NAND Flash to develop high-performance and high-capacity NAND suitable for the AI
259、era.As shown in Figures 1.3.29,1.3.30,various technologies such as BVNAND(Bonding Vertical NAD 30 31)and Multi-BVNAND are actively being developed.BVNAND addresses thermal stress issues associated with the COP(Cell over Peri)structure by using C2C Hybrid Bonding.This method separates the cell and pe
260、ripheral(Peri)layers during fabrication,allowing for the application of advanced processes specifically to the Peri layer,thus overcoming I/O speed limitations imposed by cell process constraints.Additionally,to achieve over 1000 WL(word line)layers,Multi-BVNAND technology is being developed to over
261、come vertical stack and Peri scaling constraints.6.3 Advanced Package Solution Technological advancements aimed at enhancing the competitiveness of DRAM in the AI era are also being applied to DRAM packaging.Specifically,techniques such as TCB(Thermal Compression Bonding),HCB(Hybrid Copper Bonding),
262、and VIMS(Vertical Interconnect Multichip Stack),as shown in Figure 1.3.31,are being employed in stacked DRAM domains like HBM and LPDDR.These technologies dramatically reduce datapath distances between dies compared to traditional methods.This reduction can lower power consumption associated with di
263、e connections,decrease overall package thickness while achieving higher capacities,and increase datapath density.Future advancements will target improved bandwidth,power reduction,and capacity enhancement,to be applied in technologies such as HBM4E,LP6,and LPW.Methods for connecting AI accelerators
264、and DRAM are evolving in multiple directions.Figure 1.3.32 illustrates a roadmap for packaging that uses a 2.5D structure with an interposer to place the host and memory on the same plane.This approach plans to replace traditional high-cost silicon interposer-based connection layers(I-cube S)with I-
265、cube R/E technology,which utilizes panel-level RDL(Redistribution Layer)interposers and silicon bridges.This change aims to provide improved signal integrity(SI)and enable the implementation of large-area chiplet systems as cost-effective single components.However,successful solution implementation
266、will require close technical collaboration with GPU/NPU manufacturers,particularly regarding die floor planning and other technical aspects,as the GPU/NPU and DRAM need to function as a unified product.6.4 A New Memory Hierarchy with Software Support As emphasized throughout this paper,the character
267、istics of all types of memory inherently involve trade-offs.While it is still possible to develop memory systems that simultaneously improve all properties,progress in this area has become very slow and is approaching its limitations.For this reason,the traditional Von Neumann architecture,which emp
268、hasizes a single DRAM/storage structure,needs to evolve.The memory systems discussed in Sections 3,4,and 5 of this paper,which enhance performance,power efficiency,and capacity,are designed to perform their functions effectively within a segmented memory hierarchy.These advancements are expected to
269、provide a more energy-efficient and improved computing environment,capable of accommodating both current AI algorithms and future applications.As illustrated in Figure 1.3.33,under a segmented memory structure,it is imperative for operating systems and system software to undergo structural changes t
270、o support the new hierarchy.Specifically,technologies such as PIM and computational storage require modifications in the operating system and compiler architecture to enable computation within memory.130 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.3979-8-3
271、315-4101-9/25/$31.00 2025 IEEEIn the AI era,memory and storage vendors must go beyond providing new hardware;they must also offer integrated software solutions to leverage this hardware effectively.To this end,Samsung is actively collaborating with the open-source community,operating system vendors,
272、and standardization organizations(for example,Open Compute Project,SNIA)to build an ecosystem for the new memory-storage hierarchy.For example,in the realm of computational storage,SNIA has defined a Computational Storage API to standardize an interface that allows applications to offload computatio
273、n to SSDs 32.Additionally,to fully exploit the high-performance bandwidth of NVMe SSDs,Samsung has provided an open-source system software stack,contributing to the development of the NVMe ecosystem 33.These efforts are aimed at maximizing the benefits of memory and storage solutions across differen
274、t layers of AI systems and providing the best possible environment.7.Conclusion Over the past 70 to 80 years,we have faced numerous challenges while envisioning the AI era and have recently achieved remarkable advancements.To further accelerate this progress,it is essential to integrate collaborativ
275、e efforts from not only hardware providerssuch as CPUs,GPUs,memory,and storagebut also from software architects involved in design and optimization.To advance the semiconductor industry,we must provide comprehensive solutions that encompass everything from basic die design and manufacturing to packa
276、ging.Additionally,leveraging partner design and services is crucial to meet the diverse needs of our customers through collaboration with various partners.This close cooperation will foster innovation and growth throughout the industry,continually driving progress toward our shared goal of creating
277、a better future.We will listen to the voices of our customers and the industry,and we will continue to challenge ourselves to positively impact human life.References:1 I.Kang,“The Art of Scaling:Distributed and Connected to Sustain the Golden Age of Computation,”I EEE I nt erna t iona l Solid-St a t
278、 e Circuit s Conference,vol.65,pp.25-31,2022.2 J.Ainslie,“GQA:Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints,”in Proceedings of t he 2023 Conference on Empirica l Met hods in Na t ura l La ngua ge Proces s ing(EMNLP),Singapore,2023.3 G.-I.Yu,“Orca:A Distributed Servi
279、ng System for Transformer-Based Generative Models,”in 16t h USENI X Sympos ium on Opera t ing Sys t ems Des ign a nd I mplement a t ion(OSDI 22),Carlsbad,CA,2022.4 Y.Leviathan,“Fast Inference from Transformers via Speculative Decoding,”in Proceedings of t he 40t h I nt erna t iona l Conference on Ma
280、 chine Lea rning(I CML),2023.5 N.P.Jouppi,“Ten Lessons From Three Generations Shaped Google s TPUv4i:Industrial Product,”in ACM/I EEE 48t h Annual I nt ernat ional Symposium on Comput er Archit ect ure(I SCA),2021.6 B.Kim,“The Breakthrough Memory Solutions for Improved Performance on LLM Inference,”
281、I EEE Micro,vol.44,no.3,pp.40-48,2024.7 J.Lin,“AWQ:Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.,”Proceedings of Ma chine Lea rning a nd Sys t ems,vol.6,pp.87-100,2024.8 T.Gunter,“Apple intelligence foundation language models,”arXiv preprint a rXiv:2407.21075.,
282、2024.9 G.Team,“Gemini:a family of highly capable multimodal models.,”arXiv preprint a rXiv:2407.21075.,2023.10 Google,“Android on-device AI under the hood,”Android Developers on Youtube,2024.Online.Available:https:/ MLC-team,“MLC-LLM,”2023.Online.Available:https:/ G.M.Amdahl,Computer architecture an
283、d Amdahl s law,IEEE,2013,pp.3846.13 S.W.D.Chien,“Characterizing deep-learning I/O workloads in TensorFlow,”in I EEE/ACM 3rd I nt erna t iona l Works hop on Pa ra llel Da t a St ora ge&Da t a I nt ens iv e Sca la ble Comput ing Sys t ems (PDSW-DI SCS),2018.14 G.Wang,“FastPersist:Accelerating Model Ch
284、eckpointing in Deep Learning,”in a rXiv preprint a rXiv:2406.13768,2024.15 K.Sohn,“A 1.2V 20nm 307GB/s HBM DRAM with at-speed wafer-level I/O test scheme and adaptive refresh considering temperature distribution,”in I EEE I nt erna t iona l Solid-St a t e Circuit s Conference(I SSCC),2016.16 C.-S.Oh
285、,“A 1.1V 16GB 640GB/s HBM2E DRAM with a Data-Bus Window-Extension Technique and a Synergetic On-Die ECC Scheme,”in I EEE I nt erna t iona l Solid-St a t e Circuit s Conference(I SSCC),2020.17 Y.Ryu,“A 16 GB 1024 GB/s HBM3 DRAM with On-Die Error Control Scheme for Enhanced RAS Features,”in I EEE Symp
286、os ium on VLSI Technology a nd Circuit s,2022.18 Y.-C.Kwon,“A 20nm 6GB Function-In-Memory DRAM Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism,for Machine Learning Applications,”in I EEE I nt erna t iona l Solid-St a t e Circuit s Conference(I SSCC),2021.19 S.
287、Lee,“Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology,”in ACM/I EEE 48t h Annual I nt ernat ional Symposium on Comput er Archit ect ure(I SCA),2021.20 S.Kang,“An fpga-based rnn-t inference accelerator with pim-hbm,”in Proceedings of t he 2022 ACM/SI GDA I nt ernat
288、 ional Symposium on Field-Programmable Gat e Arrays (F PGA),2022.21 J.H.Kim,“Samsung pim/pnm for transformer-based ai:Energy efficiency on pim/pnm cluster,”in I EEE Hot Chips 35 Sympos ium(HCS),2023.22 L.Su,“Innovation For the Next Decade of Compute Efficiency,”in I EEE I nt erna t iona l Solid-St a
289、 t e Circuit s Conference(I SSCC),San Francisco,2023.23 S.Hong,“A Reflection and Crosstalk Canceling Continuous-Time Linear Equalizer for High-Speed DDR SDRAM,”in Sympos ium on VLSI Circuit s,2021.24 Y.-H.Kim,“A 16Gb Sub-1V 7.14Gb/s/pin LPDDR5 SDRAM Applying a Mosaic Architecture with a Short-Feedba
290、ck 1-Tap DFE,an FSS Bus with Low-Level Swing and an Adaptively Controlled Body Biasing in a 3rd-Generation 10nm DRAM,”in I EEE I nt erna t iona l Solid-St a t e Circuit s Conference(I SSCC),2021.25 J.Song,“Novel Target-Impedance Extraction Method-Based Optimal PDN Design for High-Performance SSD Usi
291、ng Deep Reinforcement Learning,”I EEE Tra ns.on Signa l a nd Power I nt egrit y,vol.2,pp.1-12,2023.26 Samsung Semiconductor,2019.Online.Available:https:/ Y.Zhang,“MSFRD:Mutation Similarity based SSD Failure Rating and Diagnosis for Complex and Volatile Production Environments,”in USENI X Annua l Tec
292、hnica l Conference(USENI X ATC 24),2024.28 D.Ha,“Highly Manufacturable,Cost-Effective,and Monolithically Stackable 4F2 Single-Gated IGZO Vertical Channel Transistor(VCT)for sub-10nm DRAM,”in I nt erna t iona l Elect ron Dev ices Meet ing(I EDM),2023.29 J.W.Han,“Ongoing Evolution of DRAM Scaling via
293、Third Dimension-Vertically Stacked DRAM,”in I EEE Sympos ium on VLSI Technology a nd Circuit s,2023.30 Z.Huo,“Unleash Scaling Potential of 3D NAND with Innovative Xtacking Architecture,”in I EEE Sympos ium on VLSI Technology a nd Circuit s,2022.31 S.I.Shim,“Trends and Future Challenges of 3D NAND Fl
294、ash Memory,”in I EEE I nt erna t iona l Memory Works hop(I MW),2023.32 SNIA,“Computational Storage API v1.0,”3 10 2023.Online.Available:https:/www.snia.org/sites/default/files/technical-work/csapi/release/Computational_Storage_API_v1.0.pdf.33 Samsung Semiconductor,“DS open ecosystem support,”Online.
295、Available:https:/ 1.3.1:Language Model History.Figure 1.3.2:I ncreasing xPU power consumption.Source:Thermal techniques for higher data center compute density,Hotchips2024Source:https:/ 1.3.3:The Evolution of GPT Model Size,GPU Performance,and HBM Bandwidth.Figure 1.3.4:Memory Capacity Requirements
296、for LLM Training and I nference.Source:https:/ 1.3.5:Effect of Bandwidth on Roofline Model on Llama-2-13b.Figure 1.3.6:Power Consumption by AI Training Model.Source:LLM Inference Unveiled:Survey and Roofline Model InsightsI SSCC 2025/February 17,2025/10:45 AM31 DIGEST OF TECHNICAL PAPERS 1Figure 1.3
297、.7:Suitable Memory Solution by LLM Size and Token/s Requirement.Figure 1.3.8:RAG-Based LLM I nference.Figure 1.3.9:Computing Stage and I nput/Output Data in the AI Processing Pipeline.Figure 1.3.10:Paradigm Shift of Memory Solutions.Figure 1.3.11:HBM Technology Roadmap.Figure 1.3.12:Normal DRAM VS P
298、I M DRAM.32 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.3979-8-3315-4101-9/25/$31.00 2025 IEEEFigure 1.3.13:Performance Gain and Energy Efficiency of HBM-PI M Clustered System.Figure 1.3.14:Block Diagram of Standard HBM(sHBM)VS Custom HBM(cHBM)Base-Die.Fig
299、ure 1.3.15:I nterface Difference of sHBM and cHBM.Figure 1.3.16:LPW DRAM Architecture.?!#?$?%&(?)?#?*!$?%&?&+?,?%&?-?Figure 1.3.17:Advancing SSD Performance and Capacity.Figure 1.3.18:DQ Swap Technology for Ultra High Capacity SSD.?I SSCC 2025/February 17,2025/10:45 AM33 DIGEST OF TECHNICAL PAPERS 1
300、Figure 1.3.19:NRZ vs PAM-4 Signaling for PCI e Host I nterface.Figure 1.3.20:SI Margin Degradation Caused by Jitter I nduced by Power Noise in SSD.?!?#$%?$?!?!?Figure 1.3.21:Roadmap of CXL Memory Module.Figure 1.3.22:Memory Pooling with CMM Solutions.Figure 1.3.23:Enabling AI Platforms with Scalable
301、 Memory.Figure 1.3.24:Four Key Technologies of DRAM Scaling.34 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.3979-8-3315-4101-9/25/$31.00 2025 IEEEFigure 1.3.25:4F DRAM Cell using VCT(vertical channel transistor).Figure 1.3.26:VSDRAM(Vertically Stacked DRAM)
302、.?Figure 1.3.27:I GZO Transistor.Figure 1.3.28:Rapid I ncrease in Bandwidth Requirements and FinFET Compatible with DRAM Process.Figure 1.3.29:BVNAND.Figure 1.3.30:Multi-BVNAND.I SSCC 2025/February 17,2025/10:45 AM35 DIGEST OF TECHNICAL PAPERS 1Figure 1.3.31:TCB,HCB and VI MS.Figure 1.3.32:Roadmap o
303、f Advanced Packaging Solutions.Figure 1.3.33:Memory Hierarchy of the Present and the Future.36 2025 IEEE International Solid-State Circuits ConferenceI SSCC 2025/SESSI ON 1/PLENARY/1.3979-8-3315-4101-9/25/$31.00 2025 IEEEI SSCC 2025/SESSI ON 1/PLENARY/1.4I SSCC 2025/February 17,2025/11:15 AM37 DIGES
304、T OF TECHNICAL PAPERS 1.4 The Crucial Role of Semiconductors in the Software-Defined Vehicle Peter Schiefer President&CEO,Automotive Division,Infineon Technologies,Munich,Germany The world urgently needs new and smart forms of mobility.Pushed by the desire for ever-smarter and ever-more connected ca
305、rs,by the need to comply with ever-stricter emission standards,and by the calls for sustainable and user-centric mobility,the transformation required in the automotive industry is profound.Advancement in automotive innovation greatly depends on microelectronics.According to market experts,that s aro
306、und 90%of all automotive innovations and this rate will remain unchanged in the years to come.Semiconductors play an incredible role in this.They are essential to mastering the challenges of decarbonization as well as digitalization and digital transformation in the vehicle.1.4.1 Megat rends in t he
307、 aut omot ive indus t ry The future car is fully connected,always online,carbon neutral,and autonomous.This vision reflects in the main megatrends of the automotive industry,as shown in Figure 1.4.1.Green mobility,with zero emission becoming real as the key enabler for achieving global emission redu
308、ction targets.Autonomous driving,with the driver becoming a passenger to reduce the number and the impact of road accidents 1.The car becoming ever-smarter strongly linked to the enhancements in connectivity and digitalization.Key to provide tailor-made user experience and updatable communication of
309、 cars within the vehicle,from vehicle to vehicle,from vehicle to infrastructure connectivity finally contributes to all vehicles being a part of the Internet of Things(IoT).Advanced security is paramount to safeguard both personal information and vehicle integrity.1.4.2 Semic onduc t or growt h in t
310、 he aut omot ive arena These megatrends drive the global automotive semiconductor market to a strong growth from about 37 billion USD in 2019 to about 81 billion USD in 2024(almost its double five years later).In 2030,the global automotive semiconductor market is anticipated to about 149 billion USD
311、,as shown in Figure 1.4.2 2.While light vehicle production volumes grow slowly,semiconductor bill-of-materials per car will grow strongly driven by the trends of green mobility,automated driving,and smarter cars.The semiconductor bill-of-materials per car is increasing in line with the strong growth
312、 of electromobility.Already today,a battery electric vehicle(BEV)can have nearly double the semiconductor content as a gasoline or diesel car.Back in 2019,the average car s semiconductor bill-of-materials(BoM)amounted to$425 USD.In 2024,the average car s semiconductor BoM amounts to$750 USD.By 2030,
313、the average battery electric car will be$1,650 USD.The majority growth of the BoM in electric cars come from semiconductors for drivetrain functions,for example,inverters,on-board chargers,battery management systems,complex drivetrain,and an increasing share of silicon carbide(SiC)and gallium nitrid
314、e(GaN).In addition,semiconductor growth is pushed by the level of vehicle automation;for example,up to about 2,500 Euros additional BoM for fully-autonomous cars as in robo-taxis.1.4.3 Value Drivers for Soft ware-Defined Vehic les (SDVs)The Software-Defined Vehicle(SDV)is the key enabler of the main
315、 megatrends of the automotive industry.The SDV is a vehicle whose features and functions are primarily enabled through software 3.There is no doubt that the software-defined vehicle will become a reality,but it will take some time as it requires a change in the electric-electronic(E/E)vehicle archit
316、ecture.The SDV encompasses a range of value drivers that shape the future of the automotive industry,as shown in Figure 1.4.3.These drivers include the monetization of functions and data,allowing manufacturers to generate revenue through new features and services.In fact,car users are increasingly l
317、ooking for features defined by software.SDVs aim to seamlessly integrate into the users digital lives,providing a unified experience across devices.In this context,vehicle personalization and differentiation are key,as SDVs adapt to individual preferences,creating a competitive edge.The SDV makes it
318、 possible to on-demand upgrade the vehicle with new features the customer wants,or to correct deficiencies in existing settings in the car.The process of continuous vehicle improvement is facilitated by over-the-air updates,enabling swift deployment.Consequently,time-to-market is accelerated,allowin
319、g manufacturers to respond quickly to changing customer expectations.The emphasis on reducing system costs enhances competitiveness in the market.In this scenario,managing system complexity is crucial as vehicles become more software-driven,and SDVs offer tools to efficiently handle intricate softwa
320、re and hardware interactions typical of embedded systems development.By integrating DevOps with a unique in-field safe and secure trace capability,every stage of the development and operational lifecycle is ensured to be efficient,safe,and secure.Real-time accuracy for parallel multi-core processors
321、 enables developers to achieve optimal performance and reliability,crucial for complex high-stakes applications.Finally,the use of on-chip trace memory coupled with an integrated logic analyzer provides unparalleled insights and debugging capabilities,allowing for thorough analysis and quick resolut
322、ion of issues,thereby accelerating time-to-market and improving end-product quality.In essence,the value drivers of SDVs span technological innovation,business opportunities,and user experience.SDVs empower car manufacturers(OEMs)to regain system leadership by positioning them as industry innovators
323、 in technology development.Additionally,SDVs contribute to a circular economy through sustainable manufacturing practices and end-of-life considerations-shaping a comprehensive and forward-looking approach to vehicle development 4.1.4.4 Trends for Soft ware-Defined Vehic les (SDVs)The transformative
324、 trends in software-defined vehicles represent a paradigm shift in the automotive industry,shaping a future marked by flexibility,efficiency,and innovation,as shown in Figure 1.4.3.SDVs are characterized by their ability to undergo seamless software updates,upgrades,downgrades,and on-demand modifica
325、tions,ensuring adaptability to evolving technology,security,and user needs.Digital lifecycle management is a key focus,encompassing the entire vehicle lifecycle from design to decommissioning,enhancing efficiency,and sustainability.The decoupling of software from hardware,fostering a hardware-agnost
326、ic approach,allows for greater flexibility and innovation.Optimizing service,signal and power management is crucial for improving overall performance,reliability,and energy efficiency.The trend towards“plug&play”functionality and component reusability streamlines integration,enabling easy upgrades,a
327、nd reducing development costs.A concerted effort to reduce the number of end-point Electronic Control Units(ECUs),the cost of wiring harnesses,and the number of component references simplifies vehicle systems,enhancing efficiency,and maintenance.Agile development methodologies and the standardizatio
328、n of components accelerate software deployment and promote interoperability.Advancements in by-wire applications by-wire braking and by-wire steering,for example eliminate physical connections,enhancing vehicle responsiveness,and enabling advanced driver-assistance and automated driving features.The
329、 increased use of virtual prototyping accelerates development,reduces costs,and ensures high-quality products through comprehensive testing.Logistical improvements guarantee secure software updates and parts delivery,while manufacturing automation driven by robotics and artificial intelligence,enhan
330、ces precision and efficiency.Collectively,these trends define a future where SDVs embody adaptability,digitalization,and streamlined processes throughout the entire vehicle lifecycle,promising a more dynamic and sustainable automotive landscape.1.4.5 The New Elec t ric al-Elec t ronic (E/E)Vehic le
331、Arc hit ec t ure Trans format ion As the current vehicle system cannot manage/enable the SDV,the entire vehicle architecture must change.A holistic system approach of combined evolution in two main areas is necessary for new E/E architectures:a change in complex computing and high-speed in-vehicle n
332、etwork on the one hand,and intelligent power distribution on the other hand.In addition,security is of paramount importance among stakeholders of the transformation.On the computing and network side,the architectural transformation of SDV involves a systematic evolution from a multi-domain architect
333、ure with a central gateway,encompassing domains like human-machine interface(HMI)and cockpit,autonomous control,sensing,and actuation,to a more integrated and efficient structure,as shown in Figure 1.4.4 top.High-Performance Computing(HPC)(green)units serve Advanced Driver Assistance Systems(ADAS)and In-Vehicle Infotainment(IVI)applications,and provide safety companion MCU for service-oriented sys