《SESSION 37 Design-Technology Optimization and Digital Accelerators.pdf》由會員分享,可在線閱讀,更多相關《SESSION 37 Design-Technology Optimization and Digital Accelerators.pdf(411頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2025SESSION 37 Design-Technology Optimization and Digital Accelerators37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference1 of 21IBM Telum IIprocessor design-technology co-optimizations
2、 for power,performance,area,and reliabilityDavid Wolpert1,Gerry Strevig2,Chris Berry1,Leon Sigal3,Bill Huott1,MarkCichanowski2,Matthias Pflanz4,Tobias Werner4,Philipp Salz4,Nick Jing1,Michael Romain1,Iris Leefken4,Richard Serton1,Rajesh Veerabhadraiah5,Dureseti Chidambarrao3,Robert Arelt1,Matt Angya
3、l1,Ben Trombley1,Arvind Haran2,Stefan Hougardy6,Ben Klotz6,Rahul Rao51IBM Poughkeepsie,NY,2IBM Austin,TX,3IBM Yorktown Heights,NY,4IBM Bblingen,Germany,5IBM Bangalore,India,6University of Bonn,Bonn,Germany37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and r
4、eliability 2025 IEEE International Solid-State Circuits Conference2 of 21Outline Design-Technology Co-Optimization(DTCO)Technology optimizationsEnhanced library design Multiple BEOL images in both IP and synthesized blocks DFM+Design optimizationsParticle-aware latch placementLow-power latches and l
5、ocal clock buffers Chip impacts37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference3 of 21Telum IIOverview Samsung 5 nm technology 8x 5.5 GHz cores 10 x 36 MB L2 caches3.6ns access352 GB/s ri
6、ng360 MB virtual L3(chip)2.88 GB virtual L4(drawer)On-die data processing unit Enhanced AI&security unitsFor more chip details,see Streviget al.ISSCC25,paper 2.237.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-Sta
7、te Circuits Conference4 of 21Design-Technology Co-Optimization Telum II:5.5 GHz all-core frequency,99.999999%system uptime Telum IIis IBMs first Samsung 5 nm product Concurrent hierarchical design:Library,IP,gate-level,integrationIBM Product Engineering engages with Samsung Foundry to achieve target
8、sIBM Physical Design+EDA pathfinding provides insights to Product EngineeringIBMProduct EngineeringIBMPhysicalDesignIBMEDASamsungFoundry37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference5 o
9、f 21Design-Technology Co-Optimization by team Numerous areas were touched by DTCO collaborationsIBM PD hierarchical challenges new blockage layers integrated into PDKIBM shape fill built upon Samsung tooling,with PE sign-off,EDA enablementIBMProduct EngineeringIBMPhysicalDesignIBMEDASamsungFoundrySh
10、ape fillCell fillDFMReliabilityLibraryModels/abstractionHierarchical interactionsDevicesPD verificationDesign PPATech PPAPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASFPEPDEDASF37.1:IBM Telum II processor design-technology co-optimizations for power,perform
11、ance,area,and reliability 2025 IEEE International Solid-State Circuits Conference6 of 21More than“DT”COCycle ReachMetal Stack OptionsLibrary MixDFTSystem ReliabilityPower IntegrityIP Design/DefinitionECO StrategyChip-Package InteractionsDevice/BEOL TuningMethodology CheckingArch.PathsALCETALCETALCET
12、ALCETALCETALCETALCETALCETALCETALCETALCETALCET Complexity within each silo can be relieved by other silosArchitectural paths drive logic design choices which require circuit solutionsnecessitating more robust enablement or device/BEOL technology tuningCross-pollination is key understanding where the“
13、cliffs”are in each silo37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference7 of 21Outline Design-Technology Co-Optimization(DTCO)Technology optimizationsEnhanced library design Multiple BEOL
14、images in both IP and synthesized blocks DFM+Design optimizationsParticle-aware latch placementLow-power latches and local clock buffers Chip impacts37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits
15、Conference8 of 21Technology optimizations:Enhanced library image DTCO collaboration between IBM and Samsung Foundry enabled an additional signal track between RXP and RXNDecoupled input pins from PC,allowing M2:PC scaling from 1:1 to 3:2+50%M2 tracks 10-15%area scaling with a fixed library image hei
16、ghtMetalLayerTelum II/Telumwire tracks/umTelum II/Telummetal density(%,unfilled)M11.001.00M21.501.10M31.001.57M41.001.53M51.000.89M61.001.0237.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conferenc
17、e9 of 21Technology optimizations:Complex cell design Design-Academia collaboration between IBM and University of BonnComplex cell design tool:250 complex library cells,5.3M gate instancesFET permutations pruning routing pruning compare&selectFor more details:P.Van Cleeff,et al.,IEEE TCAD,vol.39,no.1
18、0,Oct.2020.37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference10 of 21Technology optimizations:BEOL images DTCO between IBM PD,PE,EDA enabled multiple BEOL grids within a single gate-level b
19、lock structure,reducing area overheadImproved wide wire utilization for embedded high-performance memoryBoundary region constraints to enhance yield and reliability37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-
20、State Circuits Conference11 of 21Technology optimizations:DFM+Telum IIimproves robust via count by 19%,eliminates 66%of non-DFM vias99%DFM scoring in 7 out of 8 layers applied,inc.hierarchical boundary keepoutsIn addition to vias,DTCO process learning drove 178 pattern sensitivity rules10%modeled de
21、fect-driven yield benefit,considering die growth+wire lengthParent blockChild blockDFM keepout rings?Via robustness improvements37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference12 of 21Out
22、line Design-Technology Co-Optimization(DTCO)Technology optimizationsEnhanced library design Multiple BEOL images in both IP and synthesized blocks DFM+Design optimizationsParticle-aware latch placementLow-power latches and local clock buffers Chip impacts37.1:IBM Telum II processor design-technology
23、 co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference13 of 21Design optimizations:Particle-aware latch placement 99.999999%system uptime drove advances in multi-bit error detection Multi-bit flip protection was added to latch clustersLog
24、ic+placement tools enforced spacing of elements sharing a parity groupGroups determined after clustering,limiting impact to optimization steps or PPA SRAMs exploited both circuit&logic interleaving to improve resiliencylatch28latch08Clock bufferlatch16latch06latch10latch17latch03latch29latch18latch1
25、3latch12latch07latch25latch22latch20latch27latch01latch26parity24-31latch19latch15latch24latch21latch02latch04latch14latch09latch23latch30latch05latch31parity16-23latch11latch00Particle strikeparity00-07parity08-15Errors detected37.1:IBM Telum II processor design-technology co-optimizations for powe
26、r,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference14 of 21Design optimizations:Low-power latch designPrevious latchSDCKSDCKNSDCKNSDCKSISDCKSLCKL1NL1SLCKSLCKNL2SDCKNLCKLCKNDNQNSLCKLCKLCKNSLCKNL2NLCKSLCKNLCKN Telum IIlatch counts increased 1.4x,straining power e
27、nvelope “0-state”:Same delay,less clock(6 Tx3 Tx)&logic switching(D=0)37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference15 of 21Design optimizations:Low-power latch design“0-state”latch“Sta
28、tic”latchScan LatchScan_inScan_clkSLCKNSLCKSLCKSLCKNSLCKSLCKNLCKDD0L2QNScan LatchScan_inScan_clkSLCKNSLCKSLCKSLCKNSLCKSLCKNLCKDD0L2QN Telum IIlatch counts increased 1.4x,straining power envelope “0-state”:Same delay,less clock(6 Tx3 Tx)&logic switching(D=0)“Static”:Adds feedback to minimize power fo
29、r all states37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference16 of 21Design optimizations:Low-power latch design“Static”latchScan LatchScan_inScan_clkSLCKNSLCKSLCKSLCKNSLCKSLCKNLCKDD0L2QNS
30、can LatchScan_inScan_clkSLCKNSLCKSLCKSLCKNSLCKSLCKNLCKDD0L2QN“0-state”latch Telum IIlatch counts increased 1.4x,straining power envelope “0-state”:Same delay,less clock(6 Tx3 Tx)&logic switching(D=0)“Static”:Adds feedback to minimize power for all states 85%of logic latches,reduces sequential power
31、20+%,total power 3%37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference17 of 21Design optimizations:Low-power LCB+-clock gating The Telum IIresonant global clock(GCKN)oscillates at 5.5 GHz-ga
32、te local clock buffer(LCB)enables fine-grained clock gatingRemoves scan clock load from the clock meshAllows LCBs to have larger latch counts(meaning fewer LCBs)+7%core latch count,-40%sequential power,-5%total chip powerClock ChopperChopDisable/ScanEnableChopOut1510152025303540455055606510k20k30k40
33、k50k60k10k20k30k40k50k60k000Latches/LCB37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference18 of 21Outline Design-Technology Co-Optimization(DTCO)Technology optimizationsEnhanced library desi
34、gn Multiple BEOL images in both IP and synthesized blocks DFM+Design optimizationsParticle-aware latch placementLow-power latches and local clock buffers Chip impacts37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Soli
35、d-State Circuits Conference19 of 21Chip ImpactsMetricTelum II/TelumDie size1.13xShape count1.25xTx count1.38xWire length1.28xFrequency1.15xPower envelope1.05xPower envelope was a key concern for workloads exercising all coresTotal power envelope growth was limited to 5%CVf=(1.38x+1.28x)/2*1x*1.15x=1
36、.53x,which was reduced to just 1.05xKey cross-stack wins:improved placement density reducing wire RC,improved sequential cell/clock load,voltage control loop speed/accuracyFor more details on voltage control loop,see Webel et al.,ISSCC25,paper 8.137.1:IBM Telum II processor design-technology co-opti
37、mizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits Conference20 of 21Summary Importance of DTCO is increasingNot just“DT”CO Power:1.05x chip power envelope15%core power reduction40%sequential power reduction Performance:5.5 GHz1.15x frequency increase A
38、rea:+1.38x Tx,+1.13x die sizeCore area reduced 20%Reliability:99.999999%system uptime+19%robust vias,-66%standard viasParticle-aware latch placement37.1:IBM Telum II processor design-technology co-optimizations for power,performance,area,and reliability 2025 IEEE International Solid-State Circuits C
39、onference21 of 21Acknowledgements The authors would like to thank the entire IBM Enterprise Systems Z,EDA,Product Engineering and IBM Research teams for all their significant contributions to the success of this project,and Samsung for their collaboration and wafer fabrication.Thank you!37.2:A 2-Dim
40、ensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference1 of 96A 2-Dimensional mm-Scale Network-on-Textiles(kNOT)for Wearable Computing with Direct Die-to-Yarn I
41、ntegration of 0.62.15 mm2SoC and bySPI ChipletsAnjali Agrawal1,Zhenghong Chen1,Braden E.Desman1,Jinhua Wang1,Akiyoshi Tanaka1,Fahim Foysal1,Charlie D.Hess1,Will Farrell2,Jim Owens2,Daniel S.Truesdell1,Benton H.Calhoun11University of Virginia,Charlottesville,VA2Nautilus Defense LLC,Pawtucket,RI37.2:A
42、 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference2 of 96Outline Background and Motivation Prior Art Proposed System Architecture and OperationkNOT A
43、rchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and by
44、SPI Chiplets 2025 IEEE International Solid-State Circuits Conference3 of 96Background:Why Textile?Easily deployable Have large surface area Light Weight Scalable Manufacturing Stretchable 37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integrati
45、on of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference4 of 96Background:Why E-Textile?Distributed Sensing and Storage Real-Time Monitoring Human-Computer Interaction37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die
46、-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference5 of 96Background:E-Textile ApplicationsSportsLife Belt37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC
47、 and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference6 of 96Background:Distributed Systems Light weight;mm-scale disaggregated systems;Capable of multimodal sensing,processing,and storage;Design Requirements:Retain textiles look,feel and comfort;37.2:A 2-Dimensional mm-Scale Ne
48、twork-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference7 of 96Network of chips in textilesSoCRouting chipsSensorMemoryBackground:2-Dimensional Network in TextilesAnother sensing ap
49、plication37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference8 of 96Outline Background and Motivation Prior Art Proposed System Architecture and
50、OperationkNOT ArchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.
51、15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference9 of 96Prior Art:E-textile systemsPros vs Cons:In-fiber 4-wire SPI bus(ISSCC23 12)In-fiber I2C bus(Nature 5,9)No.of pads scales with no.of receiversStatic Power DissipationPros vs Cons:37.2:A 2-Dimensional mm-Scale Ne
52、twork-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference10 of 96Prior Art:E-textile systemsFront6mm4mmBackE-textile sensingE-textile ML inferenceSelf-powered System in Fiber1-Dimens
53、ional(I2C)1-Dimensional(SPI)cm-scale filamentCustomized fibersNeeds InterposerNature20 5Nature21 9ISSCC23 12Pros vs Cons:Pros vs Cons:Pros vs Cons:1-Dimensional(I2C)37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC an
54、d bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference11 of 96Prior Art:Customized SoCs4.7mm3.7mmISSCC23 12JSSC18 18VLSI18 19ISSCC08 11Pros vs Cons:Textile Integrationmm-scale sensingTarget miniaturized applicationsLarge no.of padsOff-chip componentsDo not support direct die attach
55、ment37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference12 of 96Outline Background and Motivation Prior Art Proposed System Architecture and Oper
56、ationkNOT ArchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm
57、2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference13 of 96Proposed kNOT ArchitectureScalable;2-Dimensional;Flexible Topology;1 Input,2 Outputs bySPI chipletSoC chiplet37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn I
58、ntegration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference14 of 96Proposed kNOT Architecture37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE I
59、nternational Solid-State Circuits Conference15 of 96Proposed kNOT ArchitecturekNOT(Network on Textiles)Features:E-textile Systems:Customized SoC:Communication Protocol:Direct-die attachment;Scalability;Multiple Sensing modalities;Reconfigurable pads;Clock Synchronization;Global Bootup;Custom 3-wire
60、bySPI;Flexible Protocol Options;37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference16 of 96Proposed kNOT ArchitectureSystem Diagram:37.2:A 2-Dim
61、ensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference17 of 96Outline Background and Motivation Prior Art Proposed System Architecture and OperationkNOT Archit
62、ectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI C
63、hiplets 2025 IEEE International Solid-State Circuits Conference18 of 96Proposed SoCLinear Reconfigurable Pads Array;SoC Features:SoC-to-SoC Clock Synchronization;Fault Tolerant Global Bootup;Dynamic Network Reconfiguration.37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computin
64、g with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference19 of 96Proposed SoCFast Clock:9-stage Ring Oscillator 8-bit tunability37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Y
65、arn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference20 of 96Proposed SoCFast Clock:Slow Clock:9-stage Ring Oscillator 8-bit tunability 5-stage Current Starved Ring Oscillator 8-bit tunability37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNO
66、Ts)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference21 of 96Proposed SoCFast Clock:Slow Clock:LDO:5-stage Current Starved Ring Oscillator 8-bit tunability 9-stage Ring Oscillator 8-bit tunability No
67、 off-chip components required Stable 1.12V output for Vin 1.37 to 3.3V37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference22 of 96Proposed SoCFas
68、t Clock:Slow Clock:Core:5-stage Current Starved Ring Oscillator 8-bit tunability 9-stage Ring Oscillator 8-bit tunability No off-chip components required Stable 1.12V output for Vin 1.37 to 3.3V Cortex M0+core with 32kB SRAMLDO:37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Com
69、puting with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference23 of 96Proposed SoCFast Clock:Slow Clock:LDO:Core:3-wire SPI Rx:5-stage Current Starved Ring Oscillator 8-bit tunability 9-stage Ring Oscillator 8-bit tunability No
70、 off-chip components required Stable 1.12V output for Vin 1.37 to 3.3V Cortex M0+core with 32kB SRAM Custom 3-wire SPI interface to communicate with upstream chips37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and
71、bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference24 of 96Proposed SoCFast Clock:Slow Clock:LDO:3-wire SPI Rx:Core:PADs Controller:5-stage Current Starved Ring Oscillator 8-bit tunability 9-stage Ring Oscillator 8-bit tunability No off-chip components required Stable 1.12V output
72、 for Vin 1.37 to 3.3V Cortex M0+core with 32kB SRAM Custom 3-wire SPI interface to communicate with upstream chips Reconfigures linear pads array37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 202
73、5 IEEE International Solid-State Circuits Conference25 of 96Outline Background and Motivation Prior Art Proposed System Architecture and OperationkNOT ArchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Compariso
74、n and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference26 of 96Proposed bySPI3-wire bypass SPI;Built-in CS-free reset;bySPI Features:CO
75、TS SPI Compatible;Fully on-chip LDO.37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference27 of 96Proposed bySPIOn-chip Oscillator:Supports timeout
76、 feature to detect dropped SCLK cycles37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference28 of 96Proposed bySPIOn-chip Oscillator:LDO+Power on R
77、eset Supports timeout feature to detect dropped SCLK cycles No off-chip components required37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference29
78、 of 96Proposed bySPIOn-chip Oscillator:LDO+Power on ResetFunction&Direction Ctrl Supports timeout feature to detect dropped SCLK cycles No off-chip components required Selects between 3-wire and 4-wire SPI modes Decides downstream direction37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for
79、Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference30 of 96Proposed bySPIOn-chip Oscillator:LDO+Power on ResetFunction&Direction CtrlCS Generator:Supports timeout feature to detect dropped SCLK cycles No
80、off-chip components required Selects between 3-wire and 4-wire SPI modes Decides downstream direction Custom CS generation to support 4-wire SPI 37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 202
81、5 IEEE International Solid-State Circuits Conference31 of 96Proposed bySPIOn-chip Oscillator:LDO+Power on ResetFunction&Direction CtrlCS Generator:PADs Controller:Supports timeout feature to detect dropped SCLK cycles No off-chip components required Selects between 3-wire and 4-wire SPI modes Decide
82、s downstream direction Custom CS generation to support 4-wire SPI Reconfigures linear pads array37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Confere
83、nce32 of 96Fully Integrated LDOCompact footprint of 0.33 mm 0.34 mmNo off-chip components requiredAble to supply up to 10 mA with a low quiescent current of 5.7 A37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and b
84、ySPI Chiplets 2025 IEEE International Solid-State Circuits Conference33 of 96Fully Integrated LDO4-Transistor Voltage Reference37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE Internation
85、al Solid-State Circuits Conference34 of 96Fully Integrated LDOCompensation Circuit37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference35 of 96Ful
86、ly Integrated LDO1:4 Feedback Network37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference36 of 96Fully Integrated LDOLow Temp Start-up Circuit37.
87、2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference37 of 96Fully Integrated LDOHigh Density on-chip capacitor37.2:A 2-Dimensional mm-Scale Network-
88、on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference38 of 96Outline Background and Motivation Prior Art Proposed System Architecture and OperationkNOT ArchitectureSoC and bySPI Implem
89、entation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE Internat
90、ional Solid-State Circuits Conference39 of 96bySPI protocolSensorS1b1 b3b237.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference40 of 96bySPI proto
91、colStep 1:SensorS1b1 b3b2SCLK:MOSI:CMD:06Step 1:select b1 as target37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference41 of 96bySPI protocolStep
92、 2:SensorS1b1 b3b2Step 2:perform read/write operations with b1Step 1:select b1 as target37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference42 of
93、 96bySPI protocolStep 3:SensorS1b1 b3b2SCLK:MOSI:CMD:02Step 3:unselect b1 and select b2Step 2:perform read/write operations with b1Step 1:select b1 as target37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI
94、Chiplets 2025 IEEE International Solid-State Circuits Conference43 of 96bySPI protocolStep 4:Step 4:b1 bypasses signals to b2SensorS1b1 b3b2Step 2:perform read/write operations with b1Step 1:select b1 as targetStep 3:unselect b1 and select b237.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)fo
95、r Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference44 of 96Step 4:b1 bypasses signals to b2Step 3:unselect b1 and select b2Step 1:select b1 as targetStep 2:perform operations with b1bySPI protocolStep 5
96、:SensorS1b1 b3b2Step 5:Change b2 to 4-wire SPI mode37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference45 of 96Step 4:b1 bypasses signals to b2St
97、ep 3:unselect b1 and select b2Step 1:select b1 as targetStep 2:perform operations with b1SCLK:CS:CMD:08bySPI protocolStep 6:SensorS1b1 b3b2Step 6:drive CS down then start operationStep 5:Change b2 to 4-wire SPI mode37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with D
98、irect Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference46 of 96SCLK:CS:Operation DonebySPI protocolStep 6:Step 7:release CS back after operation doneStep 4:b1 bypasses signals to b2Step 3:unselect b1 and select b2Step 1:select b1 as
99、targetStep 2:perform operations with b1SensorS1b1 b3b2Step 6:drive CS down then start operationStep 5:Change b2 to 4-wire SPI mode37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE Internat
100、ional Solid-State Circuits Conference47 of 96CS:Operation PeriodbySPI protocolRepeat step 6&7Step 4:b1 bypasses signals to b2Step 3:unselect b1 and select b2Step 1:select b1 as targetStep 2:perform operations with b1SensorS1b1 b3b2Step 7:release CS back after operation doneStep 6:drive CS down then
101、start operationStep 5:Change b2 to 4-wire SPI mode37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference48 of 96Outline Background and Motivation P
102、rior Art Proposed System Architecture and OperationkNOT ArchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing wit
103、h Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference49 of 96SoCNon-Volatile Memory Network in TextilesOff-chip Oscillator37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Int
104、egration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference50 of 96Global BootupSensorS1b1 b3b2S2S3S1.MOSIS1.MISO37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Ch
105、iplets 2025 IEEE International Solid-State Circuits Conference51 of 96Global BootupSensorS1b1 b3b2S2S3S1.MOSIS1.MISONo Signal on MISO37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE Inter
106、national Solid-State Circuits Conference52 of 96Global BootupSensorS1b1 b3b2S2S3S1.MOSIS1.MISOSoC is ready37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circui
107、ts Conference53 of 96Global BootupSensorS1b1 b3b2S2S3S1.MOSIS1.MISO37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference54 of 96Global BootupSenso
108、rS1b1 b3b2S2S3S1.MOSIS1.MISO37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference55 of 96Global BootupSensorS1b1 b3b2S2S3b1.MOSIb3.MOSIS2.MOSIBypa
109、ssed signals37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference56 of 96Global BootupSensorS1b1 b3b2S2S3b1.MOSIb3.MOSIS2.MOSI37.2:A 2-Dimensional
110、 mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference57 of 96Clock and Timestamp SynchronizationSensorS1b1 b3b2S2S337.2:A 2-Dimensional mm-Scale Network-on-Textiles(k
111、NOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference58 of 96Clock and Timestamp SynchronizationSensorS1b1 b3b2S2S337.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing wit
112、h Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference59 of 96Clock and Timestamp Synchronization37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC
113、 and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference60 of 96Clock and Timestamp Synchronization37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-
114、State Circuits Conference61 of 96Clock and Timestamp SynchronizationSensorS1b1 b3b2S2S337.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference62 of
115、96Clock and Timestamp SynchronizationSensorS1b1 b3b2S2S337.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference63 of 96Outline Background and Motiva
116、tion Prior Art Proposed System Architecture and OperationkNOT ArchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Synchronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computi
117、ng with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference64 of 96Die Photos37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets
118、 2025 IEEE International Solid-State Circuits Conference65 of 96Measurement Setup:Detailed37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference66
119、of 96Measurement Setup:Simplified37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference67 of 96Measurement Setup:System Test 1SoC Bootup and bySPI
120、Link Setup37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference68 of 96Measurements:SoC Bootup&bySPI Link SetupCheck if SoC is readyWrite 32-bits
121、data packet and read it back37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference69 of 96Measurements:SoC Bootup&bySPI Link SetupSoft reset for b1
122、37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference70 of 96Measurements:SoC Bootup&bySPI Link SetupSelect DN1CMD:0237.2:A 2-Dimensional mm-Scale
123、 Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference71 of 96Measurements:SoC Bootup&bySPI Link SetupSignals bypassed to b237.2:A 2-Dimensional mm-Scale Network-on-Textiles(kN
124、OTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference72 of 96Measurements:SoC Bootup&bySPI Link SetupRead operation from b237.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Comput
125、ing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference73 of 96Measurements:SoC Bootup&bySPI Link SetupRemove BypassCMD:05Select DN2CMD:0337.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with D
126、irect Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference74 of 96Measurements:SoC Bootup&bySPI Link SetupSignals bypassed to b3CMD:05CMD:0637.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-t
127、o-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference75 of 96Measurements:SoC Bootup&bySPI Link SetupSelect all(both b2 and b3)CMD:05CMD:0637.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn
128、Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference76 of 96Measurement Setup:System Test 2Program Execution:Read data from Sensor and Write to Memory37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-
129、Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference77 of 96Measurements:Program ExecutionSensor Setup37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySP
130、I Chiplets 2025 IEEE International Solid-State Circuits Conference78 of 96Measurements:Program ExecutionData from Sensor37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Soli
131、d-State Circuits Conference79 of 96Measurements:Program ExecutionFlash Setup37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference80 of 96Measureme
132、nt Setup:System Test 3SoC-SoC Bootup37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference81 of 96Measurements:SoC-SoC Bootup37.2:A 2-Dimensional m
133、m-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference82 of 96Measurements:SoC-SoC BootupSoC1 Bootup37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable
134、Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference83 of 96Measurements:SoC-SoC BootupUpstream instructions to SoC137.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-
135、Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference84 of 96Measurements:SoC-SoC BootupSoC2 Bootup37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Ch
136、iplets 2025 IEEE International Solid-State Circuits Conference85 of 96Measurements:SoC-SoC BootupReadout data from SoC237.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid
137、-State Circuits Conference86 of 96Measurements:On-chip LDO&Fast Clock10mA0 AILOAD45mV1.12V12sVOUT37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Confer
138、ence87 of 96Textile IntegrationIntegration StepsYarnHuman Hair(for reference)37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference88 of 96Direct-d
139、ie AttachmentbySPI integrated on textile37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference89 of 96bySPI Swatch for TestingJumpers for testing37
140、.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference90 of 96In-textile bySPI Measured WaveformSCLKMOSIMISO10msSetupWriteRead37.2:A 2-Dimensional mm
141、-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference91 of 96kNOT SwatchSoCbySPI37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direc
142、t Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference92 of 96Outline Background and Motivation Prior Art Proposed System Architecture and OperationkNOT ArchitectureSoC and bySPI Implementation bySPI Protocol Global bootup and Clock Syn
143、chronization Silicon Measurement and Textile Integration Comparison and Summary37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference93 of 96Compar
144、ison with SOTA E-Textile Systems*SoC Integration on Textile not demonstrated,#Not demonstrated*Taken from datasheet37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-Sta
145、te Circuits Conference94 of 96Comparison with SOTA E-Textile Systems#Due to pull up resistor when the bus is driven low37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid
146、-State Circuits Conference95 of 96SummarykNOT Highlights:Distributed Network in Textiles;Direct die attachment of mm-scale chips;Custom 3-wire bypass SPI protocol;Reconfigurable array of pads;Global bootup of SoCs;Multimodal sensing,processing,and networking;Fully integrated LDO and clocks;Clock syn
147、chronization and timestamping.37.2:A 2-Dimensional mm-Scale Network-on-Textiles(kNOTs)for Wearable Computing with Direct Die-to-Yarn Integration of 0.6x2.15mm2SoC and bySPI Chiplets 2025 IEEE International Solid-State Circuits Conference96 of 96AcknowledgementsThis research is based upon work suppor
148、ted in part by the Office of the Director of National Intelligence(ODNI),Intelligence Advanced Research Projects Activity(IARPA),via N66001-23-C-4512.The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies,ei
149、ther expressed or implied,of ODNI,IARPA,or the U.S.Government.The U.S.Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.The authors would like to thank RLP-VLSI Lab at UVA,Nautilus Defense,and Brandon Reilly.Thank
150、 you!37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference1 of 38Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inference in MRAM-Embe
151、dded 28nm CMOS Technology with 1.1Mb Weight StorageSamsung Advanced Institute of Technology,Suwon,KoreaSoonwan Kwon,Sungmeen Myung,Jangho An,Hyunsoo Kim,Minje Kim,HyungwooLee,Wooseok Yi,Seungchul Jung,Daekun Yoon,Shinhee Han,Saeyoon Chung,Kilho Lee,Jeong-Heon Park,Kangho Lee,Sang Joon Kim,Donhee Ham
152、37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference2 of 38Outline Background and Motivation In-Memory Computing Microprocessor DesignOverall System Ar
153、chitectureMRAM Crossbar ArrayProcessing Units and DataflowSystem-wide Calibration Measurement Summary37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conferen
154、ce3 of 38 Two-stage AI architecture:Practical solution for always-on AILow-power AI system detects events in always-on modeCentral AI system handles complex on-demand AI tasksBackground:Always-on InferenceFace UnlockVoice CommandHealth CareDetection1stStage(Always-On)EventAPAuthentication2ndStageThi
155、s Work37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference4 of 38Key Requirements for Always-on Inference Low Power ConsumptionDynamic power Reduce usi
156、ng low power inference engine Static power Minimize leakage(weight memory)Key Requirements Non-volatile IMC1)Dynamic Power Reduction 2)Static Power ReductionPower Time1.Dynamic Power Reduction2.Static Power ReductionLeakageSRAMMRAM76%Memory DensitySRAMMRAM23%37.3:Monolithic In-Memory Computing Micro
157、processor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference5 of 38Our Approach Low Power ConsumptionDynamic power reduction End-to-end AI inference processor without external memory access Energy-eff
158、icient IMC crossbar arrays(+system-wide calibration)IMC-optimized dataflowStatic power reduction 126 NVM MRAM IMC storing all weights Multiple power domains37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storag
159、e 2025 IEEE International Solid-State Circuits Conference6 of 38Outline Background and Motivation In-Memory Computing Microprocessor DesignOverall System ArchitectureMRAM Crossbar ArrayProcessing Units and DataflowSystem-wide Calibration Measurement Summary37.3:Monolithic In-Memory Computing Micropr
160、ocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference7 of 3814 MM engines(MMEs),126 XBARsMCU SubsystemMCUCode Memory(128KB)ClockManagerPowerManangerGPIOExternalInterfacePost-engine(Pooling,Scalin
161、g,Bias-addition,Activation)Feature MapMemory Unit(8 Bank)SRAMbank8(64KB)SRAMbank1(64KB)MRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARPE1PE2PE3Shared Shift RegisterMM engine(3 Processing Elements(PEs)=9 XBARs)Feature Map Memory UnitPost-engineAdderMRAM XBAR(128x64)MME1MME2MM
162、E3MME4MME5MME6MME7MME8MME9MME10MME11MME12MME13MME14WR Peri-CircuitInput DriverTIA/ADC ReadoutControl CircuitData FlowControl UnitSharedShiftRegisterInput Fetch UnitOutputStore UnitTIA:TransImpedance Amplifier ADC:Analog-to-Digital ConverterInterconnectOverall Architecture MRAM IMC Crossbar Array1286
163、4 size,binary MultiplicationAnalog computing engine37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference8 of 3814 MM engines(MMEs),126 XBARsMCU Subsyste
164、mMCUCode Memory(128KB)ClockManagerPowerManangerGPIOExternalInterfacePost-engine(Pooling,Scaling,Bias-addition,Activation)Feature MapMemory Unit(8 Bank)SRAMbank8(64KB)SRAMbank1(64KB)MRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARPE1PE2PE3Shared Shift RegisterMM engine(3 Proce
165、ssing Elements(PEs)=9 XBARs)Feature Map Memory UnitPost-engineAdderMRAM XBAR(128x64)MME1MME2MME3MME4MME5MME6MME7MME8MME9MME10MME11MME12MME13MME14WR Peri-CircuitInput DriverTIA/ADC ReadoutControl CircuitData FlowControl UnitSharedShiftRegisterInput Fetch UnitOutputStore UnitTIA:TransImpedance Amplifi
166、er ADC:Analog-to-Digital ConverterInterconnectOverall Architecture Matrix Multiplication EngineHierarchically organized 9 XBARsSignal sharing among 3 XBARs MRAM IMC Crossbar Array12864 size,binary MultiplicationAnalog computing engine37.3:Monolithic In-Memory Computing Microprocessor for End-to-End
167、DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference9 of 38 MRAM IMC Crossbar Array12864 size,binary MultiplicationAnalog computing engineOverall Architecture Full IMC System14 MM engines(126 XBARs)Shared digital comp
168、onentsIMC optimized dataflowSystem-wide calibration Matrix Multiplication EngineHierarchically organized 9 XBARsSignal sharing among 3 XBARs14 MM engines(MMEs),126 XBARsMCU SubsystemMCUCode Memory(128KB)ClockManagerPowerManangerGPIOExternalInterfacePost-engine(Pooling,Scaling,Bias-addition,Activatio
169、n)Feature MapMemory Unit(8 Bank)SRAMbank8(64KB)SRAMbank1(64KB)MRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARMRAMXBARPE1PE2PE3Shared Shift RegisterMM engine(3 Processing Elements(PEs)=9 XBARs)Feature Map Memory UnitPost-engineAdderMRAM XBAR(128x64)MME1MME2MME3MME4MME5MME6MME7MME8MME
170、9MME10MME11MME12MME13MME14WR Peri-CircuitInput DriverTIA/ADC ReadoutControl CircuitData FlowControl UnitSharedShiftRegisterInput Fetch UnitOutputStore UnitTIA:TransImpedance Amplifier ADC:Analog-to-Digital ConverterInterconnect37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inf
171、erencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference10 of 38Outline Background and Motivation In-Memory Computing Microprocessor DesignOverall ArchitectureMRAM Crossbar ArrayProcessing Units and DataflowSystem-wide Calibrati
172、on Measurement Summary37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference11 of 38Readout Electronics(TIA+ADC)VREFOp-ampFETVRVDDTIAMRAM Crossbar Array
173、128 x 64,analog computing engine for binary MMs8 readout electronics,each shared by 8 columns 8 columns share readout electronics(TIA+ADC)Readout ElectronicsReadout ElectronicsReadout ElectronicsW1,1W1,2W1,3W1,128IN1IN2IN3IN1288 columns37.3:Monolithic In-Memory Computing Microprocessor for End-to-En
174、d DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference12 of 38 Reuse foundry-provided MTJ for high reliabilityStacked bit-cell structure to address relatively low resistance of MTJMRAM Crossbar Array Unit-bit cell:rH:
175、26 k(2.0 k),rL:13 k(1.6 k)Stacked MRAM Bit-cellUnit-cell1T-1MTJrH,1rL,1rH,1rL,1rH,1rL,1rH,1rL,1IN1IN1Source LinerH,2rL,2rH,2rL,2rH,2rL,2rH,2rL,2IN2IN2rH,128rL,128IN128IN1288 unit-cellsrH,128rL,128rH,128rL,128rH,128rL,1281T-1MTJFoundry-provided MTJ37.3:Monolithic In-Memory Computing Microprocessor fo
176、r End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference13 of 38TIAColumn 1Column 2Column 7Column 8MRAM Crossbar Array Readout electronics:TIA with ADCOp-amp offset calibration+Resistor calibration=+=1128()Of
177、fset Calibration(3b,Binary)NEGPOSOffsetVoltageCalibrationADCDOUT(4b)VResistor CalibrationResistor Calibration(4b,one-hot)Op-ampVDD_AMPMRAM ArrayBIT_LINECOLUMNSEL(8b)VREFVDDCOLUMNSEL(8b)37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Techno
178、logy with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference14 of 38Outline Background and Motivation In-Memory Computing Microprocessor DesignOverall System ArchitectureMRAM Crossbar ArrayProcessing Units and DataflowSystem-wide Calibration Measurement Summary37.3:Monolith
179、ic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference15 of 38Processing Units based on MRAM XBAR 1stHierarchy:Grouping XBAR by sharing signalsAddress high routing co
180、mplexity in integrating 126 XBARs on a chipCompute two outputs from the same inputShared signal XBARXBARXBAR4b4b4b3x2 SwitchIN(128b)OUT_0(4b)OUT_1(4b)Processing element(PE)Enable(3b)Controls Includes R/W signalsAnalog Domain37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Infere
181、ncing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference16 of 38Processing Units based on MRAM XBAR 2ndHierarchy:Grouping to reduce required memory bandwidthReduce input memory bandwidth Reuse input data among XBARs Reuse input dat
182、a across PEsReduce output memory bandwidth Two outputs to chain post-engine directlyAnalog DomainIFM input(128b)AccumulatorShared Shift RegisterInterface between digital and analogX3X14X10X6X2X13X9X5X1AdderW3W6W9W2W5W8W1W4W7PE3PE2PE1AC1AC23:23:23:2MM Engine(level shifter,isolation cell,)37.3:Monolit
183、hic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference17 of 38 The dataflow leverages the architecture of MM engine4-line wave-patterned input fetch Increase data re
184、useDirect chained post-engine Remove intermediate data access1)4-line wave-patterned input fetch:SRAM access Overall Dataflow 2)Direct chained post-engine:SRAM access,latency 37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with
185、1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference18 of 38Zero-padded regionValid data3 slicesInput feature map(6 6)Slice 1Slice 2Slice 3DataflowControl UnitcontorlsX1,1X1,2X1,3X1,4X1,5X1,6X2,1X2,2X2,3X2,4X2,5X2,6X3,1X3,2X3,3X3,4X3,5X3,6X4,1X4,2X4,3X4,4X4,5X4,6X5,1X5,2X5,3X
186、5,4X5,5X5,6X6,1X6,2X6,3X6,4X6,5X6,6TimeSlice 1Slice 2Slice 3X1,1X1,2X1,3X1,4X1,5X1,6X2,1X2,2X2,3X2,4X2,5X2,6X3,1X3,2X3,3X3,4X3,5X3,6X2,1X2,2X2,3X2,4X2,5X2,6X3,1X3,2X3,3X3,4X3,5X3,6X4,1X4,2X4,3X4,4X4,5X4,6X5,1X5,2X5,3X5,4X5,5X5,6X4,1X4,2X4,3X4,4X4,5X4,6X5,1X5,2X5,3X5,4X5,5X5,6X6,1X6,2X6,3X6,4X6,5X6,6
187、Execution flows in timeNeural engineInput BuffersInput Pattern for a Layer Split IFM to IFM slices,then process one by oneMultiple logical slices,each is read in a 4-line wave patternFor zero-padding regions,IFU provides zero flags to skip computing37.3:Monolithic In-Memory Computing Microprocessor
188、for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference19 of 38Input Reuse Reducing input data access by 33%over line-by-line accessX11X12X13X21X22X23X31X32X33X41X42X43X14X24X34X44X15X25X35X45X16X26X36X46T
189、he wave-patterned patch slidingOverlapping in horizontalOverlapping in verticalX11X12X13X21X22X23X31X32X33X41X42X43X14X24X34X44X15X25X35X45X16X26X36X46The line-by-line patch slidingOverlapping in horizontalFor input feature map(2)/2 4 (2)(2)3 (2)number of SRAM Access 33%less input data access Comput
190、e 2-lines outputsX11X12X13X21X22X23X31X32X33X41X42X43X12X13X22X23X32X33X14X24X34X42X43X44X11X12X13X21X22X23X31X32X33X12X13X22X23X32X33X14X24X3437.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE I
191、nternational Solid-State Circuits Conference20 of 38Dataflow(Cycle 1)Feed input vector X1,X2,X3 to the MM engine 3 XBARs ON,6 XBARs OFFX13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4X13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4AC1=+AccumulatorShared Shift RegisterX3X14X10X6X2X13X9X5X1AdderW3W6W9W2W5W8W1W4W7PE3PE2
192、PE1AC1AC23:23:23:2MM EngineIFM input37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference21 of 38 Push X7 and feed Input vector X5,X6,X7Two partial-sums
193、 are computed(6 XBARs ON,3 XBARs OFF)Dataflow(Cycle 2)X13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4X13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4AC1=1 1+2 2+3 3+AC2=+AccumulatorShared Shift RegisterX7X3X14X10X6X2X13X9X5AdderW3W6W9W2W5W8W1W4W7PE3PE2PE1AC1AC23:23:23:2MM EngineIFM input37.3:Monolithic In-Memory Com
194、puting Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference22 of 38 Complete 1stconvolutional output6 XBARs ON,3 XBARs OFFDataflow(Cycle 3)X13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4X13X14X15
195、X16X9X10X11X12X5X6X7X8X1X2X3X4AC1=1 1+2 2+3 3+5 4+6 5+7 6+CompleteAC2=5 1+6 2+7 3+AccumulatorShared Shift RegisterX11X7X3X14X10X6X2X13X9AdderW3W6W9W2W5W8W1W4W7PE3PE2PE1AC1AC23:23:23:2MM EngineIFM input37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded
196、28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference23 of 38 Complete 2ndconvolutional output3 XBARs ON,6 XBARs OFFDataflow(Cycle 4)X13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4X13X14X15X16X9X10X11X12X5X6X7X8X1X2X3X4AC2=5 1+6 2+7 3+9 4+10 5+11 6+CompleteAc
197、cumulatorShared Shift RegisterX15X11X7X3X14X10X6X2X13AdderW3W6W9W2W5W8W1W4W7PE3PE2PE1AC1AC23:23:23:2MM EngineIFM inputW3W2W137.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-
198、State Circuits Conference24 of 38Output Pattern for a Layer Subsequent operations are executed in a chained mannerReduced to a single output vector when pooling is enabledOutput resolution options:1b,4b,and 8b(for final layer)Y1,1Y1,2Y1,3Y1,4Y1,5Y1,6Y2,1Y2,2Y2,3Y2,4Y2,5Y2,6Y3,1Y3,2Y3,3Y3,4Y3,5Y3,6Y4
199、,1Y4,2Y4,3Y4,4Y4,5Y4,6Y5,1Y5,2Y5,3Y5,4Y5,5Y5,6Y6,1Y6,2Y6,3Y6,4Y6,5Y6,6Outputs of MM engines for slice 1Outputs of MM engines for slice 2Outputs of MM engines for slice 32x2 Max poolO1,1O1,2O1,3O2,1O2,2O2,3O3,1O3,2O3,3When pooling is enabled,O1,1O1,2O1,3O1,4O1,5O1,6O2,1O2,2O2,3O2,4O2,5O2,6O3,1O3,2O3,
200、3O3,4O3,5O3,6O4,1O4,2O4,3O4,4O4,5O4,6O5,1O5,2O5,3O5,4O5,5O5,6O6,1O6,2O6,3O6,4O6,5O6,6When pooling is disabled,Scaling,Bias-additionActivationOSUActivation SRAMEvery 4 cycles37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.
201、1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference25 of 38Outline Background and Motivation In-Memory Computing Microprocessor DesignOverall ArchitectureMRAM Crossbar ArrayProcessing Elements and DataflowSystem-wide Calibration Measurement Summary37.3:Monolithic In-Memory Com
202、puting Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference26 of 38ADCVREFRColumnOp-ampVDDSystem-wide Calibration Categorize three noise types with 7 noise sourcesColumn-variation:Dominan
203、t at TIA 1),2)Layer-dependent variation:VDDvariation caused by different averaged currentResidual Noise:Remaining noises 3),4),6),7)Activated XBARDifferent averaged current from VDDMM engine 1MM engine 2=+=1128()V1)Op-amp offset voltage4)Op-am thermal noise2)R variation6)Comparator thermal noise7)Re
204、ference voltage noise3)Parasitic resistor5)MTJ variationSuppressed bystacked bit-cell structure37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference27 o
205、f 38Column Variation Two calibration points in every readout electronicsTune the op-amp offset and R to minimize the RMS3b configurations for positive and negative inputs in op-amp4b one-hot configuration for resistor valuesEach crossbar array requires a one-time calibration95%XBAR95%37.3:Monolithic
206、 In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference28 of 38Layer-dependent Variation Estimate and pre-adjust errors due to VDDvariationVariation depends on the numbe
207、r of active crossbar arrays These differences cause shifts in MM engine outputsAfter mapping,estimate shifts using random input patternsRe-estimation is required whenever the mapping changesRandom Pattern for CalibrationExpectedMeasuredBias OffsetCombined to Bias-additionRMSE 0.33Error value in MM e
208、ngineSum of 19 XBARsMM engine37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference29 of 38Residual Noise Enhance network model robustness against residu
209、al noiseIncorporate layer-wise noise aware training Train with random noise,not specific measured noise Ensure no chip dependency Tanh Better accuracy when incorporating noise during training ReLU Effective for deep-layer trainingTraining with ReLU(no noise),then gradually retrain one by one using T
210、anh37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference30 of 38Outline Background and Motivation In-Memory Computing Microprocessor DesignOverall Archi
211、tectureMRAM Crossbar ArrayProcessing Units and DataflowSystem-wide Calibration Measurement Summary37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference3
212、1 of 38Chip Micrograph and SummaryFeature MapMemory Unit(SRAM+Interconnector)Post engine+Digital logicMCU Subsystem3513.6m5223.6mMM engineMM engineMM engineMM engineMM engineMM engineMM engineMM engineMM engineMM engineMM engineMM engineMM engineR/W Peri.MRAM Bit-cell arrayADCINAmplifierTechnology28
213、nmCrossbar area0.040mm2Die area19.95mm2Frequency62.5MHzVoltage1V(Digital)1V(Crossbar)1.8V(Input Driver/IO)Crossbar power efficiency59.8 TOPS/WSystem power efficiency20 TOPS/WAccuracy(MNIST)97.62%Accuracy(FDDB)91.3%37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in M
214、RAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference32 of 38EnvironmentThis external MCU is used only for interfacingbetween HOST PC and IMC microprocessorClock frequencyTIA/ADC clock62.5 MHzData clock20.8 MHzComputing clock5.2 MHzSupply
215、 voltagesDigitalMCU sub-system1 VSRAM1 VDigital Logic1 VAnalogInput driver(level shifter)1.8 VPeripheral1 VMRAM crossbar array and TIA1 VADC1 VDefault conditionExperiment environmentEvaluation boardAll experiments performed under default condition Including power and accuracy evaluation This is not
216、maximum operating condition.For example,the TIA/ADC can operate at speeds of up to 200MHz37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference33 of 38Me
217、asurements of MRAM XBAR Measured across 126 XBARs on a single chip 95%of errors are within 1 LBSAt 1V,59.8 TOPS/W power efficiency For comparison,at 0.7V,103 TOPS/W,but 13%error exceeds 1 LSBExpected output codeMeasured output codeHeat-map of output matrixErrors in 126 XBARsXBAR Power BreakdownRead/
218、Write PeripheralADCInput Driver+Level shifter34%27%8%3%28%TIAMRAM Cells The network models are trained to be robust,assuming 90%of errors are confined within 1 LSB95%37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Wei
219、ght Storage 2025 IEEE International Solid-State Circuits Conference34 of 38Measurement System 20 TOPS/W,including all components on the chipAnalog consumes 42%(MRAM bit-cell array+ADC+TIA)Digital logic:Interconnector,post-engine,and data-path logicPower BreakdownMeasurements with 100 Chips42%37.3:Mo
220、nolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference35 of 38Measurement Accuracy 8-layer VGG model(MNIST)Face-detection model(FDDB)All weight stored but exter
221、nal feature map tiling(due to large image)MNIST:97.62%(HW),99.45%(SW baseline)Face-detection:91.3%(HW),92.51%(SW baseline)Both in same bit resolution(CONV1)(CONV2)(CONV3)(CONV4)(CONV5)(CONV6)(CONV7)(CONVx-2)(CONV8-1)(CONV7-1)(CONV6-1)(CONV8)Digital blockSRAMMCUCONV1(64,3)CONV2(64,64)CONV3(64,64)CONV
222、4(64,64)CONV5(64,64)CONV6(64,64)CONV6-1(64,64)CONV6-2(64,42)CONV7(64,64)CONV7-1(64,64)CONV7-2(64,2)CONV8(64,64)CONV8-1(64,64)CONV8-2(64,2)CONCATCONV1(128,128)CONV3(128,128)CONV4(128,128)CONV5(128,128)CONV6(128,128)FC7(128,128)FC8(10,128)CONV2(128,128)(CONV1)(CONV1)(CONV2)(CONV2)(CONV3)(CONV3)(CONV4)
223、(FC8)(FC7)(CONV6)(CONV6)(CONV5)(CONV5)(CONV4)Post engine(+extra logic)SRAMMCU Redesigned from FaceBoxes*,8b thermometer input*Faceboxes:A CPU real-time face detector with high accuracy.2017 IEEE International Joint Conference on Biometrics(IJCB).IEEE,201737.3:Monolithic In-Memory Computing Microproc
224、essor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference36 of 38Face Detection ModelTraining dual-path modelBounding-box generation pathConfidence score generation pathInference with our processorConf
225、idence score generation path Detect faces37.3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference37 of 38Comparison Demonstrate system-level performance37.
226、3:Monolithic In-Memory Computing Microprocessor for End-to-End DNN Inferencing in MRAM-Embedded 28nm CMOS Technology with 1.1Mb Weight Storage 2025 IEEE International Solid-State Circuits Conference38 of 38Summary First MRAM-based IMC microprocessor for end-to-end inferenceIntegrate 126 MRAM XBARs M
227、itigate non-idealities with system-wide calibrationIntroduce a system-level architecture with IMC optimized dataflow Demonstrate power efficiency both for XBAR and System59.8 TOPS/W for MRAM XBAR20 TOPS/W at the system level System-level efficiency includes all components within a chip37.4:SHINSAI:A
228、 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference1 of 45SHINSAI:A 586mm2Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck MemoryBo Jiao*,Haozhe Z
229、hu*,Yuman Zeng,Yongjiang Li,Jie Liao,Siyao Jia,Zexing Chen,Jun Tao,Chixiao Chen,Qi Liu,Ming LiuFudan University,Shanghai,China,*Equally Credited Authors(ECAs)37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE Internation
230、al Solid-State Circuits Conference2 of 45Outline Background and Motivation SHINSAI Active TSV Interposer Overview SHINSAI DesignHeterogeneous dual-layer NoAI(Network-on-Active-Interposer)Programmable Horizontal Die-to-Die Link&FabricReconfigurable Vertical Link as 3D NoC Bridges Silicon Implementati
231、on and Measurement Results Summary37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference3 of 45Outline Background and Motivation SHINSAI Active TSV Interposer Overview SHINSAI De
232、signHeterogeneous dual-layer NoAIProgrammable horizontal Die-to-Die Link&FabricReconfigurable Vertical link as 3D NoC bridges Silicon Implementation and Measurement Results Summary37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory
233、 2025 IEEE International Solid-State Circuits Conference4 of 45GPUReticle LimitMonolithicChipletSource:AMDServer CPUChiplet&Advanced Integration Technology Chiplet and advanced integration start a new era for More Moores and More-than-Moore systems.200620082009201020122013201420162017201920202021202
234、22023202420252026Die Size(MM)10010001000037.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference5 of 45From Passive Interposer to Active Interposer Passive Interposer&2.5D Integra
235、tion:Provides electrical interconnection and capacitors only Active Interposer&3D Integration:Provide electrical interconnection and integrates active components37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE Internat
236、ional Solid-State Circuits Conference6 of 45Challenge 1:Reusability LimitCCDCCDCCDXCDXCDTSVTSVTSVTSVTSVCMCMCMCMCMCMCMCMCMCMCMCMCMCMHBMHBMHBMHBMCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSIC
237、MCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCP
238、HYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYCSICMCPHYIODIOD Chiplet Motivations:Cost driven,Heterogeneous integration,Modularity Interposer NRE Cost:Unaffordable for low-volume products,Complex design flow,Long design cycleAMD,ISSCC20
239、2437.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference7 of 45Challenge 1:Reusability Limit Cost Motivation for Active interposer:Act TSV Interposer reusable,compatible with var
240、ious top dies and IO/bump maps.How to make Network-on-Active-Interposer(NoAI)reusable fordiverse topologies?Different Chiplets/IOsInterposer Reusability37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Sol
241、id-State Circuits Conference8 of 45Challenge 2:Scalability Limit As compute scales,memory also needs scaling.Memory bandwidth has become a systematic bottleneck.IBM Research,IEDM2018Shao,MICRO2019024131321.52.5Increase compute onlyIncrease bandwidth onlyIncrease compute and bandwidthIncrease Factor
242、for Compute or BandwidthSystem Performance Improvement37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference9 of 45Challenge 2:Scalability Limit More embedded memory is crucial f
243、or system performance.Higher data transfer rate boosts the compute efficiency.Insufficient memory to support compute scalabilityAMD,ISSCC2022IntAct,ISSCC20213D V-cache improves system performance37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D U
244、nderdeck Memory 2025 IEEE International Solid-State Circuits Conference10 of 45Challenge 3:Diverse D2D Communication Different deployment strategies lead to varying communication requirements between chiplets,resulting in diverse demands on bandwidth and computing resources.Die2Die0Die3Die1Layer-to-
245、layerAll ReduceDie0Die1ScatterDie0Die1Die2MemoryWeightWeightWeightDie2Die337.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference11 of 45Outline Background and Motivation SHINSAI
246、Active TSV Interposer Overview SHINSAI DesignHeterogeneous dual-layer NoAIProgrammable Horizontal Die-to-Die Link&FabricReconfigurable Vertical Link as 3D NoC bridges Silicon Implementation and Measurement Results Summary37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Intercon
247、nect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference12 of 45Active TSV InterposerTop Die#1Top Die#15Top Die#0.SubstrateHigh-speedIOs.Low-speed Peripherals V-Link Interface#0V-Link Interface#1V-Link Interface#15Programmable Horizontal Die-to-die Interconne
248、ct Fabric on Interposer(H-Link)CPUbumpTop DieActive TSV InterposerV-LinkH-LinkTSVC4Bump.V-Link Interface#2IVRIVRIVRSRAM Bank#0NoCSRAM Bank#7SRAM Bank#1SRAM Bank#2SHINSAI Active TSV InterposerFeature 1:Heterogeneous dual-layer NoAI with 3D Underdeck SRAMFeature 2:Programmable horizontal Die-to-Die Li
249、nk&Fabric(H-link)Feature 3:Reconfigurable vertical link as 3D NoC bridges(V-link)Feature1Feature2Feature337.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference13 of 45Outline Bac
250、kground and Motivation SHINSAI Active TSV Interposer Overview SHINSAI DesignHeterogeneous Dual-layer NoAIProgrammable Horizontal Die-to-Die Link&FabricReconfigurable Vertical Link as 3D NoC Bridges Silicon Implementation and Measurement Results Summary37.4:SHINSAI:A 586mm2 Reusable Active TSV Interp
251、oser with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference14 of 45NoC:Packet Switching vs.Circuit Switching Circuit switching and packet switching have different advantages.PacketSwitchingCircuitSwitchingPath DeterminismDiverse pa
252、thDedicated pathPath ExclusivityNon-exclusiveExclusiveData TypeContinuous data streamTransmission of packetsPath ConfigurationNoYesFlexibilityLatencyScalabilityComplexity37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE
253、 International Solid-State Circuits Conference15 of 45Heterogeneous Dual-layer NoAIActive TSV InterposerTop Die#1Top Die#15Top Die#0.SubstrateHigh-speedIOs.Low-speed Peripherals V-Link Interface#0V-Link Interface#1V-Link Interface#15Programmable Horizontal Die-to-die Interconnect Fabric on Interpose
254、r(H-Link)CPUbumpTSVC4Bump.V-Link Interface#2IVRIVRIVRSRAM Bank#0NoCSRAM Bank#7SRAM Bank#1SRAM Bank#2 NoAI:Programmable circuit-switching 2.5D interconnect+packet-switching 3D stacking NoC with underdeck memory.Feature1Packet-switchingCircuit-switching37.4:SHINSAI:A 586mm2 Reusable Active TSV Interpo
255、ser with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference16 of 45Heterogenous Dual-layer NoAI OP1OP2OP3ABCCircuit SwitchingV-LinkNoCUnderdeck SRAM ABDCH-Link For straight-through tasks,e.g.layer pipeline37.4:SHINSAI:A 586mm2 Reusa
256、ble Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference17 of 45Heterogenous Dual-layer NoAIAUpdated DataOP1Packet SwitchingV-LinkNoCUnderdeck SRAM ABDCH-Link For memory-bandwidth-intensive tasks,e.g.weight
257、shuffling37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference18 of 45Heterogenous Dual-layer NoAI OP1OP4OP3OP2OP6OP5ABCPacket+CircuitSwitchingV-LinkNoCUnderdeck SRAM ABDCH-Link
258、 For multi-branch tasks,e.g.residual paths among layers37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference19 of 45Heterogenous Dual-layer NoAI Resnet-50 simulation results on
259、NoAIInputV-LinkNoCUnderdeck SRAM ABDCH-Link+Packet+CircuitSwitchingRes2a_branch2bRes2b_branch2bRes2c_branch2bRes2a_branch2a37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference2
260、0 of 45Outline Background and Motivation SHINSAI Active TSV Interposer Overview SHINSAI DesignHeterogeneous Dual-layer NoAIProgrammable Horizontal Die-to-Die Link&FabricReconfigurable Vertical Link as 3D NoC Bridges Silicon Implementation and Measurement Results Summary37.4:SHINSAI:A 586mm2 Reusable
261、 Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference21 of 45Programmable Horizontal Die-to-Die Link&Fabric(H-link)Active TSV InterposerTop Die#1Top Die#15Top Die#0.SubstrateHigh-speedIOs.Low-speed Periphera
262、ls V-Link Interface#0V-Link Interface#1V-Link Interface#15Programmable Horizontal Die-to-die Interconnect Fabric on Interposer(H-Link)CPUbumpTSVC4Bump.V-Link Interface#2IVRIVRIVRSRAM Bank#0NoCSRAM Bank#7SRAM Bank#1SRAM Bank#2Feature237.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programma
263、ble Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference22 of 45Circuit-switching Programmable Fabric A programmable circuit-switch matrix for bump-wisely routingTOBCOBTopDieChiplet#1Chiplet#0bumpArraysH-link FabricCross-over-Block(COB)Turn-over-B
264、lock(TOB)bumpIntegration128 Tracks37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference23 of 45bump-to-bump Independent Routing1.High-Speed Multi-lane Buses bump-to-bump indepen
265、dent routing address the limited hardware resources and diverse routing requirements.D2D TXD2D RXD2D TXD2D RXChiplet0 D2D InterfaceChiplet02.External IOsChipletHost(Off-package)SPIHigh speedlinksInterposerChiplet0Chiplet13.Host SynchronizingChiplet2Chiplet337.4:SHINSAI:A 586mm2 Reusable Active TSV I
266、nterposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference24 of 45Tracks:Digital Channel with BuffersTOBCOBTopDieDistance 20 mmSignal Integrity Degradation:Digital waveforms experience signal integrity issues when transmissio
267、n channel distances exceed 2mm.VTVTTransmitterReceiver37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference25 of 45Track:Digital Channel with BuffersTransceiverReceiver Long tra
268、nsmission channel without buffersLong channel needs impedance matching with Equalizationwith Termination 37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference26 of 45Track:Digit
269、al Channel with BuffersDigital TransceiverDigital Receiver Long transmission channel with digital buffers Long transmission channel without active buffersTransceiverReceiverLong channel needs impedance matching with Equalizationwith termination Digital channel enhances signal integrity37.4:SHINSAI:A
270、 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference27 of 45Track:Digital Channel with BuffersDigital TransceiverDigital Receiver Long transmission channel with digital buffers Long transmis
271、sion channel without active buffersTransceiverReceiverLong channel needs impedance matching with Equalizationwith termination Embedded in other active switching circuits37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE
272、International Solid-State Circuits Conference28 of 45Track:Digital Channel with BuffersTracks:high-speed parallel signal lanes in 1.5mm per segment with drivers and DCDL,ensuring signal integrity,drive capability and timing variations.TOB DriverTOB/COB Digital Controlled Delay Line(DCDL)sel0sel1sel1
273、5Delay Cell0inouttofromDelay Cell1intooutfromDelay Cell15intooutfromsel15:0DataInDataOutSelDecoderCOB DriverRXTX ENDCDLPDRXTXENDCDLESD37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits
274、Conference29 of 45-bump Turn-out Block TOB(Turn-out block)achieves arbitrary connections from each bump to track via three-stage switches.TOBTrack TOB Bank 0TOB Bank 1Two-stage 8-8 Switch x8Two-stage 8-8 Switch x82-2 Switch x64COBCOB8-8 Switch017017.2-2 SwitchVDD/2VDD/2.Track 0Track 63Track 64Track
275、127bumps37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference30 of 45Wilton-based Cross-over Block COB(Cross-over block)handle signal direction changes adopting Wilton structure
276、 in FPGA for good routability.Source:Wilton,FPL 1999Eastern Track127:0Western Track127:0 x16Southern Track127:0Northern Track127:0Wilton Connection Blockt0,0t0,1t0,7t0,0t0,1t0,7t1,0t1,1t1,7t1,0t1,1t1,7.x8COBDriverx8COBDriverx8COBDriverx8COBDriverWilton Connection Blockt0,0t0,1t0,7t0,0t0,1t0,7t1,0t1,
277、1t1,7t1,0t1,1t1,7.37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference31 of 45Placement and Routing Process To efficiently implement diverse systems by reusing SHINSAI,an autom
278、atic placement and routing process is proposed.Routing Sequence of All Wire NetsOUTPUTINPUTH-Link Track RoutingSHINSAIArchitectureNetlistsNetConstraintsTop Die PlacementTop Die LocationH-Link FabricConfigurationsH-Link Configuration GenerationHigh-Speed Multi-lane Bus Connecting H-Link BoundaryGener
279、al Nets Connecting H-Link Boundary High-Speed Multi-lane Bus Inside H-Link FabricGeneral Nets Including Pull-up/down PortsGeneral Nets Inside H-Link Fabric37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International
280、Solid-State Circuits Conference32 of 45Outline Background and Motivation SHINSAI Active TSV Interposer Overview SHINSAI DesignHeterogeneous Dual-layer NoAIProgrammable Horizontal Die-to-Die Link&FabricReconfigurable Vertical Link as 3D NoC Bridges Silicon Implementation and Measurement Results Summa
281、ry37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference33 of 45Reconfigurable Vertical link as 3D NoCbridges(V-link)Active TSV InterposerTop Die#1Top Die#15Top Die#0.SubstrateHi
282、gh-speedIOs.Low-speed Peripherals V-Link Interface#0V-Link Interface#1V-Link Interface#15Programmable Horizontal Die-to-die Interconnect Fabric on Interposer(H-Link)CPUbumpTSVC4Bump.V-Link Interface#2IVRIVRIVRSRAM Bank#0NoCSRAM Bank#7SRAM Bank#1SRAM Bank#2Feature337.4:SHINSAI:A 586mm2 Reusable Activ
283、e TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference34 of 45Fully Digital 3D InterfaceSource:UCIe 2.0 Ultra-short link for 3D stacking feature negligible EM loss.A fully digital PHY is used for 3D inter-chiplet c
284、ommunication.PHYPHYCKbufxNbumpDie 1Die 2FlopsTxTop DieBase DieFlopsRxCKCKxNProtocolLayerAdapterLayerAdapterLayerProtocolLayer37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conferenc
285、e35 of 45V-Link:Fully Digital Reconfigurable 3D Link V-link ImplementationConsist of 5 channels of 32-lane data transmission with specific protocolReconfigurable due to up/down-stream requirementsBi-directional ModuleFIFODFFDFFFIFOCLKFIFODFFDFFFIFOCLKDly_lineDly_lineCLKD0D31Transmitting ModuleReceiv
286、ing ModuleFIFODFFFIFODFFCLKDly_lineD0D31CLKFIFODFFCLKDly_lineD0D31CLKDFFFIFO12337.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference36 of 45Reconfigure V-Link for Diverse Demand
287、sTop DieV-Link4x TX Channel1x RX ChannelBase DieV-Link4x RX Channel1x TX ChannelDownstream ModeTop Die TX:RX=4:1Top DieV-Link2x TX Channel3x RX ChannelBase DieV-Link2x RX Channel3x TX ChannelBalance ModeTop Die TX:RX=2:3Top DieV-Link1x TX Channel4x RX ChannelBase DieV-Link1x RX Channel4x TX ChannelU
288、pstream ModeTop Die TX:RX=1:437.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference37 of 45Reconfigurable Up/Down Stream Bandwidth Varying upstream and downstream demands across
289、different OPs can be optimized by reconfigure the bandwidth.For Compute-intensive OP:WeightInput ActOutput ActUp DownWeightOutput ActInput ActUp DownFor Memery-intensive OP:Top DieBase Die With Underdeck MemoryWeightInputOutput10.728Improvement of 3D reconfigurable V-Link bandwidth3D with fixedV-Lin
290、k bandwidth3D withReconfigurableV-Link bandwidthNormalized Latency0.50.60.70.80.91.01.37x37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference38 of 45Outline Background and Moti
291、vation SHINSAI Active TSV Interposer Overview SHINSAI DesignHeterogeneous Dual-layer NoAIProgrammable Horizontal Die-to-Die Link&FabricReconfigurable Vertical Link as 3D NoC Bridges Silicon Implementation and Measurement Summary37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable I
292、nterconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference39 of 45Integrated Voltage Regulator(IVR)IVRs convert 1.8V input supply voltage into 0.6-1.2V,reducing PDNs IR drops by directly suppling 100A through TSVsPower switches and buck controller circui
293、ts are integrated into the interposer.Passive components are mounted on the substrate.37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference40 of 45Physical Layout EffortsOn-Inte
294、rposer Underdeck Memory512MbTSV Technology/PitchTSV-middle/223mBump Technology/Pitchbump/40mTSVSolderBallC4Bumpbump37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference41 of 45T
295、op Die and 3D IntegrationTop Die LayoutBump Detail3D Cross SectionTSV Detail37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference42 of 45Measurement Results The entire interpose
296、r operates at a clock rate of 400MHz under a 0.9V power supply.Voltage vs Lane Bitrate of V-Link PHY2.7%2.6%10%5%46.1%33.6%Interposer SRAMCPU,NOC,Misc.Vertical Link InterfaceHorizontal LinkFabricIVR+PLLIOsPower consumption breakdown0.9V,400MHz0.50.60.70.80.90200400600800100012001400160018002000Lane
297、Bitrate(MT/s)Voltage(V)0.9V1900MT/s37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference43 of 45Performance SummaryTop Die SHINSAI37.4:SHINSAI:A 586mm2 Reusable Active TSV Inter
298、poser with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference44 of 45Outline Background and Motivation SHINSAI Active TSV Interposer Overview SHINSAI DesignHeterogeneous dual-layer NoAIProgrammable horizontal Die-to-Die Link&FabricR
299、econfigurable Vertical link as 3D NoC bridges Silicon Implementation and Measurement Results Summary37.4:SHINSAI:A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory 2025 IEEE International Solid-State Circuits Conference45 of 45Summary SHINSAI
300、FeaturesHeterogeneous Dual-layer NoAI Supporting both a packet-switching 3D stacking NoC and a programmable H-Link fabric to accommodate the diverse communication requirements Programmable Horizontal Die-to-Die Link&Fabric Achieving independent bump-to-bump routing for various interconnect topologie
301、sReconfigurable Vertical Link as 3D NoC bridges Adaptable to varying up/down-stream bandwidth on stacked interfaces1 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%Solvability 2
302、025 IEEE International Solid-State Circuits ConferenceSKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityZihan Wu,Xiyuan Tang,Tao Zhang,Lishan Lin,Haoyang Luo,Bocheng Xu,Zhongyi Wu,Jiahao Song,Yitao Liang,Xiaochen Bo,Yuan WangPeking Uni
303、versity,Beijing,ChinaEmail:xitang,wangyuan 2 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityOutline Introduction Proposed SolverOverall ArchitectureDual-path SRAM-bas
304、ed MacroIncremental Update Measurement Results Conclusion3 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityBoolean SATisfiability Problem(SAT)Objective of K-SAT:Determ
305、ine whether there exists a truth assignment for n Booleanvariables X that satisfies all clauses(i.e.,makes F(x)=1).ClauseBoolean variables:(X0,X1.)Literal=variable/its negationF(x)=(x0 x1x2)(x1 x3x2)(x5 x1 x2)4 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K
306、-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityExample IF(x)=(x0 x1)(x0 x1)X0=1X1=0 All clauses can be satisfied concurrently.F(x)isSATisfiable(SAT)5 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Feat
307、uring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityExample IIF(x)=(x0)(x1)(x0 x1)X0=1X1=1 All clauses cannot be satisfied concurrently.F(x)is UNSATisfiable(UNSAT)6 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring
308、Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityComplexity of SAT:NP-CompleteSAT problem is the first proven non-deterministic polynomial time(NP)-complete problem.7 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring D
309、ual-path SRAM-based Macro and Incremental Update with 100%SolvabilityIncomplete Solversk-SAT problemRandom flip literalsAll clauses SAT?SolutionsYesNo Feature:Insufficient solvability Partial solution space exploration Cannot assert UNSAT Limited application scenariosAll prior ASIC solversare incomp
310、lete.118 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%Solvabilityk-SAT problemHeuristic AssignmentYesNoUnit PropagationConflict?BacktrackSAT/UNSAT?NoComplete Solvers Feature:C
311、onstant 100%solvability Entire solution space exploration Can prove UNSAT More application scenarios9 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityThe Challenge of
312、Complete Design#1 C1:Bidirectional assignment clause deduction.AssignmentsClauses.Bidirectional assignment-clause deduction requiredComplete solverAssignmentsClauses.Only need to deduce clauses from assignmentsIncomplete solver10 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKAD
313、I:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityThe Challenge of Complete Design#2 C2:The storage and backtracking of assignments.Complete solvers1 10.Backtrack and refresh0.0 01.Assignments update Assign1Assign2Assign3Must store prior a
314、ssign-ments for backtrackingIncomplete solvers No prior assignments record requirement1 1 000.Random literals flipping1 0 000.1 0 001.Assign1Assign2Assign311 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and
315、Incremental Update with 100%SolvabilityThe Challenge of Complete Design#3 C3:Accurate,compact computing of multi-level clausestates for fine-grained complete analysis.0123456 Analog computationminmaxCompact but Inaccurate 7Accurate but Huge Area Cost Digital adder treeAdder Tree.CIM Array.12 of 52 2
316、025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityOutline Introduction Proposed SolverOverall ArchitectureDual-path SRAM-based MacroIncremental Update Measurement Results Concl
317、usion13 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityBacktrackingControllerSRAM-based Dual-path Macro 50 218DecoderR/W PeripheralsBackwardPrechargeForwardPrechargeM
318、L99:0SL217:0ClauseAnalyzerAddr.DATADATA_Addr.13AXB49:034b Clause States217:024555AX49:0SAT_Flag UNSAT_FlagAX/AXB 66UpdaterRegister File8 100bIncremental UpdaterAssignment Mask445AXB49:0AX49:0YesNoYes NoYesNoYesProgram problems intothe PIM-SAT macroDecide a truth value to an unassigned variableForwar
319、d PIMDetermined?Any Conflict Cla.?Backtrack andincremental updateBackward indexOutput SAT or UNSAT flag123456Any Conflict Var.?Any Unit Cla.?NoOverall Architecture and Operation Flow14 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual
320、-path SRAM-based Macro and Incremental Update with 100%Solvabilityk-SAT problemHeuristic AssignmentYesNoConflict?BacktrackSAT/UNSAT?NoUnit PropagationUnit Propagation(UP)F(x)=(x0 x1 x2)(x2 x1 x3)x0=x1=False x2must be set as True F(x)=(x0 x1 x2)(x2 x1 x3)x3must be set as True 15 of 52 2025 IEEE Inter
321、national Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%Solvabilityk-SAT problemHeuristic AssignmentYesNoConflict?BacktrackSAT/UNSAT?NoUnit PropagationUnit Propagation(UP)A bidirectional deduction between li
322、terals and clauses:literals clauses literals Dual-path Macro16 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilitySRAM-based Dual-path MacroEach row is one clause in the
323、F(x).Each row consists of 50 PE,representing 50 variables.Clause Row.Clause RowSL0ML3:2AX1ML1:0AX0AXB0ML99:98AX49AXB49AXB1Backward IndexSRAM1,0 x0 in C0Forward PIMSRAM0,0 x0 in C0Backward IndexSRAM3,0 x1 in C0Forward PIMSRAM2,0 x1 in C0Backward IndexSRAM99,0 x49 in C0Forward PIMSRAM98,0 x49 in C04b
324、State17 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilitySRAM-based Dual-path MacroA PE includes 2SRAMs,Backward-Index and Forward-PIM Units.Clause Row.Clause RowSL0ML3
325、:2AX1ML1:0AX0AXB0ML99:98AX49AXB49AXB1Backward IndexSRAM1,0 x0 in C0Forward PIMSRAM0,0 x0 in C0Backward IndexSRAM3,0 x1 in C0Forward PIMSRAM2,0 x1 in C0Backward IndexSRAM99,0 x49 in C0Forward PIMSRAM98,0 x49 in C04b StatePE18 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 2
326、8nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityForward PIMForward PIM:Literals(AX/AXB)Clause statesClause Row.Clause RowSL0ML3:2AX1ML1:0AX0AXB0ML99:98AX49AXB49AXB1Backward IndexSRAM1,0 x0 in C0Forward PIMSRAM0,0 x0 in C0Backward IndexSRAM3,0
327、 x1 in C0Forward PIMSRAM2,0 x1 in C0Backward IndexSRAM99,0 x49 in C0Forward PIMSRAM98,0 x49 in C04b State19 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityForward PIM
328、(SAT&SW)2 SRAMsBW-IDXSAT LineUnassignedlit.detectorPosition-encodedCounterPE CellAXiPCHFAXiAXBixi in CjAXBixi in Cjxi in Cjxi in CjSWPCHFWLWLxi in Cjxi in Cjxi in Cjxi in CjSATjBLjBLBjPCHF20 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featurin
329、g Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilitySWSATAXBiAXiPCHFForward PIM(SAT&SW)e.g.clause includes xiAXi=0,AXBi=1,Xi=-1SAT is charged to 1AXiPCHFAXiAXBixi in CjAXBixi in Cjxi in Cjxi in CjSWPCHFWLWLxi in Cjxi in Cjxi in Cjxi in CjSATjBLjBLBjPCHF21 of 52 2025 IEEE Internat
330、ional Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityForward PIM Schematic(SAT&SW)e.g.clause includes xiAXi=AXBi=0,Xiis unassignedSW is charged to 1SWSATAXBiAXiPCHFAXiPCHFAXiAXBixi in CjAXBixi in
331、Cjxi in Cjxi in CjSWPCHFWLWLxi in Cjxi in Cjxi in Cjxi in CjSATjBLjBLBjPCHF22 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%SolvabilityPosition-encoded Counter(PEC)PEC0100PECi-
332、1Row-wise cascaded mannerPECiPECi+1PEC49PC2jPC1jPC0j3b outputSW=03b Input SW=0 means literal is assigned.3b input is passed on.23 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%
333、Solvability SW=1 means literal is unassigned.3b input is right-shifted.Position-encoded Counter(PEC)PEC0100PECi-1Row-wise cascaded mannerPECiPECi+1PEC49PC2jPC1jPC0j3b Input3b outputSW=124 of 52 2025 IEEE International Solid-State Circuits Conference37.5:SKADI:A 28nm Complete K-SAT Solver Featuring Dual-path SRAM-based Macro and Incremental Update with 100%Solvability The PEC starts with.Position-e