《SESSION 14 - Digital Techniques for System Adaptation, Power Management and Clocking.pdf》由會員分享,可在線閱讀,更多相關《SESSION 14 - Digital Techniques for System Adaptation, Power Management and Clocking.pdf(382頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2024SESSION 14Digital Techniques for System Adaptation,Power Management and Clocking14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference1 of 37A Software-Assisted Peak Curr
2、ent Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoCMonodeep Kar1,Joel Silberman1,Swagath Venkataramani1,Viji Srinivasan1,Bruce Fleischer1,Joshua Rubin1,JohnDavid Lancaster1,Saekyu Lee1,Matthew Cohen1,Matthew Ziegler1,NianzhengCao1,Sandra Woodward2,Ankur Agrawal1,Chin
3、g Zhou1,Prasanth Chatarasi1,Thomas Gooding2,Michael Guillorn1,Bahman Hekmatshoartabari1,Philip Jacob1,Radhika Jain1,Shubham Jain1,Jinwook Jung1,Kyu-Hyoun Kim1,Siyu Koswatta1,Martin Lutz1,Alberto Mannari3,Abey Mathew4,Indira Nair1,Ashish Ranjan1,Zhibin Ren1,Scot Rider5,Thomas Roewer1,David Satterfiel
4、d6,Marcel Schaal1,Sanchari Sen1,Gustavo Tellez1,Hung Tran1,Wei Wang1,Vidhi Zalani1,Jintao Zhang1,Xin Zhang1,Vinay Shah7,Robert Senger1,Arvind Kumar1,Pong-Fei Lu1,Leland Chang11IBM Research,Yorktown Heights,NY,2IBM,Rochester,MN,3IBM Research,Zurich,Switzerland,4IBM,Austin,TX,5IBM,Poughkeepsie,NY,6IBM
5、 Research,Lowell,MA,7IBM,Hursley,United Kingdom14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference2 of 37Outline IntroductionPower profile of an AI Inference SoC Dynamic power m
6、anagement for AI Control enhancements for AI inference Measurement Results Conclusion14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference3 of 37AI Inference SoCAn AI SoC deployed
7、 at cloud processes widely varying inference requestsVision ModelsLarge Language ModelsQueryvsMultiple Batch SizesMultiple Model classesMultiple PrecisionsFP16,FP8,INT8,INT414.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEE
8、E International Solid-State Circuits Conference4 of 37Power variation in AI inference SoC2D compute array(Convolution/Matmul)consumes higher powerVector engines(Softmax,GeLu etc)and data transfer consumes lower powerLocal MemoryMPEMPEActivation bufferWeight bufferMPEMPEMPEMPEMPEMPEMPEMPEMPEMPEMPEMPE
9、MPEMPEVector Engine RoutingAI Core 0AI Core iAI Core n0.000.200.400.600.801.001.20MatMul(FP16)MatMul(Int8)MatMul(Int4)ReLUBatchNormIdleNormalized Power0=i 1msPCIE card(form factor dependent)Card Form FactorPower Limit(W)M.225PCIE Edge75PCIE+Power Connector15014.1 A Software-Assisted Peak Current Reg
10、ulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference7 of 37AI SoC with Power Management Compute32 AI CoresRing interconnect Memory2MB local memory/core128GB DRAMCoreCoreMC InterfaceCore GlobalInterfaceMemory ControllerS
11、ervice Unit8DMAPCIE InterfaceAI Inference SoCPCIE attached card for AI Inference.Voltage RegulatorPower Management ControllerCoreCoreCoreCorePCIE ConnectorDRAMChipsSensed Card input CurrentControl Response EnhancementCompiler assisted feedforward PMPower Management ConfigA power management architect
12、ure for AI workloads to optimize performance across different peak current specification14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference8 of 3732 AI CoresPower Management Uni
13、t18.3mm15.8mmCurrent SensingVRMVRMVRMVRMDie photo and PCIE cardPCIE Card5nm14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference9 of 37Outline IntroductionAI SoC and Power Variati
14、on Dynamic power management for AI Control enhancements for AI inference Measurement Results Conclusion14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference10 of 37Dynamic power r
15、egulation for performanceNo dynamic regulationStatic operating point(voltage,frequency)selectionMassive power underutilization for nominal workloadsPowerPower LimitExecution TimeLeakageModel 1 Batch 1Model 2 Batch 4Model 3 Batch 1Model 1 Batch 4Underutilized Power(V0,F0)14.1 A Software-Assisted Peak
16、 Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference11 of 37Dynamic power regulation for performanceNo dynamic regulationStatic operating point(voltage,frequency)selectionMassive power underutilization for n
17、ominal workloadsPreferred for SoC with high TOPSDynamic power regulation Set operating point optimized for nominal workload mixWorst case workload power is dynamically regulated to PLIMCurrent regulationPowerPower LimitExecution TimeLeakageModel 1 Batch 4Model 1 Batch 1Model 2 Batch 4Model 3Batch 1P
18、referred for SoC with high TOPS(V1,F1)PowerPower LimitExecution TimeLeakageModel 1 Batch 1Model 2 Batch 4Model 3 Batch 1Model 1 Batch 4(V0,F0)Underutilized Power14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE Internatio
19、nal Solid-State Circuits Conference12 of 37Closed loop current regulationActuationEffective frequency throttling of AI coresSenseCurrent sensed at the 12V input of the PCIE cardControlCompiler driven feed forward power management12Vininput decapAI CoresVRM Controller+Power stagesADCSensed currentsta
20、ll-rateDynamic Power Management(DPM)VCoreAI SoCPower Management ConfigPCARDPSOCVRM2PMEMDRAMPCIE attached card for AI Inference 14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conferenc
21、e13 of 37Power control in AI corePrevious approach:ISSCC21Per Layer V-F/Per-layer F Overhead of V-F transition might be comparable to inference timeDesign complexity PowerPLIMExecution TimeLeakageINF1V1/F1V2/F2INF 2INF 2INF 2INF 2V/F transitionInference JobsINF 3Workstream of AI SoC deployed at clou
22、dV2/F214.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference14 of 37Power control in AI coreProposed ApproachEffective frequency throttlingFixed V-FFast responseSelective pipeline
23、throttling no deadlock/data corruptionMinimal energy efficiency overheadL1 MemoryMPEMPEWeight ScratchpadMPEMPEMPEMPEMPEMPEMPESFU1(64-way FP16 SIMD)SFU2(32-way FP32 SIMD)Routing LogicRingIF/IDIF/IDActivation ScratchpadIF/IDIF/IDIF/IDStall-rate(Feedback)Stall(Overflow)+14.1 A Software-Assisted Peak Cu
24、rrent Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference15 of 37Limitations of SoC based current sensingPower sensed and controlled at SoCAdditional power margin for discrete card components(DRAM,voltage regulators
25、)Compute throttles for small duration of high SoC power typical in AIMatMulAuxiliary(low power)conv/Matmul(high power)AuxiliaryPLIMNetwork 1Network 2CC.Power Management ControllerCCCC.Compute Unit12VinVRMVCorePSOCVRMPMEMDRAM ChipsPCARDSoC Current SensingSoCPSoC14.1 A Software-Assisted Peak Current R
26、egulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference16 of 37Card level current sensingPower sensed at PCIE card input Seamless shifting between compute and memory power without addition marginThrottling avoided for sm
27、all duration of peak SoC powerAI SoC12VinVRMADC1MSPSCSENSEVCorePCARDPSOCVRMPMEMDRAM ChipsCSENSE11:0MatMulAuxiliary(low power)conv/Matmul(high power)AuxiliaryPLIMNetwork 1Network 2PCARDPSoC14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm
28、AI SoC 2024 IEEE International Solid-State Circuits Conference17 of 37Outline IntroductionAI SoC Power Dynamic power management for AI Control enhancements for AI inference Measurement Results Conclusion14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Perfor
29、mance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference18 of 37Configurable Saturation Internal state and the error can be saturated through configurable values to improve response+Ta1Ta2+TTb0b1b2+Input Current(from ADC)12+Target12-AI Cores+DRAMVoltage regulatorClip rangeStall
30、en135Configurable Error SaturationConfigurable State SaturationPower Management ConfigTargetFeedforward coefficients(b0-b2)Feedback coefficients(a1-a2)ErrSAT11:0StateSAT12:014.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEE
31、E International Solid-State Circuits Conference19 of 37Response improvement using saturationAI inferences have frequent low to high power transitionsTimePowerController StateLayer nLayer n+1PLIMT20Final StallT1Without CSATIIR output negativeNo stalling-SATBaseT1:Core initialization delayAll cores ar
32、e programmed first before executionT2:Loop DelayDefined by closed loop bandwidth14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference20 of 37Response improvement using saturationA
33、I inferences have frequent low to high power transitionsIncreasing the saturation value(between 0 and-2N-1)improves controller responseIIR output negativeNo stallingTimePowerController StateLayer nLayer n+1PLIMT20Final StallT1Without CSATWith CSAT(c)-SATVAL-SATBaseT1:Core initialization delayAll cor
34、es are programmed first before executionT2:Loop DelayDefined by closed loop bandwidth14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference21 of 37Compiler Assisted StallingCompile
35、r combines power model+inference graph software stall rate(SSR)SSR is embedded in program binaryProgram DataWeightsInputsDRAM LayoutInput GraphWork Division/OptimizationStall-aware Code GenProgram data with stall-infoPower ModelAI compilerPower ProjectionAI workloads are predictive at compile time!P
36、roactive power management14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference22 of 37Compiler Assisted Stalling+SSR(Unsigned)Service Unit32 AI CoresFractional StallProgram Data+S
37、SRWeightsInputs6565Compute UnitPower Management ControllerDRAM LayoutFinalSRIIRSR(Signed)AI workloads are predictive at compile time!Proactive power managementCompiler combines power model+inference graph software stall rate(SSR)SSR is embedded in program binary14.1 A Software-Assisted Peak Current
38、Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference23 of 37Response Improvement with SSRTimePowerLayer nLayer n+1PLIMT2T1HW Control-CSATVALIIRSRT1:Core initialization delayT2:Loop Delay14.1 A Software-Assisted Peak
39、Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference24 of 37Response Improvement with SSRSSR is updated before the core starts execution,proactively controlling the peak current!TimePowerFinalSRLayer nLayer n
40、+1PLIMT2SnSn+1T1HW ControlHW Control+SW Stall-CSATVAL-CSATVAL+Sn-CSATVAL+Sn+1IIRSRSSRT1:Core initialization delayT2:Loop DelaySSR updatedPower ramps14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-Sta
41、te Circuits Conference25 of 37SSR exampleInput EmbeddingMulti-headAttentionFeed ForwardAdd&NormAdd&NormDecoderAttention+Feed-forwardAttention+FeedforwardLinearSoftmaxP1P2Transformer Block 1Transformer Block n.ReuseLayerUtilizationEstimated cyclesSystolicAuxiliaryAttention0.80.2C0Add&Norm00.6C1Feed f
42、orward0.70.2C2.SoftMax00.2CnExample compiler computations+Power ModelSSRS0S1S2SnTransformer14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference26 of 37Outline IntroductionAI SoC
43、Power Dynamic power management for AI Control enhancements for AI inference Measurement Results Conclusion14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference27 of 37Closed loop
44、behavior of end-to-end workloadInference WorkloadModel:BertLargeSequence Length:384Precision:Int8PLIMNormalized CurrentNormalized CurrentTime(ms)24 transformer blocksRuntime stretchingUnregulated without closed loopWith closed loopPLIMPLIMMeasured at 25C,40MSPs14.1 A Software-Assisted Peak Current R
45、egulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference28 of 37Closed loop behavior of end-to-end workloadInference WorkloadModel:BertLargeSequence Length:384Precision:Int8High power region(P1)Dynamic regulationModerate
46、powerRegion(P2)Less stretching1 transformer layer of bert-largeTime(ms)PLIMPLIMMeasured at 25C,40MSPs14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference29 of 37Peak current vs t
47、ime scales0 4m 8m 12m 16m 20m Time(s)TWINDOW=1usTWINDOW=10ms1.251.00.750.5Normalized Current0.600.700.800.901.001.101.201.301u100u10mNormalized to target currentWithout closed loopClosed loopTime-scale(s)Normalized peak currentWithin loop time-constantNormalized to the same peak current at different
48、 time scalesPeak power regulation effective at TWINDOW TLOOPControl enhancement techniques are critical for TWINDOW CSAT11.21.11.00.90.81.21.11.00.90.81.21.11.00.90.8Normalized input current0.60.70.80.911.11.21.31u100u10mClosed loopCSAT1CSAT2Measured at 25C 40MSPsNormalized currentTime-scale(s)PLIM1
49、4.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference31 of 37Workload specific software stallClosed loop+SSR compensates the compiler inaccuracies in generating the SSRP2:Low SSRP1
50、:High SSRClosed loopOpen loop+SSRTime(ms)1.21.11.00.90.81.21.11.00.90.8Normalized input currentPLIMPLIMPeak averaged current vs time-scale0.60.70.80.911.11.21.31u100u10mClosed loopSSRNormalized currentTime-scale(s)PLIM14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited I
51、nference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference32 of 37Performance comparison:closed loop40%performance improved at 10ms time window at the same FCoreMaximum performance improvement if TSPEC TLOOP,diminishes quickly for TSPEC TLOOP:Closed loop control pre
52、ferred!Baseline No dynamic power management Reduced FCore Power for worst case kernel meets PLIM14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC 2024 IEEE International Solid-State Circuits Conference33 of 37Performance comparison
53、:enhanced controlSSR jointly tuned with CSAT increases performance improvement to 27%and 32%0.000.200.400.600.801.001.201.401.60Iso peak power 1us windowIso peak power 10ms windowIso peak power 100us windowCSAT OnlyBaselineClosed loop SSRTSPEC100mV10ns)impacts chip reliability voltage margin1 H.Mair
54、 et al.,3.4 A 10nm FinFET 2.8GHz tri-gear deca-core CPU complex with optimized power-delivery network for mobile SoC performance,ISSCC 2017Measured droop from ISSCC 2017114.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied
55、 to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference6 of 34Previous Work High-speed,high resolution voltage sensor 2 Ring-oscillator based digital sensor Convert voltage to frequency Sample ring oscillators phase Calculate phase difference in a cycle Resolut
56、ion:10.8mV2.24Ghz 3-cycle latency(synchronization,computation and calibration)Not fast enough for 1storder droop2 P.N.Whatmough,et al.,Cortex-A57 Cluster Using an All-Digital Power Delivery Monitor,in JSSC 2017.14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based
57、on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference7 of 34Previous Work Digital Power Meter(DPM),machine learning APOLLO3 Estimates power by critical signals toggling status Avoids manual signal selection Cycle-level
58、 power monitoring,with NRMSE1bit on/off clk_gating14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference24 of 34Dual-loop PD Contr
59、ol Python simulation results of different strategies01020304050760780800820840860880900920-93mVVoltage/mVt/ns-127mV-94mV-60mV70mV43mV5mVSecondaryVoltage DroopHeavy iloadOccurrenceNone(Baseline)I Predict OnlyV Predict OnlyDual-PD Control PDN Model with control strategies based on current/voltage chan
60、ges Step from idle to a heavy iloadControl StrategyWorstDroopDroop SuppressOver-shootNone/Baseline-127mV0%70mVI Predict Only-93mV27%43mVV Predict Only-94mV15%3mVDual-PD-60mV53%5mVSmall overshootLeast droop14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Cur
61、rent and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference25 of 34Dual-loop PD Control Timing diagram of Dual Prediction&Dual PD Control Current prediction 1-cycle in advance,voltage prediction at iLoad,PD control of clk-gating a
62、t V Achieving 0-cycle response latencyOn chip VoltageOn chip CurrentVoltage Predict0-Cycle Latency!In-2In-1InMeasured VoltagePredicted CurrentOur StrategyTn+3Tn-1Tn+1Tn+2Tn-4TnTn-2VnVn-1Vn-2Heavy iLOADOccurancePredicted VoltageI/V Dual-PDRegulationDual-PD ControlTn-314.2:Proactive Voltage Droop Miti
63、gation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference26 of 34Outline Background and motivationIntroduction of Supply Voltage DroopPrevious workChallenges&Solution
64、s Proposed Proactive Droop Mitigation TechniquesSystem ArchitectureCurrent PredictionVoltage PredictionDual-loop I-V PD Control Circuits Implementation&Measurement ResultsSummary14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction
65、 Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference27 of 34Circuits Implementation Die Photo and SpecificationsCMOS Process:28nmPULP processor with 8 RISC-V coresTotal size:2.10mm*1.99mmInput voltage:0.65V0.9VCPU frequency:200MHz 500MHzControl modul
66、e overhead:Area overhead:5683um2(0.13%)Power overhead:2.2mW500MHz(PDPM,Vsensor,V prediction;not PDN scanner cause turned off)14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 I
67、EEE International Solid-State Circuits Conference28 of 34Measurement ResultsRecord both predicted and actual voltage values in SRAMVoltage trends matches,with predicted voltage ahead of time*Vsensor:A 28nm All-Digital,1.92-7.32 mV/LSB,0.5-2GS/s sample rate,0-latency Voltage Sensor,CICC 2023 Comparis
68、on of predicted voltage and sampled voltage*voltage/mVt/ns0100200300400840845850855860865870875880V_SampledV_PredictedAhead of Vsensor14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CM
69、OS 2024 IEEE International Solid-State Circuits Conference29 of 34Measurement ResultsRe-aligned curves by shifting the predicted voltageShowing well matchedComparison of predicted voltage and sampled voltagevoltage/mVt/ns0100200300400840845850855860865870875880Re-alignedV_SampledV_Predicted_realigne
70、d14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference30 of 34Measurement Results*mlDct:floating-point discrete cosine transformM
71、ax droop reduces 32.0%(45.2mV),from 132.9mV to 87.7mV16 cycles clock gated in this period;total performance loss 0.6%*Waveform data exported from the on-die VsensorZoomed020004000600080007508008509009501000t/ns-132.9mV-87.7mV45.2mVwithoutwithwithout strategy:132.9mVwith strategy:87.7mV02500270029003
72、100330035007508008509009501000t/nswithout with79.8mV39.8mV-132.9mV-87.7mV 10ns16-Cycle/64ns Gatedvoltage/mVOn-die Droop Mitigation Effect:Running mlDct Testcase 14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Mul
73、ticore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference31 of 34Comparison Table with SoTA workJSSC172ISSCC17 3IBM J20 4VLSI20 5This workProcess16nm10nm14nm7nm28nmProcessor4 core CPU(Cortex-A57)10 core CPU(Cortex A73*2+A53*4+A35*4)12 core CPU(IBM z15)6 core DSP(VLIW CPU*
74、4+Vector*2)8 core CPU(RISC-V RI5CY)Frequency2.02GHz2.5GHz5.2GHz500MHz500MHzMonitoring SchemeVoltage PredictionAnalogVoltage SensorPower/Timing Prediction Power Prediction Current/Voltage Prediction PredictiveYesNoYesYesYesControl SchemeThreshold ComparingProportional ControlThreshold ComparingThresh
75、old ComparingDual-loop PD ControlAdjusting SchemeClock gatingPower SwitchInstruction ThrottlingVoltage RegulationClock gating Droop reduction50mV38mV(30.0%)25.0%36.9mV45.2mV(32.0%)Performance Loss1.5%NA5 cycle+response time 2.0%0.6%25636 cycles14.2:Proactive Voltage Droop Mitigation Using Dual-Propo
76、rtional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference32 of 34Outline Background and motivationIntroduction of Supply Voltage DroopPrevious workChallenges&Solutions Proposed Proactive Dr
77、oop Mitigation TechniquesSystem ArchitectureCurrent PredictionVoltage PredictionDual-loop I-V PD Control Circuits Implementation&Measurement ResultsSummary14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore
78、 Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference33 of 34Summary The first work that utilizes both Current and Voltage Prediction in droop mitigation based on real PDN network,responding at the exact time at droop occurrence Applied on an 8-core processor Droop reductio
79、n of 45.2mV/32.0%with only 0.6%performance loss when running a test program14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference3
80、4 of 34Acknowledgementssupported by:National Key R&D Program of China National Natural Science Foundation of ChinaThank you for your attendance14.2:Proactive Voltage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Voltage Prediction Applied to a Multicore Processor i
81、n 28nm CMOS 2024 IEEE International Solid-State Circuits Conference35 of 34Please Scan to Rate Please Scan to Rate This PaperThis Paper14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference1 of
82、52A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle DistortionDaniel Yingling1,Yimai Peng1,Robert Vachon1,Dipti Pal2,Sagar Jariwala2,Felipe Cabral3,Jason Hu2,Rajan Verma2,Vamshidhar Chiranji2,Anil Kumar2,Santanu Sarma2,&Keith Bowman1Qualcomm Technologies,Inc.1R
83、aleigh,NC,2San Diego,CA,&3Cork,Ireland14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference2 of 52Outline Motivation Adaptive Clock Duty-Cycle Controller(DCC)Design Measured Results Conclusion1
84、4.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference3 of 52ProcessorCGCACDCGCGF-MUXGF-MUXFFFFFFFFclk_pllPLLclkProcessor(CPU,GPU,or NPU)Clock Path Clock path contains many inverters,MUX gates,&c
85、lock-gating circuits(CGCs)NPU:Neural Processing UnitPLL:Phase-Locked LoopGF-MUX:Glitch-Free MUXCGC:Clock-Gating CircuitACD:Adaptive Clock Distribution for VDDDroop Mitigation14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE Internation
86、al Solid-State Circuits Conference4 of 52 INOUT Aging-Induced Clock Duty-Cycle Distortion Duty-cycle distortion(DCD)accumulates across the clock path due to agingwhen the clock path does not toggle for extended periods of timeStressed Clock Transistors during Clock Gating14.3:A 3nm Adaptive Clock Du
87、ty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference5 of 521 J.Tschanz et.al,Symp.VLSI Circuits,2009.Path Delay Change vs.Stress Time INOUT Aging-Induced Clock Duty-Cycle Distortion Duty-cycle distortion(DCD)accumulates
88、across the clock path due to agingwhen the clock path does not toggle for extended periods of timeStressed Clock Transistors during Clock Gating14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Confer
89、ence6 of 52INOUT50%50%50%50%50%SlowerFaster1 J.Tschanz et.al,Symp.VLSI Circuits,2009.Path Delay Change vs.Stress Time INOUT Aging-Induced Clock Duty-Cycle Distortion Duty-cycle distortion(DCD)accumulates across the clock path due to agingwhen the clock path does not toggle for extended periods of ti
90、meStressed Clock Transistors during Clock GatingAging-Induced Clock Duty-Cycle Distortion(DCD)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference7 of 52INOUT50%50%50%50%50%SlowerFaster1 J.Tsc
91、hanz et.al,Symp.VLSI Circuits,2009.Path Delay Change vs.Stress Time INOUT Aging-Induced Clock Duty-Cycle Distortion Duty-cycle distortion(DCD)accumulates across the clock path due to agingwhen the clock path does not toggle for extended periods of time Automotive processors exacerbate aging-induced
92、clock DCDStressed Clock Transistors during Clock GatingAging-Induced Clock Duty-Cycle Distortion(DCD)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference8 of 52INOUT50%50%50%50%50%SlowerFaster
93、1 J.Tschanz et.al,Symp.VLSI Circuits,2009.Path Delay Change vs.Stress Time INOUT Aging-Induced Clock Duty-Cycle Distortion Duty-cycle distortion(DCD)accumulates across the clock path due to agingwhen the clock path does not toggle for extended periods of time Automotive processors exacerbate aging-i
94、nduced clock DCD Severe clock DCD degrades processor minimum voltage(VMIN)Stressed Clock Transistors during Clock GatingAging-Induced Clock Duty-Cycle Distortion(DCD)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid
95、-State Circuits Conference9 of 52ACDclk_inclk_outCGCPLLGF-MUXGF-MUXFFFFFFFFFFFFFFFFFFCGCMXUTest-Chip:NPU Matrix-Multiplication Unit(MXU)MXU contains 1,024 multiply-accumulate units(MACs),executing a maximum of 4,096 multiplications&1,024 accumulations per cycle14.3:A 3nm Adaptive Clock Duty-Cycle Co
96、ntroller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference10 of 52ACDclk_inclk_outCGCDCCPLLGF-MUXGF-MUXn1FFFFFFFFFFFFFFFFFFCGCMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Adaptive Clock Duty-Cycle Controller(DCC)14.3:A 3nm Adaptive Clock Dut
97、y-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference11 of 52ACDclk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Adaptive Cl
98、ock Duty-Cycle Controller(DCC)Duty-cycle monitor(DCM)measures the clock DCD14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference12 of 52ACDclk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-M
99、UXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0clk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Adaptive Clock Duty-Cycle Controller(DCC)Duty-cycle monitor(DCM)measures the clock DCD14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Cloc
100、k Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference13 of 52ACDclk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0clk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Adaptive Clock Duty-Cyc
101、le Controller(DCC)Duty-cycle adjuster(DCA)corrects the duty cycle14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference14 of 52ACDhpe_cfg063:0dca_cfg0hpe_cfg163:0dca_cfg1clk_inclk_outdcm_seg 6:0
102、dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0clk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Adaptive Clock Duty-Cycle Controller(DCC)Adaptive control configures the DCA based on the current DCA setting&the DCM measurement14.3:A 3
103、nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference15 of 52ACDhpe_cfg063:0dca_cfg0hpe_cfg163:0dca_cfg1clk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0c
104、lk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control DCM&DCA Circuit Descriptions14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference16 of 52ACDhpe_cfg063:
105、0dca_cfg0hpe_cfg163:0dca_cfg1clk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0clk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Duty-Cycle Monitor(DCM)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigatin
106、g Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference17 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerDCM Controlclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCCGCclk_cclk_c_endcm_adj1:0dcm_seg6:0Duty-Cycle Monitor(DCM)Enables clock hi
107、gh-phase,low-phase,&period measurements14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference18 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q
108、15tdc_q15:0TDCCGCclk_cclk_c_endcm_adj1:0dcm_seg6:0DCM ControlDuty-Cycle Monitor(DCM)DCM control configures the settings to perform a measurement14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Confer
109、ence19 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM ControlDuty-Cycle Monitor(DCM)Configurable launching&capturing clocks via an XOR,instead of a MUX,tominimize delay difference between non-inv
110、erting&inverting clock transitions14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference20 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerDCM Controlclk_ldinTDEtdc_q0tdc_q
111、1tdc_q15tdc_q15:0TDCCGCclk_cclk_c_endcm_adj1:0dcm_seg6:0Duty-Cycle Monitor(DCM)Tunable-delay element(TDE)provides an 8-bit(256 values)configuration rangewith buffer-delay resolution(back-to-back inverters with a fanout of 2)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced
112、 Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference21 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerDCM Controlclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCCGCclk_cclk_c_endcm_adj1:0dcm_seg6:0Duty-Cycle Monitor(DCM)Time-to-digital(TDC)contains 16
113、 bits(17-value thermometer code)with buffer-delay resolution14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference22 of 52clk_dcmclk_ltdc_q15:0clk_cdcm_enclk_dcmdcm_cfg7:0launch_edge_sel=0captur
114、e_edge_sel=1dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM ControlDCM Clock High-Phase MeasurementRising Clock-Edge Launch to Falling Clock-Edge Capture14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distorti
115、on 2024 IEEE International Solid-State Circuits Conference23 of 52dinclk_dcmclk_lclk_cdcm_enclk_dcmdcm_cfg7:0launch_edge_sel=0capture_edge_sel=1dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM ControlDCM Clock High-Phase MeasurementRising Clock-Edge Launch
116、 to Falling Clock-Edge Capture14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference24 of 52dinclk_dcmclk_ltdc_q15:00000.0000.0011clk_cdcm_enclk_dcmdcm_cfg7:0launch_edge_sel=0capture_edge_sel=1d
117、cm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM ControlDCM Clock High-Phase MeasurementRising Clock-Edge Launch to Falling Clock-Edge Capture14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE
118、International Solid-State Circuits Conference25 of 52DCM Clock Low-Phase MeasurementFalling Clock-Edge Launch to Rising Clock-Edge Capturedcm_enclk_dcmdcm_cfg7:0launch_edge_sel=1capture_edge_sel=0dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Controldinc
119、lk_dcmclk_ltdc_q15:00000.000011.1111clk_c14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference26 of 52DCM Clock Period MeasurementRising Clock-Edge Launch to Rising Clock-Edge Capturedcm_enclk_
120、dcmdcm_cfg7:0launch_edge_sel=0capture_edge_sel=0dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Controldinclk_dcmclk_ltdc_q15:00000.000111.1111clk_c14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 20
121、24 IEEE International Solid-State Circuits Conference27 of 52DCM Measurements:(1)TDE Calibration Optimize dcm_cfg7:0 via a binary search while only observing tdc_q0dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_se
122、g6:0CGCDCM ControlCompare dcm_cfg Values to Set DCM OutputsMeasure High-Phase Delay via dcm_cfg7:0Measure Low-Phase Delay via dcm_cfg7:0dcm_trigger=0IDLEdcm_trigger=114.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid
123、-State Circuits Conference28 of 52DCM Measurements:(1)TDE Calibration Optimize dcm_cfg7:0 via a binary search while only observing tdc_q0Compare dcm_cfg Values to Set DCM OutputsMeasure High-Phase Delay via dcm_cfg7:0Measure Low-Phase Delay via dcm_cfg7:0dcm_trigger=0IDLEdcm_trigger=1dcm_enclk_dcmdc
124、m_cfg7:0launch_edge_sel=0capture_edge_sel=1dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Control14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Confere
125、nce29 of 52DCM Measurements:(1)TDE Calibration Optimize dcm_cfg7:0 via a binary search while only observing tdc_q0Compare dcm_cfg Values to Set DCM OutputsMeasure High-Phase Delay via dcm_cfg7:0Measure Low-Phase Delay via dcm_cfg7:0dcm_trigger=0IDLEdcm_trigger=1dcm_enclk_dcmdcm_cfg7:0launch_edge_sel
126、=1capture_edge_sel=0dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Control14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference30 of 52Compare dcm_
127、cfg Values to Set DCM OutputsMeasure High-Phase Delay via dcm_cfg7:0Measure Low-Phase Delay via dcm_cfg7:0dcm_trigger=0IDLEdcm_trigger=1DCM Measurements:(1)TDE Calibration Optimize dcm_cfg7:0 via a binary search while only observing tdc_q0dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_tri
128、ggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Control14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference31 of 52Compare tdc_q Values to Set DCM Outputs
129、Measure High-Phase Delay via tdc_q15:0Measure Low-Phase Delay via tdc_q15:0dcm_trigger=0IDLEdcm_trigger=1DCM Measurements:(2)Constant TDE Apply a constant dcm_cfg7:0&directly measure tdc_q15:0dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCc
130、lk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Control14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference32 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_sel=0capture_edge_sel=1dcm_triggerclk_ldinTDEtdc_q0
131、tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM ControlCompare tdc_q Values to Set DCM OutputsMeasure High-Phase Delay via tdc_q15:0Measure Low-Phase Delay via tdc_q15:0dcm_trigger=0IDLEdcm_trigger=1DCM Measurements:(2)Constant TDE Apply a constant dcm_cfg7:0&directly measure tdc_q1
132、5:014.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference33 of 52dcm_enclk_dcmdcm_cfg7:0launch_edge_sel=1capture_edge_sel=0dcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_a
133、dj1:0dcm_seg6:0CGCDCM ControlCompare tdc_q Values to Set DCM OutputsMeasure High-Phase Delay via tdc_q15:0Measure Low-Phase Delay via tdc_q15:0dcm_trigger=0IDLEdcm_trigger=1DCM Measurements:(2)Constant TDE Apply a constant dcm_cfg7:0&directly measure tdc_q15:014.3:A 3nm Adaptive Clock Duty-Cycle Con
134、troller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference34 of 52Compare tdc_q Values to Set DCM OutputsMeasure High-Phase Delay via tdc_q15:0Measure Low-Phase Delay via tdc_q15:0dcm_trigger=0IDLEdcm_trigger=1DCM Measurements:(2)Constant
135、 TDE Apply a constant dcm_cfg7:0&directly measure tdc_q15:0dcm_enclk_dcmdcm_cfg7:0launch_edge_selcapture_edge_seldcm_triggerclk_ldinTDEtdc_q0tdc_q1tdc_q15tdc_q15:0TDCclk_cclk_c_endcm_adj1:0dcm_seg6:0CGCDCM Control14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty
136、-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference35 of 52DCM Measurements:Trade-Offs All DCM measurements based on(1)Future applications for(2)if processors require constant monitoring1)TDE Calibration:+Does not require TDC+Does not require initial dcm_cfg7:0 starting value+E
137、nables wide clock DCD measurement rangeRelatively slow measurement2)Constant TDE:+Fast measurementRequires TDCRequires initial dcm_cfg7:0 starting valueTDC width limits clock DCD measurement range14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion
138、 2024 IEEE International Solid-State Circuits Conference36 of 52ACDhpe_cfg063:0dca_cfg0hpe_cfg163:0dca_cfg1clk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0clk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control Duty-C
139、ycle Adjuster(DCA)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference37 of 52Duty-Cycle Adjuster(DCA)Automotive safety compliance requires self-checking sequence with two DCAsDCA0 induces DCD
140、 to represent aging effect&DCA1 corrects duty cycleACDhpe_cfg063:0dca_cfg0hpe_cfg163:0dca_cfg1clk_inclk_outdcm_seg 6:0dcm_adj 1:0CGCDCCPLLGF-MUXGF-MUXn1clk_dcmFFFFFFFFFFFFFFFFFFCGCclk_leaf 0clk_leaf 1clk_leaf 2clk_leaf 7clk_in,n1-n3,clk_outMXUn2n3DCMDCA0 DCA1 DCA Adaptive Control 14.3:A 3nm Adaptive
141、 Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference38 of 52hpe_outHigh-Phase Extender(HPE)clk_indca_cfghpe_inhpe_cfg63:0clk_outDuty-Cycle Adjuster(DCA)Extends clock high-phase or low-phase delay with a wide con
142、figuration range14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference39 of 52hpe_outHigh-Phase Extender(HPE)clk_indca_cfghpe_inhpe_cfg63:0clk_outDCA Configurable High/Low-Phase Extension XOR ga
143、tes perform MUX function to allow high-phase or low-phase extension14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference40 of 52hpe_outHigh-Phase Extender(HPE)clk_indca_cfghpe_inhpe_cfg63:0clk_
144、outDCA High-Phase Extender(HPE)HPE high-phase delay increases via hpe_cfg63:0(thermometer code)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference41 of 52DCA HPE:0 Delay Segments Duty cycle d
145、oes not changehpe_outHigh-Phase Extender(HPE)clk_indca_cfghpe_inhpe_cfg63:0clk_outhpe_inhpe_outOR DelayOR DelayNo High-Phase Extension:hpe_cfg63:0=0 x014.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits
146、 Conference42 of 52hpe_inhpe_outOR Delay1 Segment+OR DelayDCA HPE:1 Delay Segment High-phase delay increases by one delay segmenthpe_outHigh-Phase Extender(HPE)clk_indca_cfghpe_inhpe_cfg63:0clk_outHigh-Phase Extension:hpe_cfg63:0=0 x114.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Agi
147、ng-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference43 of 52hpe_inhpe_outOR Delay6 Segments+OR DelayDCA HPE:6 Delay Segments High-phase delay increases by six delay segments Timing Constraint:hpe_in high-phase delay 1 delay segmenthpe_outHigh-Phase Extender(
148、HPE)clk_indca_cfghpe_inhpe_cfg63:0clk_outHigh-Phase Extension:hpe_cfg63:0=0 x3F14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference44 of 52Test-Chip Characteristics DCC area&power overheads ar
149、e 0.45%&0.03%,respectivelyMatrix-Multiplication Unit(MXU)DCC933.84m750.88mTechnology3nm FinFET CMOSMXU Area270,000m mm2DCC Area1,232m mm2Measured MXU Power 380mW at 0.65V&1.0GHzSimulated DCA Power 127W at 0.65V&1.0GHz14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock
150、Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference45 of 52Measured DCM&DCA Resolutions DCM&DCA delay resolutions are 9.8&10.3ps/bit(1%of clock period)at 0.65V Since goal is to mitigate severe clock DCD,no benefits from higher resolution14.3:A 3nm Adaptive Clock Duty-Cycle
151、Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference46 of 52Measured DCC System Validation DCC measures&corrects the clock duty cycle for a wide range of DCD14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Ind
152、uced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference47 of 52Measured Scope Capture of MXU Input Clock14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference48
153、of 52Measured MXU VMINvs.Clock Duty Cycle From 50%-50%to 84%-16%,VMINdoes not change Beyond 84%-16%,severe DCD increases flip-flop setup time,thus raising VMIN At 0.65V,a 92%-8%duty cycle degrades VMINby 65mV(10%increase)14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Cl
154、ock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference49 of 52Measured MXU VMINvs.FCLK DCD of 92%-8%degrades VMINby 70mV(10%)at 1.2GHz&35mV(7%)at 0.6GHz DCC restores clock duty cycle&VMINClock DCD=92%-8%14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-I
155、nduced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference50 of 52Automotive Processor DCC DeploymentSafety Compliance:Self-checking sequence at processor bootProtection from Aging-Induced Clock DCD:Duty-cycle measurement&correction at processor boot or VDD-FCLKtransi
156、tionsRequires 85 clock cycles at FCLKAssert processor interrupt if clock DCD exceeds a target threshold14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference51 of 52Measured Aging-Induced MXU VM
157、INDistribution VMINdegradation ranges from 20mV-70mV at 1.2GHz&20mV-50mV at 0.6GHz DCC compensates for aging-induced clock DCD to mitigate VMINdegradationFCLK(GHz)VMIN(V)Before AgingAfter AgingAfter Aging with DCC Correction VMINRange:20mV-70mV20mV-70mV20mV-50mV20mV-50mVAccelerated Stress at High VD
158、D&Temp4 Parts14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference52 of 52Conclusion Adaptive clock duty-cycle controller(DCC)mitigates duty-cycle distortion(DCD)All-digital design:1)Duty-cycle
159、 monitor(DCM)measures the clock DCD2)Duty-cycle adjuster(DCA)corrects the duty cycle3)Adaptive control configures the DCA based on the DCM measurement Silicon accelerated-stress test-chip measurements from an NPU MXU:DCC restores clock duty cycle&corresponding VMINdegradation,up to 10%at1.2GHz&9%at
160、0.6GHz14.3:A 3nm Adaptive Clock Duty-Cycle Controller for Mitigating Aging-Induced Clock Duty-Cycle Distortion 2024 IEEE International Solid-State Circuits Conference53 of 52Please Scan to Rate Please Scan to Rate This PaperThis Paper14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Powe
161、r for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference1 of 21Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPUChien-Yu Lu1,Bo-Jr Huang1,Min-Chieh Chen1,Ollie Tsai1,Alfred Tsai1,Eric Jia-We
162、i Fang1,Yuju Cho1,Harry H.Chen1,Ping Kao1,Ericbill Wang1,Hugh Mair2,Shih-An Hwang11MediaTek,Hsinchu,Taiwan2MediaTek,Austin,TX14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference2 o
163、f 21Outline Motivation of Current Sensor(I-sensor)in CPU Challenge of Conventional I-sensor Design CPU Overview On-chip Fully Digital I-sensor DesignCircuit and Hardware MechanismDirect-Current Resistance(DCR)CalibrationTemperature Impact on calibration errors of the I-sensorSilicon measurement resu
164、lts in the CPU Summary14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference3 of 21CPU Performance/Power Trend Mobile flagship CPU is increasing the maximum frequency and shrinking p
165、rocess to upgrade the performance by generation Multi-core applications are dominating the upgraded performance Higher CPU performance/power requires optimization for efficiencyhttps:/ by 01/12/2024+27%+44%+23%+15%+31%+39%+11%+19%14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power fo
166、r System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference4 of 21CPU Power Optimization by SchedulingCPU Scheduler 1 optimizes task allocation across the multiple coresBalanced power and thermal distribution for sustainable performancePredict power budgetT
167、askMultiple CoresRun-time power calculation1 Calculated Power by voltage,frequency,leakage w/guard-bandThis work:Run-time Measured PowerPowerRuntime power monitor for system power efficiency optimization1:Power calculation with guard-banding voltage,frequency and leakageThis work:Run-time measured p
168、ower1 B.-J.Huang et al.,ISSCC,2023,pp.40-42Run-time powerComputing task14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference5 of 21On-die current sensor(I-sensor)is emerging to moni
169、tor runtime power for system power efficiency optimizationShorten guard-band&task optimization in the conventional run-time power calculation of CPUThis work presents a fully digital I-sensor to offer per-core runtime current for a heterogeneous CPUOn-die Power Measurement TechniqueBP coreI-sensorBP
170、 coreI-sensorBP coreI-sensorRun-time power MeasurementCurrentHE coreI-sensorHE coreI-sensorHE coreI-sensorHE coreI-sensorHP coreI-sensorHeterogeneous CPURun-time powerComputing taskRemaining power budgetMaximum power budget14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for Syste
171、m Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference6 of 21CPU Overviewtsmc 4nm+FinFETARMv9.2 octa-core tri-gear heterogeneous CPU1stgear:high-performance(HP)1x Cortex-X4 up to 3.25GHzCache:64KB L1$,1MB L2$2ndgear:balanced-performance(BP)3x Cortex-X4 up to
172、2.85GHzCache:64KB L1$,512KB L2$3rdgear:high-efficiency(HE)4x Cortex-A720 up to 2GHzCache:32KB L1$,256KB L2$8MB of shared L3$on Hayden DynamicIQTMShared Unit(DSU)Die Photograph14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEE
173、E International Solid-State Circuits Conference7 of 21Challenge of Conventional I-sensor Design in CPUA current shunt resistor required in series with the CPU Ever-higher active current induces large voltage droop at current shunt resistor to impact power integrityAmplifier to sense voltage drop/con
174、vert current by analog-digital converter(ADC)Diminishing voltage headroom and enlarged offset voltage as the process shrinking degrades the accuracy CPU increasing active current induces large voltage dropCPU voltageOffset voltageIO voltageNo headroomIO voltageCPU voltage-VVoltage headroomIO voltage
175、CPU voltageDegrading accuracyProcess Shrinking14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference8 of 21This Work:Fully Digital I-sensor Design in CPU Leverage Power switch(PSH)as
176、 Direct-Current Resistance(DCR)of the CPU Sense the voltage drop between the true(TVDD)and virtual power(VVDD)of the PSH Calculate the current based on the sensed voltage drop divided by the direct-currentresistance(DCR)of the PSHCurrent sensor techniqueAnalog-based I-sensorFully Digital I-sensorPow
177、er Integrity ImpactYesNoVoltage headroom requirementYesNoSampling rateSlow(99%*1-690%PowerSmall(300A)*1-6Small(3x105m2)*1-6Small(103m2)Supply voltageStatic follows IO voltageDynamic follows Core voltage*1 Z.-Tang et al.,ISSCC,2023,pp.348-350*2 Z.-Tang et al.,ISSCC,2022,pp.66-67*3 L.Xu et al.,SSCL,20
178、18,pp.94-97*4 R.Zamparette et al.,VLSIC,2021,pp.1-2*5 C.van Vroonhoven et al.,ISSCC,2020,pp.348-350*6 S.H.Shalmany et al.,JSSC,2017,pp.1034-104314.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Cir
179、cuits Conference9 of 21Digital I-sensor:Circuit and Hardware PSH isolates TVDD and VVDD in CPU Two High-speed Ring Oscillators(HS-ROSC)connect to TVDD/VVDDROSC output frequency affine-linearly related to TVDD/VVDD14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgetin
180、g in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference10 of 21Digital I-sensor:Circuit and Hardware High-sensitivity Delay Cell(HS Delay Cell)An N/PMOS-stacked delay cell adopted to improve voltage-to-frequency sensitivity(Hz/V)for sensing resolution 14.4:A Fully Digit
181、al Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference11 of 21Digital I-sensor:Circuit and Hardware Two HS-ROSCs switch supply rails between TVDD and VVDD by CP signalCP=0,between TVDD and VVDD;CP=1,b
182、etween VVDD and TVDD in order The two frequencies recorded in terms of the digital counts(ROA and ROB)ROA and ROBare quantified as the voltage of supply railsCalculated(ROA-ROB)as voltage drop of supply rails,which includes speed offset between the 2 HS-ROSCCP=0CP=114.4:A Fully Digital Current Senso
183、r Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference12 of 21Digital I-sensor:Circuit and Hardware Convert RO to Vdrop(TVDD-VVDD)Vdrop=RO/slope(trimmed Counts vs.volt.)Vdrop:average(TVDD-VVDD)voltage drop Obtain cur
184、rent I from Vdrop/DCRDCR:Direct-Current Resistance of PSH in CPU Chopper calculation cancels the speed offset of the two HS-ROSCsDelta speed count(RO)read out from the average in two sequential detect windows(CP=0 and CP=1)14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for Syste
185、m Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference13 of 21Digital I-sensor:Circuit and Hardware Convert RO to Vdrop(TVDD-VVDD)Vdrop=RO/slope(trimmed Counts vs.volt.)Vdrop:average(TVDD-VVDD)voltage drop Obtain current I from Vdrop/DCRDCR:Direct-Current Res
186、istance of PSH in CPU Chopper calculation cancels the speed offset of the two HS-ROSCsDelta speed count(RO)read out from the average in two sequential detect windows(CP=0 and CP=1)Converting RO to current with HW calibration14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for Syst
187、em Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference14 of 21HS-ROSC:Non-linearity and Calibration Given two voltages as the linear slope(Trimmed Slope)in a trim step Non-linearity of HS-ROSC HS-ROSCs Counts vs.VDDIR drop:VdropROerr(Non-linearity error)at s
188、ame Vdrop Convert RO to Vdropwith a VerrVdrop=RO/Trimmed Slope+Verr14.4:A Fully Digital Current Sensor Offering Per-Core Runtime Power for System Budgeting in a 4nm-Plus Octa-Core CPU 2024 IEEE International Solid-State Circuits Conference15 of 21HS-ROSC:Non-linearity and Calibration Hardware calibr
189、ation Proper trimming voltage step()Ensure voltage drop error rate:Verr/Vdropthres)send_status(neighbor)neighbor=neighbor.nextcounter=0Token FSMTILE0Activity10/2010/010/200/020/20TILE11Timet1t2t0Token exchanges are initiated continuously,not just when a tile activity starts or endsLUTToken count2102
190、0TokenCount2PID ControllerFtarget31.1GHz1.4GHzFtarget3LDO Ctrl.Always On(AON)BufferVlogic84LDO Ctrl.4Tile CLKTRO5LDO Ctrl.Vlogic50.8V1.0VactivityTDCFtile61.1GHzFtile61.4GHz14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orch
191、estration 2024 IEEE International Solid-State Circuits Conference15 of 21Distributed Hardware Power Management Concurrent execution of 5 accelerators under fixed 80mW power cap Without DHPM(baseline),each tile is allocated a fixed power With DHPM,power is dynamically reallocated among tiles 22-38%po
192、wer utilization improvement translating to 19-27%throughput improvement with full-hardware scalable implementation14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestration 2024 IEEE International Solid-State Circuits Con
193、ference16 of 21NoC-Based Data Orchestration NoC traffic with 11 accelerators executing in parallel“Contention”=#of cycles when a queue is full and asserts backpressure 7 different configurations of the memory hierarchy Scaling up the memory hierarchy alleviates contention and distributes traffic on
194、average1 LLC2 LLC3 LLC3 LLC+1 SPAD3 LLC+2 SPAD3 LLCs+3 SPAD3 LLCs+4 SPAD14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestration 2024 IEEE International Solid-State Circuits Conference17 of 2151%on averagePer-Applicatio
195、n Speedup vs.1 LLC ConfigurationNoC-Based Data Orchestration NoC traffic with 11 accelerators executing in parallel“Contention”=#of cycles when a queue is full and asserts backpressure 7 different configurations of the memory hierarchy Scaling up the memory hierarchy alleviates contention and distri
196、butes traffic oUsing all tiles improves performance by 51%on average n average14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestration 2024 IEEE International Solid-State Circuits Conference18 of 21NoC-Based Data Orches
197、tration Accelerators can access data from off-chip,LLC,SPAD,or directly from another acceleratorSelectable at runtime Each“mode”can offer advantages depending on the parameters of the accelerator invocation and the dynamic status of the system Flexible data orchestration is key to consistently good
198、performance across a variety of workloads!Heterogeneous Accelerator Data Access Modes14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestration 2024 IEEE International Solid-State Circuits Conference19 of 21SummaryManagin
199、g resources in a large,heterogeneous SoC that runs multiple simultaneous applications is a difficult system-level challengePerformanceDataContentionPowerUtilization14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestratio
200、n 2024 IEEE International Solid-State Circuits Conference20 of 21In memory of Davide Giri,without whom this chip would not have been possible14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestration 2024 IEEE Internation
201、al Solid-State Circuits Conference21 of 21Thank youMaico Cassel dos Santosmcasselcs.columbia.edu14.5:A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types,Distributed Hardware Power Management and NoC-Based Data Orchestration 2024 IEEE International Solid-State Circuits Conference22 of 21Ple
202、ase Scan to Rate Please Scan to Rate This PaperThis Paper14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Confer
203、ence1 of 42A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFETDongha Lee,Seki Kim,Takahiro Nomiyama,Dong-Hoon Jung,Dongsu Kim,Jongwoo Lee,Sungung KwakSamsung Electro
204、nics,Hwaseong,Korea14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference2 of 42Outline Motivation Proposed
205、 Computational Digital LDO(CDLDO)Time-based Exponential Control(TEC)with Slope DetectorStep-Back&Negative-Step Control with Pre-Computational ControllerOverall Structure Measurement Results Conclusion14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating
206、 Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference3 of 42Outline Motivation Proposed Computational Digital LDO(CDLDO)Time-based Exponential Control(TEC)with Slope DetectorStep-Back&Negative-Step Control
207、with Pre-Computational ControllerOverall Structure Measurement Results Conclusion14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Sol
208、id-State Circuits Conference4 of 42MotivationExynos 2400For mobile application processors,number of clusters and cores per die increasingBIG CPUMID CPULIT CPUExynos 2100 Exynos 9810BIG CPUBig CPUMID CPULIT CPUMIDCPU2 Clusters/8 Cores3 Clusters/8 Cores3 Clusters/10 CoresBIG CPULIT CPU14.6:A 10A Compu
209、tational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference5 of 42MotivationThe power rails of multicore CPUs have been merged b
210、y CPU cluster to simplify the PMIC-SoC power rails in limited PCB areaPower optimizing by using power gating switches(PGSs)*Power Gating Switches(PGSs)are distributed throughout the entire CPU physicallyCPU StateVCPUIdle TimePGSsActive0.5V 1.1V-Fully On(Bypass)Power Gating0V1msOff14.6:A 10A Computat
211、ional Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference6 of 42MotivationIntegrated LDO(ILDO)supports active and data retention
212、modes with dynamic voltage scaling(DVS)transientCPU StateVCPUIdle TimeILDOActive0.5V 1.1V-Fully On(Bypass)Partially On(LDO)Data Retention(Idle)0.3V 0.5V1us 1msPartially On(LDO)Power Gating0V1msOff14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Swi
213、tches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference7 of 42High Area CostSmall AreaChallenges of ILDO for Mobile ApplicationSteep Load Transition(di/dt=1A/ns)Small COUT(1msOff14.6:A 10A Computational Digital L
214、DO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference9 of 42The buffer and RC parasitics of the routing result in a propagation delay(Tprop)
215、increases the loop delay of the CDLDOAdopted a distributed CDLDO scheme to reduce the delay,but propagation delay still existsMotivation(Small Area)Propagation Delay Distributed CDLDOsTprop14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches a
216、nd Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference10 of 42RVDDVOUTPIDControllerADCMotivation(Fast-transient)ADC-Based DLDOPID control w/ADCILOADVOUTIDLDOVREF+C1+C2+C3+C4+C5Cn:Added CodeHigh current consumption for
217、fast-transientLarge area and design complexityLarge current difference makes overshoot or undershoot14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEE
218、E International Solid-State Circuits Conference11 of 42RVDDVOUTPIDControllerADCMotivation(Fast-transient)ADC-Based DLDOVoltage Sensing OnlyILOADVOUTIDLDOVREF+C1+C2+C3+C4+C5Cn:Added CodePID control w/ADCLargeDifferenceHigh current consumption for fast-transientLarge area and design complexityLarge cu
219、rrent difference makes overshoot or undershoot14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference12 of 4
220、2StabilityPropagation DelayNeed Low GainSteep Load Transition(di/dt=1A/ns)Small COUT(1Slow(IDLE)TECComputational ControllerC1=KpCn=Cn-1 x2Slow Mode(VCOMP=VDIFF)Fast Mode(VCOMP!=VDIFF)TEC with Delay14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Sw
221、itches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference24 of 42 The control code steps back to the previous cycle to remove the current overshootCnt 1Slow(IDLE)TECStep-BackComputational ControllerC1=KpCn=Cn-1 x2
222、Cn=-Cn-1Slow Mode(VCOMP=VDIFF)Fast Mode(VCOMP!=VDIFF)IDLDOILOAD+C2+C1+C3+C4C6=-C5+C5TEC+Step-BackVOUT+C1+C1Step-Back Control14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Applicatio
223、n in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference25 of 42Cnt=1Cnt 1Slow(IDLE)TECStep-BackNegative-StepComputational ControllerCnt=1C1=KpCnt=Cnt-1Cn=-KpCnt=Cnt x2Cn=Cn-1 x2Cnt=Cnt/2Cn=-Cn-1Slow Mode(VCOMP=VDIFF)Fast Mode(VCOMP!=VDIFF)Cnt 1Cnt=1 The negative-step control,which ch
224、anges the control code by-C1 until Cnt equals to 1,is used to reduce the overcurrentIDLDOILOAD+C2+C1+C3+C4C6=-C5-C1-C1-C1+C5TEC+Step-Back+Negative-StepVOUTNegative-Step Control14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based
225、 Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference26 of 42 Pre-computation scheme reduces the loop delayPre-Computational ControllerCnt=1Cnt 1Slow(IDLE)TECStep-BackNegative-StepPre-Computational ControllerSUBSUBFastSlowVCOMPCLK
226、BCODE,CNT,Cn,StateSelectionBlockADDERADDERPGSsCnt=1C1=KpCnt=Cnt-1Cn=-KpCnt=Cnt x2Cn=Cn-1 x2Cnt=Cnt/2Cn=-Cn-1COREVFBVSDVREFCLKBReplicaDelayCLKB_DSlow Mode(VCOMP=VDIFF)Fast Mode(VCOMP!=VDIFF)Cnt 1Cnt=1CODECORE14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power
227、-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference27 of 42Cnt=1Cnt 1Slow(IDLE)TECStep-BackNegative-StepPre-Computational ControllerSUBSUBFastSlowVCOMPCLKBCODE,CNT,Cn,StateSelectionBlockADDERADDERP
228、GSsCnt=1C1=KpCnt=Cnt-1Cn=-KpCnt=Cnt x2Cn=Cn-1 x2Cnt=Cnt/2Cn=-Cn-1COREVFBVSDVREFCLKBReplicaDelayCLKB_DSlow Mode(VCOMP=VDIFF)Fast Mode(VCOMP!=VDIFF)Cnt 1Cnt=1CODECOREPre-Computation SelectionPre-Computational Controller Pre-computation scheme reduces the loop delay14.6:A 10A Computational Digital LDO
229、Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference28 of 42Outline Motivation Proposed Computational Digital LDO(CDLDO)Time-based Exponential
230、 Control(TEC)with Slope DetectorStep-Back&Negative-Step Control with Pre-Computational ControllerOverall Structure Measurement Results Conclusion14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for
231、Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference29 of 42Overall StructureVREFDFast n:1Pre-Computational Controller ResultCLKB_DVCOMPLOAD(1A/ns)VREF_PREMax.Voltage SelectorCLK_DCLK_DVDREFSlope DetectorVSDPGSs and LoadDroopDetectorVOUTMainComp.CPU PLL(0.5-1.
232、7GHz)CLKVDROOP1KPMUXUD*OV*Fast(TEC)n+1n+12KP4KP256KPCLK_DReplicaDelayCLKBUD R*(+KP)OV R*SlowDelayCompDelayCompStabilizing Path(+2n-1KP)(-2n-1KP)(-KP)RVDD(=VBIG,0.55-1.1V)VDD(0.75V)DSlow n:1*Overshoot(OV)/Undershoot(UD)/Recovery(R)14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density
233、 with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference30 of 42Overall StructureVREFDFast n:1Pre-Computational Controller ResultCLKB_DVCOMPLOAD(1A/ns)VREF_PREMax.Voltage Selector
234、CLK_DCLK_DVDREFSlope DetectorVSDPGSs and LoadDroopDetectorVOUTMainComp.CPU PLL(0.5-1.7GHz)CLKVDROOP1KPMUXUD*OV*Fast(TEC)n+1n+12KP4KP256KPCLK_DReplicaDelayCLKBUD R*(+KP)OV R*SlowDelayCompDelayCompStabilizing Path(+2n-1KP)(-2n-1KP)(-KP)RVDD(=VBIG,0.55-1.1V)VDD(0.75V)DSlow n:1*Overshoot(OV)/Undershoot(
235、UD)/Recovery(R)Reference voltage generator for fast DVS(1mV/ns)Up to 1.7GHz operating comparator&pre-computational controller VOUTslope detection for stable regulation Droop detector for excessive load condition response Distributed power TRs utilizing PGSs in CPU core14.6:A 10A Computational Digita
236、l LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference31 of 42Overall StructureVREFDFast n:1Pre-Computational Controller ResultCLKB_DVCOMP
237、LOAD(1A/ns)VREF_PREMax.Voltage SelectorCLK_DCLK_DVDREFSlope DetectorVSDPGSs and LoadDroopDetectorVOUTMainComp.CPU PLL(0.5-1.7GHz)CLKVDROOP1KPMUXUD*OV*Fast(TEC)n+1n+12KP4KP256KPCLK_DReplicaDelayCLKBUD R*(+KP)OV R*SlowDelayCompDelayCompStabilizing Path(+2n-1KP)(-2n-1KP)(-KP)RVDD(=VBIG,0.55-1.1V)VDD(0.
238、75V)DSlow n:1*Overshoot(OV)/Undershoot(UD)/Recovery(R)Reference voltage generator for fast DVS(1mV/ns)Up to 1.7GHz operating comparator&pre-computational controller VOUTslope detection for stable regulation Droop detector for excessive load condition response Distributed power TRs utilizing PGSs in
239、CPU core14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference32 of 42Overall StructureVREFDFast n:1Pre-Com
240、putational Controller ResultCLKB_DVCOMPLOAD(1A/ns)VREF_PREMax.Voltage SelectorCLK_DCLK_DVDREFSlope DetectorVSDPGSs and LoadDroopDetectorVOUTMainComp.CPU PLL(0.5-1.7GHz)CLKVDROOP1KPMUXUD*OV*Fast(TEC)n+1n+12KP4KP256KPCLK_DReplicaDelayCLKBUD R*(+KP)OV R*SlowDelayCompDelayCompStabilizing Path(+2n-1KP)(-
241、2n-1KP)(-KP)RVDD(=VBIG,0.55-1.1V)VDD(0.75V)DSlow n:1*Overshoot(OV)/Undershoot(UD)/Recovery(R)Reference voltage generator for fast DVS(1mV/ns)Up to 1.7GHz operating comparator&pre-computational controller VOUTslope detection for stable regulation Droop detector for excessive load condition response D
242、istributed power TRs utilizing PGSs in CPU core14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference33 of
243、42Outline Motivation Proposed Computational Digital LDO(CDLDO)Time-based Exponential Control(TEC)with Slope DetectorStep-Back&Negative-Step Control with Pre-Computational ControllerOverall Structure Measurement Results Conclusion14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density
244、with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference34 of 42Die micrograph CDLDO7CDLDO8CPU ModelingCAPCAPPGSLOAD597m358m1194m1454m18m44mDroopDetectorSlopeDetectorComputationCon
245、trollerComparatorCPUModeling Unit68.5m70mCDLDO5CDLDO6CDLDO3CDLDO4CDLDO1CDLDO2Fabricated in 3nmGAAFET processDistributed scheme with 8 CDLDOs and 8 CPU modeling units14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Trans
246、ient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference35 of 42CDLDO Measurement Result1mA6.5AFast Mode OnlySlow Mode OnlyCDLDO Undershoot 0.82V to 0.77V50mV100ns10mV20ns88mV94mV248mV1ns Slope94.4mV30.4mV28mVRipple 0.82V to 0.77VUndershootCDLD
247、OFast ModeOnlyOvershootRipple94.4mV28mV30.4mV248mV94mV88mV151mV61.2mV102.9mV(Unstable)Slow ModeOnlyLoad Transient&Ripple Voltage Summary50mV400ns151mV102.9mV61.2mV6.5A0.5A1ns SlopeOvershoot 0.85V to 0.7V1mA6.5AFast Mode OnlySlow Mode OnlyCDLDO Undershoot 0.82V to 0.77V50mV100ns10mV20ns88mV94mV248mV1
248、ns Slope94.4mV30.4mV28mVRipple 0.82V to 0.77VUndershootCDLDOFast ModeOnlyOvershootRipple94.4mV28mV30.4mV248mV94mV88mV151mV61.2mV102.9mV(Unstable)Slow ModeOnlyLoad Transient&Ripple Voltage Summary50mV400ns151mV102.9mV61.2mV6.5A0.5A1ns SlopeOvershoot 0.85V to 0.7VCDLDOFast ModeOnlySlow ModeOnlyUndersh
249、oot94mV88mV248mV14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference36 of 42CDLDO Measurement Result1mA6.
250、5AFast Mode OnlySlow Mode OnlyCDLDO Undershoot 0.82V to 0.77V50mV100ns10mV20ns88mV94mV248mV1ns Slope94.4mV30.4mV28mVRipple 0.82V to 0.77VUndershootCDLDOFast ModeOnlyOvershootRipple94.4mV28mV30.4mV248mV94mV88mV151mV61.2mV102.9mV(Unstable)Slow ModeOnlyLoad Transient&Ripple Voltage Summary50mV400ns151m
251、V102.9mV61.2mV6.5A0.5A1ns SlopeOvershoot 0.85V to 0.7V1mA6.5AFast Mode OnlySlow Mode OnlyCDLDO Undershoot 0.82V to 0.77V50mV100ns10mV20ns88mV94mV248mV1ns Slope94.4mV30.4mV28mVRipple 0.82V to 0.77VUndershootCDLDOFast ModeOnlyOvershootRipple94.4mV28mV30.4mV248mV94mV88mV151mV61.2mV102.9mV(Unstable)Slow
252、 ModeOnlyLoad Transient&Ripple Voltage Summary50mV400ns151mV102.9mV61.2mV6.5A0.5A1ns SlopeOvershoot 0.85V to 0.7VCDLDOFast ModeOnlySlow ModeOnlyUndershoot94mV88mV248mVOvershoot61.2mV102.9mV(Unstable)151mV14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Ga
253、ting Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference37 of 42CDLDO Measurement Result1mA6.5AFast Mode OnlySlow Mode OnlyCDLDO Undershoot 0.82V to 0.77V50mV100ns10mV20ns88mV94mV248mV1ns Slope94.4mV30.4mV
254、28mVRipple 0.82V to 0.77VUndershootCDLDOFast ModeOnlyOvershootRipple94.4mV28mV30.4mV248mV94mV88mV151mV61.2mV102.9mV(Unstable)Slow ModeOnlyLoad Transient&Ripple Voltage Summary50mV400ns151mV102.9mV61.2mV6.5A0.5A1ns SlopeOvershoot 0.85V to 0.7V1mA6.5AFast Mode OnlySlow Mode OnlyCDLDO Undershoot 0.82V
255、to 0.77V50mV100ns10mV20ns88mV94mV248mV1ns Slope94.4mV30.4mV28mVRipple 0.82V to 0.77VUndershootCDLDOFast ModeOnlyOvershootRipple94.4mV28mV30.4mV248mV94mV88mV151mV61.2mV102.9mV(Unstable)Slow ModeOnlyLoad Transient&Ripple Voltage Summary50mV400ns151mV102.9mV61.2mV6.5A0.5A1ns SlopeOvershoot 0.85V to 0.7
256、VCDLDOFast ModeOnlySlow ModeOnlyUndershoot94mV88mV248mVOvershoot61.2mV102.9mV(Unstable)151mVRipple30.4mV94.4mV28mV14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm G
257、AAFET 2024 IEEE International Solid-State Circuits Conference38 of 42CDLDO Measurement Result0.49VDVS Rising1.4mV/nsDVS fall=1.3mV/ns Fast DVS Rising(0.49V to 0.82V)0.82V238ns329mV0.8V0.51VDVS Falling1.3mV/ns290mV224ns Fast DVS Falling(0.8V to 0.51V)14.6:A 10A Computational Digital LDO Achieving 263
258、A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference39 of 42CDLDO Measurement Result1.1V to 1.05V1V to 0.95V0.9V to 0.85V0.8V to 0.75V0.7V to 0.65V0.55V to
259、 0.5VLoad RegulationCurrent EfficiencyPeak Efficiency 99.82%14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Con
260、ference40 of 42This Work5236Process3nm GAAFET10nm FinFET5nm FinFET3nm GAAFET10nm FinFETTECPID w/Flash ADCPID w/Flash ADCI+Event DrivenPID w/TDCDistributedDistributedDistributedCentralizedDistributedPower TRPGS Re-UsedVIN(V)0.55 1.10.6 1.60.55 0.80.5 1.10.7 1.05VDROPOUT(mV)5070505050VOUT(V)0.5 1.050.
261、55 1.30.5 0.750.45 1.050.65 0.95DVS Rate(mV/ns)1.4-Current Density(A/mm2)263.16125.144034.1521.14VOUT Droop(mV)941982138200IL/t6.5A/1ns30A/2ns1A/1us1A/1ns1.17A/1nsIL Range(mA)0 100000.1 370001 64000 140028 2740Current Eff.(%)99.8299.7799.8999.9898.60IQ(A)1817078000730025057000COUT(nF)3207501000400.5
262、Area(mm2)0.0380.2970.160.0410.126FOM*(ps)9.3210.1792.850.615.20ControlLDO DedicatedPerformance Summary14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 I
263、EEE International Solid-State Circuits Conference41 of 42Outline Motivation Proposed Computational Digital LDO(CDLDO)Time-based Exponential Control(TEC)with Slope DetectorStep-Back&Negative-Step Control with Pre-Computational ControllerOverall Structure Measurement Results Conclusion14.6:A 10A Compu
264、tational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE International Solid-State Circuits Conference42 of 42Conclusion CDLDO reuses power gating switches(PGS)s already d
265、istributed throughout the entire CPUEliminating power-FET area and routing overhead Time-based exponential control(TEC)with slope detectorFast-transient response,improving stability Step-back&negative-step control with pre-computational controllerImproving stability,reducing control delay Proposed C
266、DLDO achieves263.16 A/mm2Current density94mV voltage droop at 6.5A/1ns1.4mV/ns DVS rate14.6:A 10A Computational Digital LDO Achieving 263A/mm2Current Density with Distributed Power-Gating Switches and Time-Based Fast-Transient Controller for Mobile SoC Application in 3nm GAAFET 2024 IEEE Internation
267、al Solid-State Circuits Conference43 of 42Please Scan to Rate Please Scan to Rate This PaperThis Paper14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference1 of 40A 0.45V 0.72mW
268、2.4GHz Bias-Current-Free Fractional-N Hybrid PLL Using a Voltage-Mode Phase Interpolator in 28nm CMOSLiqun Feng,Xuansheng Ji,Longhao Kuang,Qianxian Liao,Su Han,Jiahao Zhao,Woogeun Rhee,Zhihua WangSchool of Integrated CircuitsTsinghua University,Beijing,China14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Fr
269、ee Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference2 of 40Outline Motivation Voltage-Mode Phase Interpolation Bias-Current-Free Fractional-N Hybrid PLLOverall ArchitectureVoltage-Mode Phase InterpolatorOther Key Build
270、ing Blocks Measurement Results Conclusion14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference3 of 40Motivation Lowering VDDis the most efficient way to improve energy efficienc
271、y Ultra-low voltage(ULV)enables energy-harvesting IoT devicesVivek De,Intel0.2V0.5VIoT Devices0.5VVSSVDDw/o Voltage BoosterPtotal=fCK(CLVDD+tscIpeakVDD)+IleakageVDD2Energy Harvester14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMO
272、S 2024 IEEE International Solid-State Circuits Conference4 of 40ULV Fractional-N PLL PLL has become the bottleneck of entire ULV systems ULV Frac.-N PLL without voltage booster is highly desiredULV Frac.-N PLL0.5VQ-Noise Reduction High Linearity High Resolution Low Noise High Supply Rejection Phase
273、DetectorVoltage(V)0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2FoM(dB)-210-220-230-240-250Ring PLLLC PLLJSSC18CICC20TCAS-I21JSSC19ISSCC07JSSC18JSSC19TCAS-I22JSSC21JSSC23JSSC21JSSC21RFIC21JSSC21JSSC21Low-Voltage Frac.-N PLLTargetNo Sub-0.5V Frac.-N PLLRequirementsTwo Major Issues14.7:A 0.45V 0.72mW 2.4GHz Bias
274、-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference5 of 40Phase Detector under ULV Main PD structures suffer under ULV operationUPDNPFDiUPVbpVbnDIViDNVctrlREFLimitedVolt.Headroom Degradedspur level Mismatch
275、btw.iUP&iDNPFD+CPVDDVSSCKViCPGmREFReduced VSWHigh RonCsPULSamp.ACKSSPD Degradedin-band PN Large on-resistanceREFDIVQ0VDDtresDecapNoiseQ1Qn Sensitive toVDDnoise DegradedresolutionTDCREFDIVNotchFilterVoltage-Mode PDREFDIVNotchFilterRSQ Simp.&robust High linearity Sensitive toVDDnoise Large spurLimited
276、Volt.Headroom14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference6 of 40Spur Cancellation for Voltage-Mode PDDual-Path Quadrature XORVCTRLY.Liang,T-MTT22Time-Interleaving FFPDL
277、.Feng,A-SSCC23REFDIVREFIREFQDIVIDIVQXOPXON I/Q generation I/Q mismatch Bias current req.Ref.spur cancel.Current mismatch w/o notch filter Rising edge only Simple structure Bias-current-free Ref.spur cancel.High linearity w/o notch filterREFDIVREFPREFNDIVPDIVNPDPPDNREFIDIVQREFQDIVIV2IV2IREFDIVDIV2DIV
278、2XOPXONIPINSRQRQSPDPPDNVCOREFDIVREFPDIVNREFPDIVNLPFLPFTITICKV14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference7 of 40Voltage-Domain CompensationQ-Noise Reduction under ULV D
279、TC&Time-to-Voltage conversion not suitable under ULV Due to dependency on absolute valuesREFFCWCKVMMDPDDEDTCACCDIVREF Intrinsic latency Not stable&accur.Bad resolution under low VDDREFREFtVDDt T-to-V conversion Nonlinear&costlyCKVSampling PDREFVDACVEVsamt Switch&Isource Large headroomIsourceRonSWFCW
280、ACCtVsamTime-Domain Compensationt14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference8 of 40Outline Motivation Voltage-Mode Phase Interpolation Bias-Current-Free Fractional-N H
281、ybrid PLLOverall ArchitectureVoltage-Mode Phase InterpolatorOther Key Building Blocks Measurement Results Conclusion14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference9 of 40V
282、oltage-Mode Phase Interpolation Voltage interpolation during interpolation window ABAFCWACCPIOInterpolation Window Gen.Voltage InterpolationVREFPVoltage DACABVREFNREFMMDCont.LogicVPIDQDIVDQBCKVRetiming14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Inte
283、rpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference10 of 40Voltage-Mode Phase Interpolation Voltage interpolation during interpolation window AB Equal amount of compensated phase by Volt.-PI and Time-PIOperating PrincipleCKVAFCWACCPIOInterpolation Window Gen.Voltage Interpo
284、lationArea 1Area 2VREFPVoltage DACABVREFNREFMMDRetimingCont.LogicVPINet V*T=0 REFABTime-PIVoltage-PIPIO(REF-VPI)TFSVFSVFSArea 1Area 2(1-)VFSVFSTFS=TCKV=Same Phase Variation TimeVoltage0AB=KVCOVFS(1-)TFS=KVCO(1-)VFSTFSDQDIVDQB14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a
285、 Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference11 of 40Voltage-Mode Phase Interpolation Realized by resistor-based DAC(RDAC)Stable voltage reference+Resistor-ratio-based scalingRDACVDACVDDR1R2VDAC=R2R1+R2VFSVFS=VDDRDAC=11/R1+1/R2RDACBGRStable Ref
286、.BaseFull-Range:VDD-VSS No Gain Cal.Accuracy:Res.Ratio Most Accurate in ICRequirement:Constant Out.Res.Q-Noise Reduction by VPICtrl.Word14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuit
287、s Conference12 of 40Outline Motivation Voltage-Mode Phase Interpolation Bias-Current-Free Fractional-N Hybrid PLLOverall ArchitectureVoltage-Mode Phase InterpolatorOther Key Building Blocks Measurement Results Conclusion14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Volt
288、age-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference13 of 40DQQSROverall ArchitectureMASH329CARRYFCW13INT4:07PI ControllerFRAC6:06 MSBs7MMDTI-FFPDFDClass-DD/VCOProportional PathDual BBPDsMASH16 LSBIntegral PathBBPDOAREFDCCal.DCCDTCDQBBPDOB9 MSB15DecoderDQD
289、ec.Logic15ABCont.ABFFPDsVPIRetiming DFFsABDIVQSRPDPAPDPBFSMBBPDOB2(Ref.Spur Cancellation)(Q-Noise Reduction)CM Ripple CancelledCKVTIDual TIsREF1212FSMOCKV/4PDIOVCPVCN77Cont.LogicFRAC6:0RDACDACPVCM=VDD/2VREFPDFF7DIVDIVCont.PDNAPDNBQSRQRSFFPDsCont.LogicVREFNRDACDACNVREFPVREFNDIV5INT4:0BBPDOA14.7:A 0.4
290、5V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference14 of 40DQQSROverall Architecture Proportional PathMASH329CARRYFCW13INT4:07PI ControllerFRAC6:06 MSBs7MMDTI-FFPDFDClass-DD/VCOProporti
291、onal PathDual BBPDsMASH16 LSBIntegral PathBBPDOAREFDCCal.DCCDTCDQBBPDOB9 MSB15DecoderDQDec.Logic15ABCont.ABFFPDsVPIRetiming DFFsABDIVQSRPDPAPDPBFSM2(Ref.Spur Cancellation)(Q-Noise Reduction)CM Ripple CancelledCKVTIDual TIsREF1212FSMOCKV/4PDIOVCPVCN77Cont.LogicFRAC6:0RDACDACPVCM=VDD/2VREFPDFF7DIVDIVC
292、ont.PDNAPDNBQSRQRSFFPDsCont.LogicVREFNRDACDACNVREFPVREFNDIV5INT4:0BBPDOABBPDOB14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference15 of 40Operating Principle Proportional Path
293、Pseudo-differential paths realized by TI-FFPD VPI merged with duty-cycle-based phase detectionQSRTI-FFPDABCont.FFPDsVPIQSRPDPAPDPBTIDual TIsREF121277Cont.LogicFRAC6:0RDACDACPVREFPCont.PDNAPDNBQSRQRSFFPDsCont.LogicVREFNRDACDACNVREFPVREFNTo LPFTo LPFPDP/NAPDP/NBRDACINPUT6:0LOW000000000HIGH111111111POS
294、.10FRACNEG.01FRACStateSig.RDAC Control LogicREFPDPAPDPBPDNAPDNBDACPDACNDACP-DACNABREF1REF2B1B2TCKVTREFSuppressed by LPFSame AreaSame Area0A1A2Same AreaB1A1Interp.Win.14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE Int
295、ernational Solid-State Circuits Conference16 of 40DQQSROverall Architecture Integral PathMMDTI-FFPDFDClass-DD/VCOProportional PathDual BBPDsMASH16 LSBIntegral PathBBPDOAREFDCCal.DCCDTCDQBBPDOB9 MSB15DecoderDQDec.Logic15ABCont.ABFFPDsVPIRetiming DFFsABDIVQSRPDPAPDPBFSM2(Ref.Spur Cancellation)(Q-Noise
296、 Reduction)CM Ripple CancelledCKVTIDual TIsREF1212FSMOCKV/4PDIOVCPVCN77Cont.LogicFRAC6:0RDACDACPVCM=VDD/2VREFPCont.PDNAPDNBQSRQRSFFPDsCont.LogicVREFNRDACDACNVREFPVREFNINT4:0BBPDOAMASH329CARRYFCW13INT4:07PI ControllerFRAC6:06 MSBs7DFF7DIVDIVDIV5BBPDOB14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fract
297、ional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference17 of 40BBPDOABBPDOBPDIONOR.010LEAD111LAG00-1DQOperating Principle Integral Path Provide freq.track.capability to remove static phase error REF always stays between Aand Bwit
298、hout frequency offsetMASH16 LSBBBPDOADQ9 MSB15DecoderDQDec.Logic15ABBBPDOBFSMOCKV/4PDIOREFREFDCStateSig.I-PathDecision LogicREFBBPDABBPDBPDIOABTCKV010NORMAL1110-10LEADLAGDecision TimeDCWDCWDual BBPDsFSMTime(us)VCP-VCN(mV)010203040502563200102MHz initial freq.offsetI-path onP-path only14.7:A 0.45V 0.
299、72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference18 of 40DQOperating Principle Integral Path Single BBPD in I-path keeps toggling under frac.-N operation Dual-BBPD structure avoids togglin
300、g and gain degradationBBPDOABBPDOBBBPDODQDIVREFREFA,BStays at 0REFDIVBBPDOBBPDOABBPDOBStays at 1TogglingConventional Single BBPDProposed Dual BBPDsDQREFABD/VCOREFBBPD-QTotal Cal.Total Sim.DSM-Q104105106107108Frequency(Hz)-80-90-100-110-120-130-140-150PN(dBc/Hz)BBPD gain degraded by Q-noiseFrac.-N HP
301、LL w/i Single BBPD in I-pathFrac.-N HPLL w/i Dual BBPDs in I-path104105106107108Frequency(Hz)D/VCOREFBBPD-QTotal Cal.Total Sim.DSM-Q-80-90-100-110-120-130-140-150PN(dBc/Hz)BBPD Q-noise reduced by high low-freq.gain 14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-M
302、ode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference19 of 40DQOverall Architecture FD&DCCMASH329CARRYFCW13INT4:07PI ControllerFRAC6:06 MSBs7FDDual BBPDsBBPDOAREFCal.DCCDTCDQBBPDOBAB2REFDFF7DIVDIVDIV5BBPDOAQSRMMDTI-FFPDClass-DD/VCOProportional PathDCABCont.FFPDs
303、VPIRetiming DFFsABDIVQSRPDPAPDPB(Ref.Spur Cancellation)(Q-Noise Reduction)CM Ripple CancelledCKVTIDual TIs1212VCPVCN77Cont.LogicFRAC6:0RDACDACPVCM=VDD/2VREFPCont.PDNAPDNBQSRQRSFFPDsCont.LogicVREFNRDACDACNVREFPVREFNINT4:0MASH16 LSBIntegral Path9 MSB15DecoderDQDec.Logic15FSMFSMOCKV/4PDIOBBPDOB14.7:A 0
304、.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference20 of 40Operating Principle FD&DCC Less toggling with dual-BBPD logic Coarse DTC with 20ps resolution realizes-80dBc ref.spurFrequen
305、cy DoublerREFDuty-Cycle CorrectionDTCBBPDOBREFBBPDOADQCal.Logic6CALOOFFSET+1,-1,+1,-1.BBPDOABBPDOBCALOCYCLE=1 CYCLE=-1NOR.0100POS11-11NEG001-1StateSig.DCCCal.Logic326CODEOFFSETCODERef.Spur v.s.DTC Resolution0TREFTREF/2tVPD-VPD=tRES/2Resolution:tRESLPF Reduction Effect-90-85-80-75-70-65-60DTC Resolut
306、ion(ps)10100Sim.SpurCal.SpurSpur(dBc)Pspur=20log(KVCOVPDtRESfP1fREFfPfREF)2050200Transient WaveformTime(us)00.511.522.53060030DCCon5%duty-cycle errorPD Output Volt.VPDCYCLE14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IE
307、EE International Solid-State Circuits Conference21 of 40Outline Motivation Voltage-Mode Phase Interpolation Bias-Current-Free Fractional-N Hybrid PLLOverall ArchitectureVoltage-Mode Phase InterpolatorOther Key Building Blocks Measurement Results Conclusion14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free
308、 Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference22 of 40Voltage-Mode Phase Interpolator(VPI)Combination of 4b binary R-2R DAC and 3b thermometer DAC Unit R value:Trade-off between power,linearity,area,and noise2R2R2R
309、2R2R2R2RDecoderD0D1D2VDACD32RControl LogicINAINBD0FRACPDP/NBPDP/NACODE6:03b Therm.MSBs7 Res.4b Binary LSBsRRRRSW1SW2SW7D1D2D3D4D5D6SW1SW2SW7 Simple structure Passive-intensive Suitable for ULV Bias-current-freeVREFPParametersR5k(W/L)n16u/30n(W/L)p24u/30nTotal R140kPower70WVREFN Low power consumption
310、14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference23 of 40Output Resistance Constant output resistance(load independent)Variable output voltage Voltage interpolation VFS=VREF
311、P-VREFN PD gain&PLL BW controlRDACVDACRDAC=2R/8+RSWZLPF1Thevenin EquivalentVDAC=VFS(2iDi)/128i=06RLPFCLPFRt=RDAC+RLPF(RSW R)VR-2R,eqSW1SW72R2R2RZLPF1RSW,R-2R(W/L)P(W/L)N(W/L)P(W/L)NVREFPVREFNTotal Resistance to CLPFVREFPVREFNEquivalent Circuit14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N
312、 Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference24 of 40Noise Analysis RDACfunctions the same as RLPFto realize the first pole Almost NO extra noise added except flicker noise from switchesPhase Noise ModelREFKFFPD1/NZLPF(s)-SDSM
313、SRDACSRLPFKVCOsSDSMSREFVCOSD/VCOSame Effect-Predicted Phase Noise2.102.152.202.051017105107109103Post-Sim.NoiseSSW,1/fSR,SW+SR,par.4kTRDACSRDAC=4kTRDAC+SSW,1/fRDAC=2R/8+RSWRDAC Noise104105106107108Frequency(Hz)PN(dBc/Hz)-50-160-150-140-130-120-110-100-90-80-70-60D/VCOREFLPFRDACBBPD-QDSMTotalReduced
314、Q-NoiseFreq.(Hz)Noise(V2/Hz)KBBPDSBBPD-QsTREFsTVCONKTSRDAC0=4kT(2R/8)14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference25 of 40Linearity Consistent performance with supply vo
315、ltage scaling down 0.15 LSB INL at 0.5V across P&T variations(calibration-free)Code20406080 100 120INL(LSB)-0.15-0.10-0.0500.100.150.200.05-0.20Code20406080 100 120INL(LSB)-0.15-0.10-0.0500.050.100.15tt 0.5V 25MC runs0.5V 25Post-Layout Sim.Results20406080 100 120INL(LSB)00.10-0.10-0.05-0.05Codett 25
316、tt 0.5V 25VDDvaries from 0.5V to 1VStep=0.1VDNL(LSB)000Code20406080 100 12000.040.02-0.02-0.04tt 0.5V 25VDDvaries from 0.5V to 1VStep=0.1Vtt 250INL over SupplyINL over CornersINL over MC runsDNL over Supply0.5Vff-40ff 25ff 85fs-40fs 25fs 85sf-40sf 25sf 85tt-40tt 25tt 85ss-40ss 25ss 8514.7:A 0.45V 0.
317、72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference26 of 40VPI Under Supply Variation DTC suffers from supply variation Large decoupling capacitor VPI has no Q-noise folding thanks to ratio-
318、based voltage scalingmmPDDD0V(t)=V+V sin(2f t)VDD0FBDTCVPIABQ-NoiseExtra Noise,Q-noise FoldingVPIVPI(1-)VFSPIO(REF-VPI)REFASame area kept,Q-noise cancel.exactlyVFSVFSVFSBVFS(1-)VFSTCKVTCKV(:Acc.FCWFRAC)Supply Modulation:VPDDTC(1-)VFSVFSSupply drop by VFSduring Aand BFBDTCREFTCKVABREFREFPIOSupply dro
319、p by VFSduring Aand B(Assume a constant mod.volt.during Aand Bfor fm fCKV)14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference27 of 40VPI Under Supply VariationVDD0=0.5V,Vm=20m
320、V,fm=80MHzFrac.-N PLL w/o VPI Degraded in-band noise without VPI Same in-band noise with VPI Supply noise immunityFrequency(Hz)Same in-band PNw/i supply mod.w/o supply mod.104105106107-80-90-100-110-120-130-140-150108PN(dBc/Hz)104105106107108Degraded in-band PNw/i supply mod.w/o supply mod.Frequency
321、(Hz)-80-90-100-110-120-130-140-150PN(dBc/Hz)Frac.-N PLL w/i VPImmPDDD0V(t)=V+V sin(2f t)VDD0VPDSupply Modulation:14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference28 of 40Out
322、line Motivation Voltage-Mode Phase Interpolation Bias-Current-Free Fractional-N Hybrid PLLOverall ArchitectureVoltage-Mode Phase InterpolatorOther Key Building Blocks Measurement Results Conclusion14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpol
323、ator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference29 of 40Class-D D/VCO Class-D D/VCO with low-voltage and high efficiency CMRR of differential varactor affects ref.spur performance 5b PVT7b Therm.+2b BinaryVBDCPVTFromP-PathFromI-PathVCPVCNCKVPCKVNTo BUFTo MMDCKVPCKVNDifferent
324、ial VaractorDiff.RippleCM Ripple10mVVCPPspur=40dBcKVCO=200MHz/V20nsfCMDM 30kHzPspur 70dBcIdeally f=0Spur ComparisonVCPVCNCKVPCKVNVCN10mVVCP20nsVCNf=KVCOVripple=1MHz(Initial Cal.)14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2
325、024 IEEE International Solid-State Circuits Conference30 of 40MMD and TI-FFPD Customized logic cells with low-Vthtransistors Retiming DFFs inserted to break critical pathsPrescaler4/53-bit CounterCKV2-bit CounterDIVStatic Pulse-Swallow DividerDivision Ratio:12-31DQRetiming DFFINT1:0INT4:2MC01SELBOUT
326、PD QQ1-to-2 DEMUXTime InterleaverINVSSSELSELSELBBuilt-In ControllerRVDDQQFFPDSNRST01VSSOUTN 14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference31 of 40Outline Motivation Volta
327、ge-Mode Phase Interpolator Bias-Current-Free Hybrid PLL Circuit Implementation Measurement Results Conclusion14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference32 of 40Chip Mi
328、crograph1.LC D/VCO2.TI-FFPDs3.BBPDs4.VPI*25.LPF*26.MMD7.DCC(DTC)8.DCC(Cal.Logic)9.FSM10.PI Controller11.MASH312.Decoder13.Output Buffer14526789101112133400m600mBlocksPower 0.45V(mW)Power 0.5V(mW)D/VCO0.49*0.49*VPI0.060.07MMD0.060.075Digital0.0750.095Others0.0350.05Total0.720.78*D/VCO works at 0.4V.I
329、mplemented in 28nm CMOS14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference33 of 40Phase Noise fREF=50MHz*2,FCW=24.2008920.45V SupplyInteger-NIntegrated Jitter:549.3fsFractiona
330、l-N w/i VPIIntegrated Jitter:592.7fsFractional-N w/o VPIIntegrated Jitter:663.5fsQ-Noise Reduction by VPI0.5V SupplyInteger-NIntegrated Jitter:481.9fsFractional-N w/i VPIIntegrated Jitter:517.5fsFractional-N w/o VPIIntegrated Jitter:618.1fsQ-Noise Reduction by VPI14.7:A 0.45V 0.72mW 2.4GHz Bias-Curr
331、ent-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference34 of 40Phase Noise Measured PN and jitter across VDDand tuning range Jitter reduction over higher power improves FoMJITwhen VDD0.45V0.5V0.55V2200 2300 2400 250
332、0 2600600550500450PN(dBc/Hz)-90-100-110-120-130-140-150Frequency(Hz)1041051061071080.55V0.50V0.45VJitter(fs)0.850.800.750.70Power(mW)-247.5-247-246.5-246FoMJIT(dB)0.450.50.55VDD(V)Frequency(Hz)FoMJITv.s.VDDPN v.s.VDDJitter v.s.Freq.592.7fs 0.45V517.5fs 0.50V470.2fs 0.55V14.7:A 0.45V 0.72mW 2.4GHz Bi
333、as-Current-Free Fractional-N Hybrid PLLUsing a Voltage-Mode Phase Interpolator in 28nm CMOS 2024 IEEE International Solid-State Circuits Conference35 of 40Spectrum fREF=50MHz*2,FCW=24.200892In-BandOut-Band-57.7dBc446.5kHz-69.5dBc29.7MHz0.45V 0.45V Ref.Spur-70.2dBc50MHz0.5V 0.5V-61.0dBc446.5kHz-67.3dBc29.7MHzRef.Spur-67.3dBc50MHzIn-BandOut-Band14.7:A 0.45V 0.72mW 2.4GHz Bias-Current-Free Fractional