《DS1 - Handout.pdf》由會員分享,可在線閱讀,更多相關《DS1 - Handout.pdf(30頁珍藏版)》請在三個皮匠報告上搜索。
1、ISSCC 2024 DEMONSTRATION SESSIONDS1 Monday February 19,2024:5:00-7:00 PM DS2 Tuesday February 20,2024:5:00-7:00 PM Further,I wish to recognize Brad Phil l ips(MiraSMART Conferencing)and Steve Bonney(S3 iPubl ishing)for the structuring and formatting of this handout,and the tabl et version(avail abl
2、e for downl oad from ISSCC 2024).Final l y,I woul d l ike to acknowl edge the vision and encouragement of the past ISSCC Conference Chair,Anantha Chandrakasan(MIT),for his l eadership in the real ization of the demonstration-session idea.Enjoy!Laura Chizuko Fujino ISSCC Director of Publ ications&Pre
3、sentations l February 2024ISSCC 2024 DEMONSTRATION SESSION WELCOMEWhat is an I SSCC Demonstration Session?Demonstration sessions are designed to augment the experience of al l attendees by providing an opportunity for direct interaction with authors of sel ected papers and view some of their concret
4、e resul ts.At their demonstration,the authors wil l il l ustrate their research resul ts face-to-face,providing attendees with a more hands-on experience.Overal l,these Sessions wil l:Demonstrate chip operation.Provide opportunity for in-depth discussion with the chip creators.The Demonstration Sess
5、ion 1(DS1)wil l be hel d on Monday,February 19,from 5:00 to 7:00 pm PST,and the Demonstration Session 2(DS2)wil l be hel d on Tuesday,February 20,from 5:00 to 7:00 pm PST.Anal og Subcommittee:Minkyu Je,KAIST,Daej eon,Kor ea Shon-Hang Wen,Medi aTek,Hs i nchu,Tai wan Data Converters Subcommittee:Ying-
6、Zu Lin,Medi atek,Hs i nchu,Tai wan Shiyu Su,Uni v er s i ty of Water l oo,Los Angel es,CA Digital Architectures&Systems Subcommittee:Ji-Hoon Kim,Ewha Womans Uni v er s i ty,Seoul,Kor ea Mark Anders,Intel,Hi l l s bor o,OR Digital Circuits Subcommittee:Eric Fang,Medi aTek,Hs i nchu,Tai wan Akihide Sa
7、i,Tos hi ba,Kawas ak i,Japan I MMD Subcommittee:Taekwang Jang,ETH Zur i ch,Zur i ch,Swi tz er l and Sanshiro Shishido,Panas oni c,Os ak a,Japan Memory Subcommittee:Seung-Jae Lee,Sams ung,Hwas eong,Kor ea Juang-Ying Chueh,Etr on,Tai pei,Tai wan Power Management Subcommittee:Xugang Ke,Zhej i ang Uni v
8、 er s i ty,Hangz hou,Chi na Gael Pil l onnett,CEA-Leti,Gr enobl e,Fr ance RF Subcommittee:Yves Baeyens,Nok i a-Bel l Labs,Mur r ay Hi l l,NJ Jeff Wal l ing,Vi r gi na Tech,Bl ack s bur g,VA Security Subcommittee:Yong-Ki Lee,Sams ung,Suwon,Kor ea Technol ogy Directions Subcommittee:Guy Torfs,Ghent Un
9、i v er s i ty,Gent,Bel gi um Denis Dal y,Appl e,Wel l es l ey,MA Wirel ess Subcommittee:Negar Reiskarimian,Mas s achus etts Ins ti tute of Technol ogy,Cambr i dge,MA Yun Yin,Fudan Uni v er s i ty,Shanghai,Chi na Wirel ine Subcommittee:Tamer Al i,Medi aTek,Ir v i ne,CA Ben Rhew,Sams ung,Hwas eong,Kor
10、 eaAcknowl edgements:In the preparation of these demonstration sessions,I wish to first acknowl edge the authors of the participating papers.Their work has been organized and structured under the Chairmanship of Patrick Mercier(University of Cal ifornia,San Diego),and the Demonstration Session Commi
11、ttee,consisting of:2.4 ATOMUS:A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Appl ications Chang-Hyo Yu,Hyo-Eun Kim,Sungho Shin,Kyeongryeol Bong,Hyunsuk Kim,Yoonho Boo,Jaewan Bae,Minjae Kwon,Karim Charfi,Jinseok Kim,Hongyun Kim,Myeongbo Shim,Changsoo Ha,Wongyu Shin,Jae-Sung Yoon,Miock
12、 Chi,Byungjae Lee,Sungpil l Choi,Donghan Kim,Jeongseok Woo,Seokju Yoon,Hyunje Jo,Hyunho Kim,Hyungseok Heo,Young-Jae Jin,Jiun Yu,Jaehwan Lee,Hyunsung Kim,Minhoo Kang,Seokhyeon Choi,Seung-Goo Kim,Myunghoon Choi,Jungju Oh,Yunseong Kim,Haejoon Kim,Sangeun Je,Junhee Ham,Juyeong Yoon,Jaedon Lee,Seonhyeok
13、Park,Youngseob Park,Jaebong Lee,Boeui Hong,Jaehun Ryu,Hyunseok Ko,Kwanghyun Chung,Jongho Choi,Sunwook Jung,Yashael Faith Arthanto,Jonghyeon Kim,Heejin Cho,Hyebin Jeong,Sungmin Choi,Sujin Han,Junkyu Park,Kwangbae Lee,Sung-il Bae,Jaeho Bang,Kyeong-Jae Lee,Yeongsang Jang,Jungchul Park,Sanggyu Park,Jueo
14、n Park,Hyein Shin,Sunghyun Park,Jinwook Oh Rebel l ions,Seongnam-si,Korea 2.7 BayesBB:A 9.6Gbps 1.61ms Configurabl e Al l-Message-Passing Baseband-Accel erator for B5G/6G Cel l-Free Massive-MI MO in 40nm CMOS Yi Zhang*1,2,Wenyue Zhou*1,2,Yiwei Zhang1,2,Houren Ji1,2,Yongming Huang1,2,Xiaohu You1,2,Ch
15、uan Zhang1,2,1Southeast University,Nanjing,China;2Purpl e Mountain Laboratories,Nanjing,China*Equal l y Credited Authors(ECA)3.1 A PVT-I nsensitive Sub-Ranging Current Reference Achieving 11.4ppm/C from-20C to 125C Pangi Park1,Junghyup Lee2,SeongHwan Cho1,1Korea Advanced Institute of Science and Tec
16、hnol ogy,Daejeon,Korea;2Daegu Gyeongbuk Institute of Science and Technol ogy,Daegu,Korea 3.3 A 0.5V 6.14W Trimming-Free Singl e-XO Dual-Output Frequency Reference with 5.1nJ,120s XO Startup and 8.1nJ,200s Successive-Approximation-Based RTC Cal ibration Rui Luo1,Ka-Meng Lei1,Rui P.Martins1,2,Pui-In M
17、ak1,1University of Macau,Macau,China;2Instituto Superior Tcnico/Universidade de Lisboa,Lisbon,Portugal 4.1 A 79.7W Two-Transceiver Direct-RF 7.875GHz UWB Radar SoC in 40nm CMOS Nikol aj Andersen1,Sumit Bagga1,Jrgen Andreas Michael sen1,Hkon A.Hjortl and1,Lieuwe Leene1,Torl eif Skr1,Espen Stenersen1,
18、Dag T.Wisl and1,2,1Novel da,Osl o,Norway;2University of Osl o,Osl o,Norway 6.2 An Ul trasound-Powering TX with a Gl obal Charge-Redistribution Adiabatic Drive Achieving 69%Power Reduction and 53 Maximum Beam Steering Angl e for I mpl antabl e Appl ications Marios Gourdouparis1,2,Chengyao Shi1,Yuming
19、 He1,Stefano Stanzione1,Robert Ukropec3,Pieter Gijsenbergh3,Veronique Rochus3,Nick Van Hel l eputte3,Wouter Serdijn2,Yao-Hong Liu1,2 1imec,Eindhoven,The Netherl ands;2Del ft University of Technol ogy,Del ft,The Netherl ands;3imec,Leuven,Bel gium 6.5 A 0.5-Resol ution Hybrid Dual-Band Ul trasound I m
20、aging SoC for UAV Appl ications Jiaqi Guo1,Junwei Feng1,Sil in Chen1,Liuhao Wu1,Chne-Wuen Tsai1,2,Yingna Huang1,Bochi Lin1,Jeral d Yoo1,2,1National University of Singapore,Singapore,Singapore;2The N.1 Institute for Heal th,Singapore,Singapore 6.11 A 320 x240 CMOS LiDAR Sensor with 6-Transistor nMOS-
21、Onl y SPAD Anal og Front-End and Area-Efficient Priority Histogram Memory Minkyung Kim*1,Hyeongseok Seo*1,2,Songhyeon Kim1,Jung-Hoon Chun1,2,Seong-Jin Kim3,Jaehyuk Choi1,2 1Sungkyunkwan University,Suwon,Korea;2Sol idVue,Seongnam,Korea;3Ul san National Institute of Science and Technol ogy,Ul san,Kore
22、a;*Equal l y Credited Authors(ECAs)7.2 A 224Gb/s sub pJ/b PAM-4 and PAM-6 DAC-Based Transmitter in 3nm FinFET Marco Cusmai1,Noam Famil ia1,El ad Kuperberg1,Mohammad Nashash1,Dovid Gottesman1,Dal jeet Kumar2,Zvi Marcus1,Yeshayahu Horwitz1,Sagi Zal cman1,Jihwan Kim3,Sandipan Kundu3,Il ia Radashkevich1
23、,Yoav Segal1,Dror Lazar1,Udi Virobnik1,Mike Peng Li4,Ariel Cohen1,1Intel,Jerusal em,Israel;2Intel,Bangal ore,India;3Intel,Hil l sboro,OR;4Intel,San Jose,CA 7.3 A 224Gb/s 3pJ/b 40dB I nsertion Loss Transceiver in 3nm FinFET CMOS Dirk Pfaff1,Muhammad Nummer1,Noman Hai2,Peter Xia2,Kai Ge Yang2,Mohammad
24、-Mahdi Mohsenpour1,Marc-Andre LaCroix1,Babak Zamanl ooy3,Tom Eeckel aert1,Dmitry Petrov1,Mostafa Haroun1,Carson Dick2,Al if Zaman1,Haitao Mei1,Shahab Moazzeni1,Tahseen Shakir1,Carl os Carval ho1,Howard Huang1,Pratibha Kumari1,Ral ph Mason1,Fahmida Brishty2,Ifrah Jaffri2 1Synopsys,Ottawa,Canada;2Syno
25、psys,Mississauga,Canada;3Synopsys,Markham,Canada 7.7 A 2.16pJ/b 112Gb/s PAM-4 Transceiver with Time-I nterl eaved 2b/3b ADCs and Unbal anced Baud-Rate CDR for XSR Appl ications in 28nm CMOS Yen-Po Lin*,Pen-Jui Peng*,Chun-Chang Lu,Po-Ting Shen,Yun-Cheng Jao,Ping-Hsuan Hsieh,National Tsing Hua Univers
26、ity,Hsinchu,Taiwan;*Equal l y Credited Authors(ECAs)8.6 An I ntegrated Dual-side Series/Paral l el Piezoel ectric Resonator-based 20-to-2.2V DC-DC Converter Achieving a 310%Loss Reduction Wen-Chin Brian Liu1,Gal Pil l onnet2,Patrick P.Mercier1,1University of Cal ifornia,San Diego,CA;2CEA-Lti,Grenobl
27、 e,France 9.8 A 9.3nV/rtHz 20b 40MS/s 94.2dB DR Signal-Chain Friendl y Precision SAR Converter Rares Bodnar1,2,Henry Kennedy1,Christopher P Hurrel l1,Asif Ahmad1,Mark Vickery1,Luke Smithers1,Wil l iam Buckl ey3,Monsoon Dutt1,Pasqual e Del izia4,Derek Hummerston1,Pawel Czapor3 1Anal og Devices,Newbur
28、y,United Kingdom;2University of Southampton,Southampton,United Kingdom;3Anal og Devices,Limerick,Irel and;4now at Vodafone,Newbury,United Kingdom 10.7 An 11GHz 2nd-order DPD FMCW Chirp Generator with 0.051%rms Frequency Error under a 2.3GHz Chirp Bandwidth,2.3GHz/s Sl ope,and 50ns I dl e Time in 65n
29、m CMOS Xuan Wang*1,2,Xujun Ma*3,Yupeng Fu1,Yuqian Zhou1,Ang Li1,Shuo Yang1,Xu Wu1,2,Dongming Wang1,2,Lianming Li1,2,Xiaohu You1,2 1Southeast University,Nanjing,China;2Purpl e Mountain Laboratories,Nanjing,China;3Tl com SudParis,Paris,France;*Equal l y Credited Authors(ECAs)11.3 Metis AI PU:A 12nm 15
30、TOPS/W 209.6TOPS SoC for Cost-and Energy-Efficient I nference at the Edge Pascal Al exander Hager,Bert Moons,Stefan Cosemans,Ioannis A.Papistas,Bram Roosel eer,Jeroen Van Loon,Roel Uytterhoeven,Fl orian Zaruba,Spyridoul a Koumousi,Mil os Stanisavl jevic,Stefan Mach,Sebastiaan Mutsaards,Riduan Khadda
31、m Al jameh,Gua Hao Khov,Brecht Machiel s,Cristian Ol ar,Anastasios Psarras,Sander Geursen,Jeroen Vermeeren,Yi Lu,Abhishek Maringanti,Deepak Ameta,Leonidas Katsel as,Noah Htter,Manuel Schmuck,Swetha Sivadas,Karishma Sharma,Manuel Ol iveira,Ramon Aerne,Nitish Sharma,Timir Soni,Beatrice Bussol ino,Djor
32、dje Pesut,Michel e Pal l aro,Andrei Podl esnii,Al exios Lyrakis,Yannick Ruffiner,Martino Dazzi,Johannes Thiel e,Koen Goetschal ckx,Nazareno Bruschi,Jonas Doevenspeck,Bram Verhoef,Stefan Linz,Giuseppe Garcea,Jonathan Ferguson,Ioannis Kol tsidas,Evangel os El eftheriou,Axel era AI,Eindhoven,The Nether
33、l ands 11.4 I BM NorthPol e:An Architecture for Neural Network I nference with a 12nm Chip Andrew S.Cassidy,John V.Arthur,Fil ipp Akopyan,Al exander Andreopoul os,Rathinakumar Appuswamy,Pal l ab Datta,Michael V.Debol e,Steven K.Esser,Carl os Ortega Otero,Jun Sawada,Brian Taba,Arnon Amir,Deepika Babl
34、 ani,Peter J.Carl son,Myron D.Fl ickner,Rajamohan Gandhasri,Guil l aume J.Garreau,Megumi Ito,Jennifer L.Kl amo,Jeffrey A.Kusnitz,Nathaniel J.McCl atchey,Jeffrey L.McKinstry,Yutaka Nakamura,Tapan K.Nayak,Wil l iam P.Risk,Kai Schl eupen,Ben Shaw,Jay Sivagnaname,Daniel F.Smith,Ignacio Terrizzano,Takano
35、ri Ueda,Dharmendra Modha IBM Research 12.3 A Scal abl e and I nstantaneousl y Wideband 5GS/s RF Correl ator Based on Charge Threshol ding achieving 8-bit ENOB and 152 TOPS/W Compute Efficiency Kareem Rashed1,Aswin Undaval l i2,Shantanu Chakrabartty2,Aravind Nagul u2,Arun Natarajan1,1Oregon State Uni
36、versity,Corval l is,OR;2Washington University in St.Louis,St.Louis,MO 13.5 A 64Gb/s/pin PAM4 Singl e-Ended Transmitter with a Merged Pre-Emphasis Capacitive-Peaking Crosstal k-Cancel l ation Scheme for Memory I nterfaces in 28nm CMOS Weitao Wu*,Hongzhi Wu*,Liping Zhong,Xuxu Cheng,Xiongshi Luo,Dongfa
37、n Xu,Catherine Wang,Zhenghao Li,Quan Pan Southern University of Science and Technol ogy,Shenzhen,China;*Equal l y Credited Authors(ECAs)14.2 Proactive Vol tage Droop Mitigation Using Dual-Proportional-Derivative Control Based on Current and Vol tage Prediction Appl ied to a Mul ticore Processor in 2
38、8nm CMOS Weiwei Shan,Kaize Zhou,Keran Li,Yuxuan Du,Zhuo Chen,Junyi Qian,Haitao Ge,Jun Yang,Xin Si,Southeast University,Nanjing,China 14.5 A 12nm Linux-SMP-Capabl e RI SC-V SoC with 14 Accel erator Types,Distributed Hardware Power Management and Fl exibl e NoC-Based Data Orchestration Maico Cassel do
39、s Santos*1,Tianyu Jia*2,Joseph Zuckerman*1,Martin Cochet*3,Davide Giri1,Erik Jens Loscal zo1,Karthik Swaminathan3,Thierry Tambe2,Jeff Jun Zhang2,Al per Buyuktosunogl u3,Kuan-Lin Chiu1,Giuseppe Di Gugl iel mo1,Paol o Mantovani1,Luca Piccol boni1,Gabriel e Tombesi1,David Tril l a3,John-David Wel l man
40、3,En-Yu Yang2,Aporva Amarnath3,Ying Jing4,Bakshree Mishra4,Joshua Park2,Vignesh Suresh4,Sarita Adve4,Pradip Bose3,David Brooks2,Luca P.Carl oni1,Kenneth L.Shepard1,Gu-Yeon Wei2 1Col umbia University,New York,NY;2Harvard University,Cambridge,MA;3IBM Research,Yorktown Heights,NY;4University of Il l in
41、ois,Urbana,IL;*Equal l y Credited Authors(ECAs)14.8 KASP:A 96.8%10-Keyword Accuracy and 1.68J/Cl assification Keyword Spotting and Speaker Verification Processor Using Adaptive Beamforming and Progressive Wake-Up Jianbiao Xiao1,Xuhui Zhang1,Shijian Zhu1,Zhengwei Yang1,Meng Du1,Chunsheng Ji1,Yu Long1
42、,Xiao Chen2,Xiaoyu Miao2,Liang Zhou1,Liang Chang1,Shanshan Liu1,Jun Zhou1 1University of El ectronic Science and Technol ogy of China,Chengdu,China;2China Micro Semicon,Chengdu,China 15.1 A 0.795fJ/bit Physical l y-Uncl onabl e Function-Protected TCAM for a Software-Defined Networking Switch Zhihng
43、Yue1,Xujiang Xiang1,Fengbin Tu2,Yang Wang1,Yiming Wang1,Shaojun Wei1,Yang Hu1,Shouyi Yin1 3 1Tsinghua University,Beijing,China;2Hong Kong University of Science and Technol ogy,Hong Kong,China;3Shanghai AI Lab,Shanghai,China 16.2 A 28nm 69.4kOPS 4.4J/Op Versatil e Post-Quantum Crypto-Processor Across
44、 Mul tipl e Mathematical Probl ems Yihong Zhu1,2,Wenping Zhu1,2,Yi Ouyang1,Junwen Sun1,2,Min Zhu3,Qi Zhao1,2,Jinjiang Yang1,Chen Chen1,2,Qichao Tao1,2,Guang Yang1,2,Aoyang Zhang1,Shaojun Wei1,2,Leibo Liu1,2 1Tsinghua University,Beijing,China;2Beijing National Research Center for l nformation Science
45、 and Technol ogy(BNRist),Beijing,China;3Micro Innovation Integrated Circuit Design Co.,Ltd,Wuxi,China 16.5 A Synthesizabl e Design-Agnostic Timing Faul t I nj ection Monitor Covering 2MHz to 1.26GHz Cl ocks in 65nm CMOS Yan He,Kaiyuan Yang,Rice University,Houston,TX 16.6 PACTOR:A Variation-Tol erant
46、 Probing-Attack Detector for a 2.5Gb/s4-Channel Chip-to-Chip I nterface in 28nm CMOS Mao Li1,Zhaoqing Wang1,Sanu K.Mathew2,Vivek De2,Mingoo Seok1,1Col umbia University,New York,NY;2Intel,Hil l sboro,OR 16.7 Power and EM Side-Channel-Attack-Resil ient AES-128 Core with Round-Al igned Gl obal l y-Sync
47、hronous-Local l y-Asynchronous Operation Based on Tunabl e Repl ica Circuits Sirish Oruganti*1,Meizhi Wang*1,Vishnuvardhan V.Iyer1,Yipeng Wang1,Mengtian Yang1,Raghavan Kumar2,Sanu K.Mathew2,Jaydeep P.Kul karni1 1University of Texas,Austin,TX;2Intel,Hil l sboro,OR;*Equal l y Credited Authors(ECAs)18.
48、2 A 464Gb/s NRZ 1.3pJ/b Co-Packaged and Fiber-Terminated 4-Ch VCSEL-Based Optical Transmitter Susnata Mondal,Junyi Qiu,Sashank Krishnamurthy,Joe Kennedy,Soumya Bose,Tol ga Acikal in,Shuhei Yamada,James Jaussi,Mozhgan Mansuri,Intel,Hil l sboro,ORISSCC 2024 DEMONSTRATION PAPERSDS1ATOMUS:A 5nm 32TFLOPS
49、/128TOPS ML System-on-Chip for Latency Critical ApplicationsC-H.Yu,H-E.Kim,S.Shin,K.Bong,H.Kim,Y.Boo,J.Bae,M.Kwon,K.Charfi,J.Kim,H.Kim,M.Shim,C.Ha,W.Shin,J-S.Yoon,M.Chi,B.Lee,S.Choi,D.Kim,J.Woo,S.Yoon,H.Jo,H.Kim,H.Heo,Y-J.Jin,J.Yu,J.Lee,H.Kim,M.Kang,S.Choi,S-G.Kim,M.Choi,J.Oh,Y.Kim,H.Kim,S.Je,J.Ham,
50、J.Yoon,J.Lee,S.Park,Y.Park,J.Lee,B.Hong,J.Ryu,H.Ko,K.Chung,J.Choi,S.Jung,Y.F.Arthanto,J.Kim,H.Cho,H.Jeong,S.Choi,S.Han,J.Park,K.Lee,S-I.Bae,J.Bang,K-J.Lee,Y.Jang,J.Park,S.Park,J.Park,H.Shin,S.Park,J.OhMotivationArchitectureSystem ImplementationVerificationPaper No.:2.4Demonstration Board(ATOMUS SoC)
51、SpecificationHardware UtilizationPerformance Efficiency ComparisonThe Overall SoC Block DiagramNeural EngineMemory SubsystemDemonstration SystemPowermeter for AI AccleratorGPU x 1(A100)NPU x 1GenAI Prompt InterfaceSoftware Environment&ApplicationSDXL-turbo:text-to-image model that generates detailed
52、 images conditioned on descriptionT5-3b:text-to-text model:text-to-text model covering summarization,question answering,etcHost OS1xATOMUSVM0VM1KMDTimeline in ATOMUS(Time-based context switching for two VFs)T5Context dataSDXL-turboContext dataDRAMDMAEngineNeural EngineClustersCompletion queueRequest
53、 queueVF0Completion queueRequest queueVF0SRIOV-enabledSupporting up-to 16 tenants at the same time16 channel HDMA working concurrentlyTime-based fair scheduling for QoS of all usersMulti-context supportedMaximum 60 contexts could loadedNo context switching overhead neededVF1User input:a photograph o
54、f an astronaut riding a horseThat is good.SDXL-turboDas ist gut.T5Latency-Criticality in AILatency vs.Utilization*Based on that the measured 30 tokens for a sentence on average(GPT-3.5 and GPT-4 response times(taivo.ai);*Human speaks 1015 sentences per minute on average(in English)Poor user experien
55、ce due to slow response time as Gen AI model size becomes larger and more complex Contrasting Spaces To Conquer at The Same Time Utilization comes with the expense of latency sacrifice Tackling Both Spaces Simultaneously Reducing the cold latency as small as possible Finer granularity The granule is
56、 incurring dependencies in both control and data Resolving the dependencies as quickly as possible By fine-granule&multiple layered sync protocols Control dependency By multiple layered NoC&memory subsystem hierarchiesData dependencyBayesBB:A 9.6Gbps 1.61ms Configurable All-Message-Passing Baseband-
57、Accelerator for B5G/6G Cell-Free Massive-MIMO in 40nm CMOSYi Zhang*1,2,Wenyue Zhou*1,2,Yiwei Zhang1,2,Houren Ji1,2,Yongming Huang1,2,Xiaohu You1,2,Chuan Zhang1,21Southeast University,Nanjing,China2Purple Mountain Laboratories,Nanjing,ChinaMotivationArchitectureSystem ImplementationVerificationPaper
58、No.:2.7Motivation and ChallengeUniform Mathematical Expressions of BayesBBChallenges:High throughput,Ultra-low latency,and ConfigurabilityBaseband chips should support various applicationsIntegrated SensingandCommunicationHyper Reliable and Low-LatencyCommunicationIntegrated AIandCommunicationmMTCUR
59、LLCeMBBIMT-2020Immersive CommunicationMassive CommunicationUbiquitous ConnectivityBase StationAP mAP MAP 2AP 1gmkuser kuser 1user KHigher spectrum efficiencyProduction-centric demandsSupport to vertical industries Cell-based Cell-freeB5G/6G Multi-applicationsUAV CommunicationsSmall CellOAM-based Com
60、municationsCooperative NetworksMachine-type CommunicationsComputational Holographic RadioLaser-mm-wave AggregationB5G/6G BB ChipsMotivationsupporting B5G/6G applications eMBB Appse.g.,VOLT commun.mMTC Appse.g.,indoor localizationConclusiondesigning BB chips has various challengesMassive Applications
61、Challenges of BB chips 1 Throughput BB chips need to provide at least 8Gbps/users TP 2 Latency BB chips need to achieveless than 2ms latency 3 Configurability BB chips need to support flexible multi-applicationsClient TimeServer Time2030msLatencyLess than 2msLatency10msLatencyURLLC Appse.g.,auto-dri
62、venB5G/6GUnexpected data traffic improvement leads to higher throughput requirement!Network Time1G2G3G4G5G6G1980 1990 2000 2010 20202025-30New SpecturmEnergy EfficiencyArtificial Intelligence2.4Kbps64Kbps2Mbps100-1000Mbps1-10GbpsInternet of ThingsMobile AppsInternetSMSVoice calling2010 201520202025M
63、obile Data TrafficX1000From 2010X24From 2010Actual 2010 to 2015 and predictionGrowth Assumption=x2.1/yearGrowth Assumption=x1.5/yearAll-message-passing baseband algorithm achieves system performance gains.Factor graph model enables the unified hardware architecture.1st OFDMChannel Est.&MIMO Det.1y2y
64、3yrNy1f2f3frNf.1h2htNh.1S2S3StNSSoft Demod.123tN1 1,c1,Mc2 1,c2,Mc3 1,c3,Mc1,tNc,tNMc12.1v2v3v4v5vNv.1C2C3CKCChannel Decodingk-th OFDM.1y2y3yrNy1f2f3frNf.1h2htNh.1S2S3StNSSoft Demod.123tN1 1,c1,Mc2 1,c2,Mc3 1,c3,Mc1,tNc,tNMc1 x2 x3 x4 x5 xNxInterleavingChannel Est.&MIMO Det.Overall Architecture of B
65、ayesBBMIMO DetectorChannel DecoderLLRArrayCRCPolarChannel EstimatorData ConversionBP MIMO Input BufferDescrambleETH Frame ParserSoft DemodulationGAMPClockSPIProcessingElements3-to-1 MUXPipeline RegistersSerDes CLK 10.3125GMAC CLK 156.25M SYS CLK 200MSystemConfigDECArray1 322ETH Frame WrapperXGEXGE R
66、XXGE TX1 322Eye Diagram of 10.3125Gbps SignalsPreprocessorPhaseCompensationSystemGUIOutput BufferInputBufferLDPCL-Barrel ShifterR-Barrel ShifterCN/VN CalculationSimulated system FER performance of the all-message-passing baseband accelerator0HzFrequency DomainReal_Time Eye 1.23749MUI 6Wfms-97.0ps-58
67、.2ps-77.6ps-38.4ps-19.4ps0.0ps19.4ps38.8ps58.2ps77.6ps97.0ps-120mV120mV0mV3GHz6GHz9GHz12GHz15GHz18GHz21GHz24GHz27GHz30GHz10.3125Gbps-20.0-60.0-100dBm2.4dB Gain Drop at 10-4FER 1.7dB Gain Drop at 10-4FERFeature 1Customized High-Throughput Design.Feature 2Low-Latency Design Beyond 5G.Flexible and Conf
68、igurable Architecture.Feature 33 Flexible and Configurable2 High-Throughput1 Low-LatencyPoster.Mess.Accumulation88 MIMO DetectorData ConversionSoft DemodulationSoft DemodulationPing-Pong BufferPoster.Mess.CalculationCALCALData ConversionPrior Prob.CalculationExp.CALLogicH-matrixLogicComplexto RealIn
69、terference Meas.AVGAVGAVG.PE Coremin_1stmin_2ndxor signIndex_1stt,1t,2t,3Updated CN MessageDecompressionOriginal Message VectorFilling&Punching ConfigurationVNU CoreCNU Coremin_1stmin_2ndxor signCOMPPipeline Original MessageExternal Information Extraction LDPC PatternPolar PatternIndex_1stL-Barrel S
70、hifterPE_1024Pipeline RegisterDeCOMFIFO123456678FIFO_1FIFO_2FIFO_32CRCram cmdram clrram addrdec.modeiter.num.check modedec.monitorstate monitorinterrupt monitorConfiguration.Ram Config.Local Config.Global Config.R-Barrel Shifter(1,0)(1,1)(1,2)(1,3)2.(7,1)(7,2)(7,3)8(0,0)(0,1)(0,2)(0,3)1Antenna/Dataf
71、lowSubcarrierOutput BufferInput DataflowReg.ShiftShift RegistersRAMShift RegistersCode Config.+FIFOrealimagInput Buffer BNoise Calculation+3Accumulate Antenna:0712345612345678Pipeline RegistersAsync FIFORate MatchingInput Buffer A6448Phase CompensationPipeline RegistersCode ConfigurationDescrambleIn
72、put Buffer APreprocessorChannel EstimatorInput Buffer BGAMPOutput BufferXGEETH Frame ParserETH Frame ParserCLK1CLK2CLK3CLK4 CLK7CLK8.DASALENUDPRRUDATA0CLK158SPARESUB_NUMOFDM.DATA149.Pipeline Registers01-InInOutSelTruth TableDescrambleInSel+-realimagRAMPhase Compensation+p+1/s0-0h-yD/+/nvGAMP(7,0)2 M
73、IMO Detector3 Channel DecoderBayesBB Test ScenarioRoom 1Room 2Room 3Room 4UEUEUEUEUEUEUEUEUEUEUEUERAURAURAURAURAURAURAURAURAURAURAURAURoom 5RAURAURAURAUUEUEUEUEServerAntenna ArrayRAUBayesBBData SentData ReceivedAntenna ArrayRAU1RAU2RAU3RAU4UE1UE2UE3UE4BayesBBThe OFDM systems subcarrier center freque
74、ncy is 3.65GHz.The 128x128 cell-free MIMO-OFDM systemBayesBBtest scenarioMIMO-OFDM systems centered frequencyCell-free MIMO-OFDM real deploymentBayesBB Test ScenarioThe high throughput baseband signal processing capabilities of our BayesBB,with the offline download data in an FPGAMain Highlights and
75、 Performance ComparisonsDie Micrograph and Design Summary(1)Message-passingdetection.(2)Scalable design,supporting a maximum of 128x128 MIMO in testing.(3)Scaled to 8x8 16-QAM MIMO scenario.(4)The configurable decoding mode for LDPCcodes and polar codes.(5)Composed of two LDPC decoders with the code
76、 length 1,760.(6)CRC module is included for early check.(7)The number of iterations can be configured.SystemCommercial B5G/6G cell-free massive MIMO systemCore Functions High-speed interface function:Ten-Gigabit Ethernet(XGE)with IEEE 802.3 standard;5G/NR protocol stack Data processing:Data analysis
77、;Wrapping;Descrambling;Phase compensation Baseband processing:Channel estimation;MIMO detection;LDPC/polar decoding;CRC check;SPI bus configuration;Ping-pong bufferConfigurability All the following can be configured:Data processing for different users;Decoding modes;Iteration counts;CRC check modes;
78、Descrambling and phase compensation parameters;Signal quality adjustment parameters;Frame setting parameters;Multi-loopback modes;Status monitoring modesScalabilitySingle chip supports 8x8 antenna,800 subcarriers;Support multiple-chip collaboration;Testing with a maximum of 128x128 antenna,3,200 sub
79、carriers for a throughput of 153.6GbpsApplications4K ultra HD video streaming;Unmanned aerial vehicle(UAV)applications;Autonomous driving;Virtual&augmented realitySystem InterfacesOptical port;XGE including SerDes,PCS,and MAC;SPI serial busNew Concept All-message-passing BB signal processingDetector
80、This WorkISSCC12 4ISSCC14 5JSSC19 6AlgorithmBP(1)SDSDMMSEOutput InterfaceSoftHardHardHardMIMO System8x8(2)4x44x44x4Modulation16-QAM64-QAM64-QAM256-QAMTechnology(nm)40656565Core Area(mm2)4.360.31-0.7Frequency(MHz)200333445517Power(mW)456388726.5SNR BER=10-3(dB)(3)1312.512.515Throughput(Mb/s)9,8302968
81、073961,379Area Efficiency(Mb/s/kGE)2.161.373.751.033.68Energy Efficiency(pJ/b)47.54822019.2DecoderThis WorkISSCC12 4ISSCC14 5JSSC19 6CodesLDPCPolar(4)LDPCLDPCNB-LDPCBlock Length3,520(5)2,048768768416Technology(nm)40656565Core Area(mm2)0.581.62(6)3.6-1.7Frequency(MHz)200200267500307Power(mW)60.717036
82、7-103Iterations10(7)20(7)10105-10Throughput(Mb/s)3,276.83,276.8235.21555121,024Area Efficiency(Mb/s/mm2)5,649.662,022.7265.33100.9301602Energy Efficiency(pJ/b/iter)1.852.5917023220.1BayesBB is an all-message-passing full-BB-processing chip for B5G/6G commercial system.BayesBB has merits in the follo
83、wing aspects:The high throughput(9.6Gbps)The system latency(1.61ms)The area efficiencyThe energy efficiencyThe bit error rate(BER)4.369mm4.370mmRAMDigital Logic(2)PMA(1)SPIRXTX10GSFP+SpecificationsTechnology40nm CMOSApplicationBaseband ProcessingDie Size(mm2)19.093Core Power(W)3.2PackageFlip Chip BG
84、ASerialSPIRAM Size(Mb)4.1Logic Gate Count15,572,564Digital Logic Voltage(V)1.1SerDes PMA Voltage(V)1.1 and 2.5IO Voltage(V)3.3System Frequency(MHz)200MAC Side Frequency(MHz)156.25Max.System Throughput(Gbps)9.640nm CMOSTechnology19.093mm Die size3.2WCore power9.6GbpsThroughput1.1V,2.5V,3.3VMulti-volt
85、age domainA PVT-Insensitive Sub-Ranging Current Reference Achieving 11.4ppm/C from-20C to 125CPangi Park1,JunghyupLee2and SeongHwan Cho1KAIST,Daejeon,Korea 2DGIST,Daegu,KoreaMotivationArchitectureSystem ImplementationVerificationPaper No.:3.1R3SELKV1V2k2k1R1R2VBE1VBE2IREF1:1:1:1(R3R1,R2)V31:NTTXITTX
86、k=k2k2IREFk1TTXVSELKV1V2TCV1TCV2V1=V2=VBEat T=TX=+=+=()+=+=+=+Proposed Sub-ranging Current Reference-Two currents with different TCs are generated from a single source-Exact sub-range sensing is guaranteed by comparing V1and V2 Derivation of the IREFEquation Transistor-level SchematicR1R2Q2Q1RBCC1RS
87、IREFV1V2DEMk-SelectorCHCHCHCHCHV3R3=6.5M VB(6.2k)(67.8k)(1.24k)(43.8k)Error amplifierCurrent Generation&MirrorBias Gen.1:1:1:1CC2IB1:202IB1:8 Schematic of the k-selector and Sub-blocksComparator6b Folded RDACk-SelectorC7C6C0R7R6R0V1V2V30.89R30.08R3V2V1VBSELKR7:0V2V1C7:0k1k2SELK10Row-ColDecoderFolded
88、 RDACV3Sub-ranging Current Reference Susceptible to process variation effect on TC Challenges in Sub-ranging Temp.Compensation for IREF(1)Variation in two current references IREF1,2(2)TX=TCROSSis not guaranteed due to process variation Proposed Concept of the Sub-ranging Curvature CorrectionProcess-
89、insensitive Circuit Implementationfor Sub-ranging IREFIs NecessarySmall TC PVT-Variation of the Proposed Sub-ranging IREF-20020406080100125Temperature(C)10.210.2210.2410.2610.2810.310.3210.3410.36IREF(A)Temperature VariationTC:28.6ppm/C 7.81ppm/CBest TC w.o.sub-ranging:k=35Best TC with sub-ranging:k
90、1=24,k2=44k=20k=24k=29k=35k=40k=44k=470.30.60.91.21.51.82.12.40246810VDD(V)Supply Voltage VariationVDD Range:1.3V-2.4VLine Sensitivity(ppm/V)TT FF SS FS SF AVG 388 394 339 321 382 365 1.21.51.82.12.4VDD(V)10.2510.310.3510.4IREF(A)TTFFSSFSSFIREF(A)10.1510.210.2510.310.3510.410.4510.510.55-20020406080
91、100125Temperature(C)IREF(A)Process VariationTTFFSSFSSFInside of the chamber-Func.Gen (Keysight 33600A)-Source Meter (Keithley 2400)-Power Supply (Keisight B2962A)-NI Labview DAQ (USB-6211)SPITemp.Chamber(-20 to 125C)PowerCLK Measurement Setup Measured Noise Spectrum and Temp.Sensor for demo Chip mic
92、rograph On-chip Temp.Sensor(Only used for demo)OUT10-210-1100101Frequency(Hz)103Current Noise Density(pA/Hz)Noise Spectrum of IREF11.5 pA/HzFsamp=20Hz,NFFT=2500,8X averagingRectangular Window102101CHOP X,DEM XCHOP O,DEM XCHOP O,DEM O0.850.90.9511.051.11.151.2Voltage(V)On-chip Temp.Sensor Output-2002
93、0406080100125Temperature(C)1.75 mV/CA 0.5V 6.14W Trimming-Free Single-XO Dual-Output Frequency Reference with5.1nJ,120s XO Startup and 8.1nJ,200s Successive-Approximation-Based RTC CalibrationRui Luo1,Ka-Meng Lei1,Rui P.Martins1,2,Pui-In Mak11-University of Macau,Macau,China 2-Instituto Superior Tcn
94、ico/UL,Lisbon,PortugalMotivationArchitectureSystem ImplementationPaper No.:3.3Successive-Approximation-Based RTC CalibrationVerificationChip Micrograph&Power Breakdown Multi-output frequency reference from a single quartz crystal to favor footprint and bill-of-material reduction Require crystal osci
95、llator(XO)with fast startup time and low startup energy A Successive-Approximation-Based RTC calibration is proposed in this workFrequency Multiplier for InjectionExperiment SetupPDCP0CNTfDCOCrystalIBENXOENDLLMUX1SetrefddldS3:0CSCMLMRMiM15Start Ctrl.ENINJECDLLVCDLECOdECO,dOscilliscopeFrequency count
96、erMultimeterPower supplyControl logic generatorDevice under testGmENXOENCALSA-based FCMfXO(16MHz)fDCO(1MHz)XOOn-demandAlways-onFCW13:0fDCO Digital CMPCtrl.LogicDLLECFreq.Multiplier16DCOPEDfDCO ENCALENXOFSMtENXOENCALXOActive ModeSleep ModeFCWfDCOVT-induced variationfXO/16Calibrate against XOfDCO vs.T
97、emperature w/o&w/i DCO CalibrationfDCO(ppm)fDCO vs.VDDfDCO(%)Temperature(C)VDD(V)-40-20020406080100120-4-3-2-101w/o cal.w/i cal.0.480.50.52 0.54 0.560.580.6-600-400-2000200400600-40-2002040 6080 100 120-600-400-2000200400600-1,000-5000500 1,0000100200300Occurrencesf after calibration(ppm)Temperature
98、(C)fDCO vs.Temperature-4004080120-300-100100300fDCO(ppm)fDCO(ppm)Temperature(C)fDCO Distribution in 1,000 runsfXOENXOENINJENDLL 100s 30s050100150Time(s)-100-60-20206010020ppmTransient fXO(from 16MHz,ppm)120sNegative-Voltage GeneratorDigital-Controlled Ring OscillatorFSMDelay-Locked LoopFrequency Cal
99、ibration ModuleEdge Combiner243m306mXO Core(including CL)DCO(37%)FCM(16%)FSM(14%)EC(6%)XO(14%)DLL(13%)Sleep-mode Power BreakdownOff-state LeakageCalibration Energy BreakdownChip MicrographXO(68.3%)FCM(18.7%)DLL(9.2%)FSM(2.4%)EC(1.4%)Duty-cycle calibrated RTC,TC 0?z0=14z1=9z2=21st cal.?StartYNST2/401
100、CKCALST0-2fDCODCODLOUT31 R1Delay CellFCW13:9DLINVPG=(ENCAL)?(-VDD):VDDPEDTDCPFD1.8m1.6m 127 FCW8:20.3m0.2m NVGENCALfXOfDCO/16ST0ST1+ST2CNTVPGHigh-VtMPD1D2D3B2TB2TB2TFCW1:0D0To D1-3FCW13:0Ctrl.Logic&RegistersDig.Comp.ENCALSTiCKCALfDCOFull calibration(ST0-2)once after enabling VDDSleepf 250ppm ST0ST1S
101、T2ST1ST2DoneDone1MHzVPGVDD-0.3VVDDRegular calibration(ST1-2)afterwardsi=0,FCW13:0=8192CMP=0?YYNNk=0?k=zi+1?DoneNk=zii=i+1NYYi=i+1,k=zi k=zik=k-1i=1,FCW8:0=256A 79.7W Two-Transceiver Direct-RF 7.875GHz UWB Radar SoC in 40nm CMOSN.Andersen1,S.Bagga1,J.A.Michaelsen1,H.A.Hjortland1,L.Leene1,T.Skr1,E.Ste
102、nersen1,D.T.Wisland1,21Novelda AS,Oslo,Norway,2University of Oslo,Oslo,NorwayMotivationArchitectureVerificationPaper No.:4.1Beyond ranging,UWB technology is a great fit for sensing applications!Direct-RF ReceiverADC Linearity Correction at Lower Rate(After Integration)System ImplementationCrystal-le
103、ss operation for fast-startup duty cyclingLow-Power Mode(79.7W avg.)AoA Mode(729.5W avg.)Power BreakdownRXFE PerformanceTX Time DomainChild presence detectionAccessSecurityLights and HVACSleepHealthRegulatory Mask Compliant TXTransmitterAn Ultrasound-Powering TX with a Global Charge-Redistribution A
104、diabatic Drive Achieving 69%Power Reduction and 53 Maximum Beam Steering Angle for Implantable Applications Marios Gourdouparis1,2,Chengyao Shi1,Yuming He1,Stefano Stanzione1,Robert Ukropec3,Pieter Gijsenbergh3,Veronique Rochus3,Nick Van Helleputte3,Wouter Serdijn2,Yao-Hong Liu1,21 imec,Eindhoven,Th
105、e Netherlands 2 Delft University of Technology,Delft,The Netherlands 3 imec,Leuven,BelgiumMotivationArchitectureSystem VerificationBenchmarksPaper No.:6.2Brain computer interfaces(BCI)to treat neurological disordersImplantable BCIDemand for Ultrasound(US)powering of implantable BCIHigh efficiency US
106、 TX Not heat up the tissueBeam steering to large angle Compensate for TX/RX misalignment Miniaturized system form factor Small skull hole drilled during surgery,quicker patient recoveryDelay skipping schemeSystem ArchitectureGlobal Charge Redistribution ConceptUS Powering TX Chip:power saving withou
107、t extra components&large angle steering!Electrical verification for power savingAcoustic verification for beam steeringTesting with PMUT in waterWirelessly poweredMotivationArchitectureSystem ImplementationVerification Hybrid Dual-Band Ultrasound Imaging System On-chip Feature Adaptive Frequency Con
108、troller with Dual Mode Sequence GenerationDelay&SumDM-SGTX controller LF RX data HF RX dataor HF pulse Real-time image LF pulse orHF pathLF pathFeature voxeladdr.(5b)Feature voxeladdr.(5b)FA-FCCMPCMPCMPIntensity sortingCMPCMPCMPRadius sortingFeature voxel intensityx32x32Feature voxel radiusENx32x32F
109、eature voxeladdr.(5b)Feature voxel(35b)Feature voxel(35b)Feature voxel(35b)Targeted Beamfocusing Sequence Gen.Full-Sweep Beamfocusing Sequence Gen.System Setup and ConfigurationChart Title12345DBE54.0%ADC21.8%AMP12.0%TX10.8%PMU 1.6%Chip Details Measurement Results Requirements of UAV Hybrid Dual-Ban
110、dHybrid Dual-Band 0.5 spatial resolution 7m detection range LF 21fps+HF 21fps(Typ.)Max.11.04M voxels/sTRX CH31:0TRX CH63:32HVTXIS-ICHCLDODBEFA-FCTX controllerDM-SGDelay&Sum6mm5mmPaper No.:6.5A 0.5-Resolution Hybrid Dual-Band Ultrasound Imaging SoC for UAV ApplicationsJ.Guo1,J.Feng1,S.Chen1,L.Wu1,C.T
111、sai1,2,Y.Huang1,J.Yoo1,21Nationa University of Singapore,Singapore,Singapore.2The N.1 Institute for Health,Singapore,SingaporeCameraLiDARConv.UIS ISSCC2022This workObjectResolutionDayHighHighVery lowBetterNight/Fog/Direct lightVery low(Blank)LimitedDepth sensing NoYesYesYesCostMediumHighLowLow1m7m U
112、 objectHybrid Dual-Band UIS SoC mounted on a UAV1.5m4.2mBoxWallWallWall U objectBox(LF mode)3D ImageWall(LF mode)Depth(m)U object (LF mode)U object (HF mode)HF:145kHzLF:40kHz(2)Only HFObject 1Low resolutionObject 2(3)Hybrid(LF+HF)(1)Only LFFront ViewBox(LF mode)4.2m Wall(LF mode)7m Depth(m)U object
113、(LF mode)1.5m U object (HF mode)1m“HF Zoom in”ISSCC2022 3 ISSCC2022 4JSSC 2015 5180 nm180 nm180 nmStandard CMOSBCDHV CMOS#Channel64727MediumAirBiomedicalAirDual bandNoNoNoResolution2.7-20Tpy.HF Max.HF Min.21fpsHF+21fpsLF141fpsHF+4fpsLF0fpsHF+24fpsLFField of View303045 On-chip DC-DC Max.Pout Standard
114、 CMOS200mW-Area(mm2)32.52.3(active)1.715HF&30LF0.98W30Frame rate24fps1000 fps30fps64AirYes0.5HF&2.7LFThis workProcess180 nmStandard CMOSAFEPMUDBEFPGA&ESP32(For Comm.)UIS ASIC(This Work)LF Mode(40kHz)1m7mHF Mode(145kHz)Grating lobe effectLow detection rangeObject 17m detectionObject 2Object 1141fps&c
115、lear2x8x8 HF(145kHz)&LF(40kHz)TRXouter63:0TRXinner63:0TRX CH2TRX CH63 TRX CH1HVTXADCTRX CH0 LNA3VDDH1VDDH4VDDH2VDDHReal Time ImagePOWER STAGE APOWER STAGE BCTRLSTAGECtrl Signal1-4VDDHIS-ICHCDelay&SumFA-FCDM-SGT-Seq.Gen.FS-Seq.Gen.Radius sorterTX ctrl TGCDBE63:063:034:0Intensity sorter Hybrid Dual-Ba
116、nd UIS SoCA 320 x240 CMOS LiDAR Sensor with 6-Transistor nMOS-Only SPAD Analog Front-End and Area-Efficient Priority Histogram MemoryM.Kim*1,H.Seo*1,2,S.Kim1,J-H.Chun1,2,S-J.Kim3,J.Choi1,21Sungkyunkwan University,Suwon,Korea,2SolidVue,Seongnam,3Ulsan National Institute of Science and Technology,Ulsa
117、n,KoreaMotivationArchitectureSystem ImplementationVerificationPaper No.:6.11RoboticsADAS/Self-Driving CarChallenges of High-Resolution LiDAR Sensor Chip Characteristics&Performance Comparison SPAD AFE&Histogramming TDCSensor ArchitectureDemonstration SystemEvaluation BoardTX(VCSEL)Depth Measurement
118、ResultsPC&DisplayWavelength:940 nmFOV:17 x 15Repetition Rate:226 kHzFPGA BoardTransmitter(TX)BoardPCProjector320(V)x 240(H)Pixel Array320(V)x 480(H)6-transistor nMOS only SPAD AFEsEdge DetectorCLK Tree&Signal Tree2-step Histogramming TDCs with Priority Memory Column Select&Digital ReadoutRow DriverB
119、iasPLLScanChainActive Recharge/Pull-Up logic7 mm5.2 mmLiDAR SensorSolid-state LiDAR module6-Transistor nMOS-Only SPAD AFE2-step hTDC withReconfigurable Histogram Memory-6-T nMOS only AFE Minimize the pixel size-Column-parallel Active Recharging/Pull-up logic Control SPAD dead-timeSensor Architecture
120、PC3D Point Cloud ImageTarget objectHalogenLampProjectorHDMI cableBGLDemonstration EnvironmentCapture depth imageswith background light(BGL)!Priority memory accumulates Histogram only for the most probable echoes!Area-efficient histogram memory allocation!Comparison of 2-step Histogramming TDCsUSB ca
121、bleSensorModule Need of High Spatial Resolution 6-Transistor nMOS-Only SPAD Analog Front-End Circuit(AFE)Pile-up Distortion under Strong Ambient Light SPAD Dead-time Control by Column-parallel ActiveRecharging/Pull-up Logic Extensive Histogram Memory for High Depth Resolution Reconfigurable 2-step h
122、istogramming TDC(hTDC)w/Priority Histogram MemoryQVGA(320 240)CMOSLiDAR SensorThe new demand of a high-resolution LiDAR sensor 18.1 m12.6 m14 m44 m012345678903691215182124273033363942Precision(cm)Ground Truth(m)Precision-6-4-20246061218243036424803691215182124273033363942Accuracy(cm)Measurement Dist
123、ance(m)Ground Truth(m)Measured Depth&AccuracyMeasured DepthAccuracy012345678903691215182124273033363942Precision(cm)Ground Truth(m)Precision-6-4-20246061218243036424803691215182124273033363942Accuracy(cm)Measurement Distance(m)Ground Truth(m)Measured Depth&AccuracyMeasured DepthAccuracyPoint Cloud D
124、epth ImageDisplayReal ImagePoint Cloud Depth ImageA 224Gb/s sub pJ/b PAM-4 and PAM-6 DAC-Based Transmitter in 3nm FinFETMarco Cusmai,Noam Familia,Elad Kuperberg,Mohammad Nashash,Dovid Gottesman,Daljeet Kumar,Zvi Marcus,Yeshayahu Horwitz,Sagi Zalcman,Jihwan Kim,Sandipan Kundu,Ilia Radashkevich,Yoav S
125、egal,Dror Lazar,Udi Virobnik,Mike Peng Li,Ariel CohenMotivationArchitectureSystem ImplementationTX to RX 1-meter cableVerificationPaper No.:7.2224Gb/s PAM-40.92pJ/b112Gb/s PAM-41.13pJ/b224Gb/s PAM-60.61pJ/bTX Comparison and Power Break-downPackage technologyTX Eye diagramsMeasurement setupReal-time
126、oscilloscope256GS/s Keysight UXR1104AChannel de-embedded80%of baud-rate scope BWNo scope equalizationOn-chip FFE activeQPRBS-13 patternEye opening for BER=1e-4Clocking7b DAC DriverNeed higher data-rate to improve powerKey challenges BER 1e-5RX histogram34dB ILReflective channel MotivationArchitectur
127、eSystem ImplementationVerificationDIGITALAlign2-phase8-phase10-bitPLL10-15GDIV1/1.5DFEDETADAPTILOPMIXSkewCALTXCLKDISTCTLEVGALFEQ8-phase8-phase8-phaseFFECDRFFEDETCDRsarsintlvsarsintlvsarsintlvsarsintlvsarsintlvsarsintlvsarsintlvsarsintlvILOILOMLSDNovel AFE Design to Reduce ParasiticSynopsys 1.6TbE/80
128、0GbE system SolutionControl board|USBJTAG InterfaceOn-board LDOs|FPGA(emulation)DUT on Daughter card|Bullseye connectors3.5mm SMA for Ref clock|ATB portsTx Rx with 40dB Channel LossAsynchronous 212.5Gbps LinkA 224Gb/s 3pJ/b 40dB Insertion Loss Transceiver in 3nm FinFET CMOSDirk Pfaff1,Muhammad Numme
129、r1,Noman Hai2,Peter Xia2,Kai Ge Yang2,Mohammad-Mahdi Mohsenpour1,Marc-Andre LaCroix1,Babak Zamanlooy3,Tom Eeckelaert1,Dmitry Petrov1,Mostafa Haroun1,Carson Dick2,Alif Zaman1,Haitao Mei1,Shahab Moazzeni1,Tahseen Shakir1,Carlos Carvalho1,Howard Huang1,Pratibha Kumari1,Ralph Mason1,Fahmida Brishty2,Ifr
130、ah Jaffri21Synopsys,Ottawa,Canada 2Synopsys,Mississauga,Canada 3Synopsys,Markham,Canada Active Copper CableHost SoC224GActive CableRetimersRetimerFlyover CableOptical ModuleHost SoC224GDAC ModuleVSR/OR PhyRetimerOpticalCableFlyover CablePassive Copper Cable(DAC)Host SoC 224GPassiveNo RetimerFlyover
131、CableLow Power 224G Serial LinksPMDMACPCSFEC EncodeFEC Decode16bit or 20bit 32bit or 40bit64bit or 80bit128bit or 160bit Auto NegEthernet TrainingEncodePMAPMA RegistersDecodetx datarx dataRate Control FSMPMD RegistersArbiterJTAGAPBMicro ControllerNDESDebug Capture SRAMICCM SRAMDCCM SRAMRXP3:0RXN3:0X
132、TALPXTALNTXP3:0TXN3:0212.5 Gbps PAM 4 Transmitter EyeHigh RLM|Low JitterA 2.16pJ/b 112Gb/s PAM-4 Transceiver with Time-Interleaved 2b/3b ADCs and Unbalanced Baud-Rate CDR for XSR Applications in 28nmYen-Po Lin,Pen-Jui Peng,Chun-Chang Lu,Po-Ting Shen,Yun-Cheng Jao,Ping-HsuanHsiehTsing Hua University,
133、Hsinchu City,TaiwanMotivationArchitectureSystem ImplementationVerificationPaper No.:7.7Unsegmented 3-tap FIR Transmitter with DCC/QEC.CTLE+VGA+4x6 2b/3b ADC-based Receiver.DLL+ILRO+PI and Unbalanced Muller-Muller CDR.Automatic Adaptation for LDO,Equalizer and Deskewer.112-Gb/s PAM-4 Transceiver in 2
134、8nm CMOSUser Interface(30-mm FR4 channel)Output LossHigh-density SerDes TRXs to support total throughput.5MS/s)+high precision(16b)requires rapid and highaccuracy settling of ADC input andreference High SNR requires low kT/C noise,but larger sampling capacitorsincrease thesettling time foracquisitio
135、n Constrains speed High accuracy settling of analogue input and reference kicks putsgreater strain on external amplifiers Constrains precision andincreases powerconsumption External references and drivers must have low noise and highbandwidth Increases systempowerconsumption Parasitic impedance of r
136、eference adds signal-dependent distortion Constrains precision(linearity)ADC architecture frequently offloads the burden to peripheralcomponents.Improvements to the ADC FOM can result in signal chainFOMdegradation.40nm CMOS technology in 2.25mm2die area(0.77mm2ADC core)On-chip DAC weight correction
137、in digital core for 20b at 40MS/s First stage using 3.3V transistors for 6VPPdifferential input range to maximise the SNR Dual quantizer slice DACs(sDAC)are fully separated from residue-generatingDAC(RDAC).RDAC follows quantizer sDAC with conversion results as theydevelop RDAC reference switchescan
138、be smaller lowerdistortion DAC digitally pre-chargedto previous conversion less charge kick to VIN,henceeasier to settleLow and high frequency parameters indicate stable performance across entire dynamic range(40MS/s):RDAC has multiple slicesfor DEM interleavingtones reduced,improvingSFDRand FFTnois
139、e floor RDAC always set to correctstate lower referencecurrent Soft switching of thereference reducesringingartefactsADC1 AADC1 BADC1 AHigh reference current,due to high throughput and large DAC capacitors parasitic impedances limit signal chain resolutionADC1 BADC1 AADC1 B-110PSD(dBFS)-120-130-140-
140、150-160-1700.11101001k10kFrequency(Hz)Low frequency noise corner at 40Hz0-20-40-60-80-100-120-140-1600.1k1k10k100k1M10MPSD(dBFS)Frequency(Hz)HD2=-118dBCHD3=-124dBCTone=1.03kHz(-1dBFS)SNR=93.8dBFSSFDR=125dBFSFs/2=-136dBFSTemperature(oC)-40-10255585-20-100102030-5-3-131Gain drift 320ppb/oCOffset drift
141、 47ppb/oCGain variation(ppm)0-20-40-60-80-100-120-140015M20MPSD(dBFS)Frequency(Hz)HD2=-115dBCHD3=-109dBCTone=1.053MHz(-1dBFS)SNR=92.0dBFSTHD=-107dBCFs/2=-128dBFSHD4=-122dBCHD5=-117dBC10M5MCode10485767864325242882621440DNL(LSB)00.5-0.5DNL=0.27LSBINL=2.2/-1.8LSB Two stage SAR architecture with time-in
142、terleaved approach While onephaseis sampling,theotheris converting Sampling duration extended to entire conversion period Lower AAF bandwidth required Better filtering of driver andAFE noise Longersettling time available Lower powerinputdriver At high throughput,RA power cycling is not feasible Inte
143、rleavingimprovesRAutilization Higherpower efficiency Dynamic reference buffer used to reduce impedance effects Use of pre-chargedreferencebufferallows 20bDAC settling Reduce current demand from external reference ImprovedlinearityAn 11GHz 2nd-order DPD FMCW Chirp Generator with 0.051%rms Frequency E
144、rror under 2.3GHz Chirp Bandwidth,2.3GHz/s Slope and 50ns Idle Time in 65nm CMOSXuan Wang*1,2,Xujun Ma*3,Yupeng Fu1,Yuqian Zhou1,Ang Li1,Shuo Yang1,Xu Wu1,2,Dongming Wang1,2,Lianming Li1,2,Xiaohu You1,21Southeast University,Nanjing,China,2Purple Mountain Laboratories,Nanjing,China,3Tlcom SudParis,Pa
145、ris,FranceMotivationArchitectureSystem ImplementationVerificationPaper No.:10.7step_calkDelay-Spread CalibrationDTCRefmGPCSRSC9.6-12.5GHz VCOPFDw/i DZCPEK_EXTkFCW_totalkDiffkAcckC_AcckR_AcckTref/DTC_LSBMMDC_AcckEK_EXTkZero-phase-error DTC ModulatorKDTCkMOD%BUFDTC gaincalibrationVDACVPLLVFM_PVFM_NFCW
146、kpolarityEN_DLFfcw_stepchirp_clrEK_EXTkRamp Tracker Assisted 2nd-order Curve Fitting DPDVoltage Tracking LoopVsampDifferential VarsVFMCode2NVMAX0PNChirp GeneratorFCW_totalkSW_RTISOBUFLowpass(Phase Mod.)Z-1EN_LUTProposedEK_EXTkDLFEN_DLFDLFoutDSMVtmpVoltage Tracking Loop DACIntegral pathEK_EXTkPolarit
147、y2-K10115.0115.c0c1c15qck0114.15g0g1g14g15c1-c0c2-c1c15-c14c15-c14.qckqfkfcw_stepPolarityf2v_gainChirp_clr1st-order LUTload to ciRampIntegratorSW_LUTEK_EXTkPolarity2-K2Ramp Trackerslope_rtkfcw_chirpkScale FactorqckqfkQfcw_scalekAddress Scale2nd-order Curve Fitting DPDramp_ratek01EN_LUTCLK_DTCVsampIn
148、tegrationRamp Tracker Assisted 2nd-order Curve Fitting DPDCLK_DTCHighpass(Freq.Mod.)vdac_codenControl PCOscilloscopePhase Noise AnalyzerSpectrum AnalyzerPower SupplyXTALTest BoardDUTXTALSpectrum AnalyzerPhase Noise AnalyzerPower SupplyOscilloscopeControl PCChip layoutPN test:VCO outputChirp test:VCO
149、/16 outputDebug signal1.9mm1.95mmLC-VCOSPIIDACVCOBUFMMDFMCWDigitalSS/CP Analog VTLANA.DTCVTLDIG.Virtual TRx Elements Mm-Wave MIMO Radar SystemRapid Walk-Through ImagingZd Dx DyReconstructed 3D ImageDSPPixelLateral Resolution:Depth Resolution:xy z.Tx0Tx1TxNRx0Rx1RxM LNA LNA LNA.MixerLNALPFSwitchA/DA/
150、DA/DADCDSP.PLLFMChirp GeneratorPA TimeFrequency.Tx1TxNTchirpTidle.Snapshot DurationBWchirpTx0System ArchitectureChirp Waveform of TxPLL is the Key building block of FMCW radar systemMm-Wave MIMO Radar System for Rapid Walk-Through ImagingSystem RequirementsA.Mm-level lateral resolutionB.Sub-cm depth
151、 resolutionC.Milliseconds snapshot durationD.Better imaging qualityHundreds/Thousands of TRx elements15GHz bandwidth(BW)1s chirp duration 15GHz/s chirp slope 50ns Tidle time for 95%duty cycleGood chirp linearity and phase noiseThis WorkTidle=500nsTchirp=10sBWchirp=2.4GHzChirp Slope:240MHz/sAverage r
152、ms Ferror:15.016=240kHzTidle=50nsTchirp=1sBWchirp=2.3GHzChirp Slope:2.3GHz/sAverage rms Ferror:73.516=1176kHz0246810121416180.000.010.020.030.040.050.060.07rms Ferror(%)Norm.chirp slope*(GHz/s)I:ISSCC,J:JSSC,V:VLSI Prior works This work0246810 12 14 16 18 20024681012141618Norm.chirp slope*(GHz/s)Nor
153、m.chirp BW*(GHz)I18I18I22I23I19I20I21V230.00.51.01.50.000.010.02I19I22I23I20I21I18V23H.ShananISSCC228ISSCC235VLSI234JSSC183ISSCC192ISSCC201ISSCC21This workRTWO-Based ADPLLCPPLL+TPMADPLL+Type II TPMBBPLL+TPMSSPLL+TPMSSPLL+TPM+QDACADPLL+TPMSSPLL+TPM+2nd-ordercurve fitting DPDArchitecture80-2008076.852
154、8080200100Reference(MHz)8.8 to 1215 to 18.511.4 to 16.420.4 to 24.614.7 to 17.28.3 to 11.721.8 to 25.49.6 to 12.4Frequency(GHz)0.5/101.5/123.43/4.80.2/1.21.5/301.21/12.83.2/102.4/102.3/1.0BWchirp/Tchirp(GHz/s)-1.432.50*-1.491.14-2.352.16MFR(GHz)37(0.0074)138(0.0096)1157(0.034)124(0.06)103(0.007)168(
155、0.014)309(0.010)240(0.010)1176(0.051)rms Ferror(kHz)(%)4.158(6.5)7.075(8.95)19.36(24.5)0.714(1.0)7.406(9.375)9.559(11.95)10.76(13.33)16.93(21.4)16.29(20.6)*Norm.BWchirp(GHz)(%)41658954175952477471076169316296*Norm.chirp slope(MHz/s)-102.6-91-85-90-92.6-90.7-83.5-98*PN1MHz offset(dBc/Hz)0.0125*1-0.22
156、-0.50.05Idle time(s)18716.51019.74411.72850.8Power(mW)20.60.190.4810.90.261.2Core Area(mm2)SawSawTriSaw&TriSawSawSaw&TriSaw&TriChirp waveform28nm28nm28nm65nm28nm28nm40nm65nmCMOS technology*Normalized to 79 GHz *With 2s settling time *Estimated from measured figureIBM NorthPole:An Architecture for Ne
157、ural Network Inference with a 12nm Chip Andrew S.Cassidy,John V.Arthur,Filipp Akopyan,Alexander Andreopoulos,Rathinakumar Appuswamy,Pallab Datta,Michael V.Debole,Steven K.Esser,Carlos Ortega Otero,Jun Sawada,Brian Taba,Arnon Amir,Deepika Bablani,Peter J.Carlson,Myron D.Flickner,Rajamohan Gandhasri,G
158、uillaume J.Garreau,Megumi Ito,Jennifer L.Klamo,Jeffrey A.Kusnitz,Nathaniel J.McClatchey,Jeffrey L.McKinstry,Yutaka Nakamura,Tapan K.Nayak,William P.Risk,Kai Schleupen,Ben Shaw,Jay Sivagnaname,Daniel F.Smith,Ignacio Terrizzano,Takanori Ueda,Dharmendra ModhaIBM ResearchMotivationArchitectureSystem Imp
159、lementationPerformancePaper No.:11.4NorthPole:Efficient Neural Network Inference Yolo-v4 demoEdge serverNorthPole CardUSB CamerasLaptop fabricated in a 12nm process 22 billion transistors in 800 mm2 fully operational in first silicon implementation deployed in a PCIe research prototype PCB end-to-en
160、d software toolchainCompute:256 cores;2,048(4,096 and 8,192)VMM operations per core per cycle at 8-bit(at 4-bit and 2-bit,respectively)precisionMemory:224MB of on-chip memory(192MB in core array,32MB framebuffer for input-output)Communication:4 NoCs with over 8,192 wires crossing each coreControl:2,
161、048 fully deterministic threadsEdge AIEfficient AISustainable AIYolo-v4ResNet50UnifiedMemoryVectorthread3:0Vector3:0PS,ConstantmemoryWeightmemoryVMMthreadVMMActFxthreadActFxANoCthreadI/MNoCthreadANoCINoCMNoCPSNoCActivation RepackMotivationArchitectureSystem ImplementationVerificationA Scalable and I
162、nstantaneously Wideband 5GS/s RF Correlator Based on ChargeThresholding Achieving 8-bit ENOB and 152 TOPS/W Compute EfficiencyKareem Rashed1,Aswin Undavalli2,Shantanu Chakrabartty2,Aravind Nagulu2,Arun Natarajan11Oregon State University,Corvallis,OR,2Washington University in St.Louis,St.Louis,MOPape
163、r No.:12.3A 64Gb/s/pin PAM-4 Single-Ended Transmitter with Merged Pre-Emphasis Capacitive-Peaking Crosstalk Cancellation for Memory interfaces in 28nm CMOSWeitao Wu,Hongzhi Wu,Liping Zhong,Xuxu Cheng,Xiongshi Luo,Dongfan Xu,Catherine Wang,Zhenghao Li,Quan Pan*School of Microelectronics,Southern Univ
164、ersity of Science and Technology,Shenzhen,China To achieve high data throughput,advanced DRAM applications have adopted PAM-4 to accelerate the transmission speed.Channel Loss&FEXTCrosstalk Cancellation and Equalization Measurement Setup and Result of XTC Measurement of FS-FFE and Clock CalibrationT
165、X Comparison and Power Break-down Verification Driver(27mW)CLK Path(25mW)Serializer(21mW)Pre-driver(8mW)Total Power=81mW/ch(1.27pJ/bit)32Gb/s NRZ TX Output Eye without Adaptive DCC Calibration32Gb/s NRZ TX Output Eye with Adaptive DCC Calibration32Gb/s NRZ TX Output Eye without QEC Calibration32Gb/s
166、 NRZ TX Output Eye with QEC Calibration64Gb/s PAM-4 TX Output Eye with UI-Spaced FFE64Gb/s PAM-4 TX Output Eye with Fractional-Spaced FFE70mV6.25ps0.39UI0.49UI0.36UI27.7ps27.7ps34.8ps Quadrature error:11.3626.2ps33.8ps29.7psDuty cycle error:16.1635.3ps31.1ps31.25ps 31.4ps 31.25psDuty cycle error:0.4
167、830.9ps31.6ps30.9psQuadrature error:1.1270mV6.25ps0.47UI0.53UI0.41UISystem Implementation PAM-4 is more sensitive to crosstalk,which destroys the signal integrity and hindersthe improvement of channel density.Prior crosstalk cancellation works are mainly based on NRZ,and its effectiveness isnot guar
168、anteed when it is extended to PAM-4.PAM-4 owns more levels and transition edges,resulting in the SNR loss and SWJ.An efficient PAM-4 crosstalk cancellation technique without sacrificing pinefficiency and the SNR of TX output is desired to achieve high-density transmission.An efficient bandwidth-exte
169、nding technique is desired to improve the width ofPAM-4 eyes opening and achieve high-speed transmission.Power SupplyKeysightN6705CPCTest ChipRegulationBoard16GHz TX CLKChannelWith XTI2CVISAPWRVbiasVDDOscilloscope Keysight N1060 AWG Keysight 8196A8GHz Sampling CLKRegulationBoardPower SupplyPCAWG Key
170、sight 8196ATest ChipChannelsOscilloscope Keysight N1060 Measurement Setup of XTCw/XT&w/o XTCw/XT&w/XTC0.32UI0.6UIw/XT&w/o XTCw/XT&w/XTC0.36UI6.25ps100mV90mV6.25ps6.25ps6.25ps90mV100mV32Gb/s NRZ32Gb/s NRZ64Gb/s PAM464Gb/s PAM432Gbaud NRZ/PAM-4 TX Out Eye with&without XTCD1D0OUT05b Cap Array5b Cap Arr
171、ayPhase AlignerXTC signalXTC signalMain signalVDDPDRVMP2MP1MN1MN2XTC pathMain pathImpedance Control Merged Pre-Emphasis C-Peaking XTCReconfigurable Fractional-Spaced FFED0D4C4_90TAPA/TAPBD8C4_0TAPCC4_90TAPCD0 D1 D2 D3D5 D6 D7D0 D1 D2 D3 D4 D5 D6 D7DOUTTAPC(Post)DIN0C4_0TAPA/TAPBDOUTTAPA/TAPB(Main)D4
172、Adjusted by Tap Position ModulatorConclusions:1.27FoM*(pJ/bit/dB)0.596/0.1370.115Energy Efficiency(pJ/bit)3.52/1.4*1.19*0.5951.670.5260%(NRZ)36%(PAM-4)CIJ Reduction Ratio75%*(NRZ)/78%(NRZ)87%(NRZ)82%(PAM-4)Jitter Reduction Ratio50%(NRZ)40%(NRZ)46%(NRZ)45%(NRZ)65*(NRZ)28%(PAM-4)/200%Support PAM-4 XTC
173、NONONOYESPin Efficiency100%100%100%3-TapReconfigurable FS-FFEXTC TypeFIR-XTCDual-ModeEqualizationFIR-XTCMergedC-peaking XTC TX Equalization2-Tap FFE2-Tap FFE2-TapSub-UI FFEPAM-4ILNyquist Frequency(dB)5.91010.211SignalingNRZNRZNRZ28nmData Rates(Gb/s/pin)7.518464Technology65nm8nm65nmISSCC20 5ISSCC20 6
174、This WorkReferenceJSSC13 2JSSC23 428nm10NRZ22-Tap FFEFibonacciCoding75%NO-7.8Kim CICC2228nm60PAM-43.22-Tap FFE/200%NOFEXTNyquist Frequency(dB)-4.5/-16.4/-15.8IL-to-FEXT Ratio(dB)-1.4/6.2/5.84.8*Figure of Merit:Energy Efficiency(pJ/bit)/IL at Nyquist Frequency(dB)*Estimated from the reduction ratio o
175、f the crosstalk noise amplitude*According to the power breakdownXTC offXTC on878232Gb/s NRZ0.320.6XTC off64Gb/s PAM-4XTC onVertical Eye Opening(mV)100180CIJ Reduction()Horizontal Eye Opening(UI)000.3636ArchitectureMotivationMerged pre-emphasis C-peaking XTCReconfigurable fractional-spaced FFE PAM-4
176、is more sensitive to crosstalk and suffers SWJ.Prior XTC techniques are main about based on NRZHigh-Speed&High-Density Higher channel densityHigher data rate per pinHigh Data ThroughputPAM-4 modulationCrosstalk CancellationTargetChallengesAchievementProactive Voltage Droop Mitigation using Dual-Prop
177、ortional-Derivative Control based on Current and Voltage Prediction applied to a Multi-Core Processor in 28nm CMOSWeiwei Shan,Kaize Zhou,Keran Li,Yuxuan Du,Zhuo Chen,Junyi Qian,HaitaoGe,Jun Yang,Xin SiSoutheast University,Nanjing,ChinaMotivationArchitectureSystem ImplementationVerificationPaper No.:
178、14.2Challenges:1.Response latency of Vsensor2.Prediction accuracy of proactive mitigationsProposed Solutions:1.Accurate current prediction by ML-assisted key signal toggling2.Accurate voltage prediction by physical PDN model3.Proactive dual-PD control of I&V,neglectable performance lossVRMPCBPackage
179、DieVDIEPower delivery network influences on-die voltageTime(cycle)Drastic workload variations&Weak power delivery networkSevere dynamic voltage droopsCauseLatency of Droop monitor&Poor accuracy of Droop predictionChallengeVoltage Prediction Architecture Vdpn:Predicted real-time die voltage Idn、Idn-1
180、、Idn-2:Data from DPDM Vdnn-1、Vdnn-2:Data from on-chip voltage sensorVdn n =c1Id n +c2Id n-1 +c 3Id n-2 -c3Id n-2 -d1Vdp n-1 -d2Vdp n-2 Voltage prediction circuit(VPDN)c1 DQIdnIdn-1Idn-2c2c3d1d2Vdpn-1Vdpn-2Predict Vdie D DQ D D QPDPMVoltage SensorVsRSLCRLVdieZPDNIdILoadPhysical model of power deliver
181、y network(PDN)a1=1/C a2=RS/LC b1=RSL+LRLLC b2=RL+RSRLLC JSSC172ISSCC173This work1.5%Performance LossNA5 cycle+response time 2.0%0.6%25636 cyclesDroop reduction50mV/25.0%38mV(30.0%)36.9mV45.2mV(32.0%)Control SchemeClock gatingPower SwitchInstruction ThrottlingVoltage RegulationClock gating Control Sc
182、hemeThreshold ComparingThreshold ComparingThreshold ComparingDual-loop PD Control of both I/VProportional ControlMonitoring SchemeVoltage PredictionAnalogVoltage SensorPower/Timing Prediction Power Prediction Current Prediction+Voltage Prediction Frequency2.02GHz2.5GHz5.2GHz500MHz500MHzProcessor4 co
183、re CPU(Cortex-A57)10 core CPU(Cortex A73*2+A53*4+A35*4)12 core CPU(IBM z15)6 core DSP(VLIW CPU*4+Vector*2)8 core CPU(RISC-V RI5CY)Process16nm10nm14nm7nm28nmPredictiveYesNoYesYesYesIBM J204VLSI205Performance ComparisonMeasured On Chip Voltage Under mlDctVoltage/mV02000400060008000 t/ns*mlDct:a test c
184、ase in PULP testbench7508008509009501000132.9mV87.7mV45.2mVwithout with without strategy:132.9mVwith strategy:87.7mV45.2mV Droop SuppressionSuppression Effects of Different Strategies0.6%Performance loss(157 cycles among 25636 cycles)System Evaluation ResultThe waveforms displayed on the PC are from
185、 the on-chip SRAM data recorded during program execution.Voltage SensorSRAMChipMCU systems deployed on FPGAPersonal ComputerDemonstration System Implementation010020030040050080120160200Current(mA)0100200300400500760800840880920960Voltage(mV)Periodic droop caused by CPU instructions1st-order droop c
186、aused by abrupt largeITiming failureVsensor basedDPM(APOLLO)VPDNOursStrategyDroop(mV)Gating Cycles9492.79487.7871252211157No strategy132.90Demonstration System ImplementationDual-Proportional-Derivative Control based System AdjustmentOn chip VoltageOn chip CurrentCPU Threshold VoltageHeavy iLOADOccu
187、ranceVoltage PredictDual-PD Control0-Cycle Lantency!At least 2-Cycle Sampling Latency4-Cycle LantencyIn-2In-1InMeasured VoltagePredicted CurrentPredicted VoltageVsensor-based RegulationI/V Dual-PD RegulatIonOur StrategyVsensor-based StrategyTn+3Tn-1Tn+1Tn+21 CycleTn-4TnTn-2VnVn-1Vn-2Tn+4Simulation W
188、aveforms0123456NRMSE(%)Training Sets(17)Validation Sets(10)Clock cyclesPDPMPTPX23100232002330023400235002360023700140160180200220240Power(mW)Training resultsIterationsNo.of featuresNRMSECoordinate Pre-selection Output solutionsInitialize parameters for next iteration Updated ParametersInitialSetsCon
189、vergenceCoordinate Descent algorithmUpdating Effective Set Iterate with descreasing penalty()New setsCycle00000Cycle10100Cycle21010Cycle30100Cycle41001Cycle51000Power0Power1Power2Power3Power4Power5Feature VectorsABCDLabelPower data of 3-cycle aheadComb1Comb2Comb311808GroupRISC-V coreFPUI-CacheTCDMNu
190、mbers44882746128others131020480TotalSignal DistributionCPU clk_en8FPU clk_en4Clock gating enable signalsPULP Signal PreprocessingPre Simulation TraceMachine learning TrainingPost Simulation RetrainingCurrent Prediction ArchitectureSystem ArchitectureMeasured DomainPredict DomainPredict Digital Power
191、 Meter(PDPM)Clock gatingI/V Dual-PDControllerOn-die PDN ScannerPDN-based Voltage Predictor(VPDN)Key signals toggling TDLow Latency InterconnectRISC-V#0RISC-V#1RISC-V#2RISC-V#7Shared FPU x4L1.5-Shared I$DMAMP ControlAXIL1 SRAML1 SRAML1 SRAML1-I$L1-I$L1-I$L1-I$L1 SRAMPULP processor with 8 RISC-V cores
192、VsensorTDTDTransition detector(TD)Predicted currentPredicted voltageA 12nm Linux-SMP-Capable RISCV SoC with 14 Accelerator Types,Distributed Hardware Power Management and Flexible NoC-Based Data OrchestrationM.Cassel dos Santos1,T.Jia2,J.Zuckerman1,M.Cochet3,D.Giri1,E.J.Loscalzo1,K.Swaminathan3,T.Ta
193、mbe2,J.J.Zhang2,A.Buyuktosunoglu3,K.L.Chiu1,G.Di Guglielmo1,P.Mantovani1,L.Piccolboni1,G.Tombesi1,D.Trilla3,J.D.Wellman3,E.Y.Yang2,A.Amarnath3,Y.Jing4,B.Mishra4,J.Park2,V.Suresh4,S.Adve4,P.Bose3,D.Brooks2,L.P.Carloni1,K.L.Shepard1,G.Y.Wei21Columbia University,New York,NY,2Harvard University,Cambridg
194、e,MA,3IBM Research,Yorktown Heights,NY,4University of Illinois,Urbana,ILSystem Implementation&TestingPaper No.:14.5Tile-Based ArchitectureToken-Based Distributed Hardware Power Management(DHPM)EvaluationNoC Contention under High Utilization 51%on average#./fft2_stratus.exe=fft2_stratus.0=.num_sample
195、s=64.num_ffts=1.do_inverse=0*START *Test time:20808016 ns-fft2_stratus.0 time:4104583 ns*DONE *+TEST PASS:not exceeding error count thresholdSoC ArchitectureMotivationAccelerator sharing resources management:memory hierarchy,communication channels,on-chip powerHeterogeneous SoCs features a mix of ma
196、ny hardware accelerators and general purpose cores that run many applications in parallelNoC-based data orchestrationDistributed Hardware Power ManagementHeterogeneous SoC Evaluation PlatformHeterogeneous SoC ChallengesHostFPGAFMCDRAMDRAMASICHost PCRouterESPLinkUARTEthernetEthernetTechnology 12 nmAr
197、ea 64 mm2#IOs 340Power Domains 23Clock Domains 35Total SRAM 8.4MBPower 83mW 4.33WMax.Frequency 680MHz 1.6GHzAESSHA-1SHA-2Energy Efficiency(vs CVA6)Speedup(vs CVA6)ADAEVITNLPFFTNVNVDLASASpeedup bounded by memory or SW;thus V/F can be reduced while keeping speedupActive LLC/SPADActive AcceleratorOS Re
198、served LLCMost ContentionLeast ContentionInactive LLC/SPADSoC TilesDHPMPerformanceNoC-Based Data OrchestrationPer Application Speedup vs 1 LLC ConfigurationChip PropertiesLogic DomainVTileAlways On(AON)BufferVNoCTile Physical Design(e.g.Viterbi)Tile CLKLogicCoreVlogicTROactivityPM UnitNoC Controller
199、Token FSMLUTPID ControllerTDCToken countLDO Ctrl.8FtargetFtileToken targetCSRstatusupdateLogic DomainNoCSRAMNoC DomainNoCCoh.ReqCoh.RspCoh.FwdDMA ReqDMA RspLLCDirectoryController256KBSRAMWide Off-Chip Interfaceclk_incredit_invalid_indata(64x)clk_outcredit_outvalid_outOut-of-Core AcceleratorDMA.ReqDM
200、A.RspConfigConfigRegsDMACtrlCustomDatapath+ScratchpadPM CtrlCSRsTROclk_tileRISC-V CVA616KBL1 I$32 KBL1 D$FrontendDecodeIssueExec3 In-Core AcceleratorsCommit64KBL2$AXI+Coherency ExtensionsUncachedDataCoh.ReqCoh.RspCoh.FwdMMIOCSRsTROclk_tileConfigCSRsConfigTROclk_tileEthernet MACBootRAMIRQ CtrlUARTHos
201、tPCAHBCSRsTROclk_tileTROclk_nocConfigUncachedDataDMA ReqDMA RspMMIOTo/FromFPGAIOConfig/PMVglobalVglobalVglobalVglobalVtileVtileVlogicLDOLookaheadNoC RouterNESWTo TileAsync FifoLatency Insensitive Channels6x Routers NoC Power/Clock DomainTile Power/Clock DomainTile QueuesKASP:A 96.8%10-Keyword Accura
202、cy and 1.68J/Classification Keyword Spotting and Speaker Verification Processor Using Adaptive Beamforming and Progressive Wake-UpJ.Xiao1,X.Zhang1,S.Zhu1,Z.Yang1,M.Du1,C.Ji1,Y.Long1,X.Chen2,X.Miao2,L.Zhou1,L.Chang1,S.Liu1,J.Zhou11University of Electronic Science and Technology of China,Chengdu,China
203、2China Micro Semicon,Chengdu,ChinaMotivationArchitectureSystem ImplementationVerificationPaper No.:14.8 Keyword SpottingSpot target speech keywords in the audio signals Speaker VerificationVerify target speaker of the voice signalsHuman-Machine Voice InterfacePersonalized Speech RecognitionVoice Wak
204、e-UpGood Night.How can I help?Voice ControlScenariosSmart HomeWearable Smart DevicesSmart ToysKeywordSpottingSpeaker VerificationSensitive to Human Voice NoiseChallenge 1Challenge 2High Energy ConsumptionPurpose:Voice-Noise-Robust Energy-Efficient Low-Hardware-Cost OURsKWS&SV processor Reconfigurabl
205、e KASP Architecture Supporting Adaptive Beamforming Adaptive-Frequency DoA&Lightweight FDBF Technique Four-Stage Progressive Wake-Up Processing Technique Lightweight X-Vector-Based Multi-User SV Technique1.37 mm1.51 mmDRNNCEKVADMOthersDRFECE Technology:55nm Core Area:2.07mm2 Supply Voltage:0.84V Fre
206、quency:2.5M SRAM Size:25.25KB Data Precision:16bitFE,8bitNN Feature:Reconfigurable KWS&SV with On-Chip Adaptive DoA&BF Supporting KWS and SV with on-chip adaptive DoA&BF for ultra-energy-efficient and high accuracy Exploiting domain-specific features to achieve 2.24 J classification energy consumpti
207、on while being always-on Achieving highest 96.8%accuracy and over 90%under human voice noise with microjoule-level processing energyDemonstration SetupDemonstration FlowBatteryBluetoothOur ChipDisplayInterfaceLCD Screen(Keywords/Speaker ID/Noise Direction)Robot ControlMove LeftMove RightMove Forward
208、 Move BackwardTest Board(3cm3cm)MCU for config.Step1:Real-timeuservoicecommandsandnoisearecaptured through the MEMS microphone on the test board andsent to our chip through the SPI interface.Step 2:Our chip sends the classification results to the MCU,and the MCU controls the toy robot via Bluetooth
209、and displayskeywords/speaker ID/noise direction on the LCD screen.Robot ControlUser Voice from Human Noise from Mobile Phone Test Board(3cm3cm)MEMSMicrophoneLCD Screen(Keywords/Speaker ID/Noise Direction)Issue.1:Sensitive to human voice noise,resulting in significant accuracy lossTV/RadioKWSUserErro
210、rHuman Voice NoiseHuman Voice Noise in the EnvironmentAccuracyLossKWSKWSClean NoisyAccuracy(%)People Talking AroundIssue.2:Do not sufficiently exploit domain-specific features for energy-efficiency and accuracy improvementTimeSilenceNoiseKeywordHuman Voice NoiseEnergy EfficiencyIssueAccuracyIssueAud
211、io SignalProcessingFeature ExtractionVoice Activity DetectionNN-basedClassificationMEMSMicrophone Array(21)DoA Phase Difference CalculationDynamically Reconfigurable NN Computation Engine(DRNNCE)Instructions&Weights&BiasesReconfigurable FFT EngineSingle-Channel DataMel Filter&Log KWS-Aware Adaptive
212、VAD Module(KVADM)SPI/Parallel InterfaceWindowingPre-EmphasisLite-FDBFFeature MultiplexerFour-Stage Progressive Wake-up Controller(FPWC)Audio DataAudioDataInstructions&Weights&BiasesWake UpFE Mode CtrlDynamically Reconfigurable FE Computation Engine(DRFECE)KeywordsNoisesConfigurationKWS&DoA Classific
213、ation ResultsNN Mode CtrlNNInstructionsNN Layer ControllerKWS/SV/DoA Feature Data BufferKWS NN Mode(Network Type:TS-DSCNN)SV NN Mode(Network Type:Lite-X-Vector)DoA&BF NN Mode(Network Type:EP-DSCNN)ActivateDeactivateVAD Threshold TuningWeights&BiasesWeight/Bias MEM ControllerMACArray#0(321)MACArray#1
214、(321)MACArray#2(321)MACArray#3(321)Data MEM Bank#0Data MEM Bank#1Computation EnginePost-Processing ModuleActivation UnitPooling Unit Data ShaperData MEM Bank#0/1KWS/SV/DoA FeaturesWeight MEM Bank#1Bias MEM Bank#1Weight MEM Bank#0Bias MEM Bank#0kMUXDual-Channel DataFE&NN ModesnKWS FE Mode(single-chan
215、nel 128/256-pt FFT based log-mel acoustic feature extraction)SV FE Mode(single-channel 128/256-pt FFT based log-mel acoustic feature extraction)DoA&BF FE Mode(dual-channel 256-pt FFT based phase difference calculation&Lite-FDBF)Multi-User SV Free of Specific TrainingChallenge 3Speaker?User Data with
216、 LabelSpeaker-Specific TrainingSpeaker 1Speaker NNon-UserYesNoMulti-UserSV Single-User SVJSSC20JSSC21JSSC22 ISSCC22 ISSCC22 ISSCC23This WorkProcess(nm)65654040652855Core Area(mm2)2.564.13DIE1.100.942.030.802.07On-Chip DoAOn-Chip BFKWS Accuracy90.9%10-kws90.38%6-KWS95%8-KWS94.4%16-KWS86.03%10-KWS92.8
217、%5-KWS95.7%16-KWS96.8%10-KWSSV Accuracy99.5%1-svN/AN/AN/AN/AN/A94.5%168-SV99.6%1-SVDoA AccuracyN/AN/AN/AN/AN/AN/A98.7%13-DoAEnergy(J)/Classification5.62SV+KWS0.37NN3245BF67.38BF14.72KWS1.73KWS1.68KWS,2.06KWS+BF,1.86KWS+SV,0.32KWS NNFor evaluating KWS with SV,the SV needs to use the same dataset as K
218、WS(i.e.GSCD)Proposed four-stage progressive wake-up processingProposed KWS-driven adaptive DoA&BFBenchmark TableA 0.795-fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking SwitchZhiheng Yue1,Xujiang Xiang1,Fengbin Tu2,Yang Wang1,Yiming Wang1,Shaojun Wei1,Yang Hu1,S
219、houyi Yin1 31Tsinghua University,Beijing,China,2Hong Kong University of Science and Technology,Hong Kong,China,3Shanghai Artificial Intelligence Lab,Shanghai,ChinaMotivationArchitectureSystem ImplementationVerificationPaper No.:15.1Personal DeviceSmart TrafficSmart HomeSDN SwitchSoftware Defined Net
220、workTCAMPrefixNext Hop108.1.3/xx101.1.x/16PriorityP0P2P1171.1.3/210.0.11/1105.5/9120.1.3/2Challenge 1:Area and Power OverheadChallenge 2:TCAM Update OverheadChallenge 3:TCAM Security Issue0X1000011001X01XRow PeripheralColumn PeripheralCAM ArraySASASASA0100Search KeyRead/WriteSearch KeyMatch LineSear
221、ch LogicBLWLTemporal Buffer(2Kb)&Output Buffer(2Kb)TCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankTCAM BankAXI BUSPLL&PowerInput FIFO(512b)Configuration UnitsFlow rule width/Flow rule dependency graph/Rule priori
222、ty/Instruction Sequence64b64bAddress GeneratorMode SwitchPipeline Controller0:Configuration mode 1:TCAM search mode 2:Loading mode3:PUF mode TOP ControllerVoltage ControllerVcore:TCAM Core VoltageVcell:TCAM Cell VoltageVSA:Sense Amplifier Voltage VREF:Reference VoltageVPRE:Precharge VoltageInstructi
223、on FlowPre-charge/Sense-amplify/Write Enable/Cell ActivateRule 1Rule 3Rule 4Rule 2.Rule 5Flow RuleRule DependencyStandard 6Tx 20.7953 fJ/b3.25 Mb/mm2Cell/bitEnergy/bArea Density4.088 Mb/mm2/fJMb/mm2/fJ1/4 ArrayActive Area.Matchline/BLSearch/Read SASearch KeyCombined Search/Read SAPeripheralOscillosc
224、opeLogic AnalyzerPCPowerChipFPGATest ChipTest ChipFPGAOscilloscopePowerPCLogic AnalyzerFMCSPIVoltage SupplyResultTCAMJSSC20212VLSI20173JSSC20194ASSCC20195This workTechnology28nm55nm28nm28nm28nmCell10TCustomized 6TSplit 6T+2TCustomized 6T6TArray Size64x64128x1281024x32064x6464x64Supply Voltage(V)0.90
225、.80.910.50.9Frequency(MHz)26227052610100333Search energy(fJ/bit)1.0250.450.4221.620.795Cell Area(um2)2.6590.92650.592-0.307Array Area(mm2)0.0320.0233-0.00650.0015Bit density(Mb/mm2)0.3761.0811.44790.6293.25FoM(Mb/mm2/fJ)0.36682.40223.4310.38824.0865Row MUXSelected RowNon-selected RowRow MUXSearch Ke
226、y.MismatchAll-1 MatchSASA0101Non-selected Row01010101REFREFDischargeTCAM Design6T6T6T6T6T6T.SASA6T6T6T6T6T6T.SASATCAM 0TCAM 1Physical Variation10PUF Design-+A 28nm 69.4kOPS 4.4J/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems Yihong Zhu1,2,WenpingZhu1,2,Yi Ouyang1,Ju
227、nwen Sun1,2,Min Zhu3,Qi Zhao1,2,Jinjiang Yang1,Chen Chen1,2,Qichao Tao1,2,Guang Yang1,2,Aoyang Zhang1,ShaojunWei1,2,Leibo Liu 1,21 Tsinghua University,Beijing,China 2 Beijing National Research Center for lnformation Science and Technology(BNRist),Beijing,China 3 Micro Innovation Integrated Circuit D
228、esign Co.,Ltd,Wuxi,ChinaMotivationArchitectureSystem ImplementationVerificationPaper No.16.2Post-quantum cryptography(PQC):future public-key scheme.The necessity of crypto-agility of PQC chip.Design objectives and contributions of our designSystem architectureFour blocks:1,TOC:task-operator cluster.
229、2,MEM:bulk data storage and in/out.3,BUF:directly communicate with TOCs.4,TP:task-path,for generating and issuing tasks.Clustered architecture:1,Reduce the complexity of crossbar.2,More Scalable.Compared with mainstream CPUs.(Energy efficiency)Performance Comparisons with SOTAs.Power measurement set
230、-up.Possible application scenario example of PQC chips:server.Die-photo and characteristics*Corresponding author:Leibo Liu()MotivationArchitectureASIC Implementation and SetupVerificationA Synthesizable Design-Agnostic Timing Fault Injection Monitor Covering 2MHz to 1.26GHz Clocks in 65nm CMOSYan He
231、,Kaiyuan YangRice University,Houston,TXPaper No.:16.5A synthesizable,design-agnostic and distributable monitor of Timing FIAs.01010.REGREG.01110.Cryptography Key Leak OS Security Bypass.Computation ErrorBit ErrorLocalized Block-Level Injection Function Block(F)Power Management(Buck,LDO,etc.)Voltage
232、InjectionVDDFEM,Heating,FreezingEM/Temp InjectionClock Fault InjectionAdd PulseSkip CycleChange PhaseChange Duty CycleGlitched WaveformT5T7T1T3T9T6T8T2T4T10T11T12Fault Injection Attack(FIA)represents a severe threat to modern computing devices.TargetTiming FIA can be mounted by introducing glitches
233、in clock,voltage,EM,or temperature.ObservationFIACLKDLDMinRLRMaxCLKDMaxRMinPW LockReadyGlitchGlitch DetectionDQRConfigurable Delay Line(CDL)PLPMinPMaxConfACCConfFSMConfFSMDQRDQRDQRDQDQDQMonitor ArchitectureCLKDWNegWPosDMinDMaxNormal(After Lock):WNeg Violation:WPos Violation:DL PW(DMax)PW(CLK)PW(DMin
234、)WNegWPosPW(CLK)PW(DMax)WNegWPos1Glitch10000Alert issued in same cycle!RMin,RMaxOperation WaveformCoarse 9b Medium8b CLKMedium16b Medium16b ConfMLocking FSMFineDQRDQConfFConfCBypassCCLKConfACCPLPMinPMaxDLRLCDLCntROBypassFConfigurable Delay Line(CDL)Testing Setup65nm ASIC Implementation56789341250m30
235、mPattern Gen.&Pulse Adder10DUT 1-10Fully Synthesizable Monitor Area(65nm):1500m2=0.355MF2Temperature Chamber with Testing PCB insideVoltage SourceLabview ControlTesting PCBPGA ChipFrequency CounterHot Air Rework StationFreeze SpraySignal GeneratorNI DAQClock,voltage,temperature glitch attacksAttack
236、scenarios in DemoPerformance ComparisonTech.ApplicationPrincipleVoltage(V)Temp.(C)Power(mW)Area(MF2)Monitor PrecisionTarget Attacks ISSCC2344nmAES-256Error Checking0.7525-244.56-Any Fault Attack on AESVLSI2255nmDesign AgnosticHigh-freq.Sampling0.5-1.0250.8025a192FLL Period Low-Freq.Clock This Work65
237、nmDesign AgnosticPW Comparison0.4-1.4-40-1250.487b0.355DLL Delay StepClock,Voltage,EM,TempAttack Detection ResultsGlitch DetectedVDD1ms40mVEMI off:1.2V DC EMI on:60mV Vpp 10MHz-10dbm RF powerGlitch DetectedVDD9nsPulse Depth=120mVVDDInjection Pulse Width(ps)#Missing Alerts for 100 Trials each Config.
238、100400Clock Injection:Glitch TypeT1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 000000000000000000000000Voltage InjectionEMI InjectionTemperature(C)Average Slew Rate(C/min)Testing Equipment Monitor ResultHeating Attack25 1221200Hot Air Rework StationGlitch DetectedFreezing Attack25 -11-600Freeze SprayGlitch
239、 DetectedTemperature Drift-40 1252Temp.ChamberNo GlitchTemperature InjectionPACTOR:A Variation-Tolerant Probing-Attack Detector for a 2.5Gb/s4-Channel Chip-to-Chip Interface in 28nm CMOSMao Li,Zhaoqing Wang,Sanu K.Mathew,Vivek De,Mingoo Seok Columbia University,New York,NY,USA Intel,Hillsboro,OR,USA
240、MotivationArchitectureSystem ImplementationVerificationPaper No.:16.6System Architecture The quad-channel probing-attack detector(in blue).Capacitance comparators and reference capacitor array.Detector Architecture Clock-gated cross-couple inverter pair.Reference capacitor array and external PCB loa
241、ding.Comparison with State-of-ArtsDemonstration SystemImplementationsLaptopFPGAUART PACTORPmodProbing Attack Threat Probing attack on PCB signal traces can:Steal critical data between chips;Take over victim systems;Enable clock glitching attacks,side-channel attacks,etc.Measurement Results 71fF mini
242、mum detectable loading at typical condition.0.5pF detection resolution-20 to 105C and 0.65 to 1.1V.Noise suppression algorithm increases detection margin by 5.One-temperature point calibration for LT/HT threshold.Detect the probe applied to PCB wire traceChipTxCh1Ch2Ch3Ch4ScanchainFPGALaptopLDOVddPC
243、B boardUARTPmod3.3VChannel_selCload_selPower and EM Side-Channel-Attack-Resilient AES-128 Core with Round-Aligned Globally-Synchronous-Locally-Asynchronous Operation Based on Tunable Replica CircuitsSirish Oruganti*1,Meizhi Wang*1,Vishnuvardhan V Iyer1,Yipeng Wang1,Mengtian Yang1,Raghavan Kumar2,San
244、u K Mathew2,Jaydeep P Kulkarni1|1University of Texas,Austin,TX,2Intel,Hillsboro,ORMotivationArchitectureSystem ImplementationVerificationPaper No.:16.7Test Chip SummaryTechnology65nm CMOSActive Area0.054 mm2PackageQFN56 WBAES Variant128 bitVDD1VScan Chain424 bitUnprotectedPower1.602mWCLK62.5MHzProte
245、ctedPower1.699mWCLK37.5MHzThis WorkCICC23 6ISSCC22 9ISSCC22 5ISSCC21 4Countermeasure TechniqueRA-GSLA with TRCs and Parallel/Serial Ops per CycleNoise Injection by randomized Clock SlewRun-time ML basedRandom Additive masking+Address randomizationDigital Signature AttenuationCMOS Process Technology6
246、5nm65nm40nm7nm65nmCrypto-Core AlgorithmAES-128cAES-256AES-128,PRESENTAES-128/256AES-256Design OverheadArea15%a11%93%120%52%Power8%5%-11%8.6%120%50%Performance5-40%b40.32M20M1.2B1B1.25BGain(108,973X)(1,800X)(120000 x)(40,000 x)(178000 x)EMCGMTD40.32M20M1B1B1.25BGain(3,665X)FGMTD40.32M(1,800X)(60,000
247、x)(40,000 x)(138,888x)Gain(80,640X)aArea includes TRCs,PRNG and control blocks.bPerformance impact 5%when operated with onlyparallel mode(exactly one operation per SMA block),up to 40%when operated with parallel/serial/nullsequencing.cThis chip implements KeySchedule for AES-128,but the countermeasu
248、re is compatiblewith all AES variants(128/192/256),thanks to the Round-Aligned Globally-Synchronous-Locally-Asynchronous nature of operations.CLKRound Data/KeyCongruent to Synch.AESCompute ActivityAsync Ops.at stochastic times scatters side-channel info.Power/EMSignatureKey Ideas and Design Attribut
249、es of RA-GSLA AES using TRCsMaintain round integrity&synchronous operation at the boundary,operate security-criticalblocks asynchronously.A lightweight TRC with a randomized fire timing used for completion detection duringasynchronous computation.Randomized sequencing of security-critical modules,en
250、abling parallel/serial/null operationswithin one clock cycle.Temporaldithering and Dataflow shufflingfor all computations and register updates.Fully synthesizable,all-digital,single supply and technology scaling-friendly design,without anyanalog components.Compatible with any of the AES variants:128
251、/192/256 by external synchronous key-schedule.Round-Aligned Globally-Synchronous-Locally-Asynchronous AESFlowchart of GSLA Operations within a RoundValue_Latch?1Latch Round Sequence and Data Addresses0Fire active SMAs for this sequenceSMA_Done?10Latch SMA outputs into temp.regs.All 4 SMA Ops Done?0L
252、atch temp.regs into round data regsRound Start(CLK)Round Done 1Scan ChainScan SignalsDIN/OUTMicrocontroller(TRNG,Operand Store)Value_LatchSMA_Fire3:0Complete System ArchitectureCiphertextRound State CountersSMAs with TRCs(SMA)SubByteMixColumnAddRoundKeyTunable Replica Circuit(TRC)4(SMA)SubByteMixCol
253、umnAddRoundKeyTunable Replica Circuit(TRC)(SMA)SubByteMixColumnAddRoundKeyTunable Replica Circuit(TRC)SMA 3:0TRCTemporary Registers 256:0Round Data Register 127:0Pulse GeneratorPseudo-Random#GeneratorCLKControl UnitAES Round CountersMask 31:0 x4ShiftRowCLKRound_DoneAddress DecoderReverse Permutation
254、SMA_Done3:0Round SequenceRand31:0Operation SchedulerPlaintext Encryption KeyRound Key 127:0Round Data 128bPermutation Address EncoderRA-GSLA AES with TRC-based Completion Detection and Parallel/Serial/Null Ops/CycleKeySchedulePseudo-Random Number GeneratorReSeedA 11:0Seed(A,C)Rand_Next 31:0C 11:0 1.
255、3mW1.3mWppppDemo setup(On-pkg interconnect)Fiber terminationCo-packaged and fiber terminated 4-channel VCSEL-based optical TX ElectricalTX IC12mm0.8mmVCSELDriver ICPCBVCDRV ICUS Conec MOI 25+Gb/sWirebonded 14 VCSEL arrayVCDRV ICVCDRV CH#1VCDRV CH#1TX ICTX ICVCDRV ICVCDRV IC1-UI data gen.4:1 MUX+driv
256、erPattern gen.+16:4 MUXTX CH#1TX CH#1Local clockQuad gen.System PCB under testChannel switch boxFiberOptical scope(DCA-M)16GHz Clock genDCScanInput stageComplex-zero CTLEGain stageOutput stageTX CH#2TX CH#2VCDRV CH#2VCDRV CH#2TX CH#3TX CH#3VCDRV CH#3VCDRV CH#3TX CH#4TX CH#4VCDRV CH#4VCDRV CH#4Global
257、 clock dist.TX ICC CR RVB1VtermInput stageInput stageComplexComplex-zero zero CTLECTLEOutput stageOutput stageVCSEL biasVB2DCOCL LDCOCFrequencyAmplitude responseHCTLE(s)VCSEL optical40 dB/decadeConventional CTLEProposed CTLEHVCSEL-O(s)C CvlvlL LvlvlR RvlvlVCSEL modelVCSEL modelHHVCSELVCSEL-E E(s)(s)
258、HHVCSELVCSEL-OO(s)(s)ComplexComplex-zero pair zero pair(of the proposed CTLE)equalizes equalizes complexcomplex-pole pair pole pair(of the VCSEL s optical response)g gmmZ ZL LGain stageGain stageR Ro oR Rd dElectrical OpticalVCathodeVCDRV ICCoupled-resonator based quad-genResonant-TL-based global cl
259、ock dist.3rd harmonic filteringClock inputto TX serializerCkLClkICLCLCLLocal resonant clockch#1ch#2ch#3ch#4ClkIbClkQClkQb1-UI data gen+4:1 mux+driver(slice#1)slice#2slice#8CpadCesdLpeak1/4th rate4 phase clock1/4th rate dataBW extensionYtop-pYbot-n1-UI Data GenerationN/N DriverClkIDatabYbXbClkQ4XClkI
260、DataYXClkQ0.4 V0.85 V0.85 VMulti-mode VCSEL-based co-packaged opticsContribution of this work Demonstrate co-packaged and fiber-terminated 464Gb/s TX operation New complex-zero CTLE improves data-rate and energy efficiency(EE)Low-power low-jitter resonant clocking improves link margin and EE Seriali
261、zer-driver architecture improves eye-symmetry and link EEOptical I/OVCSEL-Based Optical Interconnects10cm1m10m100m1cmElectricalIntra-rackIntra-boardModule/packageInter-rackSingle-mode optics Multi-mode optics(e.g.VCSEL)200Gb/s per lane200Gb/s per fiber ElectricalMulti-mode optics(e.g.VCSEL)High band
262、width connectivity Reach tens of meters Energy efficientSingle-mode opticsx Limited reach at higher data rates due to channel lossx High power consumptionVCSEL-Based Optical Interconnects10cm1m10m100m1cmElectricalIntra-rackIntra-boardModule/packageInter-rackSingle-mode optics Multi-mode optics(e.g.V
263、CSEL)200Gb/s per lane200Gb/s per fiber On pkg.electrical I/Oto opticalElectrical On-package optical I/O extends reach Co-packaging requires only a short electrical I/O between XPU/SW core and optics low channel loss high data rate high energy efficiency low latency Co-packaging helps with VCSEL s thermal reliability(compared to monolithic integration)