《HC2022.Yale.Bhattacharjee.v5.pdf》由會員分享,可在線閱讀,更多相關《HC2022.Yale.Bhattacharjee.v5.pdf(37頁珍藏版)》請在三個皮匠報告上搜索。
1、HALO:A Flexible and Low Power Processing Fabric for Brain-Computer InterfacesAbhishek Bhattacharjee,Computer ScienceRajit Manohar,Electrical EngineeringYale UniversityIoannis KarageorgosKarthik SriramJn VeselMichael WuXiayuan WenNick LindsayLenny KhazanHow are implantable brain-computer interfaces i
2、mplemented?SensorsADCsLogicRadioStimulatorsDACsImplantable brain-computer interfaces trade processing,power,real-time processing,and flexibilityThe FDA warns against overheating cellular tissue beyond 1 15-40mWDARPA NESD targets 100s Mbps-10s Gbps to read/stimulate biological neuronsResponses within
3、 10s of milliseconds to treat epilepsy or movement disordersFlexibility for new computational methods,use cases,for personalization,and to build standards for wider computational stackTASKSMedtronicNeuropaceAziz et al.Chen et al.Kassiri et al.NURIPHALOSpike DetectionCompressionSeizure PredictionMove
4、ment IntentEncryptionFEATURESProgrammableLimitedLimitedLimitedRead Bandwidth10Kbps20Kbps10Mbps8Kbps4Mbps46MpbsStimulation Bandwidth10Kbps20Kbps8MpbsSafety(15mW)TASKSMedtronicNeuropaceAziz et al.Chen et al.Kassiri et al.NURIPHALOSpike DetectionCompressionSeizure PredictionMovement IntentEncryptionFEA
5、TURESProgrammableLimitedLimitedLimitedRead Bandwidth10Kbps20Kbps10Mbps8Kbps4Mbps46MpbsStimulation Bandwidth10Kbps20Kbps8MpbsSafety(15mW)TASKSMedtronicNeuropaceAziz et al.Chen et al.Kassiri et al.NURIPHALOSpike DetectionCompressionSeizure PredictionMovement IntentEncryptionFEATURESProgrammableLimited
6、LimitedLimitedRead Bandwidth10Kbps20Kbps10Mbps8Kbps4Mbps46MpbsStimulation Bandwidth10Kbps20Kbps8MpbsSafety(15mW)TASKSMedtronicNeuropaceAziz et al.Chen et al.Kassiri et al.NURIPHALOSpike DetectionCompressionSeizure PredictionMovement IntentEncryptionFEATURESProgrammableLimitedLimitedLimitedRead Bandw
7、idth10Kbps20Kbps10Mbps8Kbps4Mbps46MpbsStimulation Bandwidth10Kbps20Kbps8MpbsSafety(15mW)TASKSMedtronicNeuropaceAziz et al.Chen et al.Kassiri et al.NURIPHALOSpike DetectionCompressionSeizure PredictionMovement IntentEncryptionFEATURESProgrammableLimitedLimitedLimitedRead Bandwidth10Kbps20Kbps10Mbps8K
8、bps4Mbps46MpbsStimulation Bandwidth10Kbps20Kbps8MpbsSafety(15mW)Identifying computational capabilitiesImportant computational methods for both clinical and researchSupport for reading and stimulation of biological neuronsSupported computational kernels representative of methods used across brain reg
9、ions and depthsSome computational kernels need to meet real-time processing needsSupport for emerging algorithms and computational methodsSupport for parameter tuning to personalize algorithms to subjectIdentifying a standard set of computational capabilitiesCompressionMovement IntentSeizure Treatme
10、ntSpike Detection EncryptionRISC-VcontrollerMiscellaneous AlgorithmsWidely-Used Algorithms Amenable to Specialization2-stage,in-order 32-bit modified ibex(RV32E)Building monolithic ASICsCompressionMovement IntentSeizure TreatmentSpike Detection EncryptionRISC-VcontrollerMiscellaneous AlgorithmsWidel
11、y-Used Algorithms Amenable to SpecializationLZ4LZMADWTMAMovement IntentSeizure TreatmentDWTNEOAES2-stage,in-order 32-bit modified ibex(RV32E)LZLICBaseline:Monolithic ASICHALO:Processing ElementsLZ4CompressionLZLICBaseline:Monolithic ASICHALO:Processing ElementsLZ4Compression233 MHz15 mW129 MHz3 mW23
12、 MHz0.4 mW28nm FD-SOI CMOS process,physical synthesis flow with standard cells from STMicroelectronicsLZLICMABaseline:Monolithic ASICHALO:Processing Elementsfunction LZMA_COMPRESS_BLOCK(input)output=list(lzma header)while data=input.get()dobest_match=find_best_match(data)match_prob=count(match_table
13、,best_match)/count_total(match_table)r1=range_encode(match_prob)output.push(r1)increment_counter(match_table,best_match)end whileret outputLZMACompressionLZLICMABaseline:Monolithic ASICHALO:Processing Elementsfunction LZMA_COMPRESS_BLOCK(input)output=list(lzma header)while data=input.get()dobest_mat
14、ch=find_best_match(data)match_prob=count(match_table,best_match)/count_total(match_table)r1=range_encode(match_prob)output.push(r1)increment_counter(match_table,best_match)end whileret outputRCLZMACompressionLZLICMABaseline:Monolithic ASICHALO:Processing Elementsfunction LZMA_COMPRESS_BLOCK(input)ou
15、tput=list(lzma header)while data=input.get()dobest_match=find_best_match(data)match_prob=count(match_table,best_match)/count_total(match_table)r1=range_encode(match_prob)output.push(r1)increment_counter(match_table,best_match)end whileret outputRCLZMACompression233 MHz22 mW129 MHz3 mW92 MHz3 mW90 MH
16、z0.8 mW28nm FD-SOI CMOS process,physical synthesis flow with standard cells from STMicroelectronicsLZLICMABaseline:Monolithic ASICHALO:Processing ElementsRCDWTDWTMACompressionLZLICMABaseline:Monolithic ASICHALO:Processing ElementsRCDWTThreshold NEOCompressionSpike DetectionTHRNEOLZLICMABaseline:Mono
17、lithic ASICHALO:Processing ElementsRCDWTThreshold DWTCompressionSpike DetectionTHRNEOLZLICMABaseline:Monolithic ASICHALO:Processing ElementsRCMovement IntentCompressionSpike DetectionNEOMovement IntentFFTTHRDWTLZLICMABaseline:Monolithic ASICHALO:Processing ElementsRCSeizure TreatmentCompressionSpike
18、 DetectionNEOMovement IntentSeizure TreatmentSVMXCORBBFTHRFFTDWTLZLICMABaseline:Monolithic ASICHALO:Processing ElementsRCSeizure TreatmentCompressionSpike DetectionNEOMovement IntentSeizure TreatmentSVMXCORBBFTHRFFTMovement intent 1024 pointsSeizure treatment 25 pointsMovement IntentDWTLZLICMABaseli
19、ne:Monolithic ASICHALO:Processing ElementsRCSeizure TreatmentCompressionSpike DetectionNEOMovement IntentSeizure TreatmentSVMXCORBBFTHRFFT160 MHz19 mW16 MHz1 mW6 MHz0.1 mW6 MHz0.1 mW3 MHz0.2 mWAESEncryptionEncryptionMovement IntentNEODWTDWTMALZMALZ4233 MHz15 mW233 MHz22 mW195 MHz18 mW6 MHz2 mW6 MHz2
20、 mW16 MHz3 mW5 MHz11 mW129 MHz3 mW23 MHz0.4 mW92 MHz3 mW90 MHz0.8 mW3 MHz0.006 mW3 MHz0.015 mW16 MHz0.001 mW5 MHz0.11 mWDWT28nm FD-SOI CMOS process,physical synthesis flow with standard cells from STMicroelectronicsHALO:Processing ElementsWaiting for vendor to package chip for measurement results;ph
21、ysical synthesis results shownLZLICMARCDWTNEOSVMXCORBBFTHRFFTAESChip tape-out in 12nm CMOS process155 MHz2.5 mW36 MHz0.05 mW14 MHz0.02 mW25 MHz0.10 mW180 MHz4.33 mW60 MHz0.45 mW3.5 MHz0.04 mWSummary of the HALO approachBreak each computational task into individual kernelsInstead of monolithic ASIC,b
22、uild a hardware PE per kernelClock each PE at no more than its necessary frequencyAvoid overly fine-grained PEs to reduce communicationAvoid overly coarse-grained PEs to facilitate sharing,reuse,and lower clock speedDesigning a moduleComputation needs are still being investigated by neuroscience res
23、earchers For rapid prototyping,we used a high-level synthesis(HLS)flowHLS structureStandardized parameter settings“config”interface for controllerElastic I/O interface from HLS toolsHLS optimizationsFixed-point v/s floating-pointChoice of loop pipeliningRe-structuring input to make it more“HLS-frien
24、dlyInterconnect designCurrent implementationPE frequency to/from interconnect frequency adaptorStandard synchronizer structure for interface to interconnectInterconnect frequency selected to support“full throughput”LZLICSimilar configuration interface to set configuration bits for switchesHandling b
25、ursty dataFlow of data tokens is bursty and data-dependentExample:compression produces a variable number of output data tokensEach component has a peak token consumption rate(set by its frequency)FIFOs needed at some interfaces to buffer data tokensFIFOs sized based on frequency of PEs+worst-case da
26、ta patternsManagement and configuration interfaceEach element of architecture exports a standardized“config”portParameter settingsPipeline configurationReading debugging information from PEConfig module added to RISC-V core,under software controlEvaluations using neuronal recordings of a non-human p
27、rimates motor cortex collected by the Borton Lab lab at BrownMore recent evaluations using recordings from human patients with epilepsy collected by the Yale Epilepsy Research Center28nm FD-SOI CMOS estimatesResults for worst case variation corner at VDDMAX,TrFF,RCBESTat VDDof 1VStandard cell and ma
28、cro libraries characterized or interpolated to 40Total power budget of 15mW,with 2mW devoted to ADCs,amplifiers,and radioAlso selected for inclusion in IEEE Micros Top Picks in Computer Architecture,article titled:“Balancing Specialized Versus Flexible Computation in Brain-Computer Interfaces”Our fo
29、cus is on more complete tape-outs,designing an asynchronous vector processor,building support for long-term storage,and distributed BCI scenariosKarthik SriramXiayuan WenZach TaylorOliver YeRaghavendra PothukuchiAnurag KhandelwalHitten Zaveri Dennis SpencerAlso exploring potential in-vivo tests with swine with collaborators at Yales Epilepsy Research CenterMichal Gerasimiuk