1、AMD VersalAI Edge Series Gen 2 for Vision and AutomotiveTomai Knopp and Jeffrey ChuCo-author:Sagheer AhmadHot Chips 20242|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Agenda Challenges and Motivation Silicon Features Vision Applications3|AMD VERSAL AI EDGE SERIES GEN 2 FOR VI
2、SION AND AUTOMOTIVE|AUGUST 2024AI Driven Embedded Processing PhasesSensor Processing&Control,Data ConditioningPreprocessingAI InferencePerception,AnalyticsPostprocessingDecision Making,Control,FeedbackPerceptionAnalyticsHMIControlDecision MakingCameraRadarLiDAROther SensorsSensorProcessingSensor Fus
3、ionData Conditioning4|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Superior Integration to Reduce System Power,Area&ComplexitySingle-Chip System with Versal AI Edge Series Gen 2Versal AI Edge Gen 2DeviceDDRDDRMulti-Chip AI-DrivenEmbedded SystemSafety MCUHigh Perf.Embedded CPU
4、AI AcceleratorSensor ProcessingDDRDDRDDRDDRDDRDDRDDRDDRLimited PowerAvailability Security,Safety&ReliabilityTight Form Factor RequirementsChallenging EnvironmentsReal-TimeResponseLong Life Cycles5|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024AMD Versal AI Edge Series Gen 2 Ov
5、erviewNext-Generation AI Engines for Efficient AI InferenceUp to 3X TOPs/Watt*High-Performance Integrated CPUs for PostprocessingUp to 10X Scalar Compute*Enhanced Safety&SecuritySupport for ASIL D,SIL 3World-Class Programmable Logic for Flexible,Real-Time Preprocessing Sensor Fusion,Data Conditionin
6、g New Hard Image/Video ProcessingAI Engines(AIE-ML v2)UltraRAM8x ArmCortex-A78AE Application Processor10 x Arm Cortex-R52 Real-Time ProcessorPlatform Management ControllerPS PCIeArm Mali G78AE GPUApplicationSecurity UnitDDR5,LPDDR5XImage Signal ProcessorVideo Codec UnitVideo Processing PipelineUSB 3
7、.2PS 10 GbEProcessing System(PS)Block RAMDSP EnginesLUTsProgrammable Logic(PL)100G Multirate Ethernet CoresPCIeGen5(PL PCIE5)32G TransceiversGPIOLVDSMIPIProgrammable I/OProgrammable Network on Chip2VE3858290MbitOn-die memory36B transistorsUser Selectable System Memory Interfaces for Data Intensive A
8、ccesses Up to 170 GBytes/sec Bandwidth*Pre-Silicon Estimated Performance See Endnotes VER-023,VER-0276|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 20242VE33042VE33582VE35042VE35582VE38042VE3858AIE-ML v2 Tiles24249696144144Max.Dense INT8 TOPs3131123123184184#APU Cores(Arm Cortex-
9、A78AE)484848#of RPU Cores(Arm Cortex-R52)410410410LUT694k94k225k225k543k543kDSP1841847007002,0642,06432b Memory controller334455Max.Memory Bandwidth(GB/s)102102136136170170PL PCIe(Gen4/5x4)113344MRMAC(10/25/100GE)111133GPU(4k60 200GFLOPs+)111111Video Codec Unit(VCU)Tiles-1-1-1Image Signal Processor(
10、ISP)Tiles-1-3-3Video Processing Pipeline(VPP)Tiles-1-PS-Facing GTYP/PL-Facing GTYP4/44/44/124/124/204/20X5IO/HDIO1260/44260/44384/88384/88512/44512/44A1089(27 mm x 27 mm)A1444(31 mm x 31 mm)A2112(37.5 mm x 37.5 mm)AMD Versal AI Edge Series Gen 2 Product TableHard IPComputePackageIOLead DeviceEnd-to-
11、end Acceleration for AI-driven Embedded Systems1.Maximum X5IO and HDIO counts may not be available in same package.7|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024AIE-ML Array Interface(PL&NoC Interface Tiles)AI Engines:AIE-ML v2AIEngine-MLLocal Mem.AIEngine-MLLocal Mem.AIEngi
12、ne-MLLocal Mem.AIEngine-MLLocal Mem.AIEngine-MLLocal Mem.AIEngine-MLLocal Mem.AIEngine-MLLocal Mem.AIEngine-MLLocal Mem.AIEngine-MLLocal Mem.Memory TileMemory TileMemory TileControl TileControl TileControl TileSubset of supported data types;values assume highest speed grade.TOPs in AMD VersalAI Edge
13、 Series Gen 2 Devices2VE38582VE35582VE3358MX636924661TFLOPSINT8(sparse)36924661TOPSINT8(dense)18412331TOPSFP8/MX918412331TFLOPSFP16/BF16926115TFLOPSINT16(sparse)926115TOPSINT16(dense)46318TOPS1.Pre-Silicon Estimated Performance vs.Previous Generation See Endnotes VER-025,VER-0268|AMD VERSAL AI EDGE
14、SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024AI Engine Architecture Features 8FamilyAMD Versal AI Edge SeriesAMD Versal AI Edge Series Gen 2AIE VersionAIE-MLAIE-MLv2INT8256512BFLOAT16128256FP8N/A512FP16N/A256MX6*N/A1024MX9*N/A512Compression and SparsityYesYesAIE Array Interconnect B/W1x(32b)2x(
15、64b)Tile Local Data Memory64 KB64 KBMemory TileAIE Memory Tile(512KB/tile)AIE Memory Tile(512KB/tile)AIE ControllerProgrammable Logic BasedHardended Microblaze per columnCompute(Mults/Tile)2x2x2xNewNewNewNewNew*MX6 and MX9 datatypes defined in https:/arxiv.org/pdf/2302.08007,reference to Table II.9|
16、AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024MX9 and MX6 Datatypes*See Endnotes VER-69 and VER-7010|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Processing System OverviewApplication Processing Unit(APU)Arm Cortex-A78AE cores Up to 2.2 GHz max frequenc
17、y per core1 Up to 200.3k DMIPs in APU2Real-Time Processing Unit(RPU)Arm Cortex-R52 cores Up to 1.05 GHz max frequency per core1 Up to 28.5k DMIPs in RPU2Graphics Processing Unit(GPU)Arm Mali G78AE GPU Up to 1.05 GHz max frequency Up to 268GFlopsProcessing SystemApplication Security UnitPlatform Mana
18、gement ControllerBoot I/OLow Power DomainFull Power DomainApplication Processing UnitReal-Time Processing UnitLPD I/OCortex-A78AECortex-A78AECortex-A78AECortex-A78AECortex-A78AECortex-A78AEGPUUSB3.210GbEPCIe Gen5x4DP/eDPDisplay ControllerCortex-R52Cortex-R52Cortex-A78AECortex-A78AECortex-R52Cortex-R
19、52Cortex-R52Cortex-R521:Pre-silicon estimated performance vs.prior generation.See Endnotes VER-027,VER-03011|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Processing System OverviewProcessing SystemApplication Security UnitPlatform Management ControllerBoot I/OLow Power Domain
20、Full Power DomainApplication Processing UnitReal-Time Processing UnitLPD I/OCortex-A78AECortex-A78AECortex-A78AECortex-A78AECortex-A78AECortex-A78AEGPUUSB3.210GbEPCIe Gen5x4DP/eDPDisplay ControllerCortex-R52Cortex-R52Cortex-A78AECortex-A78AECortex-R52Cortex-R52Cortex-R52Cortex-R52Source:AMD internal
21、 data,February 20241.Pre-Silicon Estimated Performance See Endnotes VER-02712|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Embedded System SecurityPlatform Management ControllerIOIOURCUPPUPMC Main Switch(AXI)DebugSecurityAnalogSystemPLInterfacesBatteryDomainProgrammable NoCXM
22、PUQoS/ArbiterCryptoLPDDR5/DDR5 ControllerExternal DDR MemoryDDRMCDirectApplication Security Unit(ASU)Provides run-time HSM security(encryption/authentication/key management)Platform Management Controller(PMC)Manages device-level security services(Secure Boot,HWRoT,Physical Attack Protection,etc.)Mem
23、ory Controller Inline Crypto(ILC)Built-in inline encryption within DDR5/LPDDR5X memory controllers(AES-XTS or AES-GCM)Application Security UnitPlatform SecurityPMCPLASU SwitchECDSA/RSATRNGSoft Crypto CoreDMADMASecure Stream InterconnectMicroBlazeVKey ManagementAESSHA-2SHA-3ProcessorsI/OPS Switch13|A
24、MD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Image and Video Processing IP14|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Functional SafetyISO 13849Machine SafetyIEC 61508Functional SafetyISO 26262Automotive SafetyAI EnginesApplication Processing UnitRe
25、al-Time Processing UnitPlatform Management ControllerDDR5,LPDDR5XProcessorInterfacesSecurityImage Signal ProcessorVideo Encode/DecodeVideo ProcessingGPUProcessing SystemProgrammable Logic 100G Ethernet PCIeGen5 Serial TransceiversProgrammableI/OProgrammable Network on ChipASIL D/SIL3 Systematic Faul
26、t IntegrityIncludingApplication/Real Time/Video and Acceleration ChannelsQM Systematic&Random Fault IntegrityAI EnginesApplication Processing UnitReal-Time Processing UnitPlatform Management ControllerDDR5,LPDDR5XProcessorInterfacesSecurityImage Signal ProcessorVideo Encode/DecodeVideo ProcessingGPU
27、Processing SystemProgrammable Logic 100G Ethernet PCIeGen5 Serial TransceiversProgrammableI/OProgrammable Network on ChipAI EnginesApplication Processing UnitReal-Time Processing UnitPlatform Management ControllerDDR5,LPDDR5XProcessorInterfacesSecurityImage Signal ProcessorVideo Encode/DecodeVideo P
28、rocessingGPUProcessing SystemProgrammable Logic 100G Ethernet PCIeGen5 Serial TransceiversProgrammableI/OProgrammable Network on ChipVideo/Acceleration channelASIL B/SIL2 Random Hardware Fault IntegrityAI EnginesApplication Processing UnitReal-Time Processing UnitPlatform Management ControllerDDR5,L
29、PDDR5XProcessorInterfacesSecurityImage Signal ProcessorVideo Encode/DecodeVideo ProcessingGPUProcessing SystemProgrammable Logic 100G Ethernet PCIeGen5 Serial TransceiversProgrammableI/OProgrammable Network on ChipApplication/Real Time channelsASIL D/SIL3 Random Hardware Fault Integrity15|AMD VERSAL
30、 AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Automotive AI and Vision ApplicationsWhy Adaptable SoCsEnables DifferentiationIdeal for emerging applicationsOMS,LiDAR,4D Imaging RadarFutureproofingChanges in SensorsTransition from CV to AILow Latency Processing Parallelization and/or Isol
31、ation of critical processing pipelines Face recognition Eye gaze Pose estimation Hand gesture Health monitoring Surround view monitoring Image enhancement Object detection Perception Smart Assistant Video conferencing Face detection/tracking Background blur/replacementDMS/OMSExterior ImagingProducti
32、vity16|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Sensor Data SetSensor Data SetSensor Data SetSensor Data SetPerception Decision MakingAI ModelsAI ModelsAI ModelsSensor PeriodSensor PeriodSensor PeriodSensor PeriodProcessing PeriodPreprocessingAI InferencePostprocessingAI
33、And Vision Processing Pipeline 17|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Sensor Data SetSensor Data SetSensor Data SetSensor Data SetPerception Decision MakingAI ModelsAI ModelsAI ModelsSensor PeriodSensor PeriodSensor PeriodSensor PeriodProcessing PeriodPreprocessingAI
34、 InferencePostprocessingAI And Vision Processing Pipeline 18|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Sensor Data SetSensor Data SetSensor Data SetSensor Data SetPerception Decision MakingAI ModelsAI ModelsAI ModelsSensor PeriodSensor PeriodSensor PeriodSensor PeriodProce
35、ssing PeriodPreprocessingAI InferencePostprocessingAI And Vision Processing Pipeline 19|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Programmable Logic and IO enable wide range of user adaptability in HardwareFunctionAdaptive SoCOther SoCs or ASICsConfigurable Physical Interf
36、acesHW/SWCustomizableFixedLow Latency Control&SynchronizationHW/SW CustomizableFixed or SWVision PipelineHW/SWCustomizableFixedSensor FusionHW/SWCustomizableSW OnlySensorHigh Speed IOsIOsCPHYDPHYData Routing,Conditioning,ExtractionReal-time Sensor Control and UpdatesCustom Vision ProcessingHard ISPG
37、PUPreprocessingProgrammable LogicHigh Speed IOsSensor Preprocessing20|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024AIE-ML v2 ArrayAIEControlComputeTilesMemoryTilesProgrammable NoCBFrame Period“N”AFrame Period“N”ResultsModel AModel BInference-Spatial SharingRun Models Concurre
38、ntly within the AIE-ML array21|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Inference Temporal SharingTimeAIEControlComputeTilesMemoryTilesConfigurable NoCAIEControlComputeTilesMemoryTilesConfigurable NoCFrame Period“N”AFrame Period“N”Model A ResultsBFrame Period“N”Model B Re
39、sultsAlternatively,multiple AI model workloads can timeshare AIE-ML Context Switching between multiple AI Models Enable prioritization of order of AI Model results for post processingAIE-ML v2 ArrayAIE-ML v2 ArrayModel AModel B22|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024P
40、ostprocessingASIL-BSensor Control/2AIn-vehicle CommunicationASIL-BPerception,Localization,and PlanningASIL-DSafety Critical Decision MakingProcessing cluster resources configurable based on application specific needsA78-AE ClusterA78-AE ClusterA78-AE ClusterCoherent Mesh NetworkProgrammable NOCDDRDD
41、RDDRDDRA78-AE ClusterA78-AECoreA78-AECoreL3L2L2A78-AECoreA78-AECoreL3L2L2A78-AECoreA78-AECoreL3L2L2A78-AECoreA78-AECoreL3L2L2System Level CacheSystem Level CacheSystem Level CacheSystem Level Cache23|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Automated Parking24|AMD VERSAL
42、AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024AMD Versal AI Edge Series Gen 2 Adaptable and Scalable for Vision and AutomotiveOptimized for User Configurability in Vision and Automotive ApplicationsCustomize for user application needs,considering total compute,customprocessing and functi
43、onal safetyLevel 2Level 3Level 4Level 5End-to-end acceleration of AI-driven embedded systems from a single-chipHighly integrated device with hardened compute accelerators,programmable logic,and built-in high reliability and safetySingle Architecture Enabling Cost Efficient to High Performance Proces
44、singDesign once and scale performance needs with the same tools,SW,ecosystem,and safety certification25|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024EndnotesVER-023Based on AMD internal pre-silicon performance estimates and power projections for the AIE-ML v2 compute tile arc
45、hitecture featured in the Versal AI Edge series Gen 2 using the MX6 data type compared to production performance specifications and AMD Power Design Manager power estimates for the AIE-ML compute tile architecture featured in the first generation Versal AI Edge series using the INT8 data type.Operat
46、ing conditions:1 GHz Fmax,0.7V AIE operating voltage,85C junction temperature,typical process,60%vector load,%activations=0 10%.Actual performance will vary when final products are released in market.Performance projections as of February 2024.VER-025Based on AMD internal pre-silicon performance est
47、imates and power projections for the AIE-ML v2 compute tile architecture featured in the Versal AI Edge series Gen 2 for the MX6 data type compared to the INT8 data type.Operating conditions:1 GHz Fmax,0.7V AIE operating voltage,85C junction temperature,typical process,60%vector load,%activations=0
48、10%.Actual performance will vary when final products are released in market.VER-026Based on AMD internal pre-silicon performance estimates for the AIE-ML v2 compute tile architecture featured in the Versal AI Edge series Gen 2 using the MX6 data type compared to the INT8 data type.Operating conditio
49、ns:1 GHz Fmax,0.7V AIE operating voltage.Actual performance will vary when final products are released in market.VER-027Based on AMD internal pre-silicon performance estimates for combined total DMIPs of the Versal AI Edge series Gen 2 and Versal Prime series Gen 2 processing system when configured
50、with 8 Arm Cortex-A78AE applications cores 2.2 GHz&10 Arm Cortex-R52 real-time cores 1.05 GHz,compared to the published combined total DMIPs of the processing system in the first generation Versal AI Edge series and Versal Prime series.Versal AI Edge series Gen 2 and Prime series Gen 2 operating con
51、ditions:highest available speed grade,0.88V PS operating voltage,split-mode operation,maximum supported operating frequency.First generation Versal AI Edge series and Prime series operating conditions:highest available speed grade,0.88V PS operating voltage,maximum supported operating frequency.Actu
52、al DMIPs performance will vary when final products are released in market.VER-030Based on Arm product specifications for a Versal AI Edge series Gen 2 and a Versal Prime series Gen 2 configured with a 4 core Mali-G78AE GPU with a maximum operating frequency of 1050 MHz,64 FP32 per ops/clock/core,and
53、 4 texels per ops/clock/core.Actual Versal product performance will vary when final products are released in market.26|AMD VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024EndnotesVER-69Based on AMD internal accuracy testing in July 2024,AMD evaluated the MX6 datatype as drop-in repl
54、acement across a subset of market-relevant FP32 AI models(25/44)in various categories(speech,automotive,CNN and vision).Configuration for internal accuracy testing:AMD EPYC 73F3 CPU with 8x AMD Instinct MI250 GPUs.OS ver:Ubuntu-20.04.ML Env:Pytorch v2.1.2,ROCm 5.6.0.50600-7620.04.Results may vary an
55、d are based on several factors,including design,device,configuration,AI model,and ML software.VER-70Based on AMD internal accuracy testing in July 2024,AMD evaluated the MX9 datatype as a drop-in replacement across a subset of market-relevant FP32 AI models(35/44)in various categories(speech,automot
56、ive,CNN and vision).Configuration for internal accuracy testing:AMD EPYC 73F3 CPU with 8x AMD Instinct MI250 GPUs.OS ver:Ubuntu-20.04.ML Env:Pytorch v2.1.2,ROCm 5.6.0.50600-7620.04.Results may vary and are based on several factors,including design,device,configuration,AI model,and ML software.27|AMD
57、 VERSAL AI EDGE SERIES GEN 2 FOR VISION AND AUTOMOTIVE|AUGUST 2024Disclaimer and AttributionsDISCLAIMERTimelines,roadmaps,and/or product release dates shown in these slides are plans only and subject to change.The information contained herein is for informational purposes only and is subject to chan
58、ge without notice.While every precaution has been taken in the preparation of this document,it may contain technical inaccuracies,omissions and typographical errors,and AMD is under no obligation to update or otherwise correct this information.Advanced Micro Devices,Inc.makes no representations or w
59、arranties with respect to the accuracy or completeness of the contents of this document,and assumes no liability of any kind,including the implied warranties of noninfringement,merchantability or fitness for particular purposes,with respect to the operation or use of AMD hardware,software or other p
60、roducts described herein.No license,including implied or arising by estoppel,to any intellectual property rights is granted by this document.Terms and limitations applicable to the purchase or use of AMDs products are as set forth in a signed agreement between the parties or in AMDs Standard Terms a
61、nd Conditions of Sale.GD-182024 Advanced Micro Devices,Inc.All rights reserved.AMD,the AMD Arrow logo,Alveo,EPYC,Instinct,Kintex,Radeon,Ryzen,Versal,Vitis,Vivado,Zynq,and combinations thereof are trademarks of Advanced Micro Devices,Inc.Arm,Cortex,and Mali are a registered trademarks of Arm Limited(
62、or its subsidiaries)in the US and/or elsewhere.DisplayPort and the DisplayPort logo are trademarks owned by the Video Electronics Standards Association(VESA)in the United States and other countries.Linux is the registered trademark of Linus Torvalds in the U.S.and other countries.PCIe is a registered trademark of PCI-SIG Corporation.Ubuntu and the Ubuntu logo are registered trademarks of Canonical Ltd.Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.