《HC2022.Mediatek.EricbillWang.v08.pptx.pdf》由會員分享,可在線閱讀,更多相關《HC2022.Mediatek.EricbillWang.v08.pptx.pdf(23頁珍藏版)》請在三個皮匠報告上搜索。
1、INTERNAL USEINTERNAL USEDimensity 9000 A Flagship Smartphone SoCPresenter:Mediatek Ericbill WangCo-author:Arm Stefan Rosinger,Saurabh Pradhan1INTERNAL USE22021 Copyright MediaTek Inc.CPU1x Arm Cortex-X2 3.05GHz3x Arm Cortex-A710 2.85GHz4x Arm Cortex-A510 1.8GHz 8MB L3+6MB system-level cacheAPU4 Perf
2、ormance cores2 Flexible coresDisplay WQHD+144Hz/FHD+180HzHDR10+AdaptiveConnectivity2x2 Wi-Fi 6EBT5.3/Bluetooth LE audioGPS(L1/L5),Galileo(E1/E5a),Glonass,BeiDou(B1i/B1c/B2a),NavIC,QZSSGPUArm Mali-G710 MC10Memory4-channel LPDDR5x 7500MbpsCamera 320MP capture3-core 3-exp HDR-ISP32MP+32MP+32MP triple c
3、am4K 3-exp video HDRAI-Video architectureVideo Decoder:8K30,AV1/VP9/H.265/H.264Encoder:8K24,H.265/H.264Modem 5G Release 16DL 3CC 300MHz up to 7GbpsUL 2CC with R16 UL enhancementMediaTek 5G UltraSave 2.0TSMC 4nmINTERNAL USEWhat Makes a Great Smartphone SoCDisplay,Camera&Gaming drive higher performanc
4、eUser demands responsive and sustainable operation experienceThin&Light device limit thermal power budget&battery sizeRequired High Performance AND Low Power Processors4INTERNAL USECPU Challenge:One Size Does Not Fit AllWorkload:1)Short-burst 2)Sustainable 3)Daily UseChallenge:1)Peak Perf 2)Perf&Pow
5、er 3)Low PowerBenchmarkApp LaunchGaming(1hr)5INTERNAL USELearnings on Tri-gear CPU Architecture-1Usable Tri-gear Power Efficiency Curve6PerformanceEnergy per OperationMore EfficientLessEfficientMin-gear CPUMid-gear CPUMax-gear CPUEnergy per OperationPerformanceINTERNAL USELearnings on Tri-gear CPU A
6、rchitecture-2Energy Aware SchedulerEvent Based Operating Frequency Decision7Energy Aware SchedulerMax-gearMid-gearMin-gearTask Wait QueueNew TaskSleepRunQueueRunQueueRunQueuePer die power modelOperating FrequencyINTERNAL USECPU Highlights vs.Dimensity 1200Arm Cortex-X2+40%integer performance over Ar
7、m Cortex-A78Arm Cortex-A510+35%integer performance over Arm Cortex-A55-50%CPU power iso-performanceGeekbenchv5 single-thread 1278(+36%),multi-thread 4400(+33%)DynamIQ Shared Unit equipped 8MB L3$Cortex-A510Cortex-A510Cortex-A710512KB L2Cortex-X21MB L2Cortex-A510Cortex-A510Shared 512KB L2$Shared VPUS
8、hared 512KB L2$Shared VPU8Cortex-A710512KB L2Cortex-A710512KB L2INTERNAL USEPerformance:Arm Cortex-X2+16%over Arm Cortex-X1Front-EndDecouple branch prediction from fetchOut-of-Order CoreRemove a pipeline stage at dispatch+30%out of order windowBack-End+33%load/store structure sizeData prefetching en
9、hancementsArm Cortex-X29INTERNAL USEArm Cortex-A510Performance:Arm Cortex-A510+35%over Arm Cortex-A55Merged-Core Microarchitecture10INTERNAL USEArm DynamIQ Shared Unit-110DSU-110 for Cache Coherency&Shared L3$Optimized Ring Transport NetworkBi-directional,Dual-ring2X Bandwidth and 25%Lower LeakageSu
10、pport Partial SRAM&Logic ShutdownCache Partition for QoS11INTERNAL USECPUQoS TechnologyReal system is noisy,mixed critical and non-critical tasksSpeed-up 14%for application launch stress,-5%power on 120fps gameCritical TasksNon-Critical TasksAndroid FrameworkBackground/Foreground/Top-appSchedulerCPU
11、sL3 Cache AllocationCritical/Non-critical TaggingAdaptive Algorithm forOptimal Non-Critical Tasks Ways12INTERNAL USEGPU Highlights vs.Dimensity 12002.2X peak performance from Arm Mali-G77 9 cores to Mali-G710 10 cores1.5X power efficiency from N4 process&new Mali-G710 IPEnable Genshin 60fps,PUBG HDR
12、 90fps13INTERNAL USEGPU Sustained Performance OptimizationLower minimum operating voltage to 0.5V-25%GPU bandwidth by larger GPU&system cache+compression-15%GPU driver loading on CPU by offloading to embedded processor14INTERNAL USEArm Mali-G710Bigger Core:2X Texel&FMA rateRedesigned execution engin
13、e1.2X performance density15INTERNAL USEGaming Improvement vs.Dimensity 120016D9000D1200+20fpsINTERNAL USEAI Applications on SmartphoneClassificationDetectionSegmentationDepth EstimationNoise ReductionSuper ResolutionSpeech RecognitionSpeech TranslationVisual PerceptionImageQualityObjectConstructionS
14、peechApplications17INTERNAL USEAPU Highlights vs.Dimensity 12004.3X performance2.8X power efficiencyEnable flagship AI features on camera capture and sustainable video18INTERNAL USEAPU Feature OverviewCONVEngine1D/2D EngineInternal MemoryBus InterfaceAPUVersatile data type&operators4b/8b/16b integer
15、,FP16,BF16Power efficient MAC arrayEfficient MAC arch.,data reuseIn-APU data exchangingLayer-fusion,flexible tile walkingMinimized DRAM DataData compressionISPDISPInter-Subsys direct communicationDirect control and data interface19INTERNAL USEAPU Dram Bandwidth Reduction Techniques-65%bandwidth by D
16、eeper Layer Fusion on 4K30 Video AINR-24%bandwidth by Tile-based Direct Link with ISP/DISP on FHD60 AISRISPDISPAPUDRAMSensorDataDataPenelISPDISPAPUDRAMSensorPenel20INTERNAL USEAPU Data Type&Mixed PrecisionOptimal data type is application specifice.g.4/8-bit for classification,FP16 for speechSupport mixed precision data type in one network architecture21INTERNAL USE22INTERNAL USEAcknowledgeDimensity 9000 teams for the Outstanding WorkCo-authorsArm:Stefan Rosinger,Saurabh PradhanMediatek:Hugh Mair,Arthur Lin,Ted Lin23