《Energy Aware Runtime (EAR) 能否成為歐洲 AI HPC 數據中心能源優化的賭注?.pdf》由會員分享,可在線閱讀,更多相關《Energy Aware Runtime (EAR) 能否成為歐洲 AI HPC 數據中心能源優化的賭注?.pdf(12頁珍藏版)》請在三個皮匠報告上搜索。
1、OCP Regional Summit|April 19,2023|Prague,CZCould Energy Aware Runtime(EAR)be the European bet for AI/HPC Data Centerenergy optimization?Julita Corbalan,Associated professor,Barcelona Supercomputing Center(BSC)/Polytechnic University of Catalonia(UPC)Why energy management?Be cost-effective:Operationa
2、l costsBe eco-responsible:Limited resourcesBe energy-efficient:Understand and optimizeSystem software for Energy management in HPC/AI Data CentersEAR includesSystem monitoringJob accountingJob optimizationSystem optimizationEAR targets energy/power but includes many other metrics for understandingWh
3、ats EARMain EAR featuresJob/SystemoptimizationJob accountingand dynamicmonitoringSystemmonitoring Extensible monitoring:Power,CPU frequency,temperature,etc Multiple sources of data:inband IPMI,GPU,RAPL Intel,AMD,NVIDIA Extensible report:MariaDB,Postgres,Sysfs,Prometheus(wip)Basic alerts for power an
4、d temperature Powerful non-intrusive application monitoring and characterization Runtime signatures:Performance and Power metrics CPI,GB/s,Power,Frequency Dynamic CPU/Memory/GPU frequency optimization Node and cluster powercapJob energy optimization processLoop detection/Time guidedRuntime signature
5、computationClasificationApply energypolicies and modelsCPU/Memory/GPU frequencyselectionReport runtimesignaturesbatch myapp.shSpecificFrequencysettingsIOGPU boundGPU idleCPU busy waitingCPU-GPUcomputational100%runtimeNo application modificationsPerformance and power metrics for energy/performance an
6、alysis and optimizationTimePower Node,DRAM,CPU,GPUFrequency:CPU,Memory,GPU Cycles per InstructionsMemory bandwidth(GB/sec)GPU activity:Utilization,Memory utilizationIO MB/secMPI activityRuntime signature includesSystem power/energy controlHierarchical architectureState-less designget_power/send_sett
7、ings APILow-level knobs based on pluginsSystem,Workload and Job Analysis with EARJob Data visualizationEAR is an European Open Source solution for Energy management in HPC/AI Data centersEAR4.2 last public release Our roadmap includesBe as much extensible and compatible with other tools as possible:optimization libraries,schedulers,monitoring systems,etcMore architectures(working in ARM)More use cases(workflows)Still lot of work to do,from software side,to become energy-efficient data centers!Do you want to know more,join the technical demo!SummaryOCP Regional Summit|April 19,2023|Prague,CZ