當前位置：首頁 > 報告詳情

1B-103_Accelerating HPC Applications with SmartNICs-x-scalesolutions.PDF

上傳人： 2*** 編號：139670 2023-08-27 PDF PDF 22頁 751.84KB

該報告所屬合集： 全球首屆智能網卡高端行業峰會(SmartNICs Summit 2022)嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/22

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《1B-103_Accelerating HPC Applications with SmartNICs-x-scalesolutions.PDF》由會員分享，可在線閱讀，更多相關《1B-103_Accelerating HPC Applications with SmartNICs-x-scalesolutions.PDF（22頁珍藏版）》請在三個皮匠報告上搜索。

1、Accelerating HPC Applications with SmartNICsDonglai DaiChief Engineercontactusx-San Jose,CA April 26-28,2022Outline Motivation Basic Idea for MVAPICH2-DPU Library Design Main Features of MVAPICH2-DPU Library Performance Benefits for Benchmarks and Applications ConclusionSan Jose,CA April 26-28,2022R

2、equirements for Next-Generation Communication Libraries SmartNICs have the potential to take over a wide range of overhead tasks in a variety of applications from the host CPUs in systems Message Passing Interface(MPI)libraries are widely used for parallel and distributed HPC and AI applications in

3、HPC/data centers and clouds Requirements for a high-performance and scalable MPI library:Low latency communication High bandwidth communication Minimum contention for host CPU resources to progress non-blocking collectives High overlap of computation with communication CPU based non-blocking communi

4、cation progress can lead to sub-par performance as the main application has less CPU resources for useful application-level computationSan Jose,CA April 26-28,2022Can MPI Functions be Offloaded?The area of network offloading of MPI primitives is still nascentState-of-the-art BlueField DPUs bring mor

5、e compute power into the networkExploit additional compute capabilities of modern BlueField DPUs into existing MPI middleware to extractPeak pure communication performance Overlap of communication and computationSan Jose,CA April 26-28,2022Outline Motivation Basic Idea for MVAPICH2-DPU Library Desig

6、n Main Features of MVAPICH2-DPU Library Performance Benefits for Benchmarks and Applications ConclusionSan Jose,CA April 26-28,2022Overview of BlueField-2 DPUConnectX-6 network adapter with 200Gbps InfiniBandSystem-on-chip containing eight 64-bit ARMv8 A72 cores with 2.7 GHz each16GB of memory for t

7、he ARM coresMVAPICH2-DPU MPI library is designed to take advantage of DPUs and accelerate scientific applicationsSan Jose,CA April 26-28,2022Basic Idea for MPI offloading to DPU Use of generic and optimized asynchronous progress threads on ARM cores for Point-to-point CollectivesRMA operationsP0MPI_

8、Wait/MPI_WaitallNon-Blocking P2P/Collective/RMA OperationP1P2Bluefield P3ComputationCommunicationControl MessagesCommunication Process/ThreadSan Jose,CA April 26-28,2022High Level Design for MPI offloading to DPU Better support for critical collective communication operationsEnable offloading to the

9、 Bluefield ARM SoC Performance enhancing algorithm selection based on the communication characteristics of application High-Performance Bluefield RDMA-Capable HCASoftware Kernel based Collective Offload on Programmable ARM CoresMVAPICH2-DPU LibraryIn-Network Collective CommunicationDesigns for Data

10、on CPUDesigns for Data on DPUOffload Decision LogicHigh-Performance SHARP-Enabled SwitchHardware(ASIC)based Collective Offload on SHARP-Enabled SwitchesYesNoGeneric Non-OffloadedCollective OperationsSan Jose,CA April 26-28,2022Outline Motivation Basic Idea for MVAPICH2-DPU Library Design Main Featur

11、es of MVAPICH2-DPU Library Performance Benefits for Benchmarks and Applications ConclusionSan Jose,CA April 26-28,2022MVAPICH2-DPU Library 2022.02 ReleaseImplemented by X-ScaleSolutionsBased on MVAPICH2 2.3.6,compliant to MPI 3.1 standardSupports all features available with the MVAPICH2 2.3.6 releas

12、e(http:/mvapich.cse.ohio-state.edu)Novel framework to offload non-blocking collectives to DPUOffloads non-blocking collectives(MPI_Ialltoall,MPI_Iallgather,MPI_Ibcast,etc)to DPUUp to 100%overlap of computation with non-blocking collectiveAccelerates scientific applications using non-blocking collect

13、ivesSan Jose,CA April 26-28,2022Outline Motivation Basic Idea for MVAPICH2-DPU Library Design Main Features of MVAPICH2-DPU Library Performance Benefits for Benchmarks and Applications ConclusionSan Jose,CA April 26-28,2022Total Execution Time with osu_Ialltoall(32 nodes)32 Nodes,32 PPN32 Nodes,16 P

14、PN0.00100.00200.00300.00400.00500.00600.00700.00800.0064K128K256K512KComm.Time(ms)Message SizeTotal Execution Time,BF-2(osu_ialltoall)MVAPICH2MVAPICH2-DPU0.00500.001,000.001,500.002,000.002,500.003,000.0064K128K256K512KComm.Time(ms)Message SizeTotal Execution Time,BF-2(osu_ialltoall)MVAPICH2MVAPICH2

15、-DPU22%21%20%22%21%17%23%17%San Jose,CA April 26-28,202201020304050607080901001K2K4K8K16K 32K 64K 128K256K512KOverlap(%)Message SizeOverlap(osu_ialltoall)MVAPICH2MVAPICH2-DPU32 Nodes,32 PPN32 Nodes,16 PPN100%Delivers peak overlap01020304050607080901001K2K4K8K16K 32K 64K 128K256K512KOverlap(%)Message

16、 SizeOverlap(osu_ialltoall)MVAPICH2MVAPICH2-DPU98%Overlap Between Computation&Communication with osu_Ialltoall(32 nodes)San Jose,CA April 26-28,2022Total Execution Time with osu_Iallgather(16 nodes)16 Nodes,32 PPN16 Nodes,1 PPN00.511.522.533.5128K256K512K1MOverall Time(ms)Message SizeTotal Execution

17、 Time,BF-2(osu_iallgather)MVAPICH2MVAPICH2-DPU024681012141618202K4K8K16KOverall Time(ms)Message SizeTotal Execution Time,BF-2(osu_iallgather)MVAPICH2MVAPICH2-DPU41%29%36%84%38%39%30%57%San Jose,CA April 26-28,2022Overlap Between Computation&Communication with osu_Iallgather(16 nodes)Delivers peak ov

18、erlap16 Nodes,1 PPN0102030405060708090100128K256K512K1MOverlap(%)Message SizeOverlap(osu_iallgather)MVAPICH2MVAPICH2-DPU97%San Jose,CA April 26-28,2022Total Execution Time with osu_Ibcast(32 nodes)32 Nodes,16 PPN32 Nodes,1 PPN05101520252M4M8M16MComm.Time(ms)Message SizeTotal Execution Time,BF-2(osu_

19、ibcast)MVAPICH2MVAPICH2-DPU0510152025303540452M4M8M16MComm.Time(ms)Message SizeTotal Execution Time,BF-2(osu_ibcast)MVAPICH2MVAPICH2-DPU57%47%58%38%58%48%46%8%San Jose,CA April 26-28,2022Overlap Between Computation&Communication with osu_Ibcast(32 nodes)32 Nodes,16 PPNDelivers peak overlap0102030405

20、0607080901002M4M8M16MOverlap(%)Message SizeOverlap(osu_ibcast)MVAPICH2MVAPICH2-DPU+30%32 Nodes,1 PPN01020304050607080901002M4M8M16MOverlap(%)Message SizeOverlap(osu_ibcast)MVAPICH2MVAPICH2-DPU+38%San Jose,CA April 26-28,2022P3DFFT Application Execution Time(32 nodes)Benefits in application-levelexec

21、ution time0510152025Latency(s)Grid SizeMVAPICH232 Nodes,32 PPN18%21%16%05101520253035Latency(s)Grid SizeMVAPICH232 Nodes,16 PPN12%12%14%San Jose,CA April 26-28,2022Outline Motivation Basic Idea for MVAPICH2-DPU Library Design Main Features of MVAPICH2-DPU Library Performance Benefits for Benchmarks

22、and Applications ConclusionSan Jose,CA April 26-28,2022ConclusionEfficient MVAPICH2-DPU MPI library utilizes the BlueField DPU to progress MPI non-blocking collective operationsProvides up to 100%overlap of communication and computation for non-blocking Alltoall,Allgather,Bcast,etcReduces the total

23、execution time of P3DFFT application up to 21%on 1,024 processes Work in progress for MVAPICH2-DPU library to efficiently offload more types of non-blocking collective operations to DPUs San Jose,CA April 26-28,2022Exhibition and Live DemoIf you are interested in knowing more details,please come and visit our exhibit booth#8 next doorLive demo on MVAPICH2-DPU library at our booth6-7 pm,today1-2 pm,tomorrowThank You!Donglai Daicontactusx-http:/x-

相關圖表

本文介紹了MVAPICH2-DPU庫，這是一個針對科學計算應用加速的高性能MPI庫。該庫由X-Scale Solutions實現，基于MVAPICH2 2.3.6，并兼容MPI 3.1標準。MVAPICH2-DPU庫通過將非阻塞集體操作（如MPI_Ialltoall, MPI_Iallgather, MPI_Ibcast等）卸載到BlueField DPU，實現了高達100%的通信與計算重疊，顯著提高了通信效率。關鍵數據包括： 1. 在osu_ialltoall基準測試中，MVAPICH2-DPU使通信時間減少了約21%-41%，消息大小為1KB至16KB。 2. 在osu_iallgather基準測試中，MVAPICH2-DPU實現了97%的通信與計算重疊，通信時間減少了約29%-84%。 3. 在osu_ibcast基準測試中，MVAPICH2-DPU使通信時間減少了約38%-58%，消息大小為2MB至16MB。 4. 在P3DFFT應用中，MVAPICH2-DPU將執行時間減少了約12%-18%。總結來說，MVAPICH2-DPU庫有效利用了BlueField DPU來加速MPI的非阻塞集體操作，提供了高達100%的通信與計算重疊，顯著降低了總體執行時間，提高了科學計算應用的性能。

"如何加速HPC應用？" "MVAPICH2-DPU庫的主要特點是什么？" "使用MVAPICH2-DPU庫能帶來哪些性能優勢？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站