1、AliGraph:?趙 昆?Analysis of Large-scale GNN System Architecture Based on AliGraphContents01目錄AliGraph?02?AliGraph平臺簡介AliGraph Overview01PAI:阿里巴巴機器學習平臺Platform of Artificial Intelligence in AlibabaPAI機器學習平臺特性Features of PAI?https:/ of GNNGNNGNN低=向量空間低=向量空間G=(V,E G=(V,E V V=V)rt)x E=E(g)=V)rt)x E=E(g)N
2、N)ura,)ura,N)tworkN)twork實踐中的挑戰Challenges in Practice數據不規則稀疏性,非結構化,屬性同構,異構,有向,無向DataCleaner規模巨大千億邊、百億點樣本指數級膨脹?TB?CPU?計算難擴展圖算子種類變化多無法結合深度學習?用AD望:一體化G決方EUsers Expect Solutions from Data to Model for GNNA APPPPGNN AlgorithmsGraph&TensorGraph&TensorEngineGraph DataGraph DataLarge-s-ale&omogeneous&eterog
3、eneous Attri,utedAliGraph:云上的“圖+深度學+”平臺A GNN Platform on Alibaba Cloud?Industry Level?PB?End to End?IDE?For Research and Production系統架構與核心技術System Architecture and Key Technologies02GNN編程范式GNN Programming Paradigmhop=1hop=21.Sampling2.Vectorization3.Aggregate4.CombineVertex IdVertex AttributesEdge A
4、ttributesEdge IdAggregate outputVertex outputV1V2E1E2V0一體化系統實現An Integrated System Implementation for GNNServer 0Server Server 2Server nTensor EngineGraph EngineData BridgePythonTensorFlow&PyTorchGraph QuerySamplerAggregatorCombinerLarge-scaleHeterogeneousAttributedGNNRPCFriendlyFlexibleRobustEffect
5、ive簡潔的用戶接4Laconic User Interface點3相2g=Graph()g.add_nodes(“node_source”,“node_format”).add_edges(“edge_source”,“edge_format”).init().2.3.4.Graph B1ildi.gg.V(“user”).shuffle().batch(512).outE(“user-click-item”).sample(10).by(“edge_weight”).run().2.3.4.Graph Sampli.g高效的圖構建過程Effective Graph Building SER
6、VER iData QueueData QueueData QueueLocalPROCESSINGTHREADPROCESSINGTHREADPROCESSINGTHREADData QueueData QueueData QueuePROCESSINGTHREADPROCESSINGTHREADPROCESSINGTHREADLOADING THREADRPCRPCLocalLOADING THREADLOADING THREADRPCRPCSERVERjLOADING THREADBatch Reading Batch Reading PartitioningPartitioningPa
7、rtitioningPartitioningBatch Reading Batch Reading Graph ParsingGraph ParsingGraph ParsingGraph ParsingGraph ParsingGraph ParsingUpdate StorageUpdate StorageUpdate StorageUpdate StorageUpdate StorageUpdate Storage多跳采SG速R制Cache to Empower Multi-Hop Graph SamplingCa0hing 2 hopk-1 hopk hopS2rv2r 0S2rv2r
8、 1S2rv2r n多跳緩存P速一倍 圖頂TU度數V合)oA2r-laA分布 LC頂TU重要性)2r02ntag2 of Ca0h21 V2rti02sTim2 Cost(ms AliGraphAliGraphRan1om Ca0h2Ran1om Ca0h2(R-Ca0h2(R-Ca0h2可擴展的算子類型Flexible and Pluggable OperatorsUser nterfaceUser nterfaceDistributeDistributeRuntimeRuntimeDistributeDistribute StorageStorageProgrammingProgrammi
9、ng nterfacenterfaceProgramming Programming nterfacenterfaceOperatorOperatorOperatorOperatorOperatorOperatorOperatorOperator系n數USystem Performance?GNNi法P發周期pe80%?模型訓m時長pe%0%?N支T的Gr/ph Em0211583i法10余種每天節d存儲300TB mo12l每天節diB萬CPU時 mo12lb億邊lG構建2分鐘多跳批量G采樣C毫gl內置G采樣i法10余種E擴展的i子種類用戶raoh已落地的應用Applications新零售人際關系淘寶推薦搜索反作弊線上支付淘寶搜索網絡安全系統的發展方向Evolution Direction of the System生態多樣性生態多樣性系統延展性系統延展性GNNGNN算法驅動算法驅動THANKSTHANKS!THANKS!