《預訓練圖神經網絡:以數據為中心的一些思考.pdf》由會員分享,可在線閱讀,更多相關《預訓練圖神經網絡:以數據為中心的一些思考.pdf(25頁珍藏版)》請在三個皮匠報告上搜索。
1、1A Data-Centric Perspective onPre-Training Graph Neural NetworksJiarong XuFudan U2Negative Transfer in Graph Pre-Training Pre-training Graph Neural Networks shows potential to be apopular strategy for learning from graph data without costly labels.However,in practice,graph pre-trained models can lea
2、d to negative transfer on many downstream tasks.1 Qiu et al.GCC:Graph Contrastive Coding for Graph Neural Network Pre-Training.KDD20.Almost 45.5%of downstream taskssuffer from the negative transfer!Results of direct fine-tuning on graph pre-trained model 1.3Research Roadmap1.When to Pre-Train Graph
3、Neural Networks?2.Better with Less:Data-Active Pre-training of Graph Neural Networks4When to Pre-train GNNs?To avoid the negative transfer,recent efforts focus on what to pre-train and how to pre-train.However,the transferability from pre-training data to downstream data cannot be guaranteed in some
4、 cases.It is a necessity to understand when to pre-train,i.e.,under what situations the“graph pre-train and fine-tune”paradigm should be adopted.5Existing methods Enumerate“pre-train and fine-tune”attemptsGraph metrics to measure the similarityProposed W2PGNNAnswer when to pre-train GNNs from a grap
5、h data generation perspective before“pre-training and fine-tuning”Key insight:Downstream data can benefit from pretraining,if it can be generated with high probability by a graph generator that summarizes pre-training dataWhen to Pre-train GNNs?6Input space-Node-level:ego-networks-Graph-level:graphs
6、(e.g.,molecules)Generator space-:a graphon(i.e.,generator)fitted from a set of(sub)graphs with similar patterns!-Width of:each!is assigned with a corresponding weight!-:a weighted combination of generator basis(!,!)=!#$!-generator space:Possible downstream spaceHow to Obtain an Appropriate Graph Gen
7、erator?all weighted combinations =(!,!)!,!-All the graphs produced by the generators in the generator space7Application Cases of W2PGNN Use case 1:Provide the application scope of a graph pre-trained model Use case 2:Estimate the feasibility of performing pre-training for a downstream Use case 3:Sel
8、ect pre-training data to benefit the downstream8Feasibility of Pre-TrainingDefinition feasibility of performing pre-training:highest probability of the downstream data generated from a generator in the generator space10Feasibility of Pre-TrainingDefinition feasibility of performing pre-training:high
9、est probability of the downstream data generated from a generator in the generator space However,exhausting all possibleto find the infimum is impractical.11 Reduce the search space of graphon basisChoose Graphon Basis to Approximate Feasibility-Integrated graphon basisWe estimate all graphs into a
10、graphon.-Domain graphon basisWe split graphs from different domains,and estimate graphons of each split.-Topological graphon basisWe split graphs according to their topology,and estimate graphons of each split.12 Reduce the search space of graphon basisChoose Graphon Basis to Approximate Feasibility
11、-Integrated graphon basisWe estimate all graphs into a graphon.-Domain graphon basisWe split graphs from different domains,and estimate graphons of each split.-Topological graphon basisWe split graphs according to their topology,and estimate graphons of each split.ApproximatedFeasibility:whereis the
12、 reducedsearch space,is learnable parameter.16Experimental Results Evaluate FeasibilityEvaluation metric:Pearson correlation coefficient between best downstream performance and estimated pre-training feasibilityThe feasibility estimated by W2PGNN achieve the highest overall ranking in most cases!17E
13、xperimental Results Evaluate FeasibilityEstimated feasibility(in x-axis)versus the best downstream performance(in y-axis)of all pairs A strong positive correlation between estimated pre-training feasibility and the best downstream performance!18Experimental Results Pre-Training Data SelectionObserva
14、tions:Pre-training data selected by W2PGNN ranks first Using all pre-training data is not always a reliable choiceWe compare downstream performance of pre-training data selected by different strategies:When to Pre-Train Graph Neural Networks?From Data Generation Perspective!In Proceedings of the 29t
15、h ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD23)19Research Roadmap1.When to Pre-Train Graph Neural Networks?2.Better with Less:Data-Active Pre-training of Graph Neural Networks20The Curse of Big Data Phenomenon First,we find that scaling pre-training samples does n
16、ot result in a one-model-fits-all increase in downstream performance.Question:Is a massive amount of input data really necessary,or even beneficial,for pre-training GNNs?21The Curse of Big Data PhenomenonQuestion:Is a massive amount of input data really necessary,or even beneficial,for pre-training
17、GNNs?Second,we observe that adding input graphs does not improve and sometimes even deteriorates the generalization.22The Curse of Big Data PhenomenonQuestion:Is a massive amount of input data really necessary,or even beneficial,for pre-training GNNs?The curse of big data phenomenon in graph pre-tra
18、ining:More training samples and graph datasets do not necessarily lead to better downstream performance.23Data-Centric Graph Selector Instead of training on massive data,it is more appealing to choose a few suitable samples and graphs for pre-training.-Predictive uncertaintythe level of confidence(o
19、r certainty)-Graph propertiesinformativeness&representativegraphs24Pre-Training Model Co-Evolves with Data Instead of swallowing data as a whole,the pretraining model is encouraged to learn from the data in a progressive way.-Predictive uncertainty feedbackswhat kind of data the model has least know
20、ledge of.-The pre-training model reinforcesitself on highly uncertain data in next training iterations.25Data-Active Graph Pre-Training(APT)Graph selector and pretraining model actively cooperate with each other-Graph selector recognizes the most instructive data for the model.-Pre-training model is
21、 well-trained and in turn provides better guidance for the graph selector.26Datasets for Pre-Training and Testing Pre-training data:11 datasets from 3 domains Downstream data:13 datasets from 7 domains27Experiments:Node ClassificationOur model beats the graph pre-training competitor by an average of
22、+9.94%and+17.83%under freezing and fine-tuning mode respectively.28Experiments:Graph ClassificationOur model is+7.2%and+1.3%on average better than the graph pre-training backbone model under freezing and fine-tuning mode.30 Attention the negative transfer when pre-training GNNs!An answer of when to pre-train GNNs Curse of big data phenomenon in graph pre-training Wide application cases:Provide application scope of a graph pre-trained model Measure the feasibility of performing pre-training Select pre-training dataTake-home MessagesContact:Homepage:https:/galina0217.github.io