《5-4 基于圖神經網絡的互聯網金融欺詐檢測.pdf》由會員分享,可在線閱讀,更多相關《5-4 基于圖神經網絡的互聯網金融欺詐檢測.pdf(44頁珍藏版)》請在三個皮匠報告上搜索。
1、基于圖神經網絡的互聯網金融欺詐檢測敖 翔中科院計算所副研究員2022-05-21|01后疫情時代的后疫情時代的互聯網金融欺詐互聯網金融欺詐02用圖神經網絡用圖神經網絡為什么行?為什么行?03用圖神經網絡用圖神經網絡有什么坑?有什么坑?04用圖神經網絡用圖神經網絡有什么趨勢?有什么趨勢?目錄目錄CONTENT|后疫情時代的互聯網金融欺詐01|金融欺詐威脅進一步增加FIS reports,the dollar volume of attempted fraudulent transactions rose 35%in April,2020,from a year earlier.Across al
2、l financial products,fraud rates rose by 33%in April 2020,compared with previous monthly averages.新冠疫情的爆發以前所未有的方式沖擊了世界經濟,進一步增加了遭受金融欺詐的風險。Fraud rate rises 33%during Covid-19 lockdown.https:/ fraud attempts rise during the coronavirus crisis.https:/ Products and Services如傳統銀行、保險、證券業務如電商平臺、生活服務平臺突出金融業務
3、遠程銀行(Remote banking)難以獲得全面的客戶身份驗證信息,導致信用欺詐頻發。阿里巴巴開始在其平臺上向小企業主提供低息貸款,如生意貸。優步(Uber)在新冠第一波大流行期間為世界各地的弱勢社區提供了 1000 萬次免費乘車、用餐和送貨服務。傳統欺詐檢測任務的挑戰|類別不平衡概念漂移數據不可信傳統欺詐檢測Trustworthy?互聯網欺詐檢測任務的挑戰|類別極度不平衡對抗攻擊標注稀缺互聯網欺詐檢測互聯網欺詐檢測任務的挑戰|類別極度不平衡對抗攻擊標注稀缺互聯網欺詐檢測特征發現難互聯網欺詐檢測任務的挑戰|類別極度不平衡對抗攻擊標注稀缺互聯網欺詐檢測特征發現難樣本價值敏感分布外樣本特征學習
4、的指導信號弱用圖神經網絡為什么行?02|欺詐檢測數據的演化|結構化數據半結構化數據非結構化數據量化表格訪談表格XML文件文本聲音視頻遙感數據欺詐檢測模型所使用的數據類型和案例X.Zhu and X.Ao et al.,“Intelligent Financial Fraud Detection Practices in Post-Pandemic Era.”The Innovation,2021,2(4):100176.欺詐檢測方法的發展|Rule-based systemsTraditional machine learningDeep learning1980s1990s2010s圖神經網
5、絡:一種新的趨勢|User Profile:結構化、靜態、數據缺失、噪聲高User Behavior:時序、動態、高頻User Relation:非歐氏數據異質圖表征udvbca圖神經網絡方法發現欺詐用戶、行為等活動新趨勢:將多源和異構數據轉換為(異質)圖表示,并設計基于 GNN 的方法來發現欺詐活動特征學習能力、半監督學習多源異構數據的整合基于元路徑的特征采樣優化|Qiwei Zhong,Yang Liu,Xiang Ao*,Binbin Hu,Jinghua Feng,Jiayu Tang*,Qing He.Financial Defaulter Detection on Online C
6、redit Payment via Multi-view Attributed Heterogeneous Information Network.In WWW page 785-795,2020.(CCF A)問題:針對欺詐用戶發現有效特征提取難貢獻:將風控規則(知識)以元路徑形式指導節點特征采樣,優化特征提取效果:線上測試,逾期用戶識別召回提升10.19%不同場景的風控規則表示為元路徑同事朋友親屬BobMikeLily商戶商戶A A商戶商戶B B交易轉賬登錄登錄登錄登錄基于元路徑的特征采樣優化|Qiwei Zhong,Yang Liu,Xiang Ao*,Binbin Hu,Jinghua
7、 Feng,Jiayu Tang*,Qing He.Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network.In WWW page 785-795,2020.(CCF A)問題:針對欺詐用戶發現有效特征提取難貢獻:將風控規則(知識)以元路徑形式指導節點特征采樣,優化特征提取效果:線上測試,逾期用戶識別召回提升10.19%同事朋友親屬BobMikeLily商戶商戶A A商戶商戶B B交易轉賬登錄登錄登錄登錄對于節點,根據元路
8、徑采樣的路徑集合Paths based onMeta-pathPaths based onMeta-pathPaths based onMeta-pathPoolingPoolingPoolingMeta-pathAttentionNodeAttentionLinkAttentionConcat路徑編碼器元路徑重要性節點和關系屬性實驗結果|Qiwei Zhong,Yang Liu,Xiang Ao*,Binbin Hu,Jinghua Feng,Jiayu Tang*,Qing He.Financial Defaulter Detection on Online Credit Payment
9、via Multi-view Attributed Heterogeneous Information Network.In WWW page 785-795,2020.(CCF A)數據:阿里巴巴真實數據 用戶數量訓練集:1.38 million(2019/01/01 2019/03/31)測試集:0.51 million(2019/05/01 2019/05/31)與當時最好方法 HACUDAAAI2019 相比,AUC指標提升2.3%,RP0.1指標提升7.4%與當時阿里巴巴線上的方法GBDT相比,AUC指標提升6.2%,RP0.1指標提升16.1%欺詐用戶數量欺詐用戶比例Ours用圖神
10、經網絡有什么坑?03|圖神經網絡核心思想:消息傳遞對于一個目標節點,根據周圍的鄰居通過消息傳遞生成embedding表示GCN、GraphSAGEGAT但是,我們面臨的是類別不平衡圖對于一個目標節點,根據周圍的鄰居通過消息傳遞生成embedding表示但是,我們面臨的是類別不平衡圖對于一個目標節點,根據周圍的鄰居通過消息傳遞生成embedding表示PC-GNN:面向類別不平衡圖的采樣GNN挑戰:由于欺詐用戶占比低,不利于圖神經網絡GNN消息傳遞貢獻:改造GNN近鄰采樣機制,緩解類別不平衡問題效果:提升欺詐用戶檢測AUC 2.6%3.6%Yang Liu,Xiang Ao*,Zidi Qin,
11、Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)ewudgfvhbcaewudgfvhbcaewudvbcaewudfvbadvauw Pick-1 Choose-1 Pick-2Choose-2bvaucPick:根據標簽類別分布占比,進行全局采樣Choose:設計自適應的距離判別函數,進行局部結構優化PC-GNN:全局平衡采樣Yang
12、 Liu,Xiang Ao*,Zidi Qin,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)Pick:根據標簽類別分布占比,進行全局采樣LF()=3LF()=6ewudgfvbcaudvbcaLabel FrequencySampling ProbabilityPC-GNN:局部結構調整Yang Liu,Xiang Ao*,Zidi
13、Qin,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)Choose:對少數類節點實施自適應過采樣,對所有節點實施自適應降采樣少數類別過采樣:所有類別降采樣:對于少數類別:對于多數類別:udvbcaudvbcabvaucPC-GNN:整體網絡結構Yang Liu,Xiang Ao*,Zidi Qin,Jianfeng Chi,Jinghu
14、a Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)ewudgfvhbcaewudgfvhbcaewudvbcaewudfvbadvauw Pick-1 Choose-1 Pick-2Choose-2bvaucRelation-1Relation-2FraudBenignPickChooseAggregatePC-GNN模型訓練Yang Liu,Xiang Ao*,Zidi Qi
15、n,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)PC-GNN損失函數距離函數訓練GNN網絡參數訓練損失函數實驗:數據集Yang Liu,Xiang Ao*,Zidi Qin,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced
16、 Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)公開測試集-評論欺詐檢測數據集YelpChi:Yelp評論(Hotel&Restaurant)Amazon:Amazon商品評論(Musical Instrument)真實數據集-阿里巴巴真實數據M7:2018/07/01-2018/07/31用戶數據M9:2018/09/01-2018/09/30用戶數據 Train/Valid/Test:40%/20%/40%實驗:對比方法和評價指標Yang Liu,Xiang Ao*,Zidi Qin,Jian
17、feng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)對比方法GCN,GAT:傳統GNNs方法DR-GCN:對偶正則GCN緩解類別不平衡GraphSAGE,GraphSAINT:基于采樣的GNNsGraphConsis,CARE-GNN:基于圖的欺詐檢測SOTAPC-GNNP,PC-GNNC:PC-GNN的消融變體評價指標F1-macro:macro
18、average of F1-score of each classAUC:Area Under the ROC CurveGMean:Geometric Mean of True Positive Rate(TPR)and True Negative Rate(TNR)實驗結果Yang Liu,Xiang Ao*,Zidi Qin,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3
19、168-3177,2021.(CCF A)RQ1:PC-GNN是否可以戰勝已有的SOTA方法?與CARE-GNNCIKM20比AUC improvement 3.6%5.2%GMean improvement 0.6%3.7%實驗結果Yang Liu,Xiang Ao*,Zidi Qin,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)R
20、Q2:消融實驗,PC-GNN的各個模塊對預測的貢獻?Pick:全局采樣是基礎Choose:更進一步提升得益于局部結構優化實驗結果Yang Liu,Xiang Ao*,Zidi Qin,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.Pick and Choose:A GNN-based Imbalanced Learning Approach for Fraud Detection.In WWW,page 3168-3177,2021.(CCF A)RQ3:參數敏感性實驗 RQ4:Pick和Choose模塊與傳統GNN結合的效果AO-GNN:面向類別不平衡
21、圖的AUC最大化GNN|優化AUC 的訓練傾向于獲得一個既能區分良性節點又能區分欺詐節點的模型Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)轉化為鞍點搜索問題欺詐者可能主動偽裝導致圖結構已被“污染”|Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi
22、,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)“污染”的圖結構:欺詐節點經常通過與其他節點交互來混淆他們的身份欺詐者可能主動偽裝導致圖結構已被“污染”|Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud D
23、etection.In WWW,page 1311-1321,2022.(CCF A)“污染”的圖結構:欺詐節點經常通過與其他節點交互來混淆他們的身份是一個拓撲結構優化器結構的優化會帶來預測結果的改變|Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)預測結果的改變會引起AUC的變化|Mengda Huan
24、g,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)思路:向著AUC增大的方向優化圖拓撲結構拓撲結構優化策略|Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Ne
25、twork for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)拓撲結構優化策略:Environment:用環境GNN編碼圖的表示Action:“剪斷”or“不剪”某一條邊Reward:剪邊后的圖用GNN分類器預測得到的AUC-ROC變化值GNN分類器|Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,p
26、age 1311-1321,2022.(CCF A)GNN分類器訓練:1.用剪邊策略對圖結構進行“凈化”2.在“凈化”的圖上,訓練GNN分類器的參數,訓練的Loss為AUC lossAO-GNN網絡結構|Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)實驗結果|公開測試數據集YelpChi:Yelp欺詐評
27、論Amazon:Amazon商品欺詐評論Books:圖書虛假訂單數據集Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)實驗結果|RQ:AUC變化曲線RQ:何種類型的邊被剪掉最多RQ:強化學習的效率驗證Mengda Huang,Yang Liu,Xiang Ao*,Kuan Li,Jianfeng Chi,
28、Jinghua Feng,Hao Yang,Qing He.AUC-oriented Graph Neural Network for Fraud Detection.In WWW,page 1311-1321,2022.(CCF A)用圖神經網絡有什么趨勢04|未來趨勢如何防御潛在的對抗攻擊?KDD2022 ,“場景依賴”,|“對抗攻擊與防御”如何在新場景中快速自適應地學習特征?DASFAA2022如何更好利用無監督數據?KDD2022|,“預訓練模型”|總結|1.互聯網金融欺詐檢測,用圖神經網絡為什么行?多源異構數據的整合天然兼容半監督學習場景2.互聯網金融欺詐檢測,用圖神經網絡有什么坑?消息傳遞機制過平滑不利于類別不平衡學習監督信號缺乏不利于有效特征提取3.互聯網金融欺詐檢測,還有什么趨勢?場景依賴下的高效特征提取欺詐行為動態對抗建模大規模無監督行為數據的有效利用非常感謝您的觀看|