當前位置：首頁 > 報告詳情

基于協作的自解釋 NLP 模.pdf

上傳人：張** 編號：155412 2024-02-15 PDF PDF 38頁 4.77MB

該報告所屬合集： DataFunSummit2023：大模型與AIGC峰會嘉賓演講PPT合集

打包下載報告合集

文檔加載中……請稍候！
如果長時間未打開，您也可以點擊刷新試試。

下載報告到電腦，查找使用更方便

VIP專享文檔

書簽

分享

收藏

已收藏

版權投訴

/38

立即下載

word格式文檔無特別注明外均可編輯修改，預覽文件經過壓縮，下載原文更清晰！

三個皮匠報告文庫所有資源均是客戶上傳分享，僅供網友學習交流，未經上傳用戶書面授權，請勿作商用。

《基于協作的自解釋 NLP 模.pdf》由會員分享，可在線閱讀，更多相關《基于協作的自解釋 NLP 模.pdf（38頁珍藏版）》請在三個皮匠報告上搜索。

1、On Cooperative Self-Explaining NLP Models2023/06/17Jun WangSummary Interpretability Cooperative Self-Explaining Framework and Spurious Correlations Our insights on Cooperative Games and Solutions for Spurious Correlations in RNP Future workInterpretability-Growing concern about the model interpretab

2、ility in various critical fieldsInterpretability in the LLM EraGPT-3&Beyond,Christopher Potts,Stanford,2023/01Generating post-hoc explanations that seem highly plausibleLLMs remain a huge black box,which may pose a problem for scenarios that require an interpretable underlying mechanism to ensure tr

3、ustworthiness.The processing cost and speed for LLMs when handling vast amounts of data,such as user reviews on large-scale websites,also pose a challenge.Expectation on Interpretability Both faithful(reflecting the models actual behavior)and plausible(aligning with human understanding)Various Metho

4、ds for Interpretability Post-hoc methods require additional surrogate models to explain the existing models being interpreted difficult to ensure faithfulness,especially for black-box models Ante-hoc models(self-explaining)incorporate interpretability into the model design and ensure faithfulness mo

5、del predictions are based on the informative explanations generated by the model itself.Lei et al.2016,Rationalizing Neural Predictions,EMNLP-2016Cooperative Self-Explaining Framework:RNP and Spurious CorrelationsCooperative Self-Explaining Framework:RNP Rationalizing Neural Predictions(RNP)utilizes

6、 a cooperative game between an explainer(or generator)and a predictor the explainer identifies a human-interpretable subset of the input(referred to as rationale)and passes it to the subsequent predictor for making predictions Significant Advantage:Certification of Exclusion Guarantees that any unse

7、lected part of the input has no contribution to the prediction Ensures the maintenance of faithfulnessLei et al.,Rationalizing Neural Predictions,EMNLP-2016Liu et al.,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022Objective of RNPCooperative PredictionRegularizerSparsitycoherency Ratio

8、nale selection:Gumbel-softmax or Reinforcement LearningLei et al.,Rationalizing Neural Predictions,EMNLP-2016Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023RNP is domain-agnostic Liu et al.,Decoupled Rationalization with Asymmetric Learning

9、 Rates:A Flexible Lipschitz Restraint,KDD-2023Luo et al.Parameterized Explainer for Graph Neural Network,MeurIPS-2020Yuan et al.,Interpreting Image Classifiers by Generating Discrete Masks,TPAMI,2022Rationales for TextsRationales for GraphsRationales for ImagesSpurious Correlations in RNPRibeiro et

10、al.,Why Should I Trust You?:Explaining the Predictions of Any Classifier,KDD-2016Liu et al.,MGR:Multi-generator Based Rationalization,ACL-2023Feature correlation:come from the generation process of the original datasetDegeneration(Mask correlation):stems from the rationale(mask)selection in Cooperat

11、ive Gamewolves always appear together with snow！a favorable taste often correlates with an appealing aroma!overfit to meaningless but distinguishable selections!Previous methods to alleviate degeneration using additional regularization modules to make use of the full textsuch that the predictor does

12、 not rely entirely on the rationale provided by the generator.Liu et al.,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022Our insights on Cooperative Games and Solutions for Spurious Correlations in RNP Solution 1:Folded Rationalization(FR)for Degeneration Solution 2:Decoupled Rationaliz

13、ation(DR)for Degeneration Solution 3:Multi-Generator Rationalization(MGR)for Spurious CorrelationsSolution 1:Folded Rationalization(FR)for DegenerationOur Observations on RNPUncoordinated learning paces the rationale quality gets better when the learning rate of the predictor is smaller than the gen

14、erator.Liu et al.2022,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022Learning of the Generator is harder!Folded Rationalization(FR)A frustratingly simple but effective method without additional modules folds the two phases of the current rationalization methods into one using a unified

15、 encoding mechanism.Liu et al.,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022RNPFRFoldMutual reinforcement between the generator and predictor The predictor with the unified encoder has a global view of all the rationale candidates by direct access to input text The predictor is enfor

16、ced with the same learning pace as the generatorLiu et al.,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022Results of FRFR improves the F1 score by up to 10.3%Liu et al.,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022(decorrelated)Solution 2:Decoupled Rationalization(DR)fo

17、r DegenerationFurther Observations on RNP is the proportion between the learning rate of the predictor and the learning rate of the generator.The values in these cells are F1 scores.Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Decoupled R

18、ationalization(DR)Asymmetric Learning Rates for cooperative gamesof RNP Directly make the learning rate of the predictor lower than the learning rate of the generator No modification to the basic framework of RNP Opposite to adversarial games they speed up the critic,while we slow down the predictor

19、 A simple heuristic but empirically effective methodLiu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Lipschitz Continuity A useful indicator of model stability and robustness for various tasks Robustness against adversarial examples Convergen

20、ce stability of Discriminator in GANs(adversarial games)Stability of closed-loop systems with reinforcement learning controllers Reflects surface smoothness of functions corresponding to prediction models For unstable models in optimization,their function surfaces usually have some non-smooth patter

21、ns,such as steep steps or spikes where model outputs may make a large change when the input values change by only a small amount Measured by Lipschitz Constant Smaller Lipschitz constant represents better Lipschitz continuityLiu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexi

22、ble Lipschitz Restraint,KDD-2023Lipschitz ConstantLiu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Intuitions and observations on rationale candidates and are rationale candidates selected from two reviews with clear sentiment tendency opposi

23、te to each other If and are uninformative candidates,then their semantic distance(,)is generally small If and are informative candidates,then their semantic distance(,)is relatively larger.Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Corr

24、elation between degeneration and predictors Lipschitz Continuity Small Lipschitz Constant leading to high likelihood of informative rationale candidates Given any two rationale candidatesand that are selected from the inputting texts and with label=0 and=1If the predictor gives high-confidence predi

25、ctions close to the true labels,it means that and are very small,and and 1-approaches 1.If becomes sufficiently low,then(,)will become sufficiently large.Only informative candidates can obtain large(,),meaning that to achieve large(,),it will inevitably force the generator to select these informativ

26、e candidates as rationales.Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Lower bound of(,)Spectral normalization:Rigid methods Spectral normalization can restrict the Lipschitz constant with some manually selected cutoff values.The rationa

27、le quality is improved.But the prediction performance is impaired.Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Correlation Between Learning Rates and the Lipschitz ConstantLipschitz constant of the predictor exhibits explosive growth when

28、 1.Lipschitz constant are constrained to much smaller values when 1.(The learning rate of the generator fixed at 0.0001)Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023DR gets much smaller Lipschitz constants compared to the RNP.DR:RNP:=1Low

29、er bound Optimizing the predictors parameters would increase 1-,and optimizing the generators parameters would also increase(,).To constrain Lc to be small,it is necessary to make(,)increase faster relative to the increase in 1-.So we slow down the predictor and speed up the generator.(i.e.,1)Result

30、s of DRCompared with FR,there has been further improvement.Liu et al.,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023Analyses on Time efficiency and overfittingLowering the learning rate of the predictor does not slow down the convergence of the train

31、ing process.The training accuracy of RNP grows fast in the beginning when limited rationales have been sampled by the generator,and reflects that the predictor is trying to overfit to these randomly sampled rationales.Although RNP gets a very high accuracy in the training dataset,it does not get acc

32、uracy higher than our method in the development dateset,and also indicates the fact of overfitting.The prediction lossreflects a similar phenomenon.Solution 3:Multi-Generator Rationalization(MGR)for Spurious CorrelationsFeature correlations and degeneration While previous methods might be adept at a

33、ddressing either feature correlations or degeneration,they are typically developed independently,failing to consider both issues simultaneously.We seek to simultaneously solve the two problems.Multi-Generator Rationalization(MGR)First to simultaneously solve the feature correlations and degeneration

34、 problem.Facilitate the predictor to have a broader view of the rationale candidates using multiple generators.Only keep the first generator during inference,which is efficient in terms of both time and resource consumption.Liu et al.,MGR:Multi-generator Based Rationalization,ACL-2023Diverse Trainin

35、g with Separate Learning Rates To facilitate the improvement of the diversity of rationales while guaranteeing the convergence of rationalization models Generators should be different from each other to guarantee that the predictor continuously learns from diverse rationale candidates Different gene

36、rators should be able to achieve the same convergence result Propose separately setting the learning rates of different generators.Learning rate of the i-th generator:i*Learning rate of the predictor:/NLiu et al.,MGR:Multi-generator Based Rationalization,ACL-2023Separate learning rates improve perfo

37、rmance.Keeping only one generator hardly influences the performance,and it indicates that different generators can finally converge to get the same outputs and only one generator is required in inference time.Diversity evolution of rationale candidatesAppearanceAromaPalateDifferent generators achiev

38、e the converged results!Liu et al.,MGR:Multi-generator Based Rationalization,ACL-2023Results of MGRCorrelated BeerAdvocate:MGR achieves an improvement by up to 20.9%as compared to state-of-the-art rationalization methods in terms of F1 score.Liu et al.,MGR:Multi-generator Based Rationalization,ACL-2

39、023Decorrelated BeerAdvocate:MGR achieves performance comparable to state-of-the-art methods,DR and FR,in terms of F1 score.Future work Tackling both types of spurious correlations within a unified framework,founded on causal inference.Expanding insights from cooperative games to other domains,such

40、as GNN.Liu et al.,MGR:Multi-generator Based Rationalization,ACL-2023Reference W Liu,J Wang,H Wang,R Li,Y Qiu,YK Zhang,J Han,Y Zou,Decoupled Rationalization with Asymmetric Learning Rates:A Flexible Lipschitz Restraint,KDD-2023 W Liu,H Wang,J Wang,R Li,X Li,YK Zhang,Y Qiu,MGR:Multi-generator Based Rationalization,ACL-2023 W Liu,H Wang,J Wang,R Li,C Yue,Y Zhang,FR:Folded Rationalization with a Unified Encoder,NeurIPS-2022

相關圖表

本文提出了一種合作自解釋的NLP模型框架，旨在提高模型解釋性和減少偽相關性。作者首先指出當前大型語言模型（如GPT-3）在解釋性上的不足，以及其在處理大量數據時的速度和成本問題。隨后，文章探討了各種解釋性方法，包括后驗解釋和前瞻性解釋，以及如何在模型設計中內嵌解釋性以確保 faithfulness（反映模型實際行為）和 plausibility（與人類理解一致）。關鍵點包括： 1. 合作自解釋框架（RNP）通過解釋者和預測者之間的合作游戲來生成可解釋的預測。 2. RNP確保了未選擇的部分輸入對預測沒有貢獻，從而維護了faithfulness。 3. 針對偽相關性問題，提出了三種解決方案：Folded Rationalization (FR)、Decoupled Rationalization (DR)和Multi-Generator Rationalization (MGR)。 4. FR通過將解釋生成和預測整合到一個統一的編碼機制中來簡化當前的理性化方法。 5. DR通過使預測器的學習率低于生成器的 learning rate 來直接減輕偽相關性。 6. MGR通過使用多個生成器來同時解決特征相關性和偽相關性問題，并在推理階段只保留第一個生成器。文章還討論了學習率與Lipschitz常數之間的關系，指出適當的學習率設置可以提高模型穩定性。實驗結果顯示，這些解決方案在各種數據集上均取得了顯著的F1分數提升，證明了其有效性。引用數據： - RNP FR提高了F1分數高達10.3%。 - DR進一步改善了性能，與FR相比有顯著提升。 - MGR在某些數據集上實現了高達20.9%的性能提升。

"如何解釋大型語言模型GPT-3的決策過程？" "合作自我解釋的NLP模型如何提高預測的準確性？" "多生成器理性化方法如何解決數據中的偽相關問題？"

相關報告

聯系我們

0731-84720580
sgpjbg002
工作日 9:30 - 18:00

關于我們

侵權處理

關于我們

出版物經營許可證
工信部備案號：湘ICP備17000430號-2
公安備案號：湘公網安備43010402001071號

三個皮匠報告專業的行業報告下載站，每日更新，歡迎大家關注！

copyright@2008-2013 長沙景略智創信息技術有限公司版權所有
網站備案/許可證號：湘B2-20190120

客服

小程序

服務號

折疊

午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站