Trustworthy Policy Learning under the Counterfactual No-Harm Criterion.pdf

編號:144895 PDF 34頁 1.70MB 下載積分:VIP專享
下載報告請您先登錄!

Trustworthy Policy Learning under the Counterfactual No-Harm Criterion.pdf

1、Counterfactual No-Harm Criterion:Individual Risk and Trustworthy Policy LearningPeng WuJoint work with Zhi Geng,Yue Liu,Haoxuan Li,and Chunyuan Zheng.Beijing Technology and Business UniversityOctober 17,2023Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion1/34Introduction1Introduction2Sha

2、rp Bounds of the No-Harm Criterion3No-Harm Trustworthy Policy Learning4ExperimentsPeng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion2/34IntroductionBackground1Policy learning determines the individuals who should be treated based on theircovariates,and it is important that humans can trust

3、 a decision made by analgorithm.2A trustworthy algorithm is expected to meet various advanced requirements,including fairness,diversity,explainability,accountability,safety,etc.3In this talk,we discuss the”harmlessness”of policy learning.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion3/

4、34IntroductionWhat is No-Harm?Hippocratic oath:”First do no harm”.Isaac Asimovs Laws of Robotics:”A robot may not injure a human being or,through inaction,allow a human being to come to harm.”Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion4/34IntroductionWhat is No-Harm?Peng Wu(DataFun

5、因果推斷在線峰會 23)Counterfactual No-Harm Criterion5/34IntroductionWhat is No-Harm?A Toy ExampleConsider two policies,a treatment policy that is useful for 70%of patients but will harm 30%of patients.the second policy can be useful for 40%of patients but no harm.The two policies have the same average causa

6、l effect(40%).Clearly,the second policyis preferable.However,if the second policy can be useful for only 30%of patients but no harm.which policy is preferred?the first or the second?Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion6/34IntroductionNotationNotation:Observed data(Xi,Ti,Yi):i

7、=1,.,n,Xi X:covariates;Ti 0,1:binary treatment;Yi 0,1:binary outcome;Yi(1),Yi(0):potential outcomes;XTYY(1)Y(0)1?1?0?0?Either Yi(0)or Yi(1)can be observed for each unit,but not both.Individual treatment effect:Yi(1)Yi(0);Conditional average treatment effect(CATE),(x)=EY(1)Y(0)|X=x,which is the avera

8、ge causal effect in the subpopulation X=x.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion7/34IntroductionIdentifiability of(x)Assumption 1(Strong Ignorability)0 P(T=1|X=x)c(x)0,(x)c(x)0,P(Y0,1|X=x)P(Y1,0|X=x)c(x)0,EY(1)|X=x c(x)0,P(Y0,1|X=x)+P(Y1,1|X=x)c(x)0,P(Y0,1|X=x)P(Y1,0|X=x)c(x).d

9、,P(Y0,1|X=x)P(Y1,0|X=x)=c(x)the cost function c(x)helps to control the upper bound of FNA().Corollary 1(Relation to the cost).For the upper bound wFNA()in Lemma 1 anduFNA()in Theorem 1,the optimal policy satisfieswFNA()Eh1 c(X)2(X)i,and uFNA()Eh?1 c(X)2?2(X)i.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactu

10、al No-Harm Criterion19/34No-Harm Trustworthy Policy Learning1Introduction2Sharp Bounds of the No-Harm Criterion3No-Harm Trustworthy Policy Learning4ExperimentsPeng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion20/34No-Harm Trustworthy Policy LearningOptimal Policy at a Given Level of No-Har

11、mDenote as the optimal target policy satisfying the no-harm criterionmaxR(;c,)subject touFNA(),(1)where is a pre-specified level of allowed harm,andR(;c,)=E(X)Y(1)c(X)+Y(0)1 (X)for 0,1,which is a general form of policy reward for different utility functions.For example,R(;c,1)=R()for U(X,T,Y)=Y Tc(X

12、);R(;c,0)=R()for U(X,T,Y)=TY Tc(X).Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion21/34No-Harm Trustworthy Policy LearningLearned PolicyLet be the learned policy of,derived by optimizing the empirical form of Eq.(1),maxR(;c,)subject to uFNA(),(2)whereR(;c,)and uFNA()are the correspondin

13、g estimators of R(;c,)anduFNA(),obtained as follows.Let e(x):=P(T=1|X=x),t(x):=EY|T=t,X=x for t=0,1,and(Z;e,0,1)=?T(Y 1(X)e(X)+1(X)c(X)?(X)+?(1 T)(Y 0(X)1 e(X)+0(X)?(1 (X),(Z;e,0,1)=?(1 T)(Y 0(X)1 e(X)+0(X)?(X)?T(Y 1(X)e(X)+1(X)?0(X)(X),where Z=(T,X,Y).Lemma 2.,R(;c,)=E(Z;e,0,1)and uFNA()=E(Z;e,0,1)

14、.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion22/34No-Harm Trustworthy Policy LearningFrom Lemma 2,it is natural to define the estimators of R(;c,)and uFNA()asR(;c,)=1nnXi=1(Zi;e,0,1),uFNA()=1nnXi=1(Zi;e,0,1),where e(x)and t(x)for t=0,1 as the estimators of e(x)and t(x),respectively,u

15、singthe sample-splitting technique.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion23/34No-Harm Trustworthy Policy LearningAsymptotic Properties of EstimatorsTheorem 2.Suppose that|e(x)e(x)|2|t(x)t(x)|2=oP(n1/2)for all x Xand t 0,1,(a)R(,c;)is consistent and asymptotically normalnR(,c;)R

16、(,c;)N(0,21),where 21=V(Z;e,0,1);(b)if 0(x)=0(x;)is a parametric model,uFNA()is consistent andasymptotically normaln uFNA()uFNA()N(0,22),where22=Vh(Z;e,0,1)s(X)En0(X;)1(X)(X)oi,and s(X)is the influence function of estimator of.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion24/34No-Harm

17、Trustworthy Policy LearningProperties of the Learned PolicyR(;c,)R(;c,)R(;c,)R(;c,)which are the regret of the learned policy,and error of the estimated reward of learnedpolicy,respectively.Theorem 3(Main Result 2)Suppose that for all ,(x)=(x;)is acontinuously differentiable and convex function with

18、 respect to,where is acompact set,under the assumptions in Theorem 1,then we have(a)The expected reward of the learned policy is consistent,andR(;c,)R(;c,)=OP(1/n);(b)The estimated reward of the learned policy is consistent,andR(;c,)R(;c,)=OP(1/n).Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm C

19、riterion25/34No-Harm Trustworthy Policy LearningTheorem 4(Main Result 3)Suppose that is a P-G-C class,t(x)and e(x)areuniformly consistent estimators of t(x)and e(x)for t=0,1,respectively,and a for any and 0 a 1,then we have(a)R(;c,)R(;c,)P 0;and(b)R(;c,)R(;c,)P 0.Peng Wu(DataFun 因果推斷在線峰會 23)Counterf

20、actual No-Harm Criterion26/34Experiments1Introduction2Sharp Bounds of the No-Harm Criterion3No-Harm Trustworthy Policy Learning4ExperimentsPeng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion27/34ExperimentsSimulation SetupsTwo semi-synthetic datasets:The(Infant Health and Development Progra

21、m)IHDP dataset.It comprises 672 units(123 treated,549 control)and 25 covariatesmeasuring aspects of children and their mothers.Goal:examines the effects of specialist home visits on future cognitivetest scores.The JOBS dataset is based on the National Supported Work program.It includes 2,570 units(2

22、37 treated,2,333 control)and 17 covariates.Goal:examines the effects of job training on income and employmentstatus after training.To determine the ground truth of harm,we simulate potential outcomes:Yi(0)Bern(w0 xi+0,i),Yi(1)Bern(w1xi+1,i),where()is the sigmoid function,w0 N1,1(0,1)follows a trunca

23、ted normaldistribution,w1 Unif(1,1)follows a uniform distribution,0,i N(0,1),and1,i N(1,1).We set the noise parameters 0=1 and 1=3 for IHDP and 0=0and 1=2 for Jobs.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion28/34ExperimentsGoal and Evaluation MetricsGoal.the goal of our policy learn

24、ing is to maximize the reward and the resulting changein welfare while satisfying the no-harm criterion.In this simulation,there are 65 and 252 units in the”harmful treatment”strata onIHDP and Jobs,respectively.We define the no-harm criterion as harming less than20%of them by the learned policy,i.e.

25、,13 units for IHDP and 50 units for Jobs.Evaluation Metrics.Reward:For CATE-based policy learning,nXi=1(Yi(1)c)(xi)+Yi(0)(1 (xi).For recommendation-based policy learning:Pni=1(Yi(1)c)(xi).The change in welfare:W()=nXi=1Yi(1)(xi)+Yi(0)(1(xi)nXi=1Yi(0)=nXi=1(Yi(1)Yi(0)(xi)The true harmnXi=1IYi(0)=1,Yi

26、(1)=0 (xi).Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion29/34ExperimentsResultsPeng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion30/34ExperimentsConclusionWe formalize the no-harm criterion for policy learning from a principalstratification perspective.We propose a novel upp

27、er bound for the fraction negatively affected by the policy.We propose an estimator of the upper bound,and show the consistency andasymptotic normality of the estimator.Based on the estimators for the policy reward and harm rate,we further propose apolicy learning approach that satisfies the no-harm

28、 criterion,and prove itsconsistency to the optimal policy reward for parametric and nonparametric policyclasses,respectively.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion31/34ExperimentsMain References1 Kush R.Varshney(2022),Trustworthy Machine Learning.Independently Published.2 Chern

29、ozhukov,V.,Chetverikov,D.,Demirer,M.,Duflo,E.,Hansen,C.,Newey,W.,andRobins,J.(2018),Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal.3 Kallus,N.(2022),Treatment effect risk:Bounds and inference.In 2022 ACM Conference onFairness,Accountability,and Tran

30、sparency.(FAccT 22).4 Kallus,N.(2022),Whats the harm?sharp bounds on the fraction negatively affected bytreatment.arXiv preprint arXiv:2205.10327.5 Kitagawa,T.and Tetenov,A.(2018)Who should be treated?Empirical welfaremaximization methods for treatment choice.Econometrica.6 Haoxuan Li,Chunyuan Zheng

31、,Yixiao Cao,Zhi Geng,Yue Liu*,and Peng Wu*(2023).Trustworthy Policy Learning under the Counterfactual No-Harm Criterion.ICML 23.7 Peng Wu,Peng Ding,Zhi Geng,and Yue Liu.Individual Benefit and Risk:Bounds andInference.Working Papers.Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion32/34Exp

32、eriments歡迎加入北京工商大學因果團隊在耿直教授的帶領下,北京工商大學因果推斷團隊于 2022 年組建成立,主要從事因果推斷的基礎理論、方法和相關應用領域的研究工作。團隊在因果效應評價、因果關系發現、因果歸因、因果推薦系統、因果強化學習,基于因果的公平性評價,以及生物醫學、食品安全和互聯網 IT 等因果推斷應用研究方面取得了一系列成果。自成立以來,研究成果發表在統計學、機器學習及人工智能領域的國際頂級期刊Biometrika、J.Machine Learning Research、Biometrics、Statistica Sinica、Statistics inMedicine、Artificial Intelligence、TNNLS 等,和國際頂級學術會議 ICML、NuerIPS、ICLR、AAAI、KDD、IJCAI、WWW、UAI 等。有興趣者請聯系。Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion33/34ExperimentsThanks!Peng Wu(DataFun 因果推斷在線峰會 23)Counterfactual No-Harm Criterion34/34

友情提示

1、下載報告失敗解決辦法
2、PDF文件下載后,可能會被瀏覽器默認打開,此種情況可以點擊瀏覽器菜單,保存網頁到桌面,就可以正常下載了。
3、本站不支持迅雷下載,請使用電腦自帶的IE瀏覽器,或者360瀏覽器、谷歌瀏覽器下載即可。
4、本站報告下載后的文檔和圖紙-無水印,預覽文檔經過壓縮,下載后原文更清晰。

本文(Trustworthy Policy Learning under the Counterfactual No-Harm Criterion.pdf)為本站 (2200) 主動上傳,三個皮匠報告文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對上載內容本身不做任何修改或編輯。 若此文所含內容侵犯了您的版權或隱私,請立即通知三個皮匠報告文庫(點擊聯系客服),我們立即給予刪除!

溫馨提示:如果因為網速或其他原因下載失敗請重新下載,重復下載不扣分。
客服
商務合作
小程序
服務號
折疊
午夜网日韩中文字幕,日韩Av中文字幕久久,亚洲中文字幕在线一区二区,最新中文字幕在线视频网站