《CSET:2024AI安全的關鍵概念:機器學習中可靠的不確定性量化方法分析報告(英文版)(13頁).pdf》由會員分享,可在線閱讀,更多相關《CSET:2024AI安全的關鍵概念:機器學習中可靠的不確定性量化方法分析報告(英文版)(13頁).pdf(13頁珍藏版)》請在三個皮匠報告上搜索。
1、Issue BriefJune 2024Key Concepts in AI SafetyReliable Uncertainty Quantification in Machine LearningAuthorsTim G.J.RudnerHelen Toner Center for Security and Emerging Technology|1 This paper is the fifth installment in a series on“AI safety,”an area of machine learning research that aims to identify
2、causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably.Other papers in the series describe three categories of AI safety issuesproblems of robustness,assurance,and specification.This paper introduces the idea of uncertainty quanti
3、fication,i.e.,training machine learning systems that“know what they dont know.”Introduction The last decade of progress in machine learning research has given rise to systems that are surprisingly capable but also notoriously unreliable.The chatbot ChatGPT,developed by OpenAI,provides a good illustr
4、ation of this tension.Users interacting with the system after its release in November 2022 quickly found that while it could adeptly find bugs in programming code and author Seinfeld scenes,it could also be confounded by simple tasks.For example,one dialogue showed the bot claiming that the fastest
5、marine mammal was the peregrine falcon,then changing its mind to the sailfish,then back to the falcondespite the obvious fact that neither of these choices is a mammal.This kind of uneven performance is characteristic of deep learning systemsthe type of AI systems that have seen most progress in rec
6、ent yearsand presents a significant challenge to their deployment in real-world contexts.An intuitive way to handle this problem is to build machine learning systems that“know what they dont know”that is,systems that can recognize and account for situations where they are more likely to make mistake
7、s.For instance,a chatbot could display a confidence score next to its answers,or an autonomous vehicle could sound an alarm when it finds itself in a scenario it cannot handle.That way,the system could be useful in situations where it performs well,and harmless in situations where it does not.This c
8、ould be especially useful for AI systems that are used in a wide range of settings,such as large language models(the technology that powers chatbots like ChatGPT),since these systems are very likely to encounter scenarios that diverge from what they were trained and tested for.Unfortunately,designin
9、g machine learning systems that can recognize their limits is more challenging than it may appear at first glance.In fact,enabling machine learning systems to“know what they dont know”known in technical circles as“uncertainty quantification”is an open and widely studied research problem within machi
10、ne learning.This paper gives an introduction to how uncertainty quantification works,why it is difficult,and what the prospects are for the future.Center for Security and Emerging Technology|2 The Challenge of Reliably Quantifying Uncertainty In principle,the kind of system we would like to build so
11、unds simple:a machine learning model that generally makes correct predictions,but that can indicate when its predictions are more likely to be incorrect.Ideally,such a model would indicate high levels of uncertainty neither too often nor too seldom.A system that constantly expresses under-confidence
12、 in situations that it could actually handle well is not very useful,but if the system sometimes does not indicate uncertainty when in fact it is about to fail,then this defeats the purpose of trying to quantify uncertainty in the first place.Experts use the idea of“calibration”to describe the desir
13、ed behavior here:the level of uncertainty that a machine learning model assigns to a given predictionits“predictive uncertainty”should be calibrated to the probability that the prediction is in fact incorrect.Figure 1:Calibration Curves Depicting Under-Confidence,Near-Perfect Calibration,and Over-Co
14、nfidence The figures show under-confident(left),well-calibrated(center),and over-confident(right)calibration curves.Ideally,the confidence expressed by the model(on the x-axis)should correspond to the chance that the prediction is correct(on the y-axis).A model is under-confident if its predictions
15、are more often correct than its confidence levels would imply(per the chart on the left),while the inverse is true for an over-confident model(on the right).Source:CSET.For example,imagine a medical machine learning classification system that uses a scan of a patients eye to predict whether the pati
16、ent has a retinal disease.1 If the system is calibrated,then its predictionstypically expressed as percentagesshould correspond to the true proportion of diseased retinas.That is,it should be the case that Center for Security and Emerging Technology|3 of the retina images predicted to be exhibiting
17、signs of disease with a 50%chance,half are in fact diseased,or that eight out of ten retina images predicted to have an 80%probability of exhibiting signs of disease in fact do,and so on.The closer the assigned probabilities are to the real proportion in the evaluation data,the better calibrated the
18、 system is.A well-calibrated system is useful because it allows users to account for how likely the prediction is to be correct.For example,a doctor would likely make different decisions about further testing and treatment for a patient whose scan indicated a 0.1%chance of disease versus one whose s
19、can indicated a 30%chanceeven though neither scan would be classified as likely diseased.Understanding Distribution Shift Building a system that can express well-calibrated predictive uncertainty in the laboratorywhile not straightforwardis achievable.The challenge lies in creating machine learning
20、models that can reliably quantify uncertainty when subjected to the messiness of the real world in which they are deployed.At the root of this challenge lies an idea called“distribution shift.”This refers to the ways in which the types of data that a machine learning system encounters(the“data distr
21、ibution”)change from one setting to another.For instance,a self-driving car trained using data from San Franciscos roads is unlikely to encounter snow,so if the same car were deployed in Boston during the winter,it would encounter a different data distribution(one that includes snow on the roads),ma
22、king it more likely to fail.Distribution shift is easy to describe informally,but very difficult to detect,measure,or define precisely.This is because it is especially difficult to foresee and account for all the possible types of distribution shifts that a system might encounter in practice.When a
23、particular shift can be anticipatedfor instance,if the engineers that trained the self-driving car in San Francisco were planning a Boston deployment and considering weather differencesthen it is relatively straightforward to manage.In most cases,however,it is impossible to know in advance what kind
24、s of unexpected situationswhat unknown unknownsa system deployed in the messy real world may encounter.The need to deal with distribution shifts makes quantifying uncertainty difficult,similarly to the broader problem of generalization in modern machine learning systems.While it is possible to evalu
25、ate a models accuracy on a limited set of data points in the lab,there are no mathematical guarantees that ensure that a model will perform as well when deployed(i.e.,that what the system learned will“generalize”beyond its training data).Likewise,for uncertainty quantification,there is no guarantee
26、Center for Security and Emerging Technology|4 that a seemingly well-calibrated model will remain calibrated on data points that are meaningfully different from the training data.But while there is a vast amount of empirical and theoretical literature on how well models generalize to unseen examples,
27、there is relatively little work on models ability to reliably identify situations where their uncertainty should be high,making“uncertainty generalization”one of the most important and yet relatively underexplored areas of machine learning research.Accurately Characterizing Uncertainty In the medica
28、l imaging example above,we described how machine learning models used for classification produce probabilities for each class(e.g.,diseased versus not diseased),but such probabilities may not be sufficient for reliable uncertainty quantification.These probability scores indicate how strongly a model
29、 predicts that a given input corresponds to a given output.For instance,an image classifier for reading zip codes takes in an image of a handwritten digit,then assigns a score to each of the ten possible outputs(corresponding to the digit in the image being a“0,”“1,”“2,”etc.).The output with the hig
30、hest score indicates the digit that the classifier thinks is most likely to be in the image.Unfortunately,these scores are generally not useful indicators of the models uncertainty,for two reasons.First,they are the result of a training process that was optimizing for the model to produce accurate o
31、utputs,not calibrated probabilities;2 thus,there is no particular reason to believe that a score of 99.9%reliably corresponds to a higher chance that the output is correct than a score of 95%.Second,systems designed this way have no way to express“none of the above”say,if the zip code reader encount
32、ered a bug splattered across the page.The model is mathematically forced to assign probability scores to the available outputs,and to ensure that those scores sum to one.3 This naturally raises the question of why adding a“none of the above”option is not possible.The reason is simple:models learn fr
33、om data and,due to the challenges of distribution shift described above,AI developers typically do not have data that represents the broad range of possibilities that could fit into a“none of the above”option.This makes it infeasible to train a model that can consistently recognize inputs as being m
34、eaningfully different.To summarize,the core problem making uncertainty quantification difficult is that in many real-world settings,we cannot cleanly articulate and prepare for every type of situation a model may need to be able to handle.The aim is to find a way for the system to identify situation
35、s when it is likely to failbut because it is impossible to Center for Security and Emerging Technology|5 expose the system to every kind of scenario in which it might perform poorly,it is impossible to verify in advance that the system will appropriately estimate its chances of performing well under
36、 novel,untested conditions.In the next section,we discuss several approaches that try to navigate this difficulty.Existing Approaches to Uncertainty Quantification The key challenge of uncertainty quantification is to develop models that can accurately and reliably express how likely their predictio
37、ns are to be correct.A wide range of approaches have been developed that aim to achieve this goal.Some approaches primarily treat uncertainty quantification as an engineering challenge that can be addressed with tailored algorithms and more training data.Others seek to use more mathematically ground
38、ed techniques that could,in theory,provide watertight guarantees that a model can quantify its own uncertainty well.Unfortunately,it is not currently possible to produce such mathematical guarantees without using unrealistic assumptions.Instead,the best we can do is develop models that quantify unce
39、rtainty well on carefully designed empirical tests.Approaches to uncertainty quantification in modern machine learning fall into four different categories:1.Deterministic Methods 2.Model Ensembling 3.Conformal Prediction 4.Bayesian Inference Each of these approaches has distinct benefits and drawbac
40、ks,with some providing mathematical guarantees and others performing particularly well on empirical tests.We elaborate on each technique in the remainder of this section.Readers are welcome to skip to the next section if the somewhat more technical material below is not of interest.Deterministic Met
41、hods Deterministic methods work by explicitly encouraging the model to exhibit high uncertainty on certain input examples during training.For example,researchers might start by training a model on one dataset,then introduce a different dataset with the expectation that the model should express high
42、uncertainty on examples from the dataset it was not trained on.Using this approach results in models that are very accurate on data similar to what they were trained on,and that indicate high uncertainty for other data.4 Center for Security and Emerging Technology|6 However,it is not clear how much
43、we can rely on these research results in practice.Models trained this way are optimized to recognize that some types of input are outside the scope of what they can handle.But because the real world is complex and unpredictable,it is impossible for this training to cover all possible ways in which a
44、n input could be out of scope.For example,even if we trained the medical imaging classifier described above to have high predictive uncertainty on images that exhibit commonly known image corruptions,it may still fail at deployment if the model was trained on images obtained in one hospital with a c
45、ertain type of equipment,and deployed in another hospital with a different type of equipment.As a result,this approach is prone to failure when the model is deployed,and there is no known way to guarantee that the predictive uncertainty estimates will in fact be reliable.Model Ensembling Model ensem
46、bling is a simple method that combines multiple trained models and averages their predictions.This approach often improves predictive accuracy compared to just using a single model.An ensembles predictive uncertainty is expressed as the standard deviation of the different predictions,meaning that if
47、 all of the models in the ensemble make similar predictions,then uncertainty is low;if they make very different predictions,uncertainty is high.Ensemble methods are often successful at providing good predictive uncertainty estimates in practice,and are therefore a popular approachthough they can be
48、expensive,given that multiple models must be trained.The underlying mechanism of using ensembling for uncertainty quantification is that different models in an ensemble will be likely to agree on input examples similar to the training data,but may disagree on input examples meaningfully different fr
49、om the training data.As such,when the predictions of the ensemble components differ,this can be used as a stand-in for uncertainty.5 However,there is no way to verify that this mechanism works for any given ensemble and input example.In particular,it is possible that for some input examples,multiple
50、 models in the ensemble may all give the same incorrect answer,which would give a false impression of confidence,and it is impossible to ensure that a given ensemble will provide reliable,well-calibrated predictive uncertainty estimates across the board.For some use cases,the fact that ensembling ty
51、pically provides fairly good uncertainty estimates may be sufficient to make it worth using.But in cases where the user needs to be able to trust that the system will reliably identify situations where it is likely to fail,ensembling should not be considered a reliable method.Center for Security and
52、 Emerging Technology|7 Conformal Prediction Conformal prediction,in contrast with deterministic methods and ensembling,is a statistically well-founded approach that provides mathematical reliability guarantees,but relies on a key assumption:that the data the model will encounter once deployed is gen
53、erated by the same underlying data-generating process as the training data(i.e.,that there is no distribution shift).Using this assumption,conformal prediction can provide mathematical guarantees of the probability that a given prediction range included the correct prediction.For instance,in a weath
54、er forecasting setting,conformal prediction could guarantee a 95%chance that the days maximum temperature will fall within a certain range.(That is,it could provide a mathematical guarantee that 95 out of 100 similar predictions would fall within the range.)6 A predicted range of,say,82F-88F would i
55、mply more uncertainty than a range of 83F-85F.Conformal predictions major advantage is that it is possible to mathematically guarantee that its predictive uncertainty estimates are correct under certain assumptions.Its major disadvantage is that those assumptionsprimarily that the model will encount
56、er similar data while deployed to the data it was trained onoften do not hold.Worse,it is often impossible to detect when these assumptions are violated,meaning that the same kind of changes in inputs that may trip up deterministic methods are also likely to cause conformal prediction to fail.In fac
57、t,in all of the example application problems where machine learning models are prone to fail and for which we would like to find approaches to improving uncertainty quantification,standard assumptions of conformal prediction would be violated.Bayesian Inference Lastly,Bayesian uncertainty quantifica
58、tion uses Bayesian inference,which provides a mathematically principled framework for updating the probability of a hypothesis as more evidence or information becomes available.7 Bayesian inference can be used to train a neural network that represents each parameter in the network as a random variab
59、le,rather than a single fixed value(as is typically the case).While this approach is guaranteed to provide an accurate representation of a models predictive uncertainty,it is computationally infeasible to carry out exact Bayesian inference on modern machine learning models such as neural networks.In
60、stead,the best researchers can do is to use approximations,meaning that any guarantee that the models uncertainty will be accurately represented is lost.Center for Security and Emerging Technology|8 Practical Considerations in Using Uncertainty Quantification Uncertainty quantification methods for m
61、achine learning are a powerful tool for making modern machine learning systems more reliable.While no existing approach is a silver bullet and each approach has distinct practical shortcomings,research has shown that methods specifically designed to improve the ability of modern machine learning sys
62、tems to quantify their uncertaintysuch as the approaches described abovesucceed at doing so in most settings.These methods therefore often serve as“add-ons”to standard training routines.They can be custom-designed to meet the specific challenges of a given prediction task or deployment setting and c
63、an add an additional safety layer to deployed systems.Considering human-computer interaction is crucial for making effective use of uncertainty quantification methods.For example,being able to interpret a models uncertainty estimates,determining the level of uncertainty in machine learning systems t
64、hat human operators are comfortable with,and understanding when and why a systems uncertainty estimates may be unreliable is extremely important for safety-critical application settings.Choices around the design of user interfaces,data visualizations,and user training can make a big difference in ho
65、w useful uncertainty estimates are in practice.8 Given the limitations of existing approaches to uncertainty quantification,it is essential that the use of uncertainty estimates does not create a false sense of confidence.Systems must be designed to account for the fact that a model displaying high
66、confidence could still be wrong if it has encountered an unknown unknown that goes beyond what it was trained and tested for.Center for Security and Emerging Technology|9 Outlook There is increasing interest in how uncertainty quantification could be used to mitigate the weaknesses of large language
67、 models,such as their tendency to hallucinate.While much past work in the space has focused on image classification or simple tabular datasets,some researchers are beginning to explore what it would look like for chatbots or other language-based systems to“know what they dont know.”9 This research n
68、eeds to grapple with challenges specific to language generation,such as the fact that there is often no single correct answer.(For instance,correct answers to the question:“What is the capital of France?”could include,“Paris,”“Its Paris,”or“The capital of France is Paris,”each of which requires the
69、language model to make different predictions about which word should come next.)Due to the fundamental challenges of reliably quantifying uncertainty,we should not expect a perfect solution to be developed for language generation or any other type of machine learning.Just as with the broader challen
70、ge of building machine learning systems that can generalize to new contexts,the possibility of distribution shift means that we may never be able to build AI systems that“know what they dont know”with complete certainty.Nonetheless,research into reliable uncertainty quantification in challenging dom
71、ainssuch as computer vision or reinforcement learninghas made great strides in improving the reliability and robustness of modern machine learning systems over the past few years and will play a crucial role in improving the safety,reliability,and interpretability of large language models in the nea
72、r future.Over time,uncertainty quantification in machine learning systems is likely to move from being an area of basic research to a practical engineering challenge that can be approached with the different paradigms and methods described in this paper.Center for Security and Emerging Technology|10
73、 Authors Tim G.J.Rudner is a non-resident AI/ML fellow with CSET and a faculty fellow at New York University.Helen Toner is the director of strategy and foundational research grants at CSET.Acknowledgments For feedback and assistance,we are grateful to Alex Engler,Heather Frase,Margarita Konaev,Larr
74、y Lewis,Emelia Probasco,and Thomas Woodside.2024 by the Center for Security and Emerging Technology.This work is licensed under a Creative Commons Attribution-Non Commercial 4.0 International License.To view a copy of this license,visit https:/creativecommons.org/licenses/by-nc/4.0/.Document Identif
75、ier:doi:10.51593/20220013 Center for Security and Emerging Technology|11 Endnotes 1 Neil Band et al.,Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks,Advances in Neural Information Processing Systems,2021,https:/ We note that,technically,models are trained to achieve a hig
76、h cross-entropy between the data labels and the predicted probabilities.This metric does discourage the modelto some extentfrom being confident and wrong on the training data but does not necessarily lead to well-calibrated predictions.3 For simplicity,we only discuss classification problems,where a
77、 model predicts class probabilities that make it easy to compute the calibration of a predictive model.4 See,for example,this paper on predicting retinal disease(including Table 4 on expected calibration error):Joost van Amersfoort,Lewis Smith,Yee Whye Teh,and Yarin Gal,“Uncertainty Estimation Using
78、 a Single Deep Deterministic Neural Network,”arXiv preprint arXiv:2003.02037(2020),https:/arxiv.org/abs/2003.02037.5 The reason for this is that ensemble methods rely on the assumption that by initializing the neural network weights at the beginning of training to different values(and,when possible,
79、training each ensemble component of a different subset of the training data),each trained model will make a different prediction on points that are very different from the training data,resulting in a high standard deviation(i.e.,uncertainty)between predictions.However,depending on the model class,t
80、he training data,and the data point where the model is asked to make a prediction,it is possible that in fact all ensemble members make similar predictions,hence resulting in low predictive uncertainty.6 More precisely,“similar”here means that the data the model is evaluated on must have been genera
81、ted by exactly the same underlying data-generating process as the training data.7 More precisely,Bayesian inference allows us to infer a distribution over some random variable given the datacalled the posterior distribution.To compute a posterior distribution,we require an observation model that tel
82、ls us how likely it is to observe the data given a specific realization of the random variable of interest and a so-called prior distribution over the random variable that reflect our beliefs about the potential values the random variable of interest could take.8 See,for example,Malte F.Jung,David S
83、irkin,Turgut M.Gr,and Martin Steinert,“Displayed Uncertainty Improves Driving Experience and Behavior:The Case of Range Anxiety in an Electric Car,”CHI 15:Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems,April 2015,https:/dl.acm.org/doi/10.1145/2702123.2702479;Matt
84、hew Kay,Tara Kola,Jessica R.Hullman,and Sean A.Munson,“When(ish)is My Bus?:User-centered Visualizations of Uncertainty in Everyday,Mobile Predictive Systems,”CHI 16:Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems,May 2016,https:/dl.acm.org/doi/10.1145/2858036.2858558.9 C
85、f.Saurav Kadavath,Tom Conerly,Amanda Askell,Tom Henighan,Dawn Drain,Ethan Perez et al.,“Language Models(Mostly)Know What They Know,”arXiv preprint arXiv:2207.05221v4(2022),Center for Security and Emerging Technology|12 https:/arxiv.org/pdf/2207.05221.pdf;Lorenz Kuhn,Yarin Gal,and Sebastian Farquhar,“Semantic Uncertainty:Linguistic Invariances For Uncertainty Estimation in Natural Language Generation,”arXiv preprint arXiv:2302.09664v3(2023),https:/arxiv.org/pdf/2302.09664.pdf.