Belief functions: A gentle introduction

Size: px
Start display at page:

Download "Belief functions: A gentle introduction"

Transcription

1 Belief functions: A gentle introduction Seoul National University Professor Fabio Cuzzolin School of Engineering, Computing and Mathematics Oxford Brookes University, Oxford, UK Seoul, Korea, 30/05/18 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 1 / 125

2 Uncertainty Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 2 / 125

3 Uncertainty Second-order uncertainty Orders of uncertainty the difference between predictable and unpredictable variation is one of the fundamental issues in the philosophy of probability second order uncertainty: being uncertain about our very model of uncertainty has a consequence on human behaviour: people are averse to unpredictable variations (as in Ellsberg s paradox) how good are Kolmogorov s measure-theoretic probability, or Bayesian and frequentist approaches at modelling second-order uncertainty? Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 3 / 125

4 Uncertainty Classical probability Probability measures mainstream mathematical theory of (first order) uncertainty: mathematical (measure-theoretical) probability mainly due to Russian mathematician Andrey Kolmogorov probability is an application of measure theory, the theory of assigning numbers to sets additive probability measure mathematical representation of the notion of chance assigns a probability value to every subset of a collection of possible outcomes (of a random experiment, of a decision problem, etc) collection of outcomes Ω sample space, universe subset A of the universe event Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 4 / 125

5 Uncertainty Classical probability Probability measures probability measure µ: a real-valued function on a probability space that satisfies countable additivity probability space: it is a triplet (Ω, F, P) formed by a universe Ω, a σ-algebra F of its subsets, and a probability measure on F not all subsets of Ω belong necessarily to F axioms of probability measures: µ( ) = 0, µ(ω) = 1 0 µ(a) 1 for all events A F additivity: for all countable collection of pairwise disjoint events A i : ( ) µ A i = µ(a i ) i i probabilities have different interpretations: we consider frequentist and Bayesian (subjective) probability Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 5 / 125

6 Uncertainty Classical probability Frequentist inference in the frequentist interpretation, the (aleatory) probability of an event is its relative frequency in time the frequentist interpretation offers guidance in the design of practical random experiments developed by Fisher, Pearman, Neyman three main tools: statistical hypothesis testing model selection confidence interval analysis Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 6 / 125

7 Uncertainty Classical probability Statistical hypothesis testing 1 state the research hypothesis 2 state the relevant null and alternative hypotheses 3 state the statistical assumptions being made about the sample, e.g. assumptions about the statistical independence or about the form of the distributions of the observations 4 state the relevant test statistic T (a quantity derived from the sample) 5 derive the distribution of the test statistic under the null hypothesis from the assumptions 6 set a significance level (α), i.e. a probability threshold below which the null hypothesis will be rejected 7 compute from the observations the observed value t obs of the test statistic T 8 calculate the p-value, the probability (under the null hypothesis) of sampling a test statistic at least as extreme as the observed value 9 Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the p-value is less than the significance level threshold Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 7 / 125

8 Uncertainty Classical probability P-values More likely observation Probability density Very unlikely observations Observed data point P-value Very unlikely observations Set of possible results the p-value is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is false: frequentist statistics does not and cannot attach probabilities to hypotheses Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 8 / 125

9 Uncertainty Classical probability Maximum Likelihood Estimation (MLE) the term likelihood was popularized in mathematical statistics by Ronald Fisher in 1922: On the mathematical foundations of theoretical statistics Fisher argued against inverse (Bayesian) probability as a basis for statistical inferences, and instead proposes inferences based on likelihood functions likelihood principle: all of the evidence in a sample relevant to model parameters is contained in the likelihood function this is hotly debated, still [Mayo,Gandenberger] maximum likelihood estimation: where {ˆθ mle} {arg max L(θ ; x 1,..., x n)}, θ Θ L(θ ; x 1,..., x n) = f (x 1, x 2,..., x n θ) and {f (. θ), θ Θ} is a parametric model consistency: the sequence of MLEs converges in probability, for a sufficiently large number of observations, to the (actual) value being estimated Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 9 / 125

10 Uncertainty Classical probability Subjective probability (epistemic) probability = degrees of belief of an individual assessing the state of the world Ramsey and de Finetti subjective beliefs must follow the laws of probability if they are to be coherent (if this proof was prooftight we would not be here in front of you!) also, evidence casts doubt that humans will have coherent beliefs or behave rationally Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 10 / 125

11 Uncertainty Classical probability Bayesian inference prior distribution: the distribution of the parameter(s) before any data is observed, i.e. p(θ α) depends on a vector of hyperparameters α likelihood: the distribution of the observed data conditional on its parameters, i.e. p(x θ) marginal likelihood (sometimes also termed the evidence) is the distribution of the observed data marginalized over the parameter(s): p(x α) = p(x θ)p(θ α) dθ θ posterior distribution: the distribution of the parameter(s) after taking into account the observed data, as determined by Bayes rule: p(θ X, α) = p(x θ)p(θ α) p(x α) p(x θ)p(θ α) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 11 / 125

12 Beyond probability Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 12 / 125

13 Something is wrong? Beyond probability measure-theoretical mathematical probability is not general enough: cannot (properly) model missing data cannot (properly) model propositional data cannot really model unusual data (second order uncertainty) the frequentist approach to probability: cannot really model pure data (without design ) in a way, cannot even model properly continuous data models scarce data only asymptotically Bayesian reasoning has several limitations: cannot model no data (ignorance) cannot model uncertain data cannot model pure data (without prior) again, cannot properly model scarce data (only asymptotically) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 13 / 125

14 Beyond probability Fisher has not got it all right the setting of hypothesis testing is (arguably) arguable the scope is quite narrow: rejecting or not rejecting a hypothesis (although it can provide confidence intervals) the criterion is arbitrary: who decides what an extreme realisation is (choice of α)? what is the deal with 0.05 and 0.01? the whole tail idea comes from the fact that, under measure theory, the conditional probability (p-value) of a point outcome x is zero seems trying to patch an underlying problem with the way probability is mathematically defined cannot cope with pure data, without assumptions on the process (experiment) which generated them (we will come back to this later) deals with scarce data only asymptotically Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 14 / 125

15 Beyond probability The problem(s) with Bayes pretty bad at representing ignorance Jeffrey s uninformative priors are just not adequate different results on different parameter spaces Bayes rule assumes the new evidence comes in the form of certainty: A is true in the real world, often this is not the case ( uncertain or vague evidence) beware the prior! model selection in Bayesian statistics results from a confusion between the original subjective interpretation, and the objectivist view of a rigorous objective procedure why should we pick a prior? either there is prior knowledge (beliefs) or there is not all will be fine, in the end! asymptotically, the choice of the prior does not matter (really!) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 15 / 125

16 Beyond probability Set-valued observations The die as random variable face 6 face 1 face 3 face 5 face 2 face 4 X a die is a simple example of (discrete) random variable there is a probability space Ω = {face1, face2,..., face6} which maps to a real number: 1, 2,..., 6 (no need for measurability here) now, imagine that face1 and face2 are cloaked, and we roll the die Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 16 / 125

17 Beyond probability Set-valued observations The cloaked die: set-valued observations face 6 face 1 face 3 face 5 face 2 face 4 X the same probability space Ω = {face1, face2,..., face6} is still there (nothing has changed in the way the die works) however, now the mapping is different: both face1 and face2 are mapped to the set of possible values {1, 2} (since we cannot observe the outcome) this is a random set [Matheron,Kendall,Nguyen, Molchanov]: a set-valued random variable whenever data are missing observations are inherently set-valued Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 17 / 125

18 Beyond probability Propositional evidence Reliable witnesses Evidence supporting propositions suppose there is a murder, and three people are under trial for it: Peter, John and Mary our hypothesis space is therefore Θ = {Peter, John, Mary} there is a witness: he testifies that the person he saw was a man this amounts to supporting the proposition A = {Peter, John} Θ should we take this testimony at face value? in fact, the witness was tested and the machine reported an 80% chance he was drunk when he reported the crime we should partly support the (vacuous) hypothesis that any one among Peter, John and Mary could be the murderer: it is natural to assign 80% chance to proposition A, and 20% chance to proposition Θ Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 18 / 125

19 Beyond probability Propositional evidence Dealing with propositional evidence even when evidence (data) supports propositions, Kolmogorov s probability forces us to specify support for individual outcomes this is unreasonable - an artificial constraint due to a mathematical model that is not general enough we have no elements to assign this 80% probability to either Peter or John, nor to distribute it among them the cause is the additivity of probability measures: but this is not the most general type of measure for sets under a minimal requirement of monotoniticy measure can potentially suitable to describe probabilities of events: these objects are called capacities in particular, random sets are capacities in which the numbers assigned to subsets are given by a probability distribution Belief functions and propositional evidence As capacities (and random sets in particular), belief functions allow us to assign mass directly to propositions. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 19 / 125

20 Beyond probability Scarce data Machines that learn Generalising from scarce data machine learning: designing algorithms that can learn from data BUT, we train them on a ridicously small amount of data: how can we make sure they are robust to new situations never encountered before (model adaptation)? statistical learning theory [Vapnik] is based on traditional probability theory Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 20 / 125

21 Beyond probability Scarce data Dealing with scarce data a somewhat naive objection: probability distributions assume an infinite amount of evidence, so in reality finite evidence can only provide a constraint on the true probability values unfortunately, those who believe probabilities to be limits of relative frequencies (the frequentists) never really estimate a probability from the data the only assume ( design ) probability distributions for their p-values Fisher: fine, I can never compute probabilities, but I can use the data to test my hypotheses on them in opposition, those who do estimate probability distributions from the data (the Bayesians) do not think of probabilities as infinite accumulations of evidence (but as degrees of belief) Bayes: I only need to be able to model a likelihood function of the data well, actually, frequentists do estimate probabilities from scarce data when they do stochastic regression (e.g., logistic regression) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 21 / 125

22 Beyond probability Scarce data Asymptotic happiness what is true, is that both frequentists and Bayesians seem to be happy with solving their problems asymptotically limit properties of ML estimates Bernstein-von Mises theorem what about the here and now? e.g. smart cars? Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 22 / 125

23 Beyond probability Representing ignorance Modelling pure data Bayesian inference Bayesian reasoning requires modelling the data and a prior (actually, you need to pick the proper hypothesis space too!) prior is just a name for beliefs built over a long period of time, from the evidence you have observed so long a time has passed that all track record of observations is lost, and all is left is a probability distribution why should we pick a prior? either there is prior knowledge or there is not nevertheless we are compelling to picking one, because the mathematical formalism requires it this is the result of a confusion between the original subjective interpretation (where prior beliefs always exist), and the objectivist view of a rigorous objective procedure (where in most cases we do not have any prior knowledge) Bayesians then go in damage limitation mode, and try to pick the least damaging prior (see ignorance later) all will be fine, in the end! (Bernstein-von Mises theorem) Asymptotically, the choice of the prior does not matter (really!) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 23 / 125

24 Beyond probability Representing ignorance Dangerous priors Bayesian inference the prior distribution is typically hard to determine solution pick an uninformative probability Jeffrey s prior Gramian of the Fisher information matrix can be improper (unnormalised), and it violates the strong version of the likelihood principle: inferences depend not just on the data likelihood but also on the universe of all possible experimental outcomes uniform priors can lead to different results on different spaces, given the same likelihood functions the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior (Bernstein-von Mises theorem) A. W. F. Edwards: It is sometimes said, in defence of the Bayesian concept, that the choice of prior distribution is unimportant in practice, because it hardly influences the posterior distribution at all when there are moderate amounts of data. The less said about this defence the better. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 24 / 125

25 Beyond probability Representing ignorance Modelling pure data Frequentist inference the frequentist approach is inherently unable to describe pure data, without making additional assumptions on the data-generating process in Nature one cannot design an experiment: data come your way, whether you want it or not you cannot set the stopping rules again, recalls the old image of a scientist analysing (from Greek ana + lysis, breaking up) a specific aspect of the world in their lab the same data can lead to opposite conclusions different experiments can lead to the same data, whereas the parametric model employed (family of probability distributions) is linked to a specific experiment apparently, however, frequentists are just fine with this Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 25 / 125

26 Beyond probability Representing ignorance Dealing with ignorance Shafer vs Bayes uninformative priors can be dangerous (Andrew Gelman): they violate the strong likelihood principle, may be unnormalised wrong priors can kill a Bayesian model priors in general cannot handle multiple hypothesis spaces in a coherent way (families of frames, in Shafer s terminology) Belief functions and priors Reasoning with belief functions does not require any prior. Belief functions and ignorance Belief functions naturally represent ignorance via the vacuous belief function, assigning mass 1 to the whole hypothesis space. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 26 / 125

27 Beyond probability Rare events Extinct dinosaurs The statistics of rare events dinosaurs probably were worrying about overpopulation risks.... until it hit them! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 27 / 125

28 Beyond probability Rare events What s a rare event? what is a rare event? clearly we are interested in them because they are not so rare, after all! examples of rare events, also called tail risks or black swans, are: volcanic eruptions, meteor impacts, financial crashes.. mathematically, an event is rare when it covers a region of the hypothesis space which is seldom sampled it is an issue with the quality of the sample Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 28 / 125

29 Beyond probability Rare events Rare events and second-order uncertainty probability distributions for the system s behaviour are built in normal times (e.g. while a nuclear plant is working just fine), then used to extrapolate results at the tail of the distribution training samples P(Y=1 x) 'rare' event x popular statistical procedures (e.g. logistic regression) can sharply underestimate the probability of rare events Harvard s G. King [2001] has proposed corrections based on oversampling the rare events w.r.t the normal ones the issue is really one with the reliability of the model! we need to explictly model second-order uncertainty Belief functions and rare events Belief functions can model second-order uncertainty: rare events are a form of lack of information in certain regions of the sample space. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 29 / 125

30 Beyond probability Uncertain data Uncertain data concepts themselves can be not well defined, e.g. dark or somewhat round object (qualitative data) fuzzy theory accounts for this via the concept of graded membership unreliable sensors can generate faulty (outlier) measurements: can we still treat these data as certain? or is more natural to attach to them a degree of reliability, based on the past track record of the sensor (data generating process)? but then, can we still apply Bayes rule? people ( experts, e.g. doctors) tend to express themselves in terms of likelihoods directly (e.g. I think diagnosis A is most likely, otherwise either A or B ) if the doctors were frequentists, and were provided with the same data, they would probably apply logistic regression and come up with the same prediction on P(disease symptoms): unfortunately doctors are not statisticians multiple sensors can provide as output a PDF on the same space e.g., two Kalman filters based one on color, the other on motion (optical flow), providing a normal predictive PDF on the location of the target in the image plane Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 30 / 125

31 Belief theory Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 31 / 125

32 Belief theory A theory of evidence A mathematical theory of evidence Shafer called his proposal A mathematical theory of evidence the mathematical objects it deals with are called belief functions where do these names come from? what interpretation of probability do they entail? truth probabilistic knowledge knowledge evidence belief it is a theory of epistemic probability: it is about probabilities as a mathematical representation of knowledge (a human s knowledge, or a machine s) it is a theory of evidential probability: such probabilities representing knowledge are induced ( elicited ) by the available evidence Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 32 / 125

33 Belief theory A theory of evidence Evidence supporting hypotheses in probabilistic logic, statements such as "hypothesis H is probably true" mean that the empirical evidence E supports H to a high degree called the epistemic probability of H given E Rationale There exists evidence in the form of probabilities, which supports degrees of belief on a certain matter. the space where the evidence lives is different from the hypothesis space they are linked by a map one to many: but this is a random set! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 33 / 125

34 Belief theory Belief functions Dempster s multivalued mappings Dempster s work formalises random sets via multivalued (one-to-many) mappings Γ from a probability space (Ω, F, P) to the domain of interest Θ drunk (0.2) not drunk (0.8) Peter Mary John examples taken from a famous trial example [Shafer] elements of Ω are mapped to subsets of Ω: once again this is a random set in the example Γ maps {not drunk} Ω to {Peter, John} Θ the probability distribution P on Ω induces a mass assignment m : 2 Θ [0, 1] on the power set 2 Θ = {A Θ} via the multivalued mapping Γ : Ω 2 Θ Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 34 / 125

35 Belief theory Belief functions Belief and plausibility measures the belief in A as the probability that the evidence implies A: Bel(A) = P({ω Ω Γ(ω) A}) the plausibility of A as the probability that the evidence does not contradict A: Pl(A) = P({ω Ω Γ(ω) A }) = 1 Bel(A) originally termed by Dempster lower and upper probabilities belief and plausibility values can (but this is disputed) be interpreted as lower and upper bounds to the values of an unknown, underlying probability measure: Bel(A) P(A) Pl(A) for all A Θ Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 35 / 125

36 Belief theory Belief functions Basic probability assignments Mass functions belief functions (BF) are functions from 2 Θ, the set of all subsets of Θ, to [0, 1], assigning values to subsets of Θ it can be proven that each belief function has form Bel(A) = B A m(b) where m is a mass function or basic probability assignment on Θ, defined as a function 2 Θ [0, 1], such that: m( ) = 0 A Θ m(a) = 1 any subset A of Θ such that m(a) > 0 is called a focal element (FE) of m working with belief functions reduces to manipulating focal elements Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 36 / 125

37 Belief theory Belief functions A generalisation of sets, fuzzy sets, probabilities belief functions generalise traditional ( crisp ) sets: a logical (or categorical ) mass function has one focal set A, with m(a) = 1 belief functions generalise standard probabilities: a Bayesian mass function has as only focal sets elements (rather than subsets) of Θ complete ignorance is represented by the vacuous mass function: m(θ) = 1 belief functions generalise fuzzy sets (see possibility theory later), which are assimilated to consonant BFs whose focal elements are nested: A 1... A m consonant Bayesian vacuous Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 37 / 125

38 Belief theory Semantics Semantics of belief functions Modelling second-order uncertainty p(x) = 1 probability simplex A B (A) 0 (B) 1 p(z) = 0.7 Bel p(x) = 0.6 p(x) = 0.2 p(z) = 1 belief functions have multiple interpretations as set-valued random variables (random sets) p(z) = 0.2 p(y) = 1 as (completely monotone) capacities (functions from the power set to [0, 1]) as a special class of credal sets (convex sets of probability distributions) [Levi,Kyburg] as such, they are a very expressive means of modelling uncertainty on the model itself, due to lack of data quantity or quality, or both Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 38 / 125

39 Belief theory Semantics Axiomatic definition belief functions can also be defined in axiomatic terms, just like Kolmogorov s additive probability measures this is the definition proposed by Shafer in 1976 Belief function A function Bel : 2 Θ [0, 1] from the power set 2 Θ to [0, 1] such that: Bel( ) = 0, Bel(Θ) = 1; for every n and for every collection A 1,..., A n 2 Θ we have that: Bel(A 1... A n) i Bel(A i ) i<j Bel(A i A j ) + + ( 1) n+1 Bel(A 1... A n) makes clearer that belief measures generalise standard probability measures: replace additivity with superadditivity (third axiom) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 39 / 125

40 Belief theory Dempster s rule Jeffrey s rule of conditioning belief measures include probability ones as a special case: what does replace Bayes rule? Jeffrey s rule of conditioning: a step forward from certainty and Bayes rule an initial probability P stands corrected by a second probability P, defined only on a number of events suppose P is defined on a σ-algebra A there is a new prob measure P on a sub-algebra B of A, and the updated probability P has to: 1 meet the prob values specified by P for events in B 2 be such that B B, X, Y B, X, Y A { P (X) P(X) P (Y ) = if P(Y ) > 0 P(Y ) 0 if P(Y ) = 0 there is a unique solution: P (A) = B B P(A B)P (B) generalises Bayes conditioning! (obtained when P (B) = 1 for some B) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 40 / 125

41 Belief theory Dempster s rule Conditioning versus combination what if I have a new probability on the same σ-algebra A? Jeffrey s rule cannot be applied! as we saw, this happens when multiple sensors provide predictive PDFs belief function deal with uncertain evidence by moving away from the concept of conditioning (via Bayes rule).... to that of combining pieces of evidence supporting multiple (intersecting) propositions to various degrees Belief functions and evidence Belief reasoning works by combining existing belief functions with new ones, which are able to encode uncertain evidence. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 41 / 125

42 Belief theory Dempster s rule Dempster s combination drunk (0.2) not drunk (0.8) cleaned (0.6) not cleaned (0.4) Peter John Mary new piece of evidence: a blond hair has been found; also, there is a probability 0.6 that the room has been cleaned before the crime the assumption is that pairs of outcomes in the source spaces ω 1 Ω 1 and ω 2 Ω 2 support the intersection of their images in 2 Θ : θ Γ 1 (ω 1 ) Γ 2 (ω 2 ) if this is done independently, then the probability that pair (ω 1, ω 2 ) is selected is P 1 ({ω 1 })P 2 ({ω 2 }), yielding Dempster s rule of combination: (m 1 m 2 )(A) = 1 m 1 (B)m 2 (C), = A Θ, 1 κ B C=A Bayes rule is a special case of Dempster s rule Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 42 / 125

43 Belief theory Dempster s rule Dempster s combination A simple numerical example B 1 A A 2 B 2 = 2 4 m Bel 1 Bel 2 m m m m({θ 1 }) = = 0.48 m({θ 2 }) = = 0.31 m({θ 1, θ 2 }) = = 0.21 X 1 X 2 X 3 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 43 / 125

44 Belief theory Dempster s rule A generalisation of Bayesian inference belief theory generalises Bayesian probability (it contains it as a special case), in that: classical probability measures are a special class of belief functions (in the finite case) or random sets (in the infinite case) Bayes certain evidence is a special case of Shafer s bodies of evidence (general belief functions) Bayes rule of conditioning is a special case of Dempster s rule of combination it also generalises set-theoretical intersection: if m A and m B are logical mass functions and A B, then m A m B = m A B however, it overcomes its limitations you do not need a prior: if you are ignorant, you will use the vacuous BF m Θ which, when combined with new BFs m encoding data, will not change the result m Θ m = m however, if you do have prior knowledge you are welcome to use it! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 44 / 125

45 Belief theory Multivariate analysis Multivariate analysis Refinements and coarsenings the theory allows us to handle evidence impacting on different but related domains assume we are interested in the nature of an object in a road scene. We could describe it, e.g., in the frame Θ = {vehicle, pedestrian}, or in the finer frame Ω = {car, bicycle, motorcycle, pedestrian} other example: different image features in pose estimation a frame Ω is a refinement of a frame Θ (or, equivalently, Θ is a coarsening of Ω) if elements of Ω can be obtained by splitting some or all of the elements of Θ Θ θ 1 θ 2 θ 3 ρ Ω Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 45 / 125

46 Belief theory Multivariate analysis Families of compatible frames Multivariate analysis when Ω is a refinement for a collection Θ 1,..., Θ N of other frames it is called their common refinement two frames are said to be compatible if they do have a common refinement compatible frames can be associated with different variables/attributes/features: let Θ X = {red, blue, green} and Θ Y = {small, medium, large} be the domains of attributes X and Y describing, respectively, the color and the size of an object in such a case the common refinement Θ X Θ Y = Θ X Θ Y is simply the Cartesian product or, they can be descriptions of the same variable at different levels of granularity (as in the road scene example) evidence can be moved from one frame to another within a family of compatible frames Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 46 / 125

47 Belief theory Multivariate analysis Families of compatible frames Pictorial illustration 1 A i i n A 1 i A n 1 n 1... n Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 47 / 125

48 Belief theory Multivariate analysis Marginalisation let Θ X and Θ Y be two compatible frames let m XY be a mass function on Θ X Θ Y it can be expressed in the coarser frame Θ X by transferring each mass m XY (A) to the projection of A on Θ X : Y X B=A X A we obtain a marginal mass function on Θ X : m XY X (B) = m XY (A) {A Θ XY,A Θ X =B} X Y B Θ X (again, it generalizes both set projection and probabilistic marginalization) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 48 / 125

49 Belief theory Multivariate analysis Vacuous extension the inverse of marginalization a mass function m X on Θ X can be expressed in Θ X Θ Y by transferring each mass m X (B) to the cylindrical extension of B: Y X B A=B Y this operation is called the vacuous extension of m X in Θ X Θ Y : { m X XY m X (B) if A = B Θ Y (A) = 0 otherwise a strong feature of belief theory: the vacuous belief function (our representation of ignorance) is left unchanged when moving from one space to another! X Y Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 49 / 125

50 Belief theory Misunderstandings Belief functions are not (general) credal sets p(x) = 1 Bel Cre a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there (a credal set) however, belief functions are a special class of credal sets, those induced by a random set mapping p(z) = 1 p(y) = 1 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 50 / 125

51 Belief theory Misunderstandings Belief functions are not parameterised families of distributions, or confidence intervals p(x) = 1 p(z) = 1 Bel Fam p(y) = 1 obviously, a parameterised family of distributions on Θ is a subset of the set of all possible distributions (just like belief functions) not all families of distributions correspond to belief functions example: Gaussian PDFs with 0 mean and arbitrary variance {N (0, σ), σ R + } is not a belief function they are not confidence intervals either: confidence intervals are one-dimensional, and their interpretation is entirely different. Confidence intervals are interval estimates Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 51 / 125

52 Belief theory Misunderstandings Belief functions are not second-order distributions Dirichlet distribution Belief function as uniform meta-distribution unlike hypothesis testing, general Bayesian inference leads to probability distributions over the space of parameters these are second order probabilities, i.e. probability distributions on hypotheses which are themselves probabilities belief functions can be defined on the hypothesis space Ω, or on the parameter space Θ when defined on Ω they are sets of PDFs and can then be seen as indicator second order distributions (see figure) when defined on the parameter space Θ, they amount to families of second-order distributions in the two cases they generalise MLE/MAP and general Bayesian inference, respectively Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 52 / 125

53 Reasoning with belief functions Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 53 / 125

54 Reasoning with belief functions Reasoning with belief functions 1 inference: building a belief function from data (either statistical or qualitative) 2 reasoning: updating belief representations when new data arrives either by combination with another belief function or by conditioning with respect to new events/observations 3 manipulating conditional belief functions via a generalisation of Bayes theorem vie network propagation via a generalisation of the total probability theorem 4 using the resulting belief function(s) for: decision making regression classification etc (estimation, optimisation..) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 54 / 125

55 Reasoning with belief functions Reasoning with belief functions EFFICIENT COMPUTATION CONDITIONING COMBINED BELIEF FUNCTIONS INFERENCE MANIPULATION DECISION MAKING STATISTICAL DATA/ OPINIONS BELIEF FUNCTIONS MEASURING UNCERTAINTY COMBINATION CONDITIONAL BELIEF FUNCTIONS TOTAL/ MARGINAL BELIEF FUNCTIONS DECISIONS CONTINUOUS FORMULATION Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 55 / 125

56 Reasoning with belief functions Statistical inference Dempster s approach to statistical inference Fiducial argument consider a statistical model { } f (x θ), x X, θ Θ, where X is the sample space and Θ the parameter space having observed x, how to quantify the uncertainty about the parameter θ, without specifying a prior probability distribution? suppose that we known a data-generating mechanism [Fisher] X = a(θ, U) where U is an (unobserved) auxiliary variable with known probability distribution µ : U [0, 1] independent of θ for instance, to generate a continuous random variable X with cumulative distribution function (CDF) F θ, one might draw U from U([0, 1]) and set X = F 1 θ (U) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 56 / 125

57 Reasoning with belief functions Statistical inference Dempster s approach to statistical inference the equation X = a(θ, U) defines a multi-valued mapping Γ : U 2 X Θ : { } Γ : u Γ(u) = (x, θ) X Θ x = a(θ, u) X Θ under the usual measurability conditions, the probability space (U, B(U), µ) and the multi-valued mapping Γ induce a belief function Bel X Θ on X Θ conditioning it on θ yields Bel X (. θ) f ( θ) on X conditioning it on X = x gives Bel Θ ( x) on Θ X U Bel X Bel (. ) X : U [0,1] x Bel (. x) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 57 / 125

58 Reasoning with belief functions Statistical inference Inference from classical likelihood [Shafer76, Denoeux] consider a statistical model { L(θ; x) = f (x θ), x X, θ Θ }, where X is the sample space and Θ the parameter space Bel Θ (θ x) is the consonant belief function (with nested focal elements) with plausibility of the singletons equal to the normalized likelihood: pl(θ x) = L(θ; x) sup θ Θ L(θ ; x) takes the empirical normalised likelihood to be the upper bound to the probability density of the sought parameter! (rather than the actual PDF) the corresponding plausibility function is Pl Θ (A x) = sup θ A pl(θ x) the plausibility of a composite hypothesis A Θ is the usual likelihood ratio statistics Pl Θ (A x) = sup θ A L(θ; x) sup θ Θ L(θ; x) compatible with the likelihood principle Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 58 / 125

59 Reasoning with belief functions Statistical inference Coin toss example Inference with belief functions consider a coin toss experiment we toss the coin n = 10 times, obtaining the sample { } X = H, H, T, H, T, H, T, H, H, H with k = 7 successes (heads H) and n k = 3 fails (tails T) parameter of interest: the probability θ = p of heads in a single toss inference problem consists then on gathering information on the value of p Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 59 / 125

60 Reasoning with belief functions Statistical inference Coin toss example General Bayesian inference trials are typically assumed to be independent (they are equally distributed) the likelihood of the sample is binomial: P(X p) = p k (1 p) n k apply Bayes rule to get the posterior P(p X) = P(X p)p(p) P(X) as we do not have a-priori information on the prior likelihood function maximum likelihood estimate P(X p) = p k (1 p) n k p Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 60 / 125

61 Reasoning with belief functions Statistical inference Coin toss example Frequentist inference what would a frequentist do? it is reasonable that p be equal to p = k, i.e., the fraction of successes n we can then test this hypothesis in the classical frequentist setting this implies assuming independent and equally distributed trials, so that the conditional distribution of the sample is the binomial we can then compute the p-value for, say, a confidence level of α = 0.05 the right-tail p-value for the hypothesis p = k (the integral area in pink) is equal n >> α = Hence, the hypothesis cannot be rejected to 1 2 likelihood function p-value = 1/2 p Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 61 / 125

62 Reasoning with belief functions Statistical inference Coin toss example Inference with likelihood-based belief functions likelihood-based belief function inference yields the following belief measure, conditioned on the observed sample X, over Θ = [0, 1] Pl Θ (A X) = sup ˆL(p X); Bel Θ (A X) = 1 Pl Θ (A c X), A Θ p A where ˆL(p X) is the normalised version of the traditional likelihood random set induced by likelihood p X determines an entire envelope of PDFs on the parameter space Θ = [0, 1] (a belief function there) the random set associated with this belief measure is: { } ω Ω = [0, 1] Γ X (ω) = θ Θ Pl Θ ({θ} X) ω Θ = [0, 1] which is an interval centered around the ML estimate of p Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 62 / 125

63 Reasoning with belief functions Statistical inference Coin toss example Inference with likelihood-based belief functions the same procedure can applied to the normalised empirical counts ˆf (H) = 7 = 1, ˆf (T ) = 3, rather than to the normalised likelihood function 7 7 imposing Pl Ω (H) = 1, Pl Ω (T ) = 3 on Ω = {H, T }, and looking for the least 7 committed belief function there with these plausibility values we get the mass assignment: m(h) = 4 7, m(t ) = 0, m(ω) = 3 7, Bel 4/7 7/ MLE p corresponds to the credal set on the left p = 1 needs to be excluded, as the available sample evidence reports that we had n(t ) = 3 counts already, so that 1 p 0 this outcome (a belief function on Ω = {H, T }) robustifies classical MLE Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 63 / 125

64 Reasoning with belief functions Statistical inference Summary on inference general Bayesian inference continuous PDF on the parameter space Θ (a second-order distribution) MLE/MAP estimation a single parameter value = a single PDF on Ω generalised maximum likelihood a belief function on Ω (a convex set of PDFs on Ω) generalises MAP/MLE likelihood-based / Dempster-based belief function inference a belief function on Θ = a convex set of second-order distributions generalises general Bayesian inference Dempster s approach requires a data-generating process likelihood approach produces only consonant BFs Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 64 / 125

65 Reasoning with belief functions Combination Combining vs conditioning Reasoning with belief functions belief theory is a generalisation of Bayesian reasoning whereas in Bayesian theory evidence is of the kind A is true (e.g. a new datum is available).... in belief theory, new evidence can assume the more general form of a belief function a proposition A is a very special case of belief function with m(a) = 1 in most cases, reasoning needs then to be performed by combining belief functions, rather than by conditioning with respect to an event nevertheless, conditional belief functions are of interest, especially for statistical inference Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 65 / 125

66 Reasoning with belief functions Combination Dempster s rule under fire Zadeh s paradox question is: is Dempster s sum the only possible rule of combination? seems to have paradoxical behaviour in certain circumstances doctors have opinions about the condition of a patient Θ = {M, C, T }, where M stands for meningitis, C for concussion and T for tumor two doctors provide the following diagnoses: D 1 : I am 99% sure it s meningitis, but there is a small chance of 1% that it is concussion". D 2 : I am 99% sure it s a tumor, but there is a small chance of 1% that it is concussion". can be encoded by the following mass functions: 0.99 A = {M} m 1 (A) = 0.01 A = {C} m 2 (A) = 0 otherwise 0.99 A = {T } 0.01 A = {C} 0 otherwise, (1) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 66 / 125

67 Reasoning with belief functions Combination Dempster s rule under fire Zadeh s paradox their (unnormalised) Dempster s combination is: { A = { } m(a) = A = {C} as the two masses are highly conflicting, normalisation yields the belief function focussed on C it is definitively concussion, although both experts had left it as only a fringe possibility objections: the belief functions in the example are really probabilities, so this is a problem with Bayesian representations, in case! diseases are never exclusive, so that it may be argued that Zadeh s choice of a frame of discernment is misleading open world approaches with no normalisation doctors disagree so much that any person would conclude that one of the them is just wrong reliability of sources needs to be accounted for Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 67 / 125

68 Reasoning with belief functions Combination Dempster s rule under fire Tchamova s paradox this time, the two doctors generate the following mass assignments over Θ = {M, C, T }: a A = {M} b 1 A = {M, C} m 1 (A) = 1 a A = {M, C} m 2 (A) = b 2 A = Θ 0 otherwise 1 b 1 b 2 A = {T }. assuming equal reliability of the two doctors, Dempster s combination yields m 1 m 2 = m 1, i.e, Doctor 2 s diagnosis is completely absorbed by that of Doctor 1! here the paradoxical behaviour is not a consequence of conflict in Dempster s combination, every source of evidence has a veto power over the hypotheses it does not believe to be possible if any of them gets it wrong, the combined belief function will never give support to the correct hypothesis (2) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 68 / 125

69 Reasoning with belief functions Combination Yager s and Dubois rules first answer to Zadeh s objections based on view that conflict is generated by non-reliable information sources conflicting mass m( ) = B C= m 1(B)m 2 (C) should be re-assigned to the whole frame Θ let m (A) = m 1 (B)m 2 (C) whenever B C = A m Y (A) = { m (A) A Θ m (Θ) + m( ) A = Θ. (3) Dubois and Prade s idea: similar to Yager s, BUT conflicting mass is not transferred all the way up, but to B C (due to applying the minimum specificity principle) m D (A) = m (A) + m 1 (B)m 2 (C). (4) B C=A,B C= the resulting BF dominates Yager s combination: m D (A) m Y (A) A Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 69 / 125

70 Reasoning with belief functions Combination Conjunctive and disjunctive rules rather than normalising (as in Dempster s rule) or re-assigning the conflicting mass m( ) to other non-empty subsets (as in Yager s and Dubois proposals), Smets conjunctive rule leaves the conflicting mass with the empty set: m (A) = m 1 (B)m 2 (C) (5) B C=A applicable to unnormalised belief functions in an open world assumption: current frame only approximately describes the set of possible hypotheses disjunctive rule of combination: m (A) = B C=A m 1 (B)m 2 (C) (6) consensus between two sources is expressed by the union of the supported propositions, rather than by their intersection not that Bel 1 Bel 2 (A) = Bel 1 (A) Bel 2 (A): belief values are simply multiplied! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 70 / 125

71 Reasoning with belief functions Combination Combination: some conclusions Yager s rule is rather unjustified.. Dubois is kinda intermediate between conjunction and disjunction my take on this: Dempster s (conjunctive) combination and disjunctive combination are the two extrema of a spectrum of possible results Proposal: combination tubes? Meta-uncertainty on the sources generating the input belief functions (their independence and reliability) induces uncertainty on the result of the combination, represented by a bracket of combination rules, which produce a tube of BFs. fits well with belief likelihood concept, and was already hinted at by Pearl in Reasoning with belief functions: An analysis of compatibility we should probably work with intervals of belief functions then? Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 71 / 125

72 Reasoning with belief functions Conditioning Conditional belief functions Approaches in Bayesian theory conditioning is done via Bayes rule: P(A B) = P(A B) P(B) for belief functions, many approaches to conditioning have been proposed (just as for combination!) original Dempster s conditioning Fagin and Halpern s lower envelopes geometric conditioning [Suppes] unnormalized conditional belief functions [Smets] generalised Jeffrey s rules [Smets] sets of equivalent events under multi-valued mappings [Spies] several of them are special cases of combination rules: Dempster s, Smets.. others are the unique solution when interpreting belief functions as convex sets of probabilities (Fagin s) once again, a duality emerges between the most and least cautious conditioning approaches Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 72 / 125

73 Reasoning with belief functions Conditioning Dempster s conditioning Dempster s rule of combination induces a conditioning operator given a new event B, the logical belief function such that m(b) = is combined with the a-priori belief function Bel using Dempster s rule the resulting BF is the conditional belief function given B Bel B Bel (A B) in terms of belief and plausibility values, Dempster s conditioning yields Bel (A B) = Bel(A B) Bel( B) 1 Bel( B) = Pl(B) Pl(B\A), Pl Pl(B) (A B) = Pl(A B) Pl(B) obtained by Bayes rule by replacing probability with plausibility measures! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 73 / 125

74 Reasoning with belief functions Conditioning Lower envelopes of conditional probabilities we know that a belief function can be seen as the lower envelope of the family of probabilities consistent with it: Bel(A) = inf P P[Bel] P(A) conditional belief function as the lower envelope (the inf) of the family of conditional probability functions P(A B), where P is consistent with Bel: Bel Cr (A B). = inf P(A B), Pl Cr (A B) =. sup P(A B) P P[Bel] P P[Bel] quite incompatible with the random set interpretation nevertheless, whereas lower/upper envelopes of arbitrary sets of probabilities are not in general belief functions, these actually are belief functions: Bel Cr (A B) = Bel(A B) Bel(A B+Pl(Ā B), Pl Cr (A B) = Pl(A B) Pl(A B)+Bel(Ā B) they provide a more conservative estimate then Dempster s conditioning Bel Cr (A B) Bel (A B) Pl (A B) Pl Cr (A B) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 74 / 125

75 Reasoning with belief functions Conditioning Geometric conditioning Suppes and Zanotti proposed a geometric conditioning approach Bel G (A B) = Bel(A B) Bel(B) Bel(B \ A), Pl G (A B) = Bel(B) Bel(B) retains only the masses of focal elements inside B, and normalises them: m G (A B) = m(a) Bel(B) A B it is a consequence of the focussing approach to belief update: no new information is introduced, we merely focus on a specific subset of the original set replaces probability with belief measures in Bayes rule Pl (A B) = Pl(A B) Bel Pl(B) G (A B) = Bel(A B) Bel(B) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 75 / 125

76 Reasoning with belief functions Conditioning Conjunctive rule of conditioning it is induced by the conjunctive rule of combination: m (A B) = m m B (m B is the logical BF focussed on B) [Smets] its belief and plausibility values are: Bel (A B) = { Bel(A B) A B 0 A B = Pl (A B) = { Pl(A B) A B 1 A B = it is compatible with the principles of belief revision [Gilboa, Perea]: a state of belief is modified to take into account a new piece of information in probability theory, both focussing and revision are expressed by Bayes rule, but they are conceptually different operations which produce different results on BFs it is more committal than Dempster s rule! Bel (A B) Bel (A B) Pl (A B) Pl (A B) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 76 / 125

77 Reasoning with belief functions Conditioning Disjunctive rule of conditioning induced by the disjunctive rule of combination: m (A B) = m m B obviously dual to conjunctive conditioning assigns mass only to subsets containing the conditioning event B belief and plausibility values: Bel (A B) = { Bel(A) A B 0 A B Pl (A B) = { Pl(A) A B = 1 A B it is less committal not only than Dempster s rule, but also than credal conditioning Bel (A B) Bel Cr (A B) Pl Cr (A B) Pl (A B) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 77 / 125

78 Reasoning with belief functions Conditioning Conditioning - an overview belief plausibility Pl(B) Pl(B \ A) Pl(A B) Dempster s Pl(B) Pl(B) Bel(A B) Pl(A B) Credal Cr Bel(A B) + Pl(Ā B) Pl(A B) + Bel(Ā B) Bel(A B) Bel(B) Bel(B \ A) Geometric G Bel(B) Bel(B) Conjunctive Bel(A B), A B Pl(A B), A B Disjunctive Bel(A), A B Pl(A), A B = Nested conditioning operators Conditioning operators form a nested family, from the more committal to the least one! Bl ( ) Bl Cr ( ) Bl ( ) Bl ( ) Pl ( ) Pl ( ) Pl Cr ( ) Pl ( ) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 78 / 125

79 Reasoning with belief functions Belief vs Bayesian reasoning Belief vs Bayesian reasoning A toy example suppose we want to estimate the class of an object appearing in an image, based on feature measurements extracted from the image (e.g. by convolutional neural networks) we capture a training set of images, complete with annotated object labels assuming a PDF of a certain family (e.g. mixture of Gaussians) we can learn from the training data a likelihood function p(y x), where y is the object class and x the image feature vector suppose n different sensors extract n features x i from each image: x 1,..., x n let us compare how data fusion works under the Bayesian and the belief function paradigms! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 79 / 125

80 Reasoning with belief functions Belief vs Bayesian reasoning (Naive) Bayesian data fusion Belief vs Bayesian reasoning the likelihoods of the individual features are computed using the n likelihood functions learned during training: p(x i y), for all i = 1,..., n measurements are typically assumed to be conditionally independent, yielding the product likelihood p(x y) = i p(x i y) Bayesian inference is applied, typically assuming uniform priors (for there is no reason to think otherwise), yielding p(y x) p(x y) = i p(x i y) x 1 likelihood function p(x 1 y) conditional independence uniform prior... Bayes' rule x n likelihood function p(x n y) i p(x i y) p(y x) ~ i p(x i y) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 80 / 125

81 Reasoning with belief functions Belief vs Bayesian reasoning Dempster-Shafer data fusion Belief vs Bayesian reasoning with belief functions, for each feature type i a BF is learned from the the individual likelihood p(x i y), e.g. via the likelihood-based approach by Shafer this yields n belief functions Bel(y x i ), on the range of possible object classes Y a combination rule is applied to compute an overall BF (e.g.,, ), obtaining Bel(Y x) = Bel(Y x 1 )... Bel(Y x n), Y Y x 1 likelihood function p(x 1 y) likelihood-based inference Bel(Y x 1 ) belief function combination... Bel(Y x) x n likelihood function p(x n y) likelihood-based inference Bel(Y x n ) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 81 / 125

82 Reasoning with belief functions Belief vs Bayesian reasoning Inference under partially reliable data Belief vs Bayesian reasoning in the fusion example we have assumed that the data are measured correctly what if the data-generating process is not completely reliable? problem: suppose we want to just detect an object (binary decision: yes Y or no N) two sensors produce image features x 1 and x 2, but we learned from the training data that both are reliable only 20% of the time at test time we get an image, measure x 1 and x 2, and unluckily sensor 2 got it wrong! the object is actually there we get the following normalised likelihoods p(x 1 Y ) = 0.9, p(x 1 N) = 0.1; p(x 2 Y ) = 0.1, p(x 2 N) = 0.9 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 82 / 125

83 Reasoning with belief functions Belief vs Bayesian reasoning Inference under partially reliable data Belief vs Bayesian reasoning how do the two fusion pipelines cope with this? the Bayesian scholar assumes the two sensors/processes are conditionally independent, and multiply the likelihoods obtaining p(x 1, x 2 Y ) = = 0.09, p(x 1, x 2 N) = = 0.09 so that p(y x 1, x 2 ) = 1 2, p(n x 1, x 2 ) = 1 2 Shafer s faithful follower discounts the likelihoods by assigning mass.2 to the whole hypothesis space Θ = {Y, N}: m(y x 1 ) = = 0.72, m(n x 1 ) = = 0.08, m(θ x 1 ) = 0.2; m(y x 2 ) = = 0.08, m(n x 2 ) = = 0.72 m(θ x 2 ) = 0.2 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 83 / 125

84 Reasoning with belief functions Belief vs Bayesian reasoning Inference under partially reliable data Belief vs Bayesian reasoning thus, when we combine them by Dempster s rule we get the BF Bel on {Y, N}: m(y x 1, x 2 ) = 0.458, m(n x 1, x 2 ) = 0.458, m(θ x 1, x 2 ) = when combined using the disjunctive rule (the least committal one) we get Bel : m (Y x 1, x 2 ) = 0.09, m (N x 1, x 2 ) = 0.09, m (Θ x 1, x 2 ) = 0.82 the corresponding (credal) sets of probabilities are Bel Bel' Bayes P(Y x,x ) the credal interval for Bel is quite narrow: reliability is assumed to be 80%, and we got a faulty measurement in two! (50%) the disjunctive rule is much more cautious about the correct inference 1 2 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 84 / 125

85 Reasoning with belief functions Generalised Bayes Theorem Generalised Bayes Theorem Generalising full Bayesian inference in Smets generalised Bayesian theorem setting, the input is a set of conditional belief functions on Θ, rather than likelihoods p(x θ) there Bel X (X θ), X X, θ Θ each associated with a value θ of the parameter (these are not the same conditional belief functions we saw, where a conditioning event B Θ alters a prior belief function Bel Θ mapping it to Bel Θ (. B)) they can be seen as a parameterised family of BFs on the data the desired output is another family of belief functions on Θ, parameterised by all sets of measurements X on X: Bel Θ (A X), X X each piece of evidence m X (X θ) has an effect on our beliefs on the parameters coherent with the random set setting, as we condition on set-valued observations Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 85 / 125

86 Reasoning with belief functions Generalised Bayes Theorem Generalised Bayes Theorem Generalised Bayes Theorem Implements this inference Bel X (X θ) Bel Θ (A X) by: 1 computing an intermediate family of BFs on X parameterised by sets of parameter values: Bel X (X A) = θ A Bel X (X θ) = θ A Bel X (X θ) via the disjunctive rule of combination 2 assuming that Pl Θ (A X) = Pl X (X A) A Θ, X X 3 this yields Bel Θ (A X) = θ Ā Bel X ( X θ) generalises Bayes rule (by replacing P with Pl) when priors are uniform Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 86 / 125

87 Reasoning with belief functions The total belief theorem The total belief theorem Generalising the law of total probability conditional belief functions are crucial for our approach to inference complementary link of the chain: generalisation of the law of total probability recall that a refining is a mapping from elements of one set Ω to elements of a disjoint partition of a second set Θ Bel 0 = 2 [0,1] i i Bel i = 2 [0,1] i i Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 87 / 125

88 Reasoning with belief functions The total belief theorem The total belief theorem Statement Total belief theorem Suppose Θ and Ω are two finite sets, and ρ : 2 Ω 2 Θ the unique refining between them. Let Bel 0 be a belief function defined over Ω = {ω 1,..., ω Ω }. Suppose there exists a collection of belief functions Bel i : 2 Π i [0, 1], where Π = {Π 1,..., Π Ω }, Π i = ρ({ω i }), is the partition of Θ induced by Ω. Then, there exists a belief function Bel : 2 Θ [0, 1] such that: 1 Bel 0 is the marginal of Bel to Ω (Bel 0 (A) = Bel(ρ(A))); 2 Bel Bel Πi = Bel i i = 1,..., Ω, where Bel Πi is the logical belief function: m Πi (A) = 1 A = Π i, 0 otherwise several distinct solutions exists, and they likely form a graph with symmetries one such solution is easily identifiable Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 88 / 125

89 Reasoning with belief functions The total belief theorem The total belief theorem Existence of a solution [Zhou & Cuzzolin, UAI 2017] assume Θ Θ, and m a mass function over Θ m can be identified with a mass function m Θ over the larger frame Θ : for any E Θ, m Θ (E ) = m(e) if E = E (Θ \ Θ) and m Θ (E ) = 0 otherwise such m Θ is called the conditional embedding of m into Θ let Bel i be the conditional embedding of Bel i into Θ for all Bel i : 2 Π i [0, 1], and Bel = Bel 1 Bel Ω Total belief theorem: existence The belief function Bel. = Bel Θ 0 Bel is a valid total belief function. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 89 / 125

90 Reasoning with belief functions Decision making Decision making with belief functions a decision problem can be formalised by defining: a set Ω of possible states of the world a set X of consequences and a set F of acts, where an act is a function f : Ω X mapping a world state to a consequence problem: to select an act f from an available list F (i.e., to make a decision), which optimises a certain objective function various approaches to decision making with belief functions; among those: decision making in the TBM is based on expected utility via pignistic transform generalised expected utility [Gilboa] based on classical expected utility theory [Savage,von Neumann] also a lot of interest in multicriteria decision making (based on a number of attributes) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 90 / 125

91 Reasoning with belief functions Decision making Decision making with the pignistic probability classical expected utility theory is due to Von Neumann in Smets Transferable Belief Model, decision making is done by maximising the expected utility of actions based on the pignistic transform this maps a belief function Bel on Ω to a probability distribution there: BetP[Bel](ω) = A ω m(a) A ω Ω the set of possible actions F and the set Ω of possible outcomes are distinct, and the utility function u is defined on F Ω the optimal decision maximises E[u] = ω Ω u(f, ω)pign(ω) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 91 / 125

92 Reasoning with belief functions Decision making Savage s sure thing principle let be a preference relation on F, such that f g means that f is at least as desirable as g Savage (1954) has showed that verifies some rationality requirements iff there exists a probability measure P on Ω and a utility function u : X R s.t. f, g F, f g E P (u f ) E P (u g) does that mean that using belief functions is irrational? given f, h F and E Ω, let feh denote the act defined by { f (ω) if ω E (feh)(ω) = h(ω) if ω E then the sure thing principle states that E, f, g, h, h : feh geh feh geh Ellsberg s paradox: empirically the Sure Thing Principle is violated! this is because people are averted to second-order uncertainty Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 92 / 125

93 Reasoning with belief functions Decision making Ellsberg s paradox suppose you have an urn containing 30 red balls and 60 balls, either black or yellow f 1 : you receive 100 euros if you draw a red ball f 2 : you receive 100 euros if you draw a black ball f 3 : you receive 100 euros if you draw a red or yellow ball f 4 : you receive 100 euros if you draw a black or yellow ball in this example Ω = {R, B, Y }, f i : Ω R and X = R empirically most people strictly prefer f 1 to f 2, but they strictly prefer f 4 to f 3 R B Y f Now, pick E = {R, B}: by definition f f 1 {R, B}0 = f 1, f 2 {R, B}0 = f 2 f f 1 {R, B}100 = f 3, f 2 {R, B}100 = f 4 f since f 1 f 2, i.e. f 1 {R, B}0 f 2 {R, B}0 the Sure Thing principle would imply f 1 {R, B}100 f 2 {R, B}100, i.e., f 3 f 4 empirically the Sure Thing Principle is violated! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 93 / 125

94 Reasoning with belief functions Decision making Lower and upper expected utilities Gilboa (1987) proposed a modification of Savage s axioms a preference relation meets these weaker requirements iff there exists a (non necessarily additive) measure µ and a utility function u : X R such that f, g F, f g C µ(u f ) C µ(u g), where C µ is the Choquet integral, defined for X : Ω R as + 0 C µ(x) = µ(x(ω) t)dt + [µ(x(ω) t) 1]dt. 0 given a belief function Bel on Ω and a utility function u, this theorem supports making decisions based on the Choquet integral of u with respect to Bel for finite Ω, it can be shown that C Bel (u f ) = m(b) min u(f (ω)) C ω B B Ω Pl(u f ) = B Ω m(b) max u(f (ω)) ω B (lower and upper expectations of u f with respect to Bel) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 94 / 125

95 Reasoning with belief functions Decision making Decision making Possible strategies let P(Bel) as usual be the set of probability measures P compatible with Bel, i.e., such that Bel P. Then, it can be shown that C Bel (u f ) = min E P (u f ) = E(u f ) C Pl (u f ) = E(u f ) P P(Bel) two expected utilities E(f ) and E(f ): how do we make a decision? possible decision criteria based on interval dominance: 1 f g iff E(u f ) E(u g) (conservative strategy) 2 f g iff E(u f ) E(u g) (pessimistic strategy) 3 f g iff E(u f ) E(u g) (optimistic strategy) 4 f g iff αe(u f ) + (1 α)e(u f ) αe(u g) + (1 α)e(u g) for some α [0, 1] called a pessimism index (Hurwicz criterion) the conservative strategy yields only a partial preorder: f and g are not comparable if E(u f ) < E(u g) and E(u g) < E(u f ) Ellberg s paradox is actually explained by the pessimistic strategy Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 95 / 125

96 Theories of uncertainty Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 96 / 125

97 Theories of uncertainty Theories of uncertainty several different mathematical theories of uncertainty compete to be adopted by practitioners consensus is that there no such a thing as the best mathematical description of uncertainty random sets are not the most general framework; however, we argue here, they naturally arise from set-valued observations scholars have extensively discussed and compared the various approaches to uncertainty theory [Klir, Destercke] theoretical and empirical comparisons between belief functions and other theories were conducted [Lee,Yager, Helton,Regan..] some attempts have been made to unify most approaches to uncertainty theory [Klir,Zadeh,Walley] Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 97 / 125

98 Theories of uncertainty A hierarchy of uncertainty theories Lower/upper previsions CREDAL SETS MONOTONE CAPACITIES 2-MON CAPACITIES Lower/upper probabilities 2-monotone capacities FEASIBLE PROB INTERVALS BELIEF FUNCTIONS RANDOM SETS -MON CAPACITIES Probability intervals Random sets Generalised p-boxes NORMALISED SUM FUNCTIONS P-boxes Probabilities Possibilities Left: relation between BFs and other uncertainty measures Right: Destercke s partial hierarchies of uncertainty theories Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 98 / 125

99 Theories of uncertainty Imprecise probability Coherent lower probabilities Walley s Imprecise Probability behavioural approach to probability a lower probability P is a function from a sigma-algebra to the unit interval [0, 1] such that: P(A B) P(A) + P(B) A B = (super-additivity) a lower probability P avoids sure loss if P(P) =. { } P : P(A) P(A), A Ω (the lower bound constraints P(A) can be satisfied by some probability measure) it is coherent if inf P(A) = P(A) p P(P) (P is the lower envelope on subsets of P(P)) not all convex sets of probabilities can be described by merely focusing on events [Walley] notion of gamble following De Finetti, imprecise probability equals belief to inclination to act : an agent believes in an outcome to the extent it is willing to accept a bet on it Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 99 / 125

100 Theories of uncertainty Imprecise probability Desirable gambles Gamble A gamble is a bounded real-valued function on Θ: X : Θ R, θ X(θ). 0 D X Y a lower probability can be seen as a functional defined on the class of all indicator functions of sets (the traditional events) an agent s set of desirable gambles by D L(Ω), where L(Ω) is the set of all bounded real valued functions on Ω since whether a gamble is desirable depends on the agent s belief on the outcome, D can be used as a model of the agent s uncertainty about the problem Coherence of desirable gambles A set D of desirable gambles is coherent iff it is a convex cone (it is closed under convex combination). Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

101 Theories of uncertainty Imprecise probability Lower and upper previsions suppose the agent buys a gamble X for a price µ: this yields a new gamble X µ lower prevision P(X) of a gamble X: P(X). = sup{µ : X µ D}, the supremum acceptable price for buying X selling a gamble X for a price µ also yields a new gamble µ X upper prevision P(X) of a gamble X: P(X). = inf{µ : µ X D}, supremum acceptable price for selling X when lower and upper prevision coincide, P(X) = P(X) = P(X) is called the precise prevision of X (what de Finetti called fair price ) for prices in [P(X), P(X)] we are undecided as to whether buy or sell gamble X Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

102 Theories of uncertainty Imprecise probability Rules of rational behaviour Rational behaviour the agent does not specify betting rates such as they lose utility whatever the outcomes (avoiding sure loss) the agent is fully aware of the consequences of its betting rates (coherence) if the first condition is not met, there exists a positive combination of gambles which the agent finds individually desirable which is not desirable to them one consequence of avoiding sure loss is that P(A) P(A) a consequence of coherence is that lower previsions are superadditive a precise prevision P is coherent iff: (i) P(λX + µy ) = λp(x) + µp(y ); (ii) if X > 0 then P(X) 0; (iii) P(Ω) = 1, and coincides with de Finetti s notion of coherent prevision A powerful theory Generalises probability measures, de Finetti previsions, 2-monotone capacities, Choquet capacities, possibility/necessity measures, belief/plausibility measures, random sets but also probability boxes, credal sets, and robust Bayesian models. Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

103 Theories of uncertainty Monotone capacities Monotone capacities Choquet [1953], Sugeno [1974] the theory of capacity is a generalisation of classical measure theory Monotone capacity Given a domain Θ and a non-empty family F of subsets of Θ, a monotone capacity or fuzzy measure is a function µ : F [0, 1] such that µ( ) = 0 if A B then µ(a) µ(b), for every A, B F ( monotonicity ) for any nonnegative measurable function f on (Θ, F), the Choquet integral of f on any A F is defined as: C µ(f ). = where F α = {x Θ f (x) α}, α [0, ) 0 µ(f α A)dα, both Choquet integral of monotone capacities and natural extension of lower probabilities are generalisations of the Lebesgue integral Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

104 Theories of uncertainty Monotone capacities Order of a capacity Special types of capacities Order of a capacity A capacity µ is said to be of order k if ( k ) µ A j j=1 K [1,...,k] for all collections of k subsets A j, j K of Θ ( ( 1) K +1 µ j K A j ) if k > k the resulting theory is less general than a theory of capacities of order k Capacities and belief functions Belief functions are infinitely monotone capacities. just compare the definition of order with the third axiom of belief functions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

105 Theories of uncertainty Probability intervals Probability intervals Set of probability intervals A system of constraints on a probability distribution p : Θ [0, 1] of the form: P(l, u) =. { } p : l(x) p(x) u(x), x Θ probability intervals typically arise through measurement errors, or measurements inherently of interval nature a set of probability intervals also determines a credal set, a sub-class of all credal sets generated by lower and upper probabilities each belief function induces a set of probability intervals Belief functions and probability intervals The minimal probability interval containing a pair of belief/plausibility functions is that whose lower bound is the belief of singletons, the upper bound is their plausibility: l (x) = Bel(x), u (x) = Pl(x) x Θ Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

106 Theories of uncertainty Fuzzy and possibility theory Fuzzy sets and possibility theory Zadeh,Dubois and Prade concept of fuzzy set was introduced by Lotfi A. Zadeh [1965] elements belong to a set with a certain degree of membership theory was further developed by Didier Dubois and Henri Prade into a mathematical theory of partial belief, called possibility theory a possibility measure on Θ is a function Π : 2 Θ [0, 1] such that Π( ) = 0, Π(Θ) = 1 and ( ) Π A i = sup Π(A i ) i for every family of subsets {A i 2 Θ } i each possibility measure is uniquely characterised by a membership function π : Θ [0, 1] s.t. π(x). = Π({x}) via the formula: Π(A) = sup x A π(x) the dual quantity N(A) = 1 Π(A c ) is called necessity measure Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

107 Theories of uncertainty Fuzzy and possibility theory Possibility and belief measures call plausibility assignment pl the restriction of the plausibility function to singletons pl(x) = Pl({x}) - then [Shafer]: Bel is a necessity measure iff Bel is consonant the membership function coincides with the plausibility assignment a finite fuzzy set is equivalent to a consonant belief function Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

108 Theories of uncertainty Fuzzy and possibility theory Belief functions on fuzzy sets belief functions defined on fuzzy sets have also been proposed basic idea: belief measures generalised on fuzzy sets as follows: Bel(X) = I(A X)m(A) A M where X is a fuzzy set defined on Θ, m is a mass function defined on the collection M of fuzzy sets on Θ I(A X) is a measure of how much fuzzy set A is included in fuzzy set X various measures of inclusion in [0, 1] can be proposed: Lukasiewicz: I(x, y) = min{1, 1 x y} [Ishizuka] Kleene-Dienes: I(x, y) = max{1 x, y} [Yager] from which one can get: I(A B) = x Θ I(A(x), B(y)) [Wu 2009] Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

109 Theories of uncertainty Probability boxes Probability boxes and random sets a probability box or p-box [Ferson and { Hajagos] F, F is a } class of cumulative distribution functions (CDFs) F, F = F CDF : F F F every pair Bel, Pl defined on the real line R (a random set), generates a unique p-box: F(x) = Bel((, x]), F(x) = Pl((, x]) F F X conversely, every p-box generates an entire equivalence class of random intervals e.g. the one with as focal elements { [ ] F = γ = F 1 (α), F 1 (α) } α [0, 1] where F 1 (α). = inf{f(x) α}, F 1 (α). = inf{f(x) α} are the quasi-inverses of the upper and lower CDFs Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

110 Theories of uncertainty Rough sets Rough sets first described by Polish computer scientist Zdzislaw I. Pawlak [1991] strongly linked to the idea of partition of the universe of hypotheses provide a formal approximation of a traditional set in terms of a pair of lower and an upper approximating sets let R Θ Θ be an equivalence relation which partitions Θ into a family of disjoint subsets Θ/R, called elementary sets measurable sets σ(θ/r): the unions of one or more elementary sets, plus the empty set we can then approximate any subset of Θ using those measurable sets X: apr(a) = { X σ(u/r), X A }, apr(a) = { X σ(u/r), X A } Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

111 Theories of uncertainty Rough sets Rough sets and belief functions elementary set lower approximation upper approximation event A Universe any probability P on F = σ(θ/r) can be extended to 2 Θ using inner measures: { } P (A) = sup P(X) X σ(θ/r), X A = P(apr(A)) these are belief functions! (as was recognised before) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

112 Belief functions on reals Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

113 Belief functions on reals Continuous formulations of the theory of belief functions in the original formulation by Shafer [1976], belief functions are defined on finite sets only need for generalising this to arbitrary domains was soon recognised main approaches to continuous formulation: Shafer s allocations of probability [1982] continuous belief functions on Borel intervals of the real line [Strat90,Smets] belief functions as random sets [Nguyen78, Molchanov06] other approaches, with limited (so far) impact generalised evidence theory MV algebras several others Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

114 Belief functions on reals Continuous belief functions Continuous belief functions [Strat, Smets] take as frame of discernment Θ the set of possible closed intervals right extremum 0 a b 1 0 a b 1 0 a b left b extremum a 0 intervals contained in [a,b] intervals containing [a,b] b a 0 Bel([a,b]) b a 0 Pl([a,b]) Bel([a, b]) = b b a x m(x, y)dydx, Pl([a, b]) = b N 0 max(a,x) Dempster s rule generalises in terms of double integrals continuous pignistic PDF: Bet(a). = lim ɛ 0 a 0 dx 1 a+ɛ m(x, y) y x dy m(x, y)dydx Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

115 Belief functions on reals Continuous belief functions Special cases of random closed intervals Fuzzy sets and p-boxes Consonant random interval (x) 1 1 p-box F * Γ( ) Γ( ) F * 0 U( ) V( ) x 0 U( ) V( ) x a fuzzy set on the real line induces a mapping to a collection of nested intervals, parameterised by the level c a p-box, i.e, upper and lower bounds to a cumulative distribution function, also induces a family of intervals (as we already saw) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

116 Belief functions on reals Random sets Belief functions as random sets [Nguyen,1978], [Hestir,1991], [Shafer,1987] given a multi-valued mapping Γ, a straightforward step is to consider the probability value P(ω) as attached to the subset Γ(ω) Θ: this is a random set in Θ, i.e., a probability measure on a collection of subsets the degree of belief Bel(A) of an event A becomes the cumulative distribution function (CDF) of the open interval of sets {B A} in 2 Θ the lower inverse and upper inverse of Γ are: Γ. { } (A) = ω Ω : Γ(ω) A, Γ(ω) Γ (A) =. { } ω Ω : Γ(ω) A given two σ-fields A, B on Ω, Θ, Γ is said strongly measurable iff B B, Γ (B) A the lower probability measure on B is defined as P (B). = P(Γ (B)) for all B B - this is nothing but a belief function! Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

117 Belief functions on reals Random sets Belief functions as random sets Molchanov s work recently, strong renewed interest in a theory of random sets, thanks to Molchanov [2006,2017] and others theory of calculus with capacities and random sets Radon-Nikodym theorems for capacities and random sets and derivatives of capacities (conditional) expectations of random sets limit theorems: strong law of large numbers, central limit theorem, Gaussian RSs examined set-valued random processes powerful mathematical framework! way forward for the theory in my view no mentioning of conditioning and combination yet connections with mathematical statistics to develop special case of random element [Frechet], random variable with structured output Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

118 Belief functions on reals Random sets Random closed sets the family of all sets is too large, we typically restrict ourselves to the case of random elements in the space of closed subsets of a certain topological space E the family of closed subsets of E is denoted by C K denotes the family of all compact subsets of E let (Ω, F, P) be a probability space a map X : Ω C is called a random closed set if, for every compact set K in E: {ω : X(ω) K } F this is equivalent to strong measurability, whenever the σ-field on Θ is replaced by the family K of compact subsets of Θ the consequence is that the upper probability of K exists K K Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

119 Belief functions on reals Random sets Random closed sets Some examples if ξ is a random variable, then X = (, ξ] is a random closed set if ξ 1, ξ 2 and ξ 3 are three random vectors in R d, then the triangle with vertices ξ 1, ξ 2 and ξ 3 is a random closed set Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

120 Belief functions on reals Random sets Capacity functionals Random closed sets a functional T X : K [0, 1] given by T X (K ) = P({X K }), K K is said to be the capacity functional of X in particular, if X = {ξ} is a classical random variable, then T X (K ) = P({ξ K }) is the probability distribution of the random variable ξ the name capacity functional follows from the fact that T X is a functional on K which takes values in [0, 1], equals 0 on the empty set, is monotone and upper semicontinuous (i.e., T X is a capacity, and also completely alternating on K) T X (K ) is the plausibility measure induced by the multivalued mapping X, restricted to compact subsets the links betwen random closed sets and belief/plausibility functions, upper and lower probabilities, and contaminated models in statistics are very briefly hinted at in [Molchanov 2005], Chapter 1, Section 9 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

121 Conclusions Outline 1 Uncertainty Second-order uncertainty Classical probability 2 Beyond probability Set-valued observations Propositional evidence Scarce data Representing ignorance Rare events Uncertain data 3 Belief theory A theory of evidence Belief functions Semantics Dempster s rule Multivariate analysis Misunderstandings 4 Reasoning with belief functions Statistical inference Combination Conditioning Belief vs Bayesian reasoning Generalised Bayes Theorem The total belief theorem Decision making 5 Theories of uncertainty Imprecise probability Monotone capacities Probability intervals Fuzzy and possibility theory Probability boxes Rough sets 6 Belief functions on reals Continuous belief functions Random sets 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

122 Conclusions A summary the theory of belief functions is grounded in the beautiful mathematics of random sets has strong relationships with other theories of uncertainty (can be efficiently implemented by Monte-Carlo approximation) statistical evidence may be represented in several ways: by likelihood-based belief functions, generalizing both likelihood-based and Bayesian inference by Dempster s idea of using auxiliary variables in the framework of the Generalised Bayesian Theorem (propagation on graphical models can be performed) decision making strategies based on intervals of expected utilities can be formulated that are more cautious than traditional ones the extension to continuous domains can be tackled via the Borel interval representation, in the more general case using the theory of random sets (a toolbox of estimation, classification, regression tools based on the theory of belief functions is available) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

123 Conclusions What still needs to be resolved clarify once and for all the epistemic interpretation of belief function theory random variables for set-valued observations mechanism for evidence combination still debated, depend on meta-information on sources hardly accessible working with intervals of belief functions may be the way forward acknowledges the meta-uncertainty on the nature of the sources generating the evidence same holds for conditioning (as we showed) what about computational complexity? not an issue, just apply sampling for approximate inference we do not need to assign mass to all subsets, but we need to be allowed to do so when observations are indeed sets belief functions on reals Borel intervals are nice, but the way forward is grounding the theory into the mathematics of random sets Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

124 Conclusions Future of random set/belief function theory fully developed theory of statistical inference with random sets generalised likelihood, logistic regression limit theorem, total probability for random sets random set random variables and processes frequentist inference with random sets propose solutions to high impact problems rare event prediction robust foundations for machine learning robust climatic change predictions further development of machine learning tools random set random forests generalised max entropy classification robust statistical learning theory Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

125 Appendix For Further Reading For Further Reading I G. Shafer. A mathematical theory of evidence. Princeton University Press, I. Molchanov. Theory of Random Sets. Springer, F. Cuzzolin. Visions of a generalized probability theory. Lambert Academic Publishing, F. Cuzzolin. The geometry of uncertainty - The geometry of imprecise probabilities. Springer-Verlag (in press). Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/ / 125

Belief functions: past, present and future

Belief functions: past, present and future Belief functions: past, present and future CSA 2016, Algiers Fabio Cuzzolin Department of Computing and Communication Technologies Oxford Brookes University, Oxford, UK Algiers, 13/12/2016 Fabio Cuzzolin

More information

Introduction to belief functions

Introduction to belief functions Introduction to belief functions Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) http://www.hds.utc.fr/ tdenoeux Spring School BFTA 2011 Autrans, April 4-8, 2011 Thierry

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Fuzzy Systems. Possibility Theory.

Fuzzy Systems. Possibility Theory. Fuzzy Systems Possibility Theory Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing

More information

Uncertainty and Rules

Uncertainty and Rules Uncertainty and Rules We have already seen that expert systems can operate within the realm of uncertainty. There are several sources of uncertainty in rules: Uncertainty related to individual rules Uncertainty

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Belief functions: basic theory and applications

Belief functions: basic theory and applications Belief functions: basic theory and applications Thierry Denœux 1 1 Université de Technologie de Compiègne, France HEUDIASYC (UMR CNRS 7253) https://www.hds.utc.fr/ tdenoeux ISIPTA 2013, Compiègne, France,

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information

Handling imprecise and uncertain class labels in classification and clustering

Handling imprecise and uncertain class labels in classification and clustering Handling imprecise and uncertain class labels in classification and clustering Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) COST Action IC 0702 Working group C, Mallorca,

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Basic Probabilistic Reasoning SEG

Basic Probabilistic Reasoning SEG Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Representing Uncertainty Manfred Huber 2005 1 Reasoning with Uncertainty The goal of reasoning is usually to: Determine the state of the world Determine what actions to take

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bayesian vs frequentist techniques for the analysis of binary outcome data 1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

Sequential adaptive combination of unreliable sources of evidence

Sequential adaptive combination of unreliable sources of evidence Sequential adaptive combination of unreliable sources of evidence Zhun-ga Liu, Quan Pan, Yong-mei Cheng School of Automation Northwestern Polytechnical University Xi an, China Email: liuzhunga@gmail.com

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Probability, Entropy, and Inference / More About Inference

Probability, Entropy, and Inference / More About Inference Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference

More information

Imprecise Probability

Imprecise Probability Imprecise Probability Alexander Karlsson University of Skövde School of Humanities and Informatics alexander.karlsson@his.se 6th October 2006 0 D W 0 L 0 Introduction The term imprecise probability refers

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins. Bayesian Reasoning Adapted from slides by Tim Finin and Marie desjardins. 1 Outline Probability theory Bayesian inference From the joint distribution Using independence/factoring From sources of evidence

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Counter-examples to Dempster s rule of combination

Counter-examples to Dempster s rule of combination Jean Dezert 1, Florentin Smarandache 2, Mohammad Khoshnevisan 3 1 ONERA, 29 Av. de la Division Leclerc 92320, Chatillon, France 2 Department of Mathematics University of New Mexico Gallup, NM 8730, U.S.A.

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Fuzzy Systems. Possibility Theory

Fuzzy Systems. Possibility Theory Fuzzy Systems Possibility Theory Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Probability is related to uncertainty and not (only) to the results of repeated experiments

Probability is related to uncertainty and not (only) to the results of repeated experiments Uncertainty probability Probability is related to uncertainty and not (only) to the results of repeated experiments G. D Agostini, Probabilità e incertezze di misura - Parte 1 p. 40 Uncertainty probability

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear. UNCERTAINTY In which we see what an agent should do when not all is crystal-clear. Outline Uncertainty Probabilistic Theory Axioms of Probability Probabilistic Reasoning Independency Bayes Rule Summary

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Pengju XJTU 2016

Pengju XJTU 2016 Introduction to AI Chapter13 Uncertainty Pengju Ren@IAIR Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Wumpus World Environment Squares adjacent to wumpus are

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

Analyzing the degree of conflict among belief functions.

Analyzing the degree of conflict among belief functions. Analyzing the degree of conflict among belief functions. Liu, W. 2006). Analyzing the degree of conflict among belief functions. Artificial Intelligence, 17011)11), 909-924. DOI: 10.1016/j.artint.2006.05.002

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Stochastic dominance with imprecise information

Stochastic dominance with imprecise information Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is

More information

Evidence with Uncertain Likelihoods

Evidence with Uncertain Likelihoods Evidence with Uncertain Likelihoods Joseph Y. Halpern Cornell University Ithaca, NY 14853 USA halpern@cs.cornell.edu Riccardo Pucella Cornell University Ithaca, NY 14853 USA riccardo@cs.cornell.edu Abstract

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week

More information

A New Definition of Entropy of Belief Functions in the Dempster-Shafer Theory

A New Definition of Entropy of Belief Functions in the Dempster-Shafer Theory A New Definition of Entropy of Belief Functions in the Dempster-Shafer Theory Radim Jiroušek Faculty of Management, University of Economics, and Institute of Information Theory and Automation, Academy

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

BIVARIATE P-BOXES AND MAXITIVE FUNCTIONS. Keywords: Uni- and bivariate p-boxes, maxitive functions, focal sets, comonotonicity,

BIVARIATE P-BOXES AND MAXITIVE FUNCTIONS. Keywords: Uni- and bivariate p-boxes, maxitive functions, focal sets, comonotonicity, BIVARIATE P-BOXES AND MAXITIVE FUNCTIONS IGNACIO MONTES AND ENRIQUE MIRANDA Abstract. We give necessary and sufficient conditions for a maxitive function to be the upper probability of a bivariate p-box,

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Entropy and Specificity in a Mathematical Theory of Evidence

Entropy and Specificity in a Mathematical Theory of Evidence Entropy and Specificity in a Mathematical Theory of Evidence Ronald R. Yager Abstract. We review Shafer s theory of evidence. We then introduce the concepts of entropy and specificity in the framework

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

A gentle introduction to imprecise probability models

A gentle introduction to imprecise probability models A gentle introduction to imprecise probability models and their behavioural interpretation Gert de Cooman gert.decooman@ugent.be SYSTeMS research group, Ghent University A gentle introduction to imprecise

More information

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University Formal Epistemology: Lecture Notes Horacio Arló-Costa Carnegie Mellon University hcosta@andrew.cmu.edu Bayesian Epistemology Radical probabilism doesn t insists that probabilities be based on certainties;

More information

Today s s lecture. Lecture 16: Uncertainty - 6. Dempster-Shafer Theory. Alternative Models of Dealing with Uncertainty Information/Evidence

Today s s lecture. Lecture 16: Uncertainty - 6. Dempster-Shafer Theory. Alternative Models of Dealing with Uncertainty Information/Evidence Today s s lecture Lecture 6: Uncertainty - 6 Alternative Models of Dealing with Uncertainty Information/Evidence Dempster-Shaffer Theory of Evidence Victor Lesser CMPSCI 683 Fall 24 Fuzzy logic Logical

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

The internal conflict of a belief function

The internal conflict of a belief function The internal conflict of a belief function Johan Schubert 1 Abstract In this paper we define and derive an internal conflict of a belief function We decompose the belief function in question into a set

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Bayesian Inference. p(y)

Bayesian Inference. p(y) Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,

More information

Context-dependent Combination of Sensor Information in Dempster-Shafer Theory for BDI

Context-dependent Combination of Sensor Information in Dempster-Shafer Theory for BDI Context-dependent Combination of Sensor Information in Dempster-Shafer Theory for BDI Sarah Calderwood Kevin McAreavey Weiru Liu Jun Hong Abstract There has been much interest in the Belief-Desire-Intention

More information

Probability, Statistics, and Bayes Theorem Session 3

Probability, Statistics, and Bayes Theorem Session 3 Probability, Statistics, and Bayes Theorem Session 3 1 Introduction Now that we know what Bayes Theorem is, we want to explore some of the ways that it can be used in real-life situations. Often the results

More information

The Fundamental Principle of Data Science

The Fundamental Principle of Data Science The Fundamental Principle of Data Science Harry Crane Department of Statistics Rutgers May 7, 2018 Web : www.harrycrane.com Project : researchers.one Contact : @HarryDCrane Harry Crane (Rutgers) Foundations

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS

A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS Albena TCHAMOVA, Jean DEZERT and Florentin SMARANDACHE Abstract: In this paper a particular combination rule based on specified

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP

A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP 08 76131 Mont-Saint-Aignan Cedex, France Eric.Lefevre@insa-rouen.fr

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

2. A Basic Statistical Toolbox

2. A Basic Statistical Toolbox . A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Introductory Econometrics. Review of statistics (Part II: Inference)

Introductory Econometrics. Review of statistics (Part II: Inference) Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference Introduction to Bayesian Inference p. 1/2 Introduction to Bayesian Inference September 15th, 2010 Reading: Hoff Chapter 1-2 Introduction to Bayesian Inference p. 2/2 Probability: Measurement of Uncertainty

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning Knowledge Engineering Semester 2, 2004-05 Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 8 Dealing with Uncertainty 8th ebruary 2005 Where are we? Last time... Model-based reasoning oday... pproaches to

More information

Outline. On Premise Evaluation On Conclusion Entailment. 1 Imperfection : Why and What. 2 Imperfection : How. 3 Conclusions

Outline. On Premise Evaluation On Conclusion Entailment. 1 Imperfection : Why and What. 2 Imperfection : How. 3 Conclusions Outline 1 Imperfection : Why and What 2 Imperfection : How On Premise Evaluation On Conclusion Entailment 3 Conclusions Outline 1 Imperfection : Why and What 2 Imperfection : How On Premise Evaluation

More information

Analyzing the Combination of Conflicting Belief Functions.

Analyzing the Combination of Conflicting Belief Functions. Analyzing the Combination of Conflicting Belief Functions. Philippe Smets IRIDIA Université Libre de Bruxelles 50 av. Roosevelt, CP 194-6, 1050 Bruxelles, Belgium psmets@ulb.ac.be http://iridia.ulb.ac.be/

More information

P (E) = P (A 1 )P (A 2 )... P (A n ).

P (E) = P (A 1 )P (A 2 )... P (A n ). Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer

More information

Review: Statistical Model

Review: Statistical Model Review: Statistical Model { f θ :θ Ω} A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The statistical model

More information

Combining Belief Functions Issued from Dependent Sources

Combining Belief Functions Issued from Dependent Sources Combining Belief Functions Issued from Dependent Sources MARCO E.G.V. CATTANEO ETH Zürich, Switzerland Abstract Dempster s rule for combining two belief functions assumes the independence of the sources

More information

Bayesian data analysis using JASP

Bayesian data analysis using JASP Bayesian data analysis using JASP Dani Navarro compcogscisydney.com/jasp-tute.html Part 1: Theory Philosophy of probability Introducing Bayes rule Bayesian reasoning A simple example Bayesian hypothesis

More information

MATH MW Elementary Probability Course Notes Part I: Models and Counting

MATH MW Elementary Probability Course Notes Part I: Models and Counting MATH 2030 3.00MW Elementary Probability Course Notes Part I: Models and Counting Tom Salisbury salt@yorku.ca York University Winter 2010 Introduction [Jan 5] Probability: the mathematics used for Statistics

More information