Belief functions: past, present and future

Size: px

Start display at page:

Download "Belief functions: past, present and future"

Bruce Bennett
5 years ago
Views:

1 Belief functions: past, present and future CSA 2016, Algiers Fabio Cuzzolin Department of Computing and Communication Technologies Oxford Brookes University, Oxford, UK Algiers, 13/12/2016 Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

IJCAI tutorial web site http://cms.brookes.ac.uk/staff/fabiocuzzolin/ijcai2016.

2 IJCAI tutorial web site Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

3 Beyond probability Outline 1 Beyond probability Uncertainty The cloaked die The murder trial Uncertain evidence 2 A theory of evidence Multivalued mappings Belief functions Dempster s combination Dempster s conditioning Bayes generalised Misunderstandings 3 Reasoning Inference Combination Belief vs Bayesian reasoning Partially reliable data Making decisions 4 Applications Recent trends Climate change prediction Pose estimation 5 New horizons Upper and lower likelihood Generalising logistic regression A new machine learning 6 Summarising Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

4 Beyond probability Uncertainty Uncertainty uncertainty is widespread, however.... difference between predictable and unpredictable variation second order uncertainty: being uncertain about our very model of uncertainty has a consequence on human behaviour: people are averse to unpredictable variations ( Ellsberg s paradox in decision making) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

5 Beyond probability Uncertainty The problem(s) with Bayes common mathematical representation of uncertainty: Bayesian reasoning uses a very special measure of uncertainty: Kolmogorov s additive probability pretty bad at representing ignorance uninformative priors are just not adequate different results on different parameter spaces in order to apply Bayes rule P(B A) = P(A B)P(B) P(A) assumes the new evidence comes in the form of certainty: A is true in the real world, often this is not the case uncertain evidence beware the prior! why should we pick a prior? either there is prior knowledge (beliefs) or there is not asymptotically, the choice of the prior does not matter (really!) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Beyond probability The cloaked die The die as random variable a die is a simple example of (discrete) random variable measurable mapping from a probability space to the real numbers there is a

6 Beyond probability The cloaked die The die as random variable a die is a simple example of (discrete) random variable measurable mapping from a probability space to the real numbers there is a probability space Ω = {face1, face2,..., face6} which maps to a real number: 1, 2,..., 6 (no need for measurability here) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

7 Beyond probability The cloaked die The cloaked die Observations which are sets now, imagine that face1 and face4 are cloaked, and we roll the die the same probability space Ω = {face1, face2,..., face6} is still there (nothing has changed in the way the die works) however, now the mapping is different: both face1 and face4 are mapped to the set of possible values {1, 4} (since we cannot observe the outcome) mathematically, this is called a random set (a set-valued random variable) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Beyond probability The cloaked die Occluded dice A more realistic scenario a more realistic scenario is one in which we roll, say, four dice for some of them, their top face might be occluded, but

8 Beyond probability The cloaked die Occluded dice A more realistic scenario a more realistic scenario is one in which we roll, say, four dice for some of them, their top face might be occluded, but some of the side faces will still be visible, providing information e.g. I see the top face of Red die, Green die and Purple die but, say, I cannot see the outcome of Blue die however, I see faces and of Blue, therefore the outcome of Blue is the set {2, 4, 5, 6} the bottom line is, whenever data are missing observations are inherently set-valued mathematically, we are not sampling a (scalar) random variable but we are sampling a set-valued random variable: a random set Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

9 Beyond probability The murder trial A murder trial Evidence supporting propositions suppose there is a murder, and three people are under trial for it: Θ = {Peter, John, Mary} there is a witness: he testifies that the person he saw was a man this amounts to supporting the proposition A = {Peter, John} Θ should we take this testimony at face value? in fact, the witness was tested and the machine reported an 80% chance he was drunk when he reported the crime it is natural to assign 80% chance to proposition A, and 20% chance to proposition Θ can we do that with probabilities? no Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

10 Beyond probability The murder trial Dealing with propositional evidence when data are missing, or evidence comes in the form of a probability on a related space, data directly support propositions even when evidence (data) supports propositions, Kolmogorov s probability forces us to specify support for individual outcomes this is unreasonable - an artificial constraint due to a mathematical model that is not general enough we have no elements to assign this 80% probability to either Peter or John, nor to distribute it among them the cause is the additivity of probability measures: but this is not the most general type of measure for sets Belief functions and propositional evidence As random sets, belief functions allow us to assign mass directly to propositions. Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

11 Beyond probability Uncertain evidence Uncertain data concepts themselves can be not well defined, e.g. dark or somewhat round object (qualitative data) fuzzy theory accounts for this via the concept of graded membership unreliable sensors can generate faulty (outlier) measurements: can we still treat these data as certain? or is more natural to attach to them a degree of reliability, based on the past track record of the sensor? but then, can we still apply Bayes rule? people ( experts, e.g. doctors) tend to express themselves in terms of likelihoods directly (e.g. I think diagnosis A is most likely, otherwise either A or B ) multiple sensors can provide as output a PDF on the same space e.g., two Kalman filters based one on color, the other on motion (optical flow), providing a normal predictive PDF on the location of the target in the image plane Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

12 Beyond probability Uncertain evidence Belief functions and uncertain evidence Conditioning versus combination belief function deal with uncertain evidence by moving away from the concept of conditioning (via Bayes rule).... to that of combining pieces of evidence supporting multiple (intersecting) propositions to various degrees Belief functions and evidence Belief reasoning works by combining existing belief functions with new ones, which are able to encode uncertain evidence. in addition, belief functions can represent fuzzy concepts as consonant (nested) belief functions they can represent unreliable measurements as discounted probabilities (by assigning mass to the entire hypothesis set) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

13 A theory of evidence Outline 1 Beyond probability Uncertainty The cloaked die The murder trial Uncertain evidence 2 A theory of evidence Multivalued mappings Belief functions Dempster s combination Dempster s conditioning Bayes generalised Misunderstandings 3 Reasoning Inference Combination Belief vs Bayesian reasoning Partially reliable data Making decisions 4 Applications Recent trends Climate change prediction Pose estimation 5 New horizons Upper and lower likelihood Generalising logistic regression A new machine learning 6 Summarising Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

14 A theory of evidence Multivalued mappings Dempster s original setting Rationale There exists evidence E in the form of probabilities, which supports degrees of belief on a certain hypothesis space H. in the murder trial example Ω is the space where the evidence E lives, in a form of a probability distribution P Θ is the hypothesis space, the set of outcomes of the trial elements of Ω are mapped to subsets of Θ (e.g. Γ maps {not drunk} Ω to {Peter, John} Θ) the probability distribution P induces a mass assignment m : 2 Θ [0, 1] via the multi-valued (one-to-many) mapping Γ : Ω 2 Θ the corresponding mass function is: m({peter, John}) = 0.8, m(θ) = 0.2 Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

15 A theory of evidence Belief functions Belief and plausibility functions Dempster s upper and lower probabilities Belief value The probability that the evidence implies A: Bel(A) = P({ω Ω Γ(ω) A}) = B A m(b) Plausibility value The probability that the evidence does not contradict A: Pl(A) = P({ω Ω Γ(ω) A }) = m(b) = 1 Bel(A) B A belief and plausibility values can (but this is disputed) be interpreted as lower and upper bounds to the values of an unknown, underlying probability measure: Bel(A) P(A) Pl(A) for all A Θ Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

16 A theory of evidence Dempster s combination Dempster s combination a blond hair has been found, but there is a probability 0.6 that the room was cleaned before the crime the outcomes compatible with both ω 1 Ω 1 and ω 2 Ω 2 are θ Γ 1 (ω 1 ) Γ 2 (ω 2 ) if the sources of evidence are independent, then the probability of (ω 1, ω 2 ) is P 1 ({ω 1 }) P 2 ({ω 2 }) if Γ 1 (ω 1 ) Γ 2 (ω 2 ) =, the pair (ω 1, ω 2 ) cannot be selected Dempster s rule The combination of the two mass functions m 1, m 2 is defined as: (m 1 m 2 )(A) = 1 1 κ B C=A m 1 (B)m 2 (C), = A Θ subsets with non-zero mass: focal elements Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

A theory of evidence Dempster s combination Dempster s rule - example m({θ 1 }) = 0.7 0.4 1 0.42 = 0.48, m({θ 0.3 0.6 2}) = 1 0.

17 A theory of evidence Dempster s combination Dempster s rule - example m({θ 1 }) = = 0.48, m({θ }) = = 0.31, m({θ , θ 2 }) = = 0.21 Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

A theory of evidence Dempster s conditioning Dempster s conditioning Dempster s rule of combination induces a conditioning operator given a new event A, the logical belief function such that m(a) = 1.

18 A theory of evidence Dempster s conditioning Dempster s conditioning Dempster s rule of combination induces a conditioning operator given a new event A, the logical belief function such that m(a) = is combined with the a-priori belief function Bel using Dempster s rule the resulting BF is the conditional belief function given A Bel (A B) in terms of belief and plausibility values, Dempster s conditioning yields Bel (A B) = Bel(A B) Bel( B) 1 Bel( B) = Pl(B) Pl(B\A), Pl Pl(B) (A B) = Pl(A B) Pl(B) obtained by Bayes rule by replacing probability with plausibility measures! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

19 A theory of evidence Bayes generalised A generalisation of Bayesian inference belief theory generalises Bayesian probability as: classical probability measures are a special class of belief functions (in the finite case) or random sets (in the infinite case) Bayes certain evidence is a special case of belief functions the belief function m A which assigns mass 1 to the single subset A Bayes rule of conditioning is a special case of Dempster s rule of combination however, it overcomes its limitations you do not need a prior: if you are ignorant, you will use the vacuous BF m Θ which, when combined with new BFs m encoding data, will not change the result m Θ m = m however, if you do have prior knowledge you are welcome to use it! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

20 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

21 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

22 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

23 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

24 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

25 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

26 A theory of evidence Misunderstandings Belief functions are not (general) credal sets a belief function on Θ is in 1-1 correspondence with a convex set of probability distributions there: a credal set this is the set of probabilities which meet belief and plausibility values as lower and upper bounds: Bel(A) P(A) Pl(A) however, belief functions are a special class of credal sets, those induced by a random set mapping Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

27 A theory of evidence Misunderstandings Belief functions are not second-order distributions general Bayesian inference leads to probability distributions over the space of parameters these are second order probabilities, i.e. probability distributions on hypotheses which are themselves probabilities belief functions can be defined on the hypothesis space Ω, or on the parameter space Θ when defined on Ω they are sets of PDFs and can then be seen as indicator second order distributions (see figure) when defined on the parameter space Θ, they amount to families of second-order distributions in the two cases they generalise MLE/MAP and general Bayesian inference, respectively Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

28 Reasoning Outline 1 Beyond probability Uncertainty The cloaked die The murder trial Uncertain evidence 2 A theory of evidence Multivalued mappings Belief functions Dempster s combination Dempster s conditioning Bayes generalised Misunderstandings 3 Reasoning Inference Combination Belief vs Bayesian reasoning Partially reliable data Making decisions 4 Applications Recent trends Climate change prediction Pose estimation 5 New horizons Upper and lower likelihood Generalising logistic regression A new machine learning 6 Summarising Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Reasoning Inference Reasoning with belief functions working with belief functions involves a number of natural steps: this section: inference, combination, decision

29 Reasoning Inference Reasoning with belief functions working with belief functions involves a number of natural steps: this section: inference, combination, decision making we are not going to see: conditioning, propagation, or generalised theorems Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

30 Reasoning Inference Dempster s approach to statistical inference (1) { } consider a statistical model f (x θ), x X, θ Θ, where X is the observation space and Θ the parameter space having observed x, how to quantify the uncertainty about the parameter θ, without specifying a prior probability distribution? assume that the samples X = {x 1,..., x n} are generated as a function of an (unobserved) auxiliary variable U X = a(θ, U) with known probability distribution independent of θ for instance, to generate a continuous random variable X with cumulative distribution function (CDF) F θ, one might draw U from U([0, 1]) and set X = F 1 θ (U) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

31 Reasoning Inference Dempster s approach to statistical inference (2) the data-generation equation X = a(θ, U) defines a multi-valued mapping { } Γ : U Γ(U) = (X, θ) X Θ X = a(θ, U) the probability space (U, B(U), µ) and the multi-valued mapping Γ induce a belief function Bel Θ X on X Θ conditioning Bel Θ X on θ yields Bel X (. θ) f ( θ) on X conditioning it on X = x gives Bel Θ ( x) on Θ Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

32 Reasoning Inference Likelihood-based inference compatible with the likelihood principle: Bel Θ ( x) should be based only on the likelihood function L(θ x) = f (x θ) generates a consonant belief function, a BF whose focal elements are nested A 1 A 2 A 3 Bel Θ ( x) is the consonant BF with plausibility of singleton elements equal to the normalized likelihood: pl(θ x) = L(θ x) sup θ Θ L(θ x) takes the empirical normalised likelihood to be the upper bound to the probability density of the sought parameter! (rather than the actual PDF) the corresponding plausibility function is Pl Θ (A x) = sup θ A pl(θ x) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

33 Reasoning Inference Coin toss example consider a coin toss experiment we toss the coin n = 10 times, obtaining the sample X = {H, H, T, H, T, T, T, H, H, H} parameter of interest: the probability θ = p of heads in a single toss the likelihood of the sample is binomial: P(X p) = p k (1 p) n k likelihood-based belief function inference determines an entire envelope of PDF on the parameter space Θ = [0, 1] we can apply the same criterion to normalised empirical counts ˆf (H) = 1, ˆf (T ) = 4/6 = 2/3 we get the mass assignment m(h) = 1/3, m(t ) = 0, m(ω) = 2/3 this robustifies the ML estimate, which is a PDF compatible with the inferred BF Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

34 Reasoning Combination Dempster s rule under fire Zadeh s paradox question is: is Dempster s sum the only possible rule of combination? seems to have paradoxical behaviour in certain circumstances.. example: doctors have opinions about the condition of a patient Θ = {M, C, T }, where M stands for meningitis, C for concussion and T for tumor two doctors provide the following diagnoses: D 1 : I am 99% sure it s meningitis, but there is a small chance of 1% that it is concussion". D 2 : I am 99% sure it s tumor, but there is a 1% chance that it s concussion". can be encoded by the following mass functions: 0.99 A = {M} m 1 (A) = 0.01 A = {C} m 2 (A) = 0 otherwise 0.99 A = {T } 0.01 A = {C} 0 otherwise, (1) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

35 Reasoning Combination Dempster s rule under fire Zadeh s paradox their (unnormalised) Dempster s combination is: { A = { } m(a) = A = {C} as the two masses are highly conflicting, normalisation yields the belief function focussed on C it is definitively concussion but both experts had left it as only a fringe possibility! objections: the belief functions in the example are really probabilities, so this is a problem with Bayesian representations, in case! diseases are never exclusive, so that it may be argued that Zadeh s choice of a frame of discernment is misleading open world approaches with no normalisation doctors disagree so much that any person would conclude that one of the them is just wrong reliability of sources needs to be accounted for Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

36 Reasoning Combination Proposed combination rules a number of alternative combination mechanisms have been proposed Yager s rule: conflict mass is assigned to Ω Dubois rule: conflict mass B C = is assigned to B C conjunctive rule: Dempster without normalisation disjunctive rule: dual of the conjunctive (and Dempster s) Denoeux s cautious rule: min weight after canonical decomposition bold rule: dual of cautious Murphy s averaging idea Deng s distance-weighted averaging Lefevre s weighting factors we will see only the first four my position: working with intervals of belief functions Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

37 Reasoning Combination Yager s and Dubois rules conflict is generated by non-reliable information sources conflicting mass m( ) = B C= m 1(B)m 2 (C) should be re-assigned to the whole frame Θ let m (A) = m 1 (B)m 2 (C) whenever B C = A m Y (A) = { m (A) A Θ m (Θ) + m( ) A = Θ. (2) Dubois and Prade s idea: similar to Yager s, BUT conflicting mass is not transferred all the way up, but to B C (due to applying the minimum specificity principle) m D (A) = m (A) + m 1 (B)m 2 (C). (3) B C=A,B C= the resulting BF dominates Yager s combination: m D (A) m Y (A) A Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

38 Reasoning Combination Conjunctive and disjunctive rules rather than normalising (as in Dempster s rule) or re-assigning the conflicting mass m( ) to other non-empty subsets (as in Yager s and Dubois ).. Smets conjunctive rule leaves the conflicting mass with the empty set: m (A) = m 1 (B)m 2 (C), A Θ B C=A applicable to unnormalised belief functions in an open world assumption: current frame only approximately describes the set of possible hypotheses disjunctive rule of combination: m (A) = B C=A m 1 (B)m 2 (C) consensus between two sources is expressed by the union of the supported propositions, rather than by their intersection not that Bel 1 Bel 2 (A) = Bel 1 (A) Bel 2 (A): belief values are simply multiplied! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

39 Reasoning Combination Combination Moving forward Yager s rule, rather unjustified, Dubois kinda intermediate between and cautious and bold rules are rather inspired by possibility theory s min rule my take on this: Dempster s (conjunctive) combination and disjunctive combination are the two extrema of a spectrum of possible results Proposal: combination tubes? Meta-uncertainty on the sources generating the input belief functions (their independence and reliability) induces uncertainty on the result of the combination, represented by a bracket of combination rules, which produce a tube of BFs. we will encounter this idea when generalising the concept of likelihood we should probably work with intervals of belief functions then? Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Reasoning Belief vs Bayesian reasoning Belief vs Bayesian reasoning Image data fusion for object classification suppose we want to estimate the class of an object appearing in an image, based on

40 Reasoning Belief vs Bayesian reasoning Belief vs Bayesian reasoning Image data fusion for object classification suppose we want to estimate the class of an object appearing in an image, based on feature measurements extracted from the image we capture a training set of images, complete with annotated object labels assuming a PDF of a certain family (e.g. mixture of Gaussians) we can learn from the training data a likelihood function p(y x), where y is the object class and x the image feature vector suppose n different sensors extract n features x i from each image: x 1,..., x n let us compare how data fusion works under the Bayesian and the belief function paradigms! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

41 Reasoning Belief vs Bayesian reasoning Belief vs Bayesian reasoning Bayesian data fusion the likelihoods of the individual features are computed using the n likelihood functions learned during training: p(x i y), for all i = 1,..., n measurements are typically assumed to be conditionally independent, yielding the product likelihood p(x y) = i p(x i y) Bayesian inference is applied, typically assuming uniform priors (for there is no reason to think otherwise), yielding p(y x) p(x y) = i p(x i y) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

42 Reasoning Belief vs Bayesian reasoning Belief vs Bayesian reasoning Dempster-Shafer data fusion for each feature type i a BF is learned from the the individual likelihood p(x i y) (e.g. via the likelihood-based approach) this yields n belief functions Bel(y x i ), on the range of possible object classes Y a combination rule is applied to compute an overall BF (e.g.,, ), obtaining Bel(Y x) = Bel(Y x 1 )... Bel(Y x n), Y Y (an empirical comparison of this kind is shown under pose estimation later) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

43 Reasoning Partially reliable data Inference under partially reliable data Belief vs Bayesian reasoning in the fusion example we have assumed that the data are measured correctly what if the data-generating process is not completely reliable? problem: suppose we want to just detect an object (binary decision: yes Y or no N) two sensors produce image features x 1 and x 2, but we learned from the training data that both are reliable only 20% of the time at test time we get an image, measure x 1 and x 2, and unluckily sensor 2 got it wrong! the object is actually there we get the following normalised likelihoods p(x 1 Y ) = 0.9, p(x 1 N) = 0.1; p(x 2 Y ) = 0.1, p(x 2 N) = 0.9 Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

44 Reasoning Partially reliable data Inference under partially reliable data Belief vs Bayesian reasoning how do the two fusion pipelines cope with this? the Bayesian scholar assumes the two sensors/processes are conditionally independent, and multiply the likelihoods obtaining p(x 1, x 2 Y ) = = 0.09, p(x 1, x 2 N) = = 0.09 so that p(y x 1, x 2 ) = 1 2, p(n x 1, x 2 ) = 1 2 Shafer s faithful follower discounts the likelihoods by assigning mass.2 to the whole hypothesis space Θ = {Y, N}: m(y x 1 ) = = 0.72, m(n x 1 ) = = 0.08, m(θ x 1 ) = 0.2; m(y x 2 ) = = 0.08, m(n x 2 ) = = 0.72 m(θ x 2 ) = 0.2 Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

45 Reasoning Partially reliable data Inference under partially reliable data Belief vs Bayesian reasoning thus, when we combine them by Dempster s rule we get the BF Bel on {Y, N}: m(y x 1, x 2 ) = 0.458, m(n x 1, x 2 ) = 0.458, m(θ x 1, x 2 ) = when combined using the disjunctive rule (the least committal one) we get Bel : m (Y x 1, x 2 ) = 0.09, m (N x 1, x 2 ) = 0.09, m (Θ x 1, x 2 ) = 0.82 the corresponding (credal) sets of probabilities are the credal interval for Bel is quite narrow: reliability is assumed to be 80%, and we got a faulty measurement in two! (50%) the disjunctive rule is much more cautious about the correct inference Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

46 Reasoning Making decisions Decision making with belief functions natural application of belief function representation of uncertainty problem: selecting an act f from an available list F (making a decision ), which optimises a certain objective function various approaches to decision making decision making in the TBM is based on expected utility via pignistic transform Strat has proposed something similar in his cloaked carnival wheel scenario generalised expected utility [Gilboa] based on classical expected utility theory [Savage,von Neumann] a lot of interest in multicriteria decision making (based on a number of attributes) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

47 Reasoning Making decisions Decision making in the TBM Savage (1954) has showed that verifies some rationality requirements iff there exists a probability measure P on Ω and a utility function u : X R s.t. f, g F, f g E P (u f ) E P (u g) the best choice is the one that maximises the expected utility: does that mean that using belief functions is irrational? in Smets Transferable Belief Model (TBM), decisions are made by maximising the expected utility of actions E[u] = θ Θ u(f, θ)betp(θ) based on the pignistic transform BetP(θ) = A {θ} m(a) A center of mass of the credal set of probabilities consistent with m Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

48 Reasoning Making decisions Lower and upper expected utilities a preference relation meets Gilboa s weaker axioms iff there exists a (non necessarily additive) measure µ and a utility function u : X R such that f, g F, f g C µ(u f ) C µ(u g), where C µ is the Choquet integral, defined for X : Ω R as C µ(x) = + 0 µ(x(ω) t)dt + 0 [µ(x(ω) t) 1]dt. given a belief function Bel on Ω and a utility function u, this theorem supports making decisions based on the Choquet integral of u with respect to Bel for finite Ω, it can be shown that C Bel (u f ) = m(b) min u(f (ω)) C ω B B Ω Pl(u f ) = B Ω m(b) max u(f (ω)) ω B let P(Bel) as usual be the set of probability measures P compatible with Bel, i.e., such that Bel P. Then, it can be shown that C Bel (u f ) = min E P (u f ) = E(u f ) C Pl (u f ) = E(u f ) P P(Bel) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

49 Reasoning Making decisions Decision making using intervals of expected utilities for each act f we have two expected utilities E(f ) and E(f ) how do we make a decision? we need to compare intervals possible decision criteria based on interval dominance: 1 f g iff E(u f ) E(u g) (conservative strategy) 2 f g iff E(u f ) E(u g) (pessimistic strategy) 3 f g iff E(u f ) E(u g) (optimistic strategy) 4 f g iff αe(u f ) + (1 α)e(u f ) αe(u g) + (1 α)e(u g) for some α [0, 1] called a pessimism index (Hurwicz criterion) it can be shown that the observed behavior in Ellberg s paradox is explained by the pessimistic strategy Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

50 Applications Outline 1 Beyond probability Uncertainty The cloaked die The murder trial Uncertain evidence 2 A theory of evidence Multivalued mappings Belief functions Dempster s combination Dempster s conditioning Bayes generalised Misunderstandings 3 Reasoning Inference Combination Belief vs Bayesian reasoning Partially reliable data Making decisions 4 Applications Recent trends Climate change prediction Pose estimation 5 New horizons Upper and lower likelihood Generalising logistic regression A new machine learning 6 Summarising Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

51 Applications Recent trends A new wave of applications sensor fusion has always been a stronghold of belief calculus mainly about merging different sensors using Dempster s rule typical applications: tracking and data association, reliability in engineering, image processing, robotics, medical imaging and diagnosis, business and finance (audit) a new wave of applications, on: here we present one (or two!) in more detail: climate change prediction motion capture in computer vision geographical information systems (GIS) communication networks and security earth sciences Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Applications Recent trends Most popular applications of belief functions information quality in financial accounting [A conceptual framework and belief-function approach to assessing overall

52 Applications Recent trends Most popular applications of belief functions information quality in financial accounting [A conceptual framework and belief-function approach to assessing overall information quality (158)] auditing [The Bayesian and belief-function formalisms: A general perspective for auditing (148)] reputation and trust management in telecoms [An evidential model of distributed reputation management (615)] security [An information systems security risk assessment model under the DS theory of belief functions (137)] DoS [Towards multisensor data fusion for DoS detection (137)] Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Applications Recent trends Most popular applications of belief functions robotics and navigation [An evidential approach to map-building for autonomous vehicles (229)],

target identification systems (317)] image processing and computer vision [Image annotations by combining multiple evidence Wordnet (231)], [Evidence-based recognition

53 Applications Recent trends Most popular applications of belief functions robotics and navigation [An evidential approach to map-building for autonomous vehicles (229)], [Dempster-Shafer theory for sensor fusion in autonomous mobile robots (192)] tracking and data association [Shafer-Dempster reasoning with applications to multisensor target identification systems (317)] image processing and computer vision [Image annotations by combining multiple evidence Wordnet (231)], [Evidence-based recognition of 3-D objects (176)] biometrics [Image quality assessment for iris biometric (160)] Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

54 Applications Climate change prediction Climate change Adaptation of flood defense structures climate change is expected to have enormous economic impact: damage or destruction from extreme events; coastal flooding and inundation from sea level rise, etc. adaptation of infrastructure to climate change is a major issue engineering design processes and standards are based on analysis of historical climate data (using, e.g. Extreme Value Theory), with the assumption of a stable climate commonly, flood defenses in coastal areas are designed to withstand at least 100 years return period events. However, due to climate change, they will be subject during their life time to higher loads than the design estimations the main impact is related to the increase of the mean sea level, which affects the frequency and intensity of surges for adaptation purposes, statistics of extreme sea levels derived from historical data should be combined with projections of the future sea level rise (SLR) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

55 Applications Climate change prediction Assumptions and approach the annual maximum sea level Z at a given location is often assumed to have a Gumbel distribution [ ( P(Z z) = exp exp z µ )] σ with mode µ and scale parameter σ procedures are based on the return level z T associated with a return period T, defined as: z T = µ σ log [ log ( )] 1 1 T because of climate change, it is assumed that the distribution of annual maximum sea level at the end of the century will be shifted to the right, with shift equal to the SLR : z T = z T + SLR approach: 1 represent the evidence on z T by a likelihood-based belief function using past sea level measurements; 2 represent the evidence on SLR by a belief function describing expert opinions; 3 combine these two items of evidence to get a belief function on z T = z T + SLR. Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

56 Applications Climate change prediction Expert evidence on sea level rise future SLR projections provided by the IPCC last Assessment Report (2007) give [0.18 m, 0.79 m] as a likely range of values for SLR over the period however, it is indicated that higher values cannot be excluded based on a simple statistical model, Rahmstorf (2007) suggests [0.5m, 1.4 m] recent studies indicate that the threshold of 2 m cannot be exceeded by the end of this century due to physical constraints the interval [0.5, 0.79] = [0.18, 0.79] [0.5, 1.4] seems to be fully supported, as considered highly plausible by all three sources while values outside the interval [0, 2] are considered as impossible how do we encode this evidence using belief functions? Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

57 Applications Climate change prediction Combination of expert and statistical evidence expert evidence: consonant random intervals with core [0.5, 0.79], support [0, 2] and different pl ( contour functions ) Contour functions Cumulative Bel and Pl π(slr) F * (SLR), F * (SLR) SLR SLR let [U zt, V zt ] and [U SLR, V SLR ] be the independent random intervals representing evidence on z T and SLR, respectively the random interval for z T = z T + SLR is [U zt, V zt ] + [U SLR, V SLR ] = [U zt + U SLR, V zt + V SLR ] Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

58 Applications Climate change prediction Some results of combining expert and historical belief functions the corresponding belief and plausibility functions are, for all A B(R): Bel(A) = P([U zt + U SLR, V zt + V SLR ] A) Pl(A) = P([U zt + U SLR, V zt + V SLR ] A ) Bel(A) and Pl(A) can be estimated by Monte Carlo simulation linear convex concave constant pl(z T ) Bel(z T <= z), Pl(z T <= z) linear convex concave constant z T z Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

59 Applications Pose estimation Belief Modeling Regression for example-based pose estimation the available evidence comes in the form of a training set of images containing sample poses of an unspecified object configuration: a vector q Q R D an oracle" provides for each training image I k the configuration q k of the object portrayed in the image source of ground truth: motion capture system object location within each training image is known as a bounding box in training, the object explores its range of possible configurations and both samples poses Q =. { } q k, k = 1,..., T and N features Ỹ =. { } y i (k), k = 1,..., T, i = 1,..., N are collected in testing, a supervised localization algorithm is employed to locate the object within the test image such features are exploited to produce an estimate of the object s configuration Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

60 Applications Pose estimation Learning evidential models learn from the training data an approximation ρ of the unknown mapping between each feature space Y i and the pose space Q we apply EM clustering to the N training sequences of feature values, obtaining a obtain a Mixture of Gaussians (MoG) { Γ j i, j = 1,..., n } j i, Γ i N (µ j i, Σj i ) { }.. and a approximate feature-pose map ρ i : Y j i Q j. i = q k Q : y i (k) Y j i Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

61 Applications Pose estimation Computing belief pose estimates new visual features y 1,..., y N are mapped to a collection of belief functions Bel 1,..., Bel N on the set of sample poses Q belief functions also allow to take into account the scarcity of the training samples by assigning some mass m(θ i ) to the whole feature space m i : 2 Θ i [0, 1], m i (Y j i ) = Γj i (y i) ( 1 mi (Θ i ) ) Γ k i (y i ) they are combined by conjunctive combination this yields the belief estimate of the pose on the set of sample poses Q 1 we can compute the expected pose associated with each vertex of the credal set: T ˆq = p(q k )q k 2 or, we can approximate ˆb with a probability ˆp on Q (e.g. the pignistic function) k=1 k Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Applications Pose estimation Two human pose estimation experiments person filmed by two uncalibrated DV cameras arm experiment: subject moves his arm, while standing in a fixed floor location legs

62 Applications Pose estimation Two human pose estimation experiments person filmed by two uncalibrated DV cameras arm experiment: subject moves his arm, while standing in a fixed floor location legs experiment: person walking normally on the floor, training set collected by sampling a random walk on a section of the floor length of the training sequences: 1726 frames for the arm and 1952 for the legs pose vector: 3D coordinates of the markers quite challenging setup: background was highly non-static, with people coming in and out the scene and flickering monitors; self-occlusions Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Applications Pose estimation Comparison with RVMs and GPR BMR results for components of the pose vector: 9 on top, 1 and 6 at bottom blue ground truth, red pignistic estimate average Euclidean errors

63 Applications Pose estimation Comparison with RVMs and GPR BMR results for components of the pose vector: 9 on top, 1 and 6 at bottom blue ground truth, red pignistic estimate average Euclidean errors for Relevant Vector Machine (RVM): 25.0, 10.6, 18.6, 7.0 cm for Gaussian Process Regression (GPR): 31.2, 13.6, 23.0, and 4.5 centimeters our belief-theoretical approach outperforms both competitors! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

64 New horizons Outline 1 Beyond probability Uncertainty The cloaked die The murder trial Uncertain evidence 2 A theory of evidence Multivalued mappings Belief functions Dempster s combination Dempster s conditioning Bayes generalised Misunderstandings 3 Reasoning Inference Combination Belief vs Bayesian reasoning Partially reliable data Making decisions 4 Applications Recent trends Climate change prediction Pose estimation 5 New horizons Upper and lower likelihood Generalising logistic regression A new machine learning 6 Summarising Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

65 A research programme New horizons we made the case that non-additive probabilities arise from real issues with the way standard probability models the data (or absence thereof) we showed that random sets are the most natural representation of uncertainty they are also a straightforward generalisation of mathematical statistics how should the theory develop? some modest proposals: generalised logistic regression for dealing with rare events parameterised families of random sets.. would allow frequentist hypothesis testing.... MAP-like estimation.. in particular, Gaussian random sets.... and how the central limit theorem generalises to RS generalising the total probability theorem.... and the concept of random variable where can its full impact be felt? new, robust foundations for machine learning a novel understanding on quantum mechanics robust models of climatic change a geometry of uncertainty as a general framework for uncertainty theory Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

66 New horizons Upper and lower likelihood Belief likelihood function Generalising the sample likelihood traditional likelihood function is a conditional probability of the data given a parameter θ Θ, i.e. a family of PDF over X parameterised by θ different take: instead of using conventional likelihood to build a belief function, can we define a belief likelihood function of a sample x X? it is natural to define a belief (set-) likelihood function as family of belief functions on X, Bel X (. θ) parameterised by θ Θ this is the input of Smets Generalised Bayesian Theorem, a collection of conditional belief functions note that a belief likelihood takes values on sets of outcomes individual outcomes are a special case seems a natural setting for computing likelihoods of set-valued observations coherent with the random set philosophy Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

67 New horizons Upper and lower likelihood Belief likelihood function Series of trials what can we say about the belief likelihood function of a series of trials observations are a tuple x = (x 1,..., x n) X 1 X n, where X i = X denotes the space of quantities observed at time i by definition the belief likelihood function is Bel X1 X n (A θ), where A is any subset of X 1 X n Belief likelihood function of repeated trials Bel X1 X n (A θ). = Bel i X i X 1 Bel i X i X n (A θ) where Bel i X i X j is the vacuous extension of Bel Xj to the Cartesian product X 1 X n where the observed tuples live, and is a combination rule. Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

68 New horizons Upper and lower likelihood Belief likelihood function Series of trials, individual tuples can we reduce this to the belief values of the individual trials? yes, if we wish to compute likelihood values of tuples of individual outcomes rather than sets of them Decomposition for individual tuples When using both or as a combination rule in the definition of belief likelihood function, the following holds: L(x = {x 1,..., x. n n}) = Bel X1 X n ({(x 1,..., x n)} θ) = Bel Xi (x i ) L(x = {x 1,..., x n}). = Pl X1 X n ({(x 1,..., x n)} θ) = i=1 n Pl Xi (x i ) We can call them lower and upper likelihoods of the sample x = {x 1,..., x n} i=1 second line conditional conjunctive independence (but just for individual samples x) new result, yet unpublished similar regularities hold when using the more cautious disjunctive combination open question: does this hold for arbitrary subsets of samples A X 1 X n? Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

69 New horizons Upper and lower likelihood Lower and upper likelihoods Bernoulli trials let us go back to the Bernoulli trials example: X i = X = {H, T } under conditional independence and equidistribution, the traditional likelihood for a series of Bernoulli trials reads as p k (1 p) n k, where k is the number of successes and n the number of trials let us compute the belief likelihood function for Bernoulli trials! we seek the belief function on X = {H, T }, parameterised by p = m(h), q = m(t ) (with p + q 1 this time) which best describes the observed sample if we apply the previous result, since all Bel i are equally distributed the lower and upper likelihoods of the sample x = {x 1,..., x n} are: L(x = {x 1,..., x n}) = Bel X ({x 1 }) Bel X ({x n}) = p k q n k L(x = {x 1,..., x n}) = Pl X ({x 1 }) Pl X ({x n}) = (1 q) k (1 p) n k after normalisation, these are PDFs over the space B of all belief functions definable on X! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

70 New horizons Upper and lower likelihood Lower and upper likelihoods (Bernoulli trials) lower likelihood (left) subsumes to the traditional likelihood p k (1 p) n k for p + q = 1 the maximum of the lower likelihood is the traditional ML estimate makes sense: the lower likelihood is highest for the most committed belief functions (i.e. probabilities) upper likelihood (right) has maximum in p = q = 0 (the vacuous BF on {H, T }) the interval of BFs joining max L with max L is the set of belief functions such that p q = k, those which preserve the ratio between the empirical counts n k once again the maths leads us to think in terms of intervals of belief functions, rather than individual ones Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

71 New horizons Generalising logistic regression Generalising logistic regression (1) Bernoulli trials are central in statistics: generalising their likelihood allow us to represent uncertainty in a number of regression problems in logistic regression π i = P(Y i = 1 x i ) = e, 1 π (β 0+β 1 x i ) i = P(Y i = 0 x i ) = e (β0+β1xi ) 1 + e (β 0+β 1 x i ) the parameters β 0, β 1 are estimated by maximum likelihood of the sample, where L(β 0, β 1 Y ) = n i=1 π Y i i (1 π i ) 1 Y i where Y i {0, 1} and π i is a function of β 0, β 1 yielding a single conditional PDF as in the Bernoulli series experiment, we can replace the conditional probability (π i, 1 π i ) on X = {0, 1} with a belief function there Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

72 New horizons Generalising logistic regression Generalising logistic regression (2) upper and lower likelihoods can then be computed as n n L(β Y ) = π Y i i q 1 Y i i, L(β Y ) = (1 q i ) Y i (1 π i ) 1 Y i i=1 i=1 where this time the Bel i are not equally distributed how do we generalise the logit link between observations x and outputs y? we need to enforce an analytical dependency for q i first simple proposal: add a parameter β 2 such that q i = m(y i = 0 x i ) = β 2 e (β 0+β 1 x i ) 1 + e (β 0+β 1 x i ) we can then find lower and upper optimal estimates for the parameters β arg max β L β 0, β 1, β 2 arg max β L β 0, β 1, β 2 plugging these optimal parameters into the logit expressions for π i, 1 π i, q i yields an upper and a lower family of conditional belief functions given x: Bel X (. β, x) Bel X (. β, x) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

73 New horizons Generalising logistic regression Rare events with belief functions Generalising logistic regression how do we use belief functions to be cautious about rare event prediction? when we measure a new observation x we plug it into Bel X (. β, x) and Bel X (. β, x), and get a lower and an upper belief function on Y note that each belief function is really an envelope of logistic functions robust estimate of rare events: how does this relate to results of classical logit regression? more to come in the near future! Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

New horizons A new machine learning What s wrong with machine learning unfortunate (but predictable) Tesla accident unable to predict how a system will behave in a

most systems have no way of detecting whether their underlying assumptions have been violated: they will happily continue to predict and act even on inputs that are

74 New horizons A new machine learning What s wrong with machine learning unfortunate (but predictable) Tesla accident unable to predict how a system will behave in a radically new setting (e.g., how does a smart car cope with driving through extreme weather conditions? most systems have no way of detecting whether their underlying assumptions have been violated: they will happily continue to predict and act even on inputs that are completely outside the scope of what they have actually learned it is imperative to ensure that these algorithms behave predictably in the wild Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

75 New horizons A new machine learning Vapnik s statistical learning theory classical statistical learning theory [Vapnik] makes predictions on the reliability of a training set based on simple quantities such as number of samples N generalisation issue: training error is different from the expected error: E x D[δ(h(x) y(x))] N δ(h(x n) y(x n)) the training data x = [x 1,..., x n] is assumed drawn from a distribution D, h(x) is the predicted label for input x and y(x) the actual label Probabilistically Approximately Correct learning The learning algorithm finds with probability at least 1 δ a model h H which is approximately correct, i.e. it makes a training error of no more than ɛ the main result of PAC learning is that we can relate the required size N of a training sample to the size of the model space H N 1 ( log H + log 1 ) ɛ δ n=1 Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

76 New horizons A new machine learning Vapnik s statistical learning theory Vapnik-Chervonenkis Dimension The VC dimension of H is the maximum number of points that can be successfully shattered by a hypothesis h H (i.e, they can be correctly classified by some h H for all possible binary labellings of these points) 4 points in R 2 with H = the space of linear separators however we arrange 4 points, there is a labelling that we cannot shatter (correctly reproduce), therefore the VC dimension of linear separators in R 2 is 3 pretty useless for model selection, for bounds are too wide: people do cross validation instead however, it provides the only justification for max-margin linear SVMs! { } VC SVM = min D, 4R2 m Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

77 New horizons A new machine learning Imprecise-theoretical foundations for machine learning A modest proposal issues with Vapnik s traditional statistical learning theory have been recently recognised by many researchers what about deep learning? nobody has a clue of why it works, really approaches should provide worst-case guarantees: it is not possible to rule out completely unexpected behaviours or catastrophic failures Liang s proposal: using minimax optimization to learn models that are suitable for any target distribution within a safe" family minimax models similar to Liang s are naturally associated with convex sets of probabilities uncertainty theory may be able to provide worst-case, cautious predictions, delivering AI agents aware of their own limitations research programme: a generalisation of the concept of Probably Approximately Correct where does the probability distribution of the data come from? Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

78 Summarising Outline 1 Beyond probability Uncertainty The cloaked die The murder trial Uncertain evidence 2 A theory of evidence Multivalued mappings Belief functions Dempster s combination Dempster s conditioning Bayes generalised Misunderstandings 3 Reasoning Inference Combination Belief vs Bayesian reasoning Partially reliable data Making decisions 4 Applications Recent trends Climate change prediction Pose estimation 5 New horizons Upper and lower likelihood Generalising logistic regression A new machine learning 6 Summarising Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

79 Summarising A summary the theory of belief functions is grounded in the beautiful mathematics of random sets has strong relationships with other theories of uncertainty can be efficiently implemented by Monte-Carlo approximation and local propagation statistical evidence may be represented in several ways: by likelihood-based belief functions, generalizing both likelihood-based and Bayesian inference by Dempster s idea of using auxiliary variables decision making strategies based on intervals of expected utilities can be formulated that are more cautious than traditional ones propagation on graphical models can be performed the extension to continuous domains can be tackled via the Borel interval representation, in the more general case using the theory of random sets a toolbox of estimation, classification, regression tools based on the theory of belief functions is available Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Summarising Recent trends in the theory and application of belief functions in 2014 alone, almost 1200 papers were published on belief functions new applications are

80 Summarising Recent trends in the theory and application of belief functions in 2014 alone, almost 1200 papers were published on belief functions new applications are gaining ground, beyond sensor fusion or expert systems earth sciences, telecoms, etc Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

81 Summarising What still needs to be resolved clarify once and for all the epistemic interpretation of belief function theory random variables for set-valued observations mechanism for evidence combination still debated, depend on meta-information on sources hardly accessible working with intervals of belief functions may be the way forward acknowledges the meta-uncertainty on the nature of the sources generating the evidence same holds for conditioning (although we did not show that) what about computational complexity? not an issue, just apply sampling for approximate inference we do not need to assign mass to all subsets, but we need to be allowed to do so when necessary (e.g. missing data) belief functions on reals Borel intervals are nice, but the way forward is grounding the theory into the mathematics of random sets Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

82 Summarising Future of random set/belief function theory further development of machine learning tools e.g., random set random forests for multilabel classification tackling current trends such as transfer learning, deep learning fully developed theory of statistical inference with random sets generalised likelihood, logistic regression limit theorem, total probability for random sets random set random variables and processes frequentist inference with random sets propose solutions to high impact problems rare event prediction robust foundations for machine learning robust climatic change predictions mathematics and geometry of random sets and other uncertainty measures Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

83 Appendix For Further Reading For Further Reading I F. Cuzzolin. The geometry of uncertainty - The geometry of imprecise probabilities Artificial Intelligence: Foundations, Theory, and Algorithms ( Springer-Verlag (2017) Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Appendix For Further Reading For Further Reading I G. Shafer.

F. Cuzzolin. Visions of a generalized probability theory.

Belief functions: theory and applications.

84 Appendix For Further Reading For Further Reading I G. Shafer. A mathematical theory of evidence. Princeton University Press, F. Cuzzolin. Visions of a generalized probability theory. Lambert Academic Publishing, F. Cuzzolin (Ed.). Belief functions: theory and applications. LNCS Volume 8764, Springer, Fabio Cuzzolin Belief functions: past, present and future Algiers, 13/12/ / 78

Belief functions: A gentle introduction

Belief functions: A gentle introduction Seoul National University Professor Fabio Cuzzolin School of Engineering, Computing and Mathematics Oxford Brookes University, Oxford, UK Seoul, Korea, 30/05/18