Reasoning Under Uncertainty

Similar documents
Reasoning with Uncertainty

Probabilistic Reasoning and Certainty Factors

Uncertainty and Rules

REASONING UNDER UNCERTAINTY: CERTAINTY THEORY

Reasoning about uncertainty

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Basic Probabilistic Reasoning SEG

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Today s s lecture. Lecture 16: Uncertainty - 6. Dempster-Shafer Theory. Alternative Models of Dealing with Uncertainty Information/Evidence

Knowledge-Based Systems and Deductive Databases

Multicomponent DS Fusion Approach for Waveform EKG Detection

Frank Agnosticism. - Shafer 1976 A Mathematical Theory of Evidence, pp

Sequential adaptive combination of unreliable sources of evidence

Bayesian Learning Features of Bayesian learning methods:

arxiv:cs/ v2 [cs.ai] 29 Nov 2006

Fuzzy Systems. Possibility Theory.

Evidential networks for reliability analysis and performance evaluation of systems with imprecise knowledge

The Semi-Pascal Triangle of Maximum Deng Entropy

A Class of DSm Conditioning Rules 1

Deductive Systems. Lecture - 3

The Fundamental Principle of Data Science

A Formal Logical Framework for Cadiag-2

Uncertainty in Heuristic Knowledge and Reasoning

Inquiry Calculus and the Issue of Negative Higher Order Informations

Uncertanity Handling in Knowledge Based System

Modeling and reasoning with uncertainty

THE LOGIC OF COMPOUND STATEMENTS

D-S Evidence Theory Applied to Fault 1 Diagnosis of Generator Based on Embedded Sensors

Incompatibility Paradoxes

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Imprecise Probability

Decision of Prognostics and Health Management under Uncertainty

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

I I I I I I I I I I I I I I I I I I I

Dempster's Rule of Combination is. #P -complete. Pekka Orponen. Department of Computer Science, University of Helsinki

Entropy and Specificity in a Mathematical Theory of Evidence

arxiv: v1 [cs.ai] 4 Sep 2007

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning

Evidence with Uncertain Likelihoods

Proving simple set properties...

A unified view of some representations of imprecise probabilities

Combining Belief Functions Issued from Dependent Sources

1.9 APPLICATION OF EVIDENCE THEORY TO QUANTIFY UNCERTAINTY IN FORECAST OF HURRICANE PATH

Handling imprecise and uncertain class labels in classification and clustering

Neutrosophic Masses & Indeterminate Models.

On Markov Properties in Evidence Theory

Study of Fault Diagnosis Method Based on Data Fusion Technology

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

1.1 Statements and Compound Statements

A Guide to Proof-Writing

Probability Calculus. Chapter From Propositional to Graded Beliefs

Uncertainty processing in FEL-Expert Lecture notes

Combination calculi for uncertainty reasoning: representing uncertainty using distributions

Application of Evidence Theory to Construction Projects

A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS

Application of Evidence Theory and Discounting Techniques to Aerospace Design

EnM Probability and Random Processes

On Conditional Independence in Evidence Theory

CERTAINTY FACTORS. Expert System Lab Work. Lecture #6

1) partial observability (road state, other drivers' plans, etc.) 4) immense complexity of modelling and predicting tra_c

PROBABILISTIC LOGIC. J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering Copyright c 1999 John Wiley & Sons, Inc.

Confirmation Theory. Pittsburgh Summer Program 1. Center for the Philosophy of Science, University of Pittsburgh July 7, 2017

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CSCE 222 Discrete Structures for Computing. Review for Exam 1. Dr. Hyunyoung Lee !!!

IN THIS paper we investigate the diagnosability of stochastic

CSC Discrete Math I, Spring Propositional Logic

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Y. Xiang, Inference with Uncertain Knowledge 1

Rough Set Theory Fundamental Assumption Approximation Space Information Systems Decision Tables (Data Tables)

Single Maths B: Introduction to Probability

CogSysI Lecture 9: Non-Monotonic and Human Reasoning

P Q (P Q) (P Q) (P Q) (P % Q) T T T T T T T F F T F F F T F T T T F F F F T T

Lecture 2. Logic Compound Statements Conditional Statements Valid & Invalid Arguments Digital Logic Circuits. Reading (Epp s textbook)

Computation and Logic Definitions

Examine characteristics of a sample and make inferences about the population

Knowledge Representation. Propositional logic

This article is published in Journal of Multiple-Valued Logic and Soft Computing 2008, 15(1), 5-38.

Desire-as-belief revisited

Non-Axiomatic Logic (NAL) Specification. Pei Wang

Symbolic Logic 3. For an inference to be deductively valid it is impossible for the conclusion to be false if the premises are true.

Introduction to Metalogic

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Building Bayesian Networks. Lecture3: Building BN p.1

BIVARIATE P-BOXES AND MAXITIVE FUNCTIONS. Keywords: Uni- and bivariate p-boxes, maxitive functions, focal sets, comonotonicity,

MODULE -4 BAYEIAN LEARNING

On the teaching and learning of logic in mathematical contents. Kyeong Hah Roh Arizona State University

Classical Belief Conditioning and its Generalization to DSm Theory

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Analyzing the Combination of Conflicting Belief Functions.

Chapter 2 Class Notes

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Lecture Notes on The Curry-Howard Isomorphism

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.

A hierarchical fusion of expert opinion in the Transferable Belief Model (TBM) Minh Ha-Duong, CNRS, France

Proofs. Joe Patten August 10, 2018

Knowledge Representation. Propositional logic.

A REASONING MODEL BASED ON AN EXTENDED DEMPSTER-SHAFER THEORY *

Paucity, abundance, and the theory of number: Online Appendices

Seminaar Abstrakte Wiskunde Seminar in Abstract Mathematics Lecture notes in progress (27 March 2010)

Transcription:

Reasoning Under Uncertainty Chapter 14&15 Part Kostas (1) Certainty Kontogiannis Factors E&CE 457 Objectives This unit aims to investigate techniques that allow for an algorithmic process to deduce new facts from a knowledge base with a level of confidence or a measure of belief. These techniques are of particular importance when: 1. The rules in the knowledge base do not produce a conclusion that is certain even though the rule premises are known to be certain and/or 2. The premises of the rules are not known to be certain The three parts in this unit deal with: 1. Techniques related to certainty factors and their application in rulebased systems 2. Techniques related to the Measures of Belief, their relationship to probabilistic reasoning, and finally their application in rule-based systems 3. The Demster Shafer model for reasoning under uncertainty in rulebased systems 1

Uncertainty and Evidential Support In its simplest case, a Knowledge Base contains rules of the form : A & B & C => D where facts A, B, C are considered to be True (that is these facts hold with probability 1), and D is asserted in the Knowledge Base as being True (also with probability 1) However for realistic cases, domain knowledge has to be modeled in way that accommodates uncertainty. In other words we would like to encode domain knowledge using rules of the form: A & B & C => D (CF:x1) where A, B, C are not necessarily certain (i.e. CF = 1) Issues in Rule-Based Reasoning Under Uncertainty Many rules support the same conclusion with various degrees of Certainty A1 & A2 & A3 => H (CF=0.5) B1 & B2 & B3 => H (CF=0.6) (If we assume all A1, A2, A3, B1, B3, B3 hold then H is supported with CF(H) = CFcombine(0.5, 0.6)) The premises of a rule to be applied do not hold with absolute certainty (CF, or probability associated with a premise not equal to 1) Rule: A1 => H (CF=0.5) However if during a consultation, A1 holds with CF(A1) = 0.3 the H holds with CF(H) = 0.5*0.3 = 0.15 2

The Certainty Factor Model The potential for a single piece of negative evidence should not overwhelm several pieces of positive evidence and vice versa the computational expense of storing MB s and MD s should be avoided and instead maintain a cumulative CF value Simple model: CF = MB - MD Cfcombine(X, Y) = X + Y*(1-X) The problem is that a single negative evidence overwhelms several pieces of positive evidence The Revised CF Model CF = MB - MD 1 - min(mb, MD) {X + Y(1 - X) X, Y > 0 CFcombine(X,Y) = X + Y One of X, Y < 0 1 - min( X, Y ) - CFcombine(-X, -Y) X, Y < 0 3

Additional Use of CFs Provide methods for search termination R1 R2 R3 R4 A B C D E 0.8 0.4 0.7 0.7 In the case of branching in the inference sequencing paths should be kept distinct Cutoff in Complex Inferences R1 R2 A B C 0.8 0.4 0.9 D R3 0.7 0.7 R5 F R4 E We should maintain to paths for cutoff (0.2), one being (E, D, C, B, A) and the other (F, C, B, A). If we had one path then E, D, C would drop to 0.19 and make C unusable later in path F, C, B, A. 4

Reasoning Under Uncertainty Part (2) Measures of Belief Kostas Kontogiannis E&CE 457 Terminology The units of belief follow the same as in probability theory If the sum of all evidence is represented by e and d is the diagnosis (hypothesis) under consideration, then the probability P(d e) is interpreted as the probabilistic measure of belief or strength that the hypothesis d holds given the evidence e. In this context: P(d) : a-priori probability (the probability hypothesis d occurs P(e d) : the probability that the evidence represented by e are present given that the hypothesis (i.e. disease) holds 5

Analyzing and Using Sequential Evidence Let e 1 be a set of observations to date, and s 1 be some new piece of data. Furthermore, let e be the new set of observations once s 1 has been added to e 1. Then P(d i e) = P(s 1 d i & e1) P(d i e 1 ) Sum (P(s 1 d j & e 1 ) P(d j e 1 )) P(d e) = x is interpreted: IF you observe symptom e THEN conclude hypothesis d with probability x Requirements It is practically impossible to obtain measurements for P(sk dj) for each or the pieces of data sk, in e, and for the inter-relationships of the sk within each possible hypothesis dj Instead, we would like to obtain a measurement of P(di e) in terms of P(di sk), where e is the composite of all the observed sk 6

Advantages of Using Rules in Uncertainty Reasoning The use of general knowledge and abstractions in the problem domain The use of judgmental knowledge Ease of modification and fine-tuning Facilitated search for potential inconsistencies and contradictions in the knowledge base Straightforward mechanisms for explaining decisions An augmented instructional capability Measuring Uncertainty Probability theory Confirmation Classificatory: The evidence e confirms the hypothesis h Comparative: e1 confirms h more strongly than e2 confirms h or e confirms h1 more strongly than e confirms h2 Quantitative: e confirms h with strength x usually denoted as C[h,e]. In this context C[h,e] is not equal to 1-C[~h,e] Fuzzy sets 7

Model of Evidential Strength Quantification scheme for modeling inexact reasoning The concepts of belief and disbelief as units of measurement The terminology is based on: MB[h,e] = x the measure of increased belief in the hypothesis h, based on the evidence e, is x MD[h,e] = y the measure of increased disbelief in the hypothesis h, based on the evidence e, is y The evidence e need not be an observed event, but may be a hypothesis subject to confirmation For example, MB[h,e] = 0.7 reflects the extend to which the expert s belief that h is true is increased by the knowledge that e is true In this sense MB[h,e] = 0 means that the expert has no reason to increase his/her belief in h on the basis of e Probability and Evidential Model In accordance with subjective probability theory, P(h) reflects expert s belief in h at any given time. Thus 1 - P(h) reflects expert s disbelief regarding the truth of h If P(h e) > P(h), then it means that the observation of e increases the expert s belief in h, while decreasing his/her disbelief regarding the truth of h In fact, the proportionate decrease in disbelief is given by the following ratio P(h e) - P(h) 1 - P(h) The ratio is called the measure of increased belief in h resulting from the observation of e (i.e. MB[h,e]) 8

Probability and Evidential Model On the other hand, if P(h e) < P(h), then the observation of e would decrease the expert s belief in h, while increasing his/her disbelief regarding the truth of h. The proportionate decrease in this case is P(h) - P(h e) P(h) Note that since one piece of evidence can not both favor and disfavor a single hypothesis, that is when MB[h,e] >0 MD[h,e] = 0, and when MD[h,e] >0 then MB[h,e] = 0. Furthermore, when P(h e) = P(h), the evidence is independent of the hypothesis and MB[h,e] = MD[h,e] = 0 Definitions of Evidential Model MB[h,e] = { 1 if P(h) = 1 max[p(h e), P(h)] - P(h) otherwise 1 - P(h) MD[h,e] = { 1 if P(h) = 0 min[p(h e), P(h)] - P(h) otherwise - P(h) CF[h,e] = MB[h,e] - MD[h,e] 9

Characteristics of Belief Measures Range of degrees: 0 <= MB[h,e] <= 1 0 <= MD[h,e] <= 1-1 <= CF[h,e] <= +1 Evidential strength of mutually exclusive hypotheses If h is shown to be certain P(h e) = 1 MB[[h,e] = 1 MD[h,e] = 0 CF[h,e] = 1 If the negation of h is shown to be certain P(~h e) = 1 MB[h,e] = 0 MD[h,e] = 1 CF[h,e] = -1 Characteristics of Belief Measures Suggested limits and ranges: -1 = CF[h,~h] <= CF[h,e] <= C[h,h] = +1 Note: MB[~h,e] = 1 if and only if MD[h,e] = 1 For mutually exclusive hypotheses h1 and h2, if MB[h1,e] = 1, then MD[h2,e] = 1 Lack of evidence: MB[h,e] = 0 if h is not confirmed by e (i.e. e and h are independent or e disconfirms h) MD[h,e] = 0 if h is not disconfirmed by e (i.e. e and h are independent or e confirms h) CF[h,e] = 0 if e neither confirms nor disconfirms h (i.e. e and h are independent) 10

More Characteristics of Belief Measures CF[h,e] + CF[~h,e] =/= 1 MB[h,e] = MD[~h,e] The Belief Measure Model as an Approximation Suppose e = s1 & s2 and that evidence e confirms d. Then CF[d, e] = MB[d,e] - 0 = P(d e) - P(d) = 1 - P(d) = P(d s1&s2) - P(d) 1 - P(d) which means we still need to keep probability measurements and moreover, we need to keep MBs and MDs 11

Defining Criteria for Approximation MB[h, e+] increases toward 1 as confirming evidence is found, equaling 1 if and only f a piece of evidence logically implies h with certainty MD[h, e-] increases toward 1 as disconfirming evidence is found, equaling 1 if and only if a piece of evidence logically implies ~h with certainty CF[h, e-] <= CF[h, e- & e+] <= CF[h, e+] MB[h,e+] = 1 then MD[h, e-] = 0 and CF[h, e+] = 1 MD[h, e-] = 1 then MB[h, e+] = 0 and CF[h, e-] = -1 The case where MB[h, e+] = MD[h, e-] = 1 is contradictory and hence CF is undefined Defining Criteria for Approximation If s1 & s2 indicates an ordered observation of evidence, first s1 then s2 then: MB[h, s1&s2] = MB[h, s2&s1] MD[h, s1&s2] = MD[h, s2&s1] CF[h, s1&s2] = CF[h, s2&s1] If s2 denotes a piece of potential evidence, the truth or falsity of which is unknown: MB[h, s1&s2] = MB[h, s1] MD[h, s1&s2] = MD[h, s1] CF[h, s1&s2] = CF[h, s1] 12

Combining Functions MB[h, s1&s2] = { 0 If MD[h, s1&s2] = 1 MB[h, s1] + MB[h, s2](1 - MB[h, s1]) otherwise MD[h, s1&s2] = { 0 If MB[h, s1&s2] = 1 MD[h, s1] + MD[h, s2](1 - MD[h, s1]) otherwise MB[h1 or h2, e] = max(mb[h1, e], MB[h2, e]) MD[h1 or h2, e] = min(md[h1, e], MD[h2, e]) MB[h, s1] = MB [h, s1] * max(0, CF[s1, e]) MD[h,s1] = MD [h, s1] * max(0, CF[s1, e]) Probabilistic Reasoning and Certainty Factors (Revisited) Of methods for utilizing evidence to select diagnoses or decisions, probability theory has the firmest appeal The usefulness of Bayes theorem is limited by practical difficulties, related to the volume of data required to compute the a-priori probabilities used in the theorem. On the other hand CFs and MBs, MDs offer an intuitive, yet informal, way of dealing with reasoning under uncertainty. The MYCIN model tries to combine these two areas (probabilistic, CFs) by providing a semi-formal bridge (theory) between the two areas 13

A Simple Probability Model (The MYCIN Model Prelude) Consider a finite population of n members. Members of the population may possess one or more of several properties that define subpopulations, or sets. Properties of interest might be e1 or e2, which may be evidence for or against a diagnosis h. The number of individuals with a certain property say e, will be denoted as n(e), and the number of two properties e1 and e2 will be denoted as n(e1&e2). Probabilities can be computed as ratios A Simple Probability Model (Cont.) From the above we observe that: n(e1 & h) * n = n(e & h) * n n(e) * n(h) = n(h) * n(e) So a convenient form of Bayes theorem is: P(h e) = P(e h) P(h) P(e) If we consider that two pieces of evidence e1 and e2 bear on a hypothesis h, and that if we assume e1 and e2 are independent then the following ratios hold and n(e1 & e2) = n(e1) * n(e2) n n n n(e1 & e2 & h) = n(e1 & h) * n(e2 & h) n(h) n(h) n(h) 14

Simple Probability Model With the above the right-hand side of the Bayes Theorem becomes: P(e1 & e2 h) = P(e1 h) * P(e2 h) P(e1 & e2) P(e1) P(e2) The idea is to ask the experts to estimate the ratios P(e i h)/p(h) and P(h), and from these compute P(h e 1 & e 2 & & e n ) The ratios P(e i h)/p(h) should be in the range [0,1/P(h)] In this context MB[h,e] = 1 when all individuals with e have disease h, and MD[h,e] = 1 when no individual with e has h Adding New Evidence Serially adjusting the probability of a hypothesis with new evidence against the hypothesis: P(h e ) = P(e i h) * P(h e ) P(e i ) or new evidence favoring the hypothesis: P(h e ) = 1 - P(e i ~h) * [ 1 - P(h e )] P(e i ) 15

Measure of Beliefs and Probabilities We can define then the MB and MD as: MB[h,e] = 1 - P(e ~h] P(e) and MD[h,e] = 1 - P(e h) P(e) The MYCIN Model MB[h1 & h2, e] = min(mb[h1,e], MB[h2,e]) MD[h1 & h2, e] = max(md[h1,e], MD[h2,e]) MB[h1 or h2, e) = max(mb[h1,e], MB[h2,e]) MD[h1 or h2, e) = min(md[h1,e], MDh2,e]) 1 - MD[h, e1 & e2] = (1 - MD[h,e1])*(1-MD[h,e2]) 1- MB[h, e1 & e2] = (1 - MB[h,e1])*(1-MB[h, e2]) CF(h, e f & e a ) = MB[h, e f ] - MD[h,e a ] 16

Reasoning Under Uncertainty Part (3) Demster-Shafer Model Kostas Kontogiannis E&CE 457 The Demster-Shafer Model So far we have described techniques, all of which consider an individual hypothesis (proposition) and and assign to each of them a point estimate in terms of a CF An alternative technique is to consider sets of propositions and assign to them an interval of the form [Belief, Plausibility] that is [Bel(p), 1-Bel(~p)] 17

Belief and Plausibility Belief (denoted as Bel) measures the strength of the evidence in favor of a set of hypotheses. It ranges from 0 (indicating no support) to 1 (indicating certainty). Plausibility (denoted as Pl) is defined as Pl(s) = 1 - Bel(~s) Plausibility also ranges from 0 to 1, and measures the extent to which evidence in favor of ~s leaves room for belief in s. In particular, if we have certain evidence in favor of ~s, then the Bel(~s) = 1, and the Pl(s) = 0. This tells us that the only possible value for Bel(s) = 0 Objectives for Belief and Plausibility To define more formally Belief and Plausibility we need to start with an exhaustive universe of mutually exclusive hypotheses in our diagnostic domain. We call this set frame of discernment and we denote it as Theta Our goal is to attach a some measure of belief to elements of Theta. In addition, since the elements of Theta are mutually exclusive, evidence in favor of some may have an effect on our belief in the others. The key function we use to measure the belief of elements of Theta is a probability density function, which we denote as m 18

The Probability Density Function in Demster-Shafer Model The probability density function m used in the Demster- Shafer model, is defined not just for the elements of Theta but for all subsets of it. The quantity m(p) measures the amount of belief that is currently assigned to exactly the set p of hypotheses If Theta contains n elements there are 2 n subsets of Theta We must assign m so that the sum of all the m values assigned to subsets of Theta is equal to 1 Although dealing with 2 n hypotheses may appear intractable, it usually turns out that many of the subsets will never need to be considered because they have no significance in a particular consultation and so their m value is 0 Defining Belief in Terms of Function m Having defined m we can now define Bel(p) for a set p, as the sum of the values of m for p and for all its subsets. Thus Bel(p) is our overall belief, that the correct answer lies somewhere in the set p In order to be able to use m, and thus Bel and Pl in reasoning programs, we need to define functions that enable us to combine m s that arise from multiple sources of evidence The combination of belief functions m1 and m2 is supported by the Dempster-Shafer model and results to a new belief function m3 19

Combining Belief Functions To combine the belief functions m1 and m2 on sets X and Y we use the following formula m3(z) = Sum Y intersect Y = Z m1(x) * m2(y) 1 - Sum X intersect Y = empty m1(x) * m2(y) If all the intersections X, Y are not empty then m3 is computed by using only the upper part of the fraction above (I.e. normalize by dividing by 1) If there are intersections of X, Y that are empty the upper part of the fraction is normalized by 1-k (where k is the sum of the m1*m2 on the X,Y elements that give empty intersection 20