Uncertainty (Chapter 13, Russell & Norvig)

Similar documents
Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertainty. Chapter 13

Artificial Intelligence

Uncertainty. Outline

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Uncertainty. Chapter 13

Pengju XJTU 2016

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Uncertainty. Chapter 13, Sections 1 6

Artificial Intelligence Uncertainty

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

CS 561: Artificial Intelligence

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Uncertainty. Chapter 13. Chapter 13 1

Probabilistic Reasoning

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Probabilistic Robotics

Uncertain Knowledge and Reasoning

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Web-Mining Agents Data Mining

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Uncertainty and Bayesian Networks

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Chapter 13 Quantifying Uncertainty

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Basic Probability and Decisions

Uncertainty. Russell & Norvig Chapter 13.

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Quantifying uncertainty & Bayesian networks

Where are we in CS 440?

Where are we in CS 440?

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

CS 188: Artificial Intelligence Fall 2009

Probabilistic Reasoning

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

CS 5100: Founda.ons of Ar.ficial Intelligence

Probabilistic representation and reasoning

Pengju

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Reasoning with Uncertainty. Chapter 13

Our Status. We re done with Part I Search and Planning!

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Models

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

CS 188: Artificial Intelligence Fall 2009

CSE 473: Artificial Intelligence

PROBABILISTIC REASONING Outline

Probability and Decision Theory

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Probabilistic representation and reasoning

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Cartesian-product sample spaces and independence

Uncertainty. CmpE 540 Principles of Artificial Intelligence Pınar Yolum Uncertainty. Sources of Uncertainty

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

CS 5522: Artificial Intelligence II

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2008

Y. Xiang, Inference with Uncertain Knowledge 1

Artificial Intelligence CS 6364

Fusion in simple models

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review

Probabilistic Models. Models describe how (a portion of) the world works

CSE 473: Artificial Intelligence Autumn 2011

Probability theory: elements

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Basic Probability and Statistics

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Artificial Intelligence Programming Probability

Graphical Models - Part I

CS 188: Artificial Intelligence Fall 2011

Reasoning under Uncertainty: Intro to Probability

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

COMP5211 Lecture Note on Reasoning under Uncertainty

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

PROBABILITY. Inference: constraint propagation

CS 343: Artificial Intelligence

Stochastic Methods. 5.0 Introduction 5.1 The Elements of Counting 5.2 Elements of Probability Theory

Bayesian networks (1) Lirong Xia

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Our Status in CSE 5522

Reasoning Under Uncertainty

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Reasoning Under Uncertainty

Web-Mining Agents. Prof. Dr. Ralf Möller Dr. Özgür Özçep. Universität zu Lübeck Institut für Informationssysteme. Tanya Braun (Lab Class)

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Answering. Question Answering: Result. Application of the Theorem Prover: Question. Overview. Question Answering: Example.

Transcription:

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150

Administration Midterm next Tuesday!!! I will try to find an old one to post. The MT will cover chapters 1-6, with emphasis on 3-6 Programming assignment is posted - start early and often! ;-) Read Chapter 13 I will not quiz you on it today!

Last Lecture: CSP Summary Many problems can be cast as Constraint Satisfaction Problems One main difference between CSPs and State Space Search (SSS) is that we have access to the structure of the state - constraints between variables A second main difference between CSPs and SSS is that domain-independent heuristics work well! CSPs can be solved by Depth First Search with backtracking.

Last Lecture: CSP Summary Problems can be made smaller by preprocessing to remove values of variables that are inconsistent: Unary constraints Binary constraints (Arc consistency - AC-3) Trinary constraints (Path consistency) This kind of inference can be applied during the search - interleaving it with variable assignment. One version of this is forward checking, which is essentially AC-3 applied after the assignment.

Last Lecture: CSP Summary Two main general heuristics: When assigning a new variable: Choose one with the fewest legal values (Minimum Remaining Values heuristic: MRV) This tends to prune the search tree This works well with forward checking Then assign the least constraining value (LCV): This is most likely to lead to a solution.

Last Lecture: CSP Summary And some stuff I didn t talk about: Local search with min-conflicts works surprisingly well Minimizing the problem by cut-set conditioning, which tries to break the constraint graph into trees, is useful because tree-structured constraint graphs can be solved in linear time!

This Lecture (probably won t t get to everything here today ) Uncertainty Probability Syntax Semantics Inference rules Independence and Bayes rule Chapter 13 Reading

The real world: Uncertainty Consider the simple problem of going home: I plan to go home today. In GOFAI (Good Old Fashioned AI), I had to prove my plan would work! So I have to deal with every eventuality - what if my car has been stolen, towed, or has a flat tire? What if I have lost my keys? What if I left my lights on and my battery is dead? There are many unknowns here!!! This is called the Qualification Problem Qualification Problem: I can never finish listing all the required preconditions and exceptional conditions in order to complete the task.

Possible Solutions Conditional planning Plan to obtain information (observation actions) Subplan for each contingency, e.g., [Check(Tire1), [IF Intact(Tire1) THEN [Inflate(Tire1)] ELSE [CallAAA] ] Expensive because it plans for many unlikely cases Monitoring/Replanning Assume normal states, outcomes Check progress during execution, replan if necessary Unanticipated outcomes may lead to failure (e.g., no AAA card) In general, some monitoring is unavoidable

Dealing with Uncertainty Head-on: Probability Let action A t = leave for airport t minutes before flight from Lindbergh Field Will A t get me there on time? Problems: 1. Partial observability (road state, other drivers' plans, etc.) 2. Noisy sensors (traffic reports) 3. Uncertainty in action outcomes (turn key, car doesn t start, etc.) 4. Immense complexity of modeling and predicting traffic Hence a purely logical approach either 1) risks falsehood: A 90 will get me there on time, or 2) leads to conclusions that are too weak for decision making: A 90 will get me there on time if there's no accident on I-5 and it doesn't rain and my tires remain intact, etc., etc. (A 1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport )

Methods for handling uncertainty Default or nonmonotonic logic: If the Antecendent holds and I can t prove NOT(Justification), then infer the Conclusion. Assume A 90 works if I can t prove that I have a flat tire: Have_ Keys : NOT (Flat _Tire) DO(A 90 ) Antecedent : Justification Conclusion Issues: What things do I need to put to the right of the :? " The qualification problem again. Lack of a good semantic model for nonmonotonic logic.

Methods for handling uncertainty Rules with fudge factors: A90 " 0.3 get there on time Sprinkler " 0.99 WetGrass WetGrass " 0.7 Rain Issues: Problems with rule combination, e.g., does Sprinkler cause Rain?? Probability Given the available evidence, A 90 will get me there on time with probability 0.95 (Fuzzy logic handles degree of truth NOT uncertainty e.g., WetGrass is true to degree 0.2)

Fuzzy Logic in Real World

Probability Probabilistic assertions summarize effects of Ignorance: lack of relevant facts, initial conditions, etc. Laziness: failure to enumerate exceptions, qualifications, etc. Subjective or Bayesian probability: Probabilities relate propositions to one's own state of knowledge e.g., P(A 90 succeeds no reported accidents) = 0.97 These are NOT assertions about the world, but represent belief about the whether the assertion is true. Probabilities of propositions change with new evidence: e.g., P(A 90 no reported accidents, 5 a.m.) = 0.99 (Analogous to logical entailment status; I.e., does KB = # )

Making decisions under uncertainty Suppose I believe the following: P(A 30 P(A 60 P(A 100 P(A 1440 gets me there on time...) = 0.05 gets me there on time...) = 0.70 gets me there on time...) = 0.95 gets me there on time...) = 0.9999 Which action to choose? Depends on my preferences for missing flight vs. airport cuisine, etc. Utility theory is used to represent and infer preferences Decision theory = utility theory + probability theory

Making decisions under uncertainty Suppose I believe the following: P(A 30 P(A 60 P(A 100 P(A 1440 gets me there on time...) = 0.05 gets me there on time...) = 0.70 gets me there on time...) = 0.95 gets me there on time...) = 0.9999 Decision theory = utility theory + probability theory E.g., best action is: arg max P(A i evidence) *Utility(being at the airport i! drive_time minutes ahead of time) A i

Unconditional Probability Let A be a proposition, P(A) denotes the unconditional probability that A is true. Example: if Male denotes the proposition that a particular person is male, then P(Male)=0.5 mean that without any other information, the probability of that person being male is 0.5 (a 50% chance). Alternatively, if a population is sampled, then 50% of the people will be male. Of course, with additional information (e.g. that the person is a CS150 student), the posterior probability will likely be different.

Axioms of probability For any propositions A, B 1. 0 $ P(A) $ 1 2. P(True) = 1 and P(False) = 0 3. P(A % B) = P(A) + P(B) - P(A & B) de Finetti (1931): an agent who bets according to probabilities that violate these axioms can be forced to bet so as to lose money regardless of outcome.

The unbeliever loses Note that Agent 1 - the unbeliever - believes that P(a) OR P(b) > P(a) + P(b) Agent 2 bets according to the odds that Agent 1 believes, and bets against his belief that P(a) OR P(b) > P(a) + P(b)

Syntax Similar to PROPOSITIONAL LOGIC: possible worlds defined by assignment of values to random variables. Note: To make things confusing variables have first letter upper-case, and variable values are lower-case Propositional or Boolean random variables e.g., Cavity (do I have a cavity?) (P(cavity) is shorthand for P(Cavity=true)) Include propositional logic expressions e.g., burglary % earthquake Multivalued random variables e.g., Weather is one of <sunny,rain,cloudy,snow> Values must be exhaustive and mutually exclusive A proposition is constructed by assignment of a value: e.g., Weather = sunny; also Cavity = true

Priors, Distribution, Joint Prior or unconditional probabilities of propositions e.g., P(cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives probabilities of all possible values of the random variable. Weather is one of <sunny,rain,cloudy,snow> P(Weather) = <0.72, 0.1, 0.08, 0.1> (normalized, i.e., sums to 1)

Joint probability distribution Joint probability distribution for a set of variables gives values for each possible assignment to all the variables P(Toothache, Cavity) is a 2 by 2 matrix. Cavity=true Cavity=false Toothache=true 0.04 0.01 Toothache = false 0.06 0.89 NOTE: Elements in table sum to 1 numbers. 3 independent

The importance of independence P(Weather,Cavity) is a 4 by 2 matrix of values: Weather = sunny rain cloudy snow Cavity=True.072.01.008.01 Cavity=False.648.09.072.09 But these are independent (usually): Recall that if X and Y are independent then: P(X=x,Y=y) = P(X=x)*P(Y=y) Weather is one of <sunny,rain,cloudy,snow> P(Weather) = <0.72, 0.1, 0.08, 0.1> and P(Cavity) =.1

The importance of independence If we compute the marginals (sums of rows or columns), we see: Weather = sunny rain cloudy snow marginals Cavity=True.072.01.008.01.1 Cavity=False.648.09.072.09.9 marginals.72.1.08.1 Each entry is just the product of the marginals (because they re independent)! So, instead of using 7 numbers, we can represent these by just recording 4 numbers, 3 from Weather and 1 from Cavity: Weather is one of <sunny,rain,cloudy,snow> P(Weather) = <0.72, 0.1, 0.08, (1- Sum of the others)> P(Cavity) = <.1, (1-.1)> This can be much more efficient (more later )

Conditional Probabilities Conditional or posterior probabilities e.g., P(Cavity Toothache) = 0.8 What is the probability of having a cavity given that the patient has a toothache? (as another example: P(Male CS150Student) =?? If we know more, e.g., dental probe catches, then we have P(Cavity Toothache, Catch) =.9 (say) If we know even more, e.g., Cavity is also given, then we have P(Cavity Toothache, Cavity) = 1 Note: the less specific belief remains valid after more evidence arrives, but is not always useful. New evidence may be irrelevant, allowing simplification, e.g., P(Cavity Toothache, PadresWin) = P(Cavity Toothache) = 0.8 (again, this is because Cavities and Baseball are independent

Conditional probability in pictures The universe of all possible events P(A B) = P(A & B) P(B) A A&BB Note that P(A) is the ratio of the size of the event space in which A is true to the size of the whole space P(A B) means: If we know B, what is the probability of A now? Another way of saying it is, our universe has shrunk to B

Conditional probability in pictures The universe of all possible events P(A B) = P(A & B) P(B) A A&B B Note that P(A) is the ratio of the size of the event space in which A is true to the size of the whole space P(A B) means: If we know B, what is the probability of A now? Another way of saying it is, our universe has shrunk to B

Conditional Probabilities Conditional or posterior probabilities e.g., P(Cavity Toothache) = 0.8 What is the probability of having a cavity given that the patient has a toothache? Computation of conditional probability using complete joint P(A B) = P(A, B) / P(B) if P(B) ' 0 Cavity=true Cavity=false Toothache=true 0.04 0.01 Toothache = false 0.06 0.89 P(Cavity Toothache) =.04/(.04+.01) = 0.8

Conditional probability cont. Definition of conditional probability: P(A B) = P(A, B) / P(B) if P(B) ' 0 Product rule gives an alternative formulation: P(A, B) = P(A B)P(B) = P(B A)P(A) Example: P(Male)= 0.5 P(CS151Student) = 0.0037 P(Male CS151Student) = 0.9 What is the probability of being a male CS151 student? P(Male, CS151Student) = P(Male CS151Student)P(CS151Student) = 0.0037 * 0.9 = 0.0033

Conditional probability cont. A general version holds for whole distributions, e.g., P(Weather,Cavity) = P(Weather Cavity) P(Cavity) (View as a 4 X 2 set of equations, not matrix multiplication - This is a pointwise product) The Chain rule is derived by successive application of product rule: P(X 1, X n ) = P(X 1,, X n-1 ) P(X n X 1,, X n-1 ) = P(X 1,,X n-2 ) P(X n-1 X 1,,X n-2 ) P(X n X 1,, X n-1 ) = n = ( P(X i X 1,, X i-1 ) i=1

Bayes Rule From product rule P(A& B) = P(A B)P(B) = P(B A)P(A), we can obtain Bayes' rule P ( A B) = P( B A) P( A) P( B) Why is this useful??? For assessing diagnostic probability from causal probability: e.g. Cavities as the cause of toothaches Sleeping late as the cause for missing class Meningitis as the cause of a stiff neck P ( Cause Effect) = P( Effect Cause) P( Cause) P( Effect)

Bayes Rule: Example P ( Cause Effect) = P( Effect Cause) P( Cause) P( Effect) Let M be meningitis, S be stiff neck P(M)=0.0001 P(S)= 0.1 P(S M) = 0.8 P( S M ) P( M ) 0.8! 0.0001 P( M S) = = = P( S) 0.1 0.0008 Note: posterior probability of meningitis still very small!

Bayes Rule: Another Example Let A be AIDS, T be positive test result for AIDS P(T A)=0.99 true positive rate of test (aka a hit ) P(T ~A)= 0.005 false positive rate of test (aka a false alarm ) P(~T A) = 1-0.99=.01 false negative or miss P(~T ~A) = 0.995 correct rejection Seems like a pretty good test, right?

Bayes Rule: Another Example Let A be AIDS, T be positive test result for AIDS P(T A)=0.99 true positive rate of test (aka a hit ) P(T ~A)= 0.005 false positive rate of test (aka a false alarm ) P(~T A) = 1-0.99=.01 false negative or miss P(~T ~A) = 0.995 correct rejection Now, suppose you are from a low-risk group, so P(A) =.001, and you get a positive test result. Should you worry? Let s compute it using Bayes rule: P(A T) = P(T A)*P(A)/P(T) Hmmm. Wait a minute: How do we know P(T)?

Bayes Rule: Another Example Hmmm. Wait a minute: How do we know P(T)? Recall there are two possibilities: Either you have AIDS or you don t. Therefore: P(A T) + P(~A T) = 1. P(A T) = P(T A)*P(A)/P(T) P(~A T) = P(T ~A)*P(~A)/P(T) So, P(A T) + P(~A T) = P(T A)*P(A)/P(T) + P(T ~A)*P(~A)/P(T) = 1 = [P(T A)*P(A) + P(T ~A)*P(~A)]/P(T)=1 => P(T A)*P(A) + P(T ~A)*P(~A)=P(T) I.e., the sum of the numerators is P(T).

Bayes Rule: Another Example P(A T) = P(T A)*P(A)/P(T) Hmmm. Wait a minute: How do we know P(T)? P(T A)*P(A) + P(T ~A)*P(~A) = P(T) ) The denominator is always the sum of the numerators if they are exhaustive of the possibilities - a normalizing constant to make the probabilities sum to 1! ) We can compute the numerators and then normalize afterwards!

Bayes Rule: Back to the Example Let A be AIDS, T be positive test result for AIDS P(T A)=0.99 true positive rate of test (aka a hit ) P(T ~A)= 0.005 false positive rate of test (aka a false alarm ) P(~T A) = 1-0.99=.01 false negative or miss P(~T ~A) = 0.995 correct rejection Now, suppose you are from a low-risk group, so P(A) =.001, and you get a positive test result. Should you worry? P(A T) *P(T A)*P(A) = 0.99*.001 =.00099 P(~A T) * P(T ~A)*P(~A) = 0.005*.999 = 0.004995 P(A T) =.00099/(.00099+.004995) = 0.165 Maybe you should worry, or maybe you should get re-tested.

Bayes Rule: Back to the Example Recall P(A) = 0.001 - the prior probability of having AIDS in your group And now we know P(A T) = 0.165 - the posterior probability of having AIDS, given a positive AIDS test. What if you get re-tested with a different, independent AIDS test, call it T2? How do we combine the new results with the prior results? P(A T) is the new prior, with everything conditioned on the prior result: P(A T,T2) = P(T2 A,T)*P(A T)/P(T2 T). But since T2 is an independent test, P(A T,T2)=P(T2 A)*P(A T)/P(T2). This is called Bayesian updating.

Bayes Rule: Back to the Example II Let A be AIDS, T be positive test result for AIDS P(T A)=0.99 true positive rate of test (aka a hit ) P(T ~A)= 0.005 false positive rate of test (aka a false alarm ) P(~T A) = 1-0.99=.01 false negative or miss P(~T ~A) = 0.995 correct rejection Now, suppose you are from a high-risk group, so P(A) =.01, and you get a positive test result. Should you worry? I.e., what is P(A T) now? The answer is: A: 0.99 B: 0.05 C: 0.67 D: 0.01

Bayes Rule: Back to the Example II Let A be AIDS, T be positive test result for AIDS P(T A)=0.99 true positive rate of test (aka a hit ) P(T ~A)= 0.005 false positive rate of test (aka a false alarm ) P(~T A) = 1-0.99=.01 false negative or miss P(~T ~A) = 0.995 correct rejection Now, suppose you are from a high-risk group, so P(A) =.01, and you get a positive test result. Should you worry? P(A T) *P(T A)*P(A) = 0.99*.01 =.0099 P(~A T) * P(T ~A)*P(~A) = 0.005*.99 = 0.00495 P(A T) =.0099/(.0099+.00495) = 0.667 => You should worry!

A generalized Bayes Rule More general version conditionalized on some background evidence E P ( A B, E) = P( B A, E) P( A P( B E) E)

Full joint distributions A complete probability model specifies every entry in the joint distribution for all the variables X = X 1,..., X n i.e., a probability for each possible world X 1 = x 1,..., X n = x n E.g., suppose Toothache and Cavity are the random variables: Cavity=true Cavity=false Toothache=true 0.04 0.01 Toothache = false 0.06 0.89 Possible worlds are mutually exclusive! P(w 1 & w 2 ) = 0 Possible worlds are exhaustive! w 1 % % w n is True hence + i P(w i ) = 1

Using the full joint distribution Cavity=true Cavity=false Toothache=true 0.04 0.01 Toothache = false 0.06 0.89 What is the unconditional probability of having a Cavity? P(Cavity) = P(Cavity ^ Toothache) + P(Cavity^ ~Toothache) = 0.04 + 0.06 = 0.1 What is the probability of having either a cavity or a Toothache? P(Cavity % Toothache) = P(Cavity,Toothache) + P(Cavity, ~Toothache) + P(~Cavity,Toothache) = 0.04 + 0.06 + 0.01 = 0.11

Using the full joint distribution Cavity=true Cavity=false Toothache=true 0.04 0.01 Toothache = false 0.06 0.89 What is the probability of having a cavity given that you already have a toothache? P( Cavity! Toothache) 0.04 P( Cavity Toothache) = = = P( Toothache) 0.04 + 0.01 0.8

Normalization Suppose we wish to compute a posterior distribution over random variable A given B=b, and suppose A has possible values a 1... a m We can apply Bayes' rule for each value of A: P(A=a 1 B=b) = P(B=b A=a 1 )P(A=a 1 )/P(B=b)... P(A=a m B=b) = P(B=b A=a m )P(A=a m )/P(B=b) Adding these up, and noting that + i P(A=a i B=b) = 1: P(B=b) = + i P(B=b A=a i )P(A=a i ) This is the normalization factor denoted # = 1/P(B=b): P(A B=b) = # P(B=b A)P(A) Typically, we compute an unnormalized distribution, normalize at the end e.g., suppose P(B=b A)P(A) = <0.4,0.2,0.2> then P(A B=b) = # <0.4,0.2,0.2> = <0.4,0.2,0.2> / (0.4+0.2+0.2) = <0.5,0.25,0.25>

Marginalization Given a conditional distribution P(X Z), we can create the unconditional distribution P(X) by marginalization: P(X) = + z P(X Z=z) P(Z=z) = + z P(X, Z=z) In general, given a joint distribution over a set of variables, the distribution over any subset (called a marginal distribution for historical reasons) can be calculated by summing out the other variables.

Conditioning Introducing a variable as an extra condition: P(X Y) = + z P(X Y, Z=z) P(Z=z Y) Why is this useful?? Intuition: it is often easier to assess each specific circumstance, e.g., P(RunOver Cross) = P(RunOver Cross, Light=green)P(Light=green Cross) + P(RunOver Cross, Light=yellow)P(Light=yellow Cross) + P(RunOver Cross, Light=red)P(Light=red Cross)

Absolute Independence Two random variables A and B are (absolutely) independent iff P(A, B) = P(A)P(B) Using product rule for A & B independent, we can show: P(A, B) = P(A B)P(B) = P(A)P(B) Therefore P(A B) = P(A) If n Boolean variables are independent, the full joint is: P(X 1,, X n ) =, i P(X i ) Full joint is generally specified by 2 n - 1 numbers, but when independent only n numbers are needed. Absolute independence is a very strong requirement, seldom met!!

Conditional Independence Some evidence may be irrelevant, allowing simplification, e.g., P(Cavity Toothache, PadresWin) = P(Cavity Toothache) = 0.8 This property is known as Conditional Independence and can be expressed as: P(X Y, Z) = P(X Z) which says that X and Y independent given Z. If I have a cavity, the probability that thedentist s probe catches in it doesn't depend on whether I have a toothache: 1. P(Catch Toothache, Cavity) = P(Catch Cavity) i.e., Catch is conditionally independent of Toothache given Cavity This doesn t say anything about P(Catch Toothache) The same independence holds if I haven't got a cavity: 2. P(Catch Toothache, ~Cavity) = P(Catch ~Cavity)

Conditional independence: Some useful equivalences Equivalent statements to: P(Catch Toothache, Cavity) = P(Catch Cavity) ( * ) 1.a P(Toothache Catch, Cavity) = P(Toothache Cavity) P(Toothache Catch, Cavity) = P(Catch Toothache, Cavity) P(Toothache Cavity) / P(Catch Cavity) = P(Catch Cavity) P(Toothache Cavity) / P(Catch Cavity) (from *) = P(Toothache Cavity) 1.b P(Toothache,Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) P(Toothache,Catch Cavity) = P(Toothache Catch,Cavity) P(Catch Cavity) (product rule) = P(Toothache Cavity) P(Catch Cavity) (from 1a)

Using Conditional Independence Full joint distribution can now be written as P(Toothache,Catch,Cavity) = (Toothache,Catch Cavity) P(Cavity) = P(Toothache Cavity)P(Catch Cavity)P(Cavity) (from 1.b) Specified by: 2 + 2 + 1 = 5 independent numbers Compared to 7 for general joint or 3 for unconditionally independent.