COMP5211 Lecture Note on Reasoning under Uncertainty

Similar documents
Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Quantifying uncertainty & Bayesian networks

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Directed Graphical Models

COMP538: Introduction to Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Networks. Motivation

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian networks. Chapter 14, Sections 1 4

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Reasoning Under Uncertainty

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

PROBABILISTIC REASONING SYSTEMS

Probabilistic Reasoning Systems

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Cognitive Systems 300: Probability and Causality (cont.)

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Foundations of Artificial Intelligence

Review: Bayesian learning and inference

Introduction to Artificial Intelligence Belief networks

Introduction to Artificial Intelligence. Unit # 11

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Probabilistic representation and reasoning

Reasoning Under Uncertainty

Probabilistic representation and reasoning

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

14 PROBABILISTIC REASONING

Learning Bayesian Networks (part 1) Goals for the lecture

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Bayesian networks. Chapter Chapter

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Graphical Models - Part I

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Uncertainty. Chapter 13

Probabilistic Representation and Reasoning

Probabilistic Models

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Y. Xiang, Inference with Uncertain Knowledge 1

Another look at Bayesian. inference

Bayesian Approaches Data Mining Selected Technique

Artificial Intelligence Bayesian Networks

Bayesian belief networks

Basic Probability and Statistics

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Quantifying Uncertainty

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Uncertainty and Bayesian Networks

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

12. Vagueness, Uncertainty and Degrees of Belief

Probabilistic Robotics

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayes Nets: Independence

CSE 473: Artificial Intelligence Autumn 2011

Reasoning Under Uncertainty: Belief Network Inference

CS 188: Artificial Intelligence Spring Announcements

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 188: Artificial Intelligence Fall 2009

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

CS 5522: Artificial Intelligence II

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 5522: Artificial Intelligence II

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Artificial Intelligence

Bayesian belief networks. Inference.

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

CS 343: Artificial Intelligence

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Why Probability? It's the right way to look at the world.

Bayesian networks. Chapter Chapter

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayes Networks 6.872/HST.950

Uncertainty. Chapter 13

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Bayesian Networks and Decision Graphs

School of EECS Washington State University. Artificial Intelligence

Discrete Random Variables

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Reasoning Under Uncertainty: Bayesian networks intro

Uncertainty. Outline

Transcription:

COMP5211 Lecture Note on Reasoning under Uncertainty Fangzhen Lin Department of Computer Science and Engineering Hong Kong University of Science and Technology Fangzhen Lin (HKUST) Uncertainty 1 / 33

Uncertainty Ordinary commonsense knowledge quickly moves away from categorical absolute statements like lawyers are always rich: x Lawyer(x) Rich(x). A lawyer might not be rich because of lack of customers bankrupted business addiction to gambling frequent divorces... There are many ways in which we can come to less than categorical information: things usually (occasionally, seldomly) are in a certain way. fuzzy judgments e.g. barely rich, a poor example of chair, not very tall, etc. imprecision of sensors. reliability of sources of information, e.g. most of time he s right. Fangzhen Lin (HKUST) Uncertainty 2 / 33

Probabilities There are many possible ways to address the difficulties, a reasonable first resort would be to use probabilities: quantifying things using probability The probability that John is rich is 0.1. The probability that John, who is a lawyer, is rich is 0.8. Fangzhen Lin (HKUST) Uncertainty 3 / 33

Utility Probability alone is not enough for rational decision making. Example: The oil wildcatter problem (Raffia 1968): Probabilities of outcome if drill. dry wet soaking.500.300.200 Decision: Drill? We also need utilities: dry wet soaking -$100K $100K $500K Decision theory: Combining probability theory with utility theory Principle of maximum expected utility: An agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action. 1 If not drill: $0K. 2 If drill: 100 0.5 + 100 0.3 + 500 0.2 = $80K 3 Rational decision: drill. Fangzhen Lin (HKUST) Uncertainty 4 / 33

Decision-Theoretic Agents function DT-AGENT( percept) returns an action static: a set probabilistic beliefs about the state of the world calculate updated probabilities for current state based on available evidence including current percept and previous action calculate outcome probabilities for actions, given action descriptions and probabilities of current states select action with highest expected utility given probabilities of outcomes and utility information return action Fangzhen Lin (HKUST) Uncertainty 5 / 33

Language use an extension of propositional logic to represent facts. If A is a proposition standing for John is rich, then P(A) = 0.3 means that there is a 0.3 chance that John is rich, and P( A) = 0.7 means that there is a 0.7 chance that John is not rich. If B is a proposition standing for John is a miser, then P(A B) = 0.2 means that there is a 0.2 chance that John is both rich and a miser. A random variable is a variable with a set of possible values. If X is a random variable, and v is one of its possible values, then P(X = v) is the probability that X is equals to v: P(Weather = Sunny) = 0.7, P(Weather = Rain) = 0.2, P(Weather = Snow) = 0.1 Fangzhen Lin (HKUST) Uncertainty 6 / 33

Axioms for Probability All probabilities are between 0 and 1: 0 P(A) 1 Necessarily true propositions have probability 1: P(True) = 1 Necessarily false propositions have probability 0: P(False) = 0 The probability of a disjunction: Set-theoretic view: P(A B) = P(A) + P(B) P(A B) True A A > B B Fangzhen Lin (HKUST) Uncertainty 7 / 33

Prior and Posterior Probabilities Before some evidence is obtained, the probability assessment of a proposition gives the prior probability (or unconditional probability). After the evidence is obtained, the probability assessment gives the posterior probability (or conditional probability). Prior probability: P(A) denotes the probability that proposition A is true. Example: P(Tomorrow = Rainy) = 0.1 Posterior probability: P(A B) denotes the probability that proposition A is true given that proposition B is true. Example: P(Tomorrow = Rainy Today = Rainy) = 0.7 Fangzhen Lin (HKUST) Uncertainty 8 / 33

Product Rule Product rule (chain rule): P(A B) = P(A B)P(B) = P(B A)P(A) Thus if P(A), P(B) > 0, then P(A B) = P(A B) P(B) P(B A) = More general form for multivalued random variables: P(B A) P(A) P(X = x i, Y = y j ) = P(X = x i Y = y j )P(Y = y j ) = P(Y = y j X = x i )P(X = x i ) Or simply P(X, Y ) = P(X Y )P(Y ) = P(Y X )P(X ). The above equations express conditional probabilties in terms of joint probabilties, which are hard to come by. Modern probabilistic reasoning systems try to sidestep joint probabilities and work directly with conditional probabilities. Fangzhen Lin (HKUST) Uncertainty 9 / 33

Bayes Rule Bayes rule: P(A E) = P(E A)P(A) P(E) More general form for multivalued random variables: Or simply, P(Y = y j X = x i ) = P(X = x i Y = y j )P(Y = y j ) P(X = x i ) P(Y X ) = P(X Y )P(Y ) P(X ) Conditioning on some background evidence E: P(Y X, E) = P(X Y, E)P(Y E) P(X E) Fangzhen Lin (HKUST) Uncertainty 10 / 33

Medical Diagnosis Example Given: Find: P(LungCancer Cough) Applying Bayes rule: P(LungCancer Cough) = P(Cough LungCancer) = 0.8 P(LungCancer) = 0.005 P(Cough) = 0.05 = P(Cough LungCancer)P(LungCancer) P(Cough) 0.8 0.005 = 0.08. 0.05 P(LungCancer Cough), as diagnostic knowledge, is usually not directly available in domain knowledge. Instead, P(Cough LungCancer), as causal knowledge, is the more commonly available form. Fangzhen Lin (HKUST) Uncertainty 11 / 33

Combining Evidence Find: P(A E 1, E 2 ) One approach: Absorbing all evidence at once. P(A E 1, E 2 ) = P(E 1, E 2 A)P(A) P(E 1, E 2 ) Another approach: Absorbing evidence one piece at a time. When only E 1 is available: When E 2 also becomes available: P(A E 1 ) = P(A) P(E 1 A) P(E 1 ) P(A E 1, E 2 ) = P(A E 1 ) P(E 2 E 1, A) P(E 2 E 1 ) This process is order independent. = P(A) P(E 1 A) P(E 1 ) P(E 2 E 1, A) P(E 2 E 1 ) (1) Fangzhen Lin (HKUST) Uncertainty 12 / 33

Conditional Independence Estimating P(E 1, E 2 A) or P(E 2 E 1, A) may be computationally unattractive, especially when many pieces of evidence have to be combined. Conditional independence can help. Independence: Two random variables X and Y are independent if P(X Y ) = P(X ), or equivalently P(Y X ) = P(Y ). Knowledge about one variable contains no information about the other. Conditional independence: Two random variables X and Y are conditionally independent given a third variable Z if P(X Y, Z) = P(X Z), or equivalently P(Y X, Z) = P(Y Z). Given background knowledge Z, knowledge about one variable contains no information about the other. Fangzhen Lin (HKUST) Uncertainty 13 / 33

Making Use of Conditional Independence If E 1 and E 2 are conditionally independent given A, then P(A E 1, E 2 ) = P(E 1, E 2 A)P(A) P(E 1, E 2 ) or 1 P(E 1,E 2 ) P(A E 1, E 2 ) = P(A) P(E 1 A) P(E 1 ) = P(E 1 A)P(E 2 A)P(A) P(E 1, E 2 ) P(E 2 E 1, A) P(E 2 E 1 ) = P(A) P(E 1 A)P(E 2 A). P(E 1, E 2 ) is the normalization constant and P(E 1, E 2 ) = P(E 1, E 2 A)P(A) + P(E 1, E 2 A)P( A) = P(E 1 A)P(E 2 A)P(A) + P(E 1 A)P(E 2 A)P( A) Fangzhen Lin (HKUST) Uncertainty 14 / 33

Reasoning with Joint Probability Example: Alarm Story: In LA, burglary and earthquake are not uncommon. They both can cause alarm. In case of alarm, two neighbors John and Mary may call. Variables: Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M). Problem: Estimate the probability of a burglary based on who has or has not called. Problem can be solved if joint probability is available. P(B, E, A, J, M) Fangzhen Lin (HKUST) Uncertainty 15 / 33

Reasoning with Joint Probability B E A J M Prob B E A J M Prob y y y y y.00001 n y y y y.0002 y y y y n.000025 n y y y n.0004 y y y n y.000025 n y y n y.0004 y y y n n.00000 n y y n n.0002 y y n y y.00001 n y n y y.0002 y y n y n.000015 n y n y n.0002 y y n n y.000015 n y n n y.0002 y y n n n.0000 n y n n n.0002 y n y y y.00001 n n y y y.0001 y n y y n.000025 n n y y n.0002 y n y n y.000025 n n y n y.0002 y n y n n.0000 n n y n n.0001 y n n y y.00001 n n n y y.0001 y n n y n.00001 n n n y n.0001 y n n n y.00001 n n n n y.0001 y n n n n.00000 n n n n n.996 Fangzhen Lin (HKUST) Uncertainty 16 / 33

Reasoning with Joint Probability P(B=y M=y)? Compute marginal probability: P(B, M) = P(B, E, A, J, M) E,A,J B M Prob y y.000115 y n.000075 n y.00015 n n.99971 P(B=y M=y) = = P(B=y, M=y) P(M=y).000115.000115 +.00015 = 0.43 Fangzhen Lin (HKUST) Uncertainty 17 / 33

Reasoning with Joint Probability Advantages: Clear semantics In theory, can perform arbitrary inference among the variables. Difficulties: Complexity In the Alarm example: 31 numbers needed, quite unnatural to assess: e.g. Many additions in inference. P(B = y, E = y, A = y, J = y, M = y) In general, P(X 1, X 2,..., X n ) needs at least 2 n 1 numbers to specify the joint probabilities. Knowledge acquisition (complex, unnatural) Storage Inference Fangzhen Lin (HKUST) Uncertainty 18 / 33

Reasoning with Joint Probability Solution: Use product rule: P(B, E, A, J, M) = P(B, E, A, J)P(M B, E, A, J) =... = P(B)P(E B)P(A B, E) P(J B, E, A)P(M B, E, A, J). Conditional independences: P(E B) = P(E). P(J B, E, A) = P(J A). P(M B, E, A, J) = P(M A). Thus P(B, E, A, J, M) = P(B)P(E)P(A B, E)P(J A)P(M A). Fangzhen Lin (HKUST) Uncertainty 19 / 33

Probabilities: B P(B) E P(E) A J P(J A) A M P(M A) y.001 y.002 y y.9 y y.7 n y.05 n y.01 B E A P(A B, E) B E A P(A B, E) y y y.95 n y y.29 y n y.94 n n y.001 Only 10 numbers are needed. These numbers are natural to assess. Fangzhen Lin (HKUST) Uncertainty 20 / 33

Belief Networks Draw an arc to each variable from each of its conditioning variables. Burglary P(B).001 Earthquake P(E).002 Alarm B T T F F E T F T F P(A).95.94.29.001 JohnCalls A T F P(J).90.05 MaryCalls A T F P(M).70.01 Attach to each variable its conditional probability table (CPT). Result: A belief network. Also know as Bayesian networks, Bayesian belief networks, probabilistic influence diagrams. Fangzhen Lin (HKUST) Uncertainty 21 / 33

Belief Networks A belief network is a directed acyclic graph (DAG): Nodes: variables Links: direct probabilistic dependencies between variables Node attributes: conditional probability tables (CPT) It is a compact graphical representation of the joint probability distribution of the random variables: The global joint probability is expressed as the product of local conditional probabilities: n P(X 1,..., X n ) = P(X i Parents(X i )) where Parent(X i ) is the set of parent nodes of X i in the network. Example: i=1 P(J M A B E) = P(J A)P(M A)P(A B E)P( B)P( E) = 0.90 0.70 0.001 0.999 0.998 = 0.00062 Fangzhen Lin (HKUST) Uncertainty 22 / 33

Belief Networks - Qualitative Level Qualitative level: Network structure Burglary Earthquake Alarm JohnCalls MaryCalls A depends on B and E. J and M are conditionally independent given the event A. Fangzhen Lin (HKUST) Uncertainty 23 / 33

Belief Networks - Numerical Level Numerical level: Conditional probabilities Burglary P(B).001 Earthquake P(E).002 Alarm B T T F F E T F T F P(A).95.94.29.001 JohnCalls A T F P(J).90.05 MaryCalls A T F P(M).70.01 The CPT for a Boolean variable with n Boolean parents contains 2 n rows. B and E have no parent nodes. The conditional probabilities thus degenerate to prior probabilities. Fangzhen Lin (HKUST) Uncertainty 24 / 33

Conditional Independence in BN BN expresses the conditional independence of a node and its predecessors, given its parents. A more general question is: given any nodes X and Y, are they independent given a set of nodes E? If X and Y are d-seperated by E, then they are conditionally independent given E. Fangzhen Lin (HKUST) Uncertainty 25 / 33

D-Seperation X and Y are d-seperated by E if for every undirected path from X to Y, there is a node Z on the path such that one of the following conditions holds: 1 Both path arrows lead in to Z, and neither Z nor any descendants of Z are in E. 2 Z is in E, and it is not the case that both path arrows lead in to Z. (1) X Z E Y (2) Z (3) Z Fangzhen Lin (HKUST) Uncertainty 26 / 33

An Example of Conditonal Independence Consider the following example BN: Battery Radio Ignition Gas Starts Moves Gas and Radio are conditionally independent given Ignition. Gas and Radio are conditionally independent given Battery. Gas and Radio are marginally independent (conditional independent given no evidence). Gas and Radio are not conditionally independent given Starts. Fangzhen Lin (HKUST) Uncertainty 27 / 33

Network Construction Procedure 1 Choose the set of relevant variables X i that describe the domain. 2 Choose an ordering for the variables. 3 While there are variables left: 1 Pick a variable X i and add the corresponding node to the network under construction. 2 Set Parents(X i ) to some minimal set of nodes already in the network such that the following conditional independence property holds: P(X i X 1,..., X i 1 ) = P(X i Parents(X i )) where Parents(X i ) {X 1,..., X i 1 }. 3 Define the CPT for X i. Fangzhen Lin (HKUST) Uncertainty 28 / 33

Node Ordering The resulting network structure depends on the order: Example 1 (left): in the order M, J, A, B, E Example 2 (right): in the order M, J, E, B, A MaryCalls MaryCalls JohnCalls JohnCalls Alarm Earthquake Burglary Burglary Earthquake Alarm Which order? Fangzhen Lin (HKUST) Uncertainty 29 / 33

Which Order? Some guidelines: 1 Naturalness of probability assessment. The order M, J, E, B, A is not good because, for instance, P(B J, M, E) is unnatural and hence difficult to assess directly. 2 Use causal relationships: causes come before their effects. The order M, J, E, B, A is not good because, for instance, M and J are effects of A but come before A. 3 Minimize number of arcs. The order M, J, E, B, A is bad because there are too many arcs. The order B, E, A, J, M is good. Fangzhen Lin (HKUST) Uncertainty 30 / 33

Causality and Bayesian Networks Arcs in a Bayesian network can usually (but not always) be interpreted as indicating cause-effect relationships. They are from causes to effects (causal network). Making use of cause-effect relationships in Bayesian network construction: Choose a set of variables that describes the domain. Draw an arc to a variable from each of its DIRECT causes. Assess the conditional probability of each node given its parents. Examples 1 Poole s example: tampering, fire, smoke, alarm, people-leaving. 2 Wet shoes wet-grass, rain, sprinkler, faulty-sprinkler, walking-on-grass, wet-shoe Fangzhen Lin (HKUST) Uncertainty 31 / 33

Inference Any node can serve as either a query variable or an evidence variable. Four inference types: Diagnostic inference: From effects to causes. Example: P(Burglary JohnCalls) Causal inference: From causes to effects. Example: P(JohnCalls Burglary) Intercausal inference: Between causes of a common effect. Example: P(Burglary Alarm, Earthquake) (Even though burglaries and earthquakes are independent, the presence of one makes the other less likely.) Mixed inference: Combining two or more of the above. Example: P(Alarm JohnCalls, Earthquake) (Simultaneous use of diagnostic and causal inferences) Example: P(Burglary JohnCalls, Earthquake) (Simultaneous use of diagnostic and intercausal inferences) Fangzhen Lin (HKUST) Uncertainty 32 / 33

Inference (cont d) Examples: Q E Q E E Q E Q E (Explaining Away) Diagnostic Causal Intercausal Mixed There exist efficient algorithms for all types of inference. Fangzhen Lin (HKUST) Uncertainty 33 / 33