Uncertainty and Bayesian Networks

Similar documents
Probabilistic Reasoning

Pengju XJTU 2016

Uncertainty. Chapter 13. Chapter 13 1

Bayesian networks. Chapter Chapter

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Uncertainty. Chapter 13

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Artificial Intelligence

Uncertainty. Chapter 13, Sections 1 6

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertain Knowledge and Reasoning

Uncertainty. Outline

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Uncertainty. Chapter 13

Artificial Intelligence Uncertainty

Bayesian networks. Chapter 14, Sections 1 4

Pengju

Quantifying uncertainty & Bayesian networks

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

CS 561: Artificial Intelligence

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Networks. Philipp Koehn. 6 April 2017

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic representation and reasoning

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Bayesian Networks. Philipp Koehn. 29 October 2015

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

PROBABILISTIC REASONING Outline

Chapter 13 Quantifying Uncertainty

Basic Probability and Decisions

Probabilistic Robotics

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Probabilistic representation and reasoning

Bayesian networks. Chapter Chapter

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Introduction to Artificial Intelligence Belief networks

CS Belief networks. Chapter

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian networks: Modeling

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Reasoning Under Uncertainty

PROBABILISTIC REASONING SYSTEMS

Where are we in CS 440?

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Web-Mining Agents Data Mining

Graphical Models - Part I

Inference in Bayesian networks

Reasoning Under Uncertainty

Foundations of Artificial Intelligence

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Y. Xiang, Inference with Uncertain Knowledge 1

Reasoning with Uncertainty. Chapter 13

Where are we in CS 440?

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Artificial Intelligence

Inference in Bayesian Networks

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Uncertainty (Chapter 13, Russell & Norvig)

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

14 PROBABILISTIC REASONING

Bayes Nets: Independence

Probabilistic Models. Models describe how (a portion of) the world works

Informatics 2D Reasoning and Agents Semester 2,

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Probabilistic Models

Course Overview. Summary. Outline

Uncertainty. CmpE 540 Principles of Artificial Intelligence Pınar Yolum Uncertainty. Sources of Uncertainty

CS 188: Artificial Intelligence. Our Status in CS188

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Artificial Intelligence Bayes Nets: Independence

Probabilistic Reasoning

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Our Status. We re done with Part I Search and Planning!

CSE 473: Artificial Intelligence Autumn 2011

Artificial Intelligence Methods. Inference in Bayesian networks

Artificial Intelligence Programming Probability

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Fall 2009

CS 343: Artificial Intelligence

Directed Graphical Models

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Uncertainty. Russell & Norvig Chapter 13.

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

CS 188: Artificial Intelligence. Bayes Nets

Transcription:

Uncertainty and Bayesian Networks Tutorial 3 Tutorial 3 1

Outline Uncertainty Probability Syntax and Semantics for Uncertainty Inference Independence and Bayes Rule Syntax and Semantics for Bayesian Networks Parameterized distributions Exact inference by enumeration Exact inference by variable elimination Tutorial 3 2

Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1) partial observability (road state, other drivers plans, etc.) 2) noisy sensors (KCBS traffic reports) 3) uncertainty in action outcomes (flat tire, etc.) 4) immense complexity of modelling and predicting traffic Hence a purely logical approach either 1) risks falsehood: A 25 will get me there on time or 2) leads to conclusions that are too weak for decision making: A 25 will get me there on time if there s no accident on the bridge and it doesn t rain and my tires remain intact etc etc. (A 1440 might reasonably be said to get me there on time but I d have to stay overnight in the airport...) Tutorial 3 3

Methods for handling uncertainty Probabilistic assertions summarize effects of laziness: failure to enumerate exceptions, qualifications, etc. ignorance: lack of relevant facts, initial conditions, etc. Subjective or Bayesian probability: Probabilities relate propositions to one s own state of knowledge e.g., P (A 25 no reported accidents) = 0.06 These are not claims of a probabilistic tendency in the current situation (but might be learned from past experience of similar situations) Probabilities of propositions change with new evidence: e.g., P (A 25 no reported accidents, 5 a.m.) = 0.15 (Analogous to logical entailment status KB = α, not truth.) Tutorial 3 4

Making decisions under uncertainty Suppose I believe the following: P (A 25 gets me there on time...) = 0.04 P (A 90 gets me there on time...) = 0.70 P (A 120 gets me there on time...) = 0.95 P (A 1440 gets me there on time...) = 0.9999 Which action to choose? Depends on my preferences for missing flight vs. airport cuisine, etc. Utility theory is used to represent and infer preferences Decision theory = utility theory + probability theory Tutorial 3 5

Probability basics Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space or probability model is a sample space with an assignment P (ω) for every ω Ω s.t. 0 P (ω) 1 Σ ω P (ω) = 1 e.g., P (1) = P (2) = P (3) = P (4) = P (5) = P (6) = 1/6. An event A is any subset of Ω P (A) = Σ {ω A} P (ω) E.g., P (die roll < 4) = P (1) + P (2) + P (3) = 1/6 + 1/6 + 1/6 = 1/2 Tutorial 3 6

Propositions Think of a proposition as the event (set of sample points) where the proposition is true Given Boolean random variables A and B: event a = set of sample points where A(ω) = true event a = set of sample points where A(ω) = false event a b = points where A(ω) = true and B(ω) = true Often in AI applications, the sample points are defined by the values of a set of random variables, i.e., the sample space is the Cartesian product of the ranges of the variables With Boolean variables, sample point = propositional logic model e.g., A = true, B = false, or a b. Proposition = disjunction of atomic events in which it is true e.g., (a b) ( a b) (a b) (a b) P (a b) = P ( a b) + P (a b) + P (a b) Tutorial 3 7

Syntax for propositions Propositional or Boolean random variables e.g., Cavity (do I have a cavity?) Cavity = true is a proposition, also written cavity Discrete random variables (finite or infinite) e.g., W eather is one of sunny, rain, cloudy, snow W eather = rain is a proposition Values must be exhaustive and mutually exclusive Continuous random variables (bounded or unbounded) e.g., T emp = 21.6; also allow, e.g., T emp < 22.0. Arbitrary Boolean combinations of basic propositions Tutorial 3 8

Prior probability Prior or unconditional probabilities of propositions e.g., P (Cavity = true) = 0.1 and P (W eather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(W eather) = 0.72, 0.1, 0.08, 0.1 (normalized, i.e., sums to 1) Joint probability distribution for a set of r.v.s gives the probability of every atomic event on those r.v.s (i.e., every sample point) P(W eather, Cavity) = a 4 2 matrix of values: W eather = sunny rain cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = f alse 0.576 0.08 0.064 0.08 Every question about a domain can be answered by the joint distribution because every event is a sum of sample points Tutorial 3 9

Definition of conditional probability: P (a b) = P (a b) P (b) Conditional probability if P (b) 0 Product rule gives an alternative formulation: P (a b) = P (a b)p (b) = P (b a)p (a) A general version holds for whole distributions, e.g., P(W eather, Cavity) = P(W eather Cavity)P(Cavity) (View as a 4 2 set of equations, not matrix mult.) Chain rule is derived by successive application of product rule: P(X 1,..., X n ) = P(X 1,..., X n 1 ) P(X n X 1,..., X n 1 ) = P(X 1,..., X n 2 ) P(X n 1 X 1,..., X n 2 ) P(X n X 1,..., X n 1 ) =... = Π n i = 1P(X i X 1,..., X i 1 ) Tutorial 3 10

Independence A and B are independent iff P(A B) = P(A) or P(B A) = P(B) or P(A, B) = P(A)P(B) P(T oothache, Catch, Cavity, W eather) = P(T oothache, Catch, Cavity)P(W eather) Catch is conditionally independent of T oothache given Cavity: P(Catch T oothache, Cavity) = P(Catch Cavity) In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. Tutorial 3 11

Bayes Rule Product rule P (a b) = P (a b)p (b) = P (b a)p (a) Bayes rule P (a b) = or in distribution form P(Y X) = P(X Y )P(Y ) P(X) P (b a)p (a) P (b) = αp(x Y )P(Y ) Useful for assessing diagnostic probability from causal probability: P (Cause Effect) = P (Effect Cause)P (Cause) P (Effect) E.g., let M be meningitis, S be stiff neck: P (m s) = P (s m)p (m) P (s) = 0.8 0.0001 0.1 = 0.0008 Note: posterior probability of meningitis still very small! Tutorial 3 12

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P(X i P arents(x i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values Tutorial 3 13

Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values B E Each row requires one number p for X i = true (the number for X i = false is just 1 p) J A M If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 1 = 31) Tutorial 3 14

Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P (x 1,..., x n ) = Π n i = 1P (x i parents(x i )) e.g., P (j m a b e) = P (j a)p (m a)p (a b, e)p ( b)p ( e) = 0.9 0.7 0.001 0.999 0.998 0.00063 B J A E M Tutorial 3 15

Global semantics Tutorial 3 16

Local semantics Local semantics: each node is conditionally independent of its nondescendants given its parents U 1... U m Z 1j X Z nj Y 1... Y n Theorem: Local semantics global semantics Tutorial 3 17

Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n Tutorial 3 18

Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Choose an ordering of variables X 1,..., X n 2. For i = 1 to n add X i to the network select parents from X 1,..., X i 1 such that P(X i P arents(x i )) = P(X i X 1,..., X i 1 ) This choice of parents guarantees the global semantics: P(X 1,..., X n ) = Π n i = 1P(X i X 1,..., X i 1 ) (chain rule) = Π n i = 1P(X i P arents(x i )) (by construction) Tutorial 3 19

Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: P(B j, m) = P(B, j, m)/p (j, m) = αp(b, j, m) = α Σ e Σ a P(B, e, a, j, m) B J A E M Rewrite full joint entries using product of CPT entries: P(B j, m) = α Σ e Σ a P(B)P (e)p(a B, e)p (j a)p (m a) = αp(b) Σ e P (e) Σ a P(a B, e)p (j a)p (m a) Recursive depth-first enumeration: O(n) space, O(d n ) time Tutorial 3 20

Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation P(B j, m) = α P(B) }{{} B Σ e P (e) }{{} E Σ a P(a B, e) }{{} A P (j a) }{{} J = αp(b)σ e P (e)σ a P(a B, e)p (j a)f M (a) = αp(b)σ e P (e)σ a P(a B, e)f J (a)f M (a) = αp(b)σ e P (e)σ a f A (a, b, e)f J (a)f M (a) = αp(b)σ e P (e)f ĀJM (b, e) (sum out A) = αp(b)f ĒĀJM (b) (sum out E) = αf B (b) f ĒĀJM (b) P (m a) }{{} M Tutorial 3 21

L L L L Complexity of exact inference Singly connected networks (or polytrees): any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(d k n) Multiply connected networks: can reduce 3SAT to exact inference NP-hard equivalent to counting 3SAT models #P-complete 0.5 0.5 0.5 0.5 A B C D 1. A v B v C 2. C v D v A 1 2 3 3. B v C v D AND Tutorial 3 22

Summary Probability is a rigorous formalism for uncertain knowledge Joint probability distribution specifies probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size Independence and conditional independence provide the tools Bayes nets provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for (non)experts to construct Canonical distributions (e.g., noisy-or) = compact representation of CPTs Continuous variables parameterized distributions (e.g., linear Gaussian) Tutorial 3 23

Exercises Show from first principles that P (a b a) = 1. Prove that any 3-SAT problem can be reduced to exact inference in a Bayesian network constructed to represent the particular problem and hence that exact inference is NP-hard. Tutorial 3 24

Answers 1. The first principles needed here are the definition of conditional probability, P (X Y ) = P (X Y )/P (Y ), and the definitions of the logical connectives. It is not enough to say that if B A is given then A must be true! From the definition of conditional probability, and the fact that A A A and that conjunction is commutative and associative, we have P (A B A) = P (A (B A)) P (B A) = P (B A) P (B A) = 1 2. Consider a SAT problem such as the following: ( A B) ( B C) ( C D) ( C D E) The idea is to encode this as a Bayes net, such that doing inference in the Bayes net gives the answer to the SAT problem. Tutorial 3 25

The figure shows the Bayes net corresponding to this SAT problem. The general construction method is as follows: The root nodes correspond to the logical variables of the SAT problem. They have a prior probability of 0.5. Each clause C i is a node. Its parents are the variables in the clause. The CPT is deterministic and implements the disjunction given in the clause. (Negative literals in the clause are indicated by negation symbols on the links in the figure.) Tutorial 3 26

A single sentence node S has all the clauses as parents and a CPT that implements deterministic conjunction. It is clear that P (S) > 0 iff the SAT problem is satisfiable. Hence, we have reduced SAT to Bayes net inference. Since SAT is NP-complete, we have shown that Bayes net inference is NP-hard (even without evidence). Tutorial 3 27