BAYESIAN NETWORKS AIMA2E CHAPTER (SOME TOPICS EXCLUDED) AIMA2e Chapter (some topics excluded) 1

Similar documents
Introduction to Artificial Intelligence Belief networks

Bayesian networks. Chapter Chapter

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian networks. Chapter Chapter

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Inference in Bayesian networks

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Bayesian Networks. Philipp Koehn. 6 April 2017

Bayesian networks: Modeling

Bayesian Networks. Philipp Koehn. 29 October 2015

CS Belief networks. Chapter

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

Artificial Intelligence Methods. Inference in Bayesian networks

Inference in Bayesian networks

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

Bayesian networks. Chapter 14, Sections 1 4

Course Overview. Summary. Outline

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Inference in Bayesian Networks

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Uncertainty and Bayesian Networks

Belief Networks for Probabilistic Inference

Bayesian networks. AIMA2e Chapter 14

Graphical Models - Part I

PROBABILISTIC REASONING SYSTEMS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Lecture 10: Bayesian Networks and Inference

Review: Bayesian learning and inference

Probabilistic Reasoning. (Mostly using Bayesian Networks)

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Graphical Models - Part II

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Artificial Intelligence

Belief networks Chapter 15.1í2. AIMA Slides cæstuart Russell and Peter Norvig, 1998 Chapter 15.1í2 1

14 PROBABILISTIC REASONING

Probabilistic representation and reasoning

Quantifying uncertainty & Bayesian networks

Probabilistic representation and reasoning

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayes Networks 6.872/HST.950

Artificial Intelligence Bayes Nets: Independence

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Informatics 2D Reasoning and Agents Semester 2,

Reasoning Under Uncertainty

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayes Nets: Independence

Another look at Bayesian. inference

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 5522: Artificial Intelligence II

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Belief Network

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Directed Graphical Models

Reasoning Under Uncertainty

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Quantifying Uncertainty

Artificial Intelligence Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Probabilistic Models. Models describe how (a portion of) the world works

School of EECS Washington State University. Artificial Intelligence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 188: Artificial Intelligence. Bayes Nets

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Directed Graphical Models or Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 188: Artificial Intelligence Spring Announcements

PROBABILISTIC REASONING Outline

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Product rule. Chain rule

Intelligent Systems (AI-2)

Bayesian networks (1) Lirong Xia

last two digits of your SID

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Bayesian Networks. Motivation

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Introduction to Artificial Intelligence. Unit # 11

CS 343: Artificial Intelligence

Probabilistic Models

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Introduction to Bayesian Networks

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Foundations of Artificial Intelligence

Graphical Models and Kernel Methods

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Artificial Intelligence Bayes Nets

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Transcription:

BAYESIAN NEWORKS AIMA2E HAPER 14.1 5 (SOME OPIS EXLUDED) AIMA2e hapter 14.1 5 (some topics excluded) 1

} Syntax } Semantics } Parameterized distributions } Inference Outline ffl Exact inference (enumeration, variable elimination) ffl Approximate inference (stochastic simulation) AIMA2e hapter 14.1 5 (some topics excluded) 2

P(X i jp arents(x i )) Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: Random Variables: a set of nodes, one per variable opology: a directed, acyclic graph ß (link directly influences ) Probabilities: a conditional distribution for each node given its parents: In the simplest case, conditional distribution represented as a conditional probability table (P) giving the distribution X over i for each combination of parent values AIMA2e hapter 14.1 5 (some topics excluded) 3

Example W eather is independent of the other variables oothache and atch are conditionally independent given avity P ( oothache; catch; avity; W eather) =? opology of network encodes conditional independence assertions: Weather avity oothache atch AIMA2e hapter 14.1 5 (some topics excluded) 4

Example W eather is independent of the other variables oothache and atch are conditionally independent given avity ( oothache; atch; avity; W eather) = P ( oothachejavity) P (atchjavity)p (avity)p (W eather) P opology of network encodes conditional independence assertions: Weather avity oothache atch AIMA2e hapter 14.1 5 (some topics excluded) 5

Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, Maryalls Johnalls, Network topology reflects causal knowledge: A burglar can set the alarm off An earthquake can set the alarm off he alarm can cause Mary to call he alarm can cause John to call AIMA2e hapter 14.1 5 (some topics excluded) 6

Example contd. Burglary Johnalls P(B).001 Alarm A P(J).05 Earthquake P(E).002 B E P(A).95.94.29.001 Maryalls A P(M).70.01 AIMA2e hapter 14.1 5 (some topics excluded) 7

ompactness A P X for Boolean k i with Boolean parents has rows for the combinations of parent values k 2 B E Each row requires one p number X for = true i (the number X for = false i is 1 just p) If each variable has no more k than parents, the complete network O(n 2 requires ) numbers k J A M I.e., grows linearly with n, vs.o(2 n ) for the full joint distribution or burglary net, 1 + 1 + 4 + 2 + 2=10numbers (vs. 2 5 1 = 31) AIMA2e hapter 14.1 5 (some topics excluded) 8

= Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: B E A P(X 1 ;:::;X n ) = Π n i =1P(X i jparents(x i )) e.g., P (j ^ m ^ a ^ :b ^ :e) J M AIMA2e hapter 14.1 5 (some topics excluded) 9

Global semantics = P (jja)p (mja)p (aj:b; :e)p (:b)p (:e) Global semantics defines the full joint distribution as the product of the local conditional distributions: B E A P(X 1 ;:::;X n ) = Π n i =1P(X i jparents(x i )) e.g., P (j ^ m ^ a ^ :b ^ :e) J M AIMA2e hapter 14.1 5 (some topics excluded) 10

Local semantics Local semantics: each node is conditionally independent of its nondescendants given its parents U 1... U m Z 1j X Z nj Y 1... Y n heorem: Local semantics, global semantics AIMA2e hapter 14.1 5 (some topics excluded) 11

Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n AIMA2e hapter 14.1 5 (some topics excluded) 12

onstructing Bayesian networks P(X i jp arents(x i )) = P(X i jx 1 ; :::; X i 1 ) Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. hoose an ordering of X variables ;:::;X n 1 2. i =1ton or X add i to the network select parents X from ;:::;X i 1 1 such that his choice of parents guarantees the global semantics: P(X 1 ;:::;X n ) = Π n i =1P(X i jx 1 ; :::; X i 1 ) (chain rule) = Π n i =1P(X i jparents(x i )) (by construction) AIMA2e hapter 14.1 5 (some topics excluded) 13

Suppose we choose the ordering M, J, A, B, E P (JjM ) = P (J)? Example Maryalls Johnalls AIMA2e hapter 14.1 5 (some topics excluded) 14

Suppose we choose the ordering M, J, A, B, E Example (JjM ) = P (J)? No P (AjJ;M) = P (AjJ)? P (AjJ;M) = P (A)? P Maryalls Alarm Johnalls AIMA2e hapter 14.1 5 (some topics excluded) 15

Suppose we choose the ordering M, J, A, B, E Example Maryalls Johnalls Alarm Burglary (JjM ) = P (J)? No P (AjJ;M) = P (AjJ)? P (AjJ;M) = P (A)? No P (BjA; J;M) = P (BjA)? P P (BjA; J;M) = P (B)? AIMA2e hapter 14.1 5 (some topics excluded) 16

Suppose we choose the ordering M, J, A, B, E Example Maryalls Johnalls Alarm Burglary Earthquake (JjM ) = P (J)? No P (AjJ;M) = P (AjJ)? P (AjJ;M) = P (A)? No P (BjA; J;M) = P (BjA)? Yes P (BjA; J;M) = P (B)? No P (EjB; A; J;M) = P (EjA)? P (EjB; A; J;M) = P (EjA; B)? P AIMA2e hapter 14.1 5 (some topics excluded) 17

Suppose we choose the ordering M, J, A, B, E Example Maryalls Johnalls Alarm Burglary Earthquake (JjM ) = P (J)? No P (AjJ;M) = P (AjJ)? P (AjJ;M) = P (A)? No P (BjA; J;M) = P (BjA)? Yes P (BjA; J;M) = P (B)? No P (EjB; A; J;M) = P (EjA)? No P (EjB; A; J;M) = P (EjA; B)? Yes P AIMA2e hapter 14.1 5 (some topics excluded) 18

Example contd. Maryalls Johnalls Alarm Burglary Earthquake Deciding conditional independence is hard in noncausal directions (ausal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in noncausal directions AIMA2e hapter 14.1 5 (some topics excluded) 19

Network is less compact: 1 + 2 + 4 + 2 + 4=13numbers needed AIMA2e hapter 14.1 5 (some topics excluded) 20

Example: ar diagnosis Initial evidence: car won t start estable variables (green), broken, so fix it variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters battery age alternator broken fanbelt broken battery dead no charging battery meter battery flat no oil no gas fuel line blocked starter broken lights oil light gas gauge car won t start dipstick AIMA2e hapter 14.1 5 (some topics excluded) 21

Example: ar insurance SocioEcon Age GoodStudent Mileage RiskAversion Seniorrain Extraar VehicleYear DrivingSkill MakeModel DrivingHist DrivQuality Antilock Airbag arvalue HomeBase Antiheft Ruggedness Accident heft OwnDamage ushioning Otherost Ownost Medicalost Liabilityost Propertyost AIMA2e hapter 14.1 5 (some topics excluded) 22

ompact conditional distributions X = f (P arents(x)) for some function f N orthamerican, anadian _ U S _ M exican @Level @t P grows exponentially with no. of parents P becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: E.g., Boolean functions E.g., numerical relationships among continuous variables = inflow + precipitation - outflow - evaporation AIMA2e hapter 14.1 5 (some topics excluded) 23

ompact conditional distributions contd. ) P (XjU 1 :::U j ; :U j+1 ::::U k ) = 1 Π j i =1q i Noisy-OR distributions model multiple noninteracting causes 1) U Parents :::U k 1 include all causes (can add leak node) 2) Independent failure q probability i for each cause alone lu Malaria P (ever) P (:ever) old 1:0 0.0 0:9 0.1 0:8 0.2 0:98 0:02 = 0:2 0:1 0:4 0.6 0:94 0:06 = 0:6 0:1 0:88 0:12 = 0:6 0:2 0:988 0:012 = 0:6 0:2 0:1 Number of parameters linear in number of parents AIMA2e hapter 14.1 5 (some topics excluded) 24

ontinuous nodes Networks may have discrete RVs, continuous RVs, or a mix of the two. All continuous (e.g., conditional Gaussian). Linear dynamic sysyems (Kalman filter). Discrete parents, continuous children (e.g., conditional Gaussian). Gaussian mixture models. ontinuous parents, discrete children (e.g., probit and logit functions). Difficult to deal with. AIMA2e hapter 14.1 5 (some topics excluded) 25

Inference tasks Simple queries: compute posterior marginal P(X i je = e) e.g., P (NoGasjGauge = empty; Lights = on; Starts = false) onjunctive P(X queries: ;X j i = e) = P(X je i = e)p(x je jx i ; j = e) E Optimal decisions: decision networks include utility information; probabilistic inference required P (outcomejaction; evidence) for Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? AIMA2e hapter 14.1 5 (some topics excluded) 26

m) P(Bjj; P(B;j;m)=P (j; m) = = ffp(b; j; m) = ff± e ± a P(B; e; a; j; m) Inference by enumeration m) P(Bjj; ff± e ± a P(B)P (e)p(ajb;e)p (jja)p (mja) = = ffp(b)± e P (e)± a P(ajB;e)P (jja)p (mja) Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: B E A J M Rewrite full joint entries using product of P entries: Recursive depth-first enumeration: O(n) space, O(d n ) time AIMA2e hapter 14.1 5 (some topics excluded) 27

Enumeration algorithm function ENUMERAION-ASK(X, e, bn) returns a distribution over X inputs: X, the query variable e, observed values for variables E bn, a Bayesian network with fxg [ variables [ E Y ) ψ a distribution over X, initially empty Q(X for each xi value of X do extend e with xi value for X ψ ENUMERAE-ALL(VARS[bn], e) Q(xi) return NORMALIZE(Q(X)) function ENUMERAE-ALL(vars, e) returns a real number if EMPY?(vars) then return 1.0 ψ Y IRS(vars) if Y has value y in e then P (y j P a(y )) return ENUMERAE-ALL(RES(vars), e) else return y P (y j Pa(Y )) ENUMERAE-ALL(RES(vars), ey) where P is e extended with Y = y ey AIMA2e hapter 14.1 5 (some topics excluded) 28

Evaluation tree Enumeration is inefficient: repeated computation e.g., computes P (jja)p (mja) for each value of e P(b).001 P(e).002 P( e).998 P(a b,e) P( a b,e) P(a b, e) P( a b, e).95.05.94.06 P(j a) P(j a) P(j a).05 P(j a).05 P(m a) P(m a).70.01 P(m a) P(m a).70.01 AIMA2e hapter 14.1 5 (some topics excluded) 29

m) P(Bjj; ff P(B) = Inference by variable elimination z } B P (e) z } ±e E P(ajB;e) z } ±a A (jja) z } P P (mja) z } M J = ffp(b)± e P (e)± a P(ajB;e)P (jja)f M (a) = ffp(b)± e P (e)± a P(ajB;e)f J (a)f M (a) = fff B (b) f μ E μ AJM (b) Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation ffp(b)±ep (e)±afa(a; b; e)fj(a)fm (a) = ffp(b)± e P (e)f μ AJM (b; e) (sum out A) = ffp(b)f μ E μ AJM (b) (sum out E) = AIMA2e hapter 14.1 5 (some topics excluded) 30

Variable elimination: Basic operations E.g., f 1 (a; b) f 2 (b; c) = f (a; b; c) Summing out a variable from a product of factors: move any constant factors outside the summation add up submatrices in pointwise product of remaining factors ±xf 1 f k = f 1 f i ± x f i+1 f k = f 1 f i f μ X assuming f 1 ;:::;f i do not depend on X Pointwise product of factors f 1 and f 2 : 1 (x 1 ;:::;x j ;y 1 ;:::;y k ) f 2 (y 1 ;:::;y k ;z 1 ;:::;z l ) f f (x = ;:::;xj;y 1 ;:::;yk;z 1 ;:::;zl) 1 AIMA2e hapter 14.1 5 (some topics excluded) 31

Variable elimination algorithm function ELIMINAION-ASK(X, e, bn) returns a distribution over X inputs: X, the query variable e, evidence specified as an event bn, a belief network specifying joint P(X distribution ;:::;Xn) 1 ψ factors []; ψ vars REVERSE(VARS[bn]) for each var in vars do ψ [MAKE-AOR(var; e)jfactors] factors if var is a hidden variable then ψ factors SUM-OU(var, factors) return NORMALIZE(POINWISE-PRODU(factors)) AIMA2e hapter 14.1 5 (some topics excluded) 32

Irrelevant variables P (Jjb) = ffp (b) X e P (e) X a P (ajb; e)p (Jja) X m P (mja) hm 1: Y is irrelevant unless Y 2 Ancestors(fXg[E) Ancestors(fXg[E) = falarm; Earthquakeg onsider the query P (JohnallsjBurglary = true) B E A Sum over m is identically 1; M is irrelevant to the query J M Here, X = Johnalls, E = fburglaryg, and so M is irrelevant AIMA2e hapter 14.1 5 (some topics excluded) 33

omplexity of exact inference Singly connected networks (or polytrees): any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(d k n) Multiply connected networks: can reduce 3SA to exact ) inference NP-hard equivalent to counting 3SA ) models #P-complete AIMA2e hapter 14.1 5 (some topics excluded) 34

Inference by stochastic simulation Basic idea: 1) N Draw samples from a sampling S distribution 2) ompute an approximate posterior probability ^P 3) Show this converges to the true probability P 0.5 Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples oin AIMA2e hapter 14.1 5 (some topics excluded) 35

Sampling from an empty network xi ψ a random sample from P(Xi j Parents(Xi)) function PRIOR-SAMPLE(bn) returns an event sampled from bn inputs: bn, a belief network specifying joint distribution P(X 1 ;:::;Xn) ψ x an event n with elements i = 1 for n to do return x AIMA2e hapter 14.1 5 (some topics excluded) 36

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 37

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 38

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 39

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 40

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 41

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 42

Example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 43

x1;:::;xn Sampling from an empty network contd. N!1 ^P (x 1 ;:::;xn) = lim lim NPS(x 1 ;:::;xn)=n N!1 = SPS(x 1 ;:::;xn) = P (x 1 :::x n ) Shorthand: ^P (x 1 ;:::;x n ) ß P (x 1 :::x n ) Probability that PRIORSAMPLE generates a particular event i.e., the true prior probability SPS(x 1 :::x n ) = Π n i =1 P (x ijparents(x i )) = P (x 1 :::x n ) S E.g., (t; f; t; t) = 0:5 0:9 0:8 0:9 = 0:324 = P (t; f; t; t) PS N Let (x 1 :::x n ) PS be the number of samples generated for event hen we have hat is, estimates derived from PRIORSAMPLE are consistent AIMA2e hapter 14.1 5 (some topics excluded) 44

^P(RainjSprinkler = true) = NORMALIZE(h8; 19i) = h0:296; 0:704i Rejection sampling ^P(Xje) estimated from samples agreeing with e function REJEION-SAMPLING(X, e, bn, N) returns an estimate P (Xje) of local variables: N, a vector of counts over X, initially zero for =1toN j do ψ x PRIOR-SAMPLE(bn) if x is consistent with e then ψ N[x] N[x]+1 where x is the value of X in x return NORMALIZE(N[X]) E.g., estimate P(RainjSprinkler = true) using 100 samples 27 samples have Sprinkler = true Of these, 8 have Rain = true and 19 have Rain = false. Similar to a basic real-world empirical estimation procedure AIMA2e hapter 14.1 5 (some topics excluded) 45

of rejection sampling Analysis = ffnps(x; e) (algorithm defn.) ^P(Xje) N PS (X; e)=n PS (e) (normalized by N PS (e)) = P(X; e)=p (e) (property of PRIORSAMPLE) ß P(Xje) (defn. of conditional probability) = Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P (e) is small P (e) drops off exponentially with number of evidence variables! AIMA2e hapter 14.1 5 (some topics excluded) 46

Likelihood weighting Idea: fix evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence function LIKELIHOOD-WEIGHING(X, e, bn, N) returns an estimate P (Xje) of local variables: W, a vector of weighted counts over X, initially zero for j =1toN do x, w ψ WEIGHED-SAMPLE(bn) ] ψ W[x ]+w where x is the value of X in x W[x NORMALIZE(W[X return ]) function WEIGHED-SAMPLE(bn, e) returns an event and a weight ψ x an event n with elements; ψ w 1 for i n =1to do Xi if has a xi value in e then ψ w P (Xi = xi j Parents(Xi)) w xi ψ else a random sample P(Xi j Parents(Xi)) from return x, w AIMA2e hapter 14.1 5 (some topics excluded) 47

w = 1:0 Likelihood weighting example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 48

w = 1:0 Likelihood weighting example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 49

w = 1:0 Likelihood weighting example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 50

w = 1:0 0:1 Likelihood weighting example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 51

w = 1:0 0:1 Likelihood weighting example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 52

w = 1:0 0:1 Likelihood weighting example P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 53

Likelihood weighting example w = 1:0 0:1 0:99 = 0:099 P(S ).10 Sprinkler P() loudy Wet Grass S R P(W S,R).99.01 Rain P(R ).80.20 AIMA2e hapter 14.1 5 (some topics excluded) 54

Likelihood weighting analysis w(z; e) = Π m i =1 P (e ijp arents(e i )) SWS(z; e)w(z; e) Sampling probability for WEIGHEDSAMPLE is Note: pays attention to evidence in ancestors only SWS(z; e) = Π l i =1P (z i jp arents(z i )) loudy somewhere in between prior and ) posterior distribution Sprinkler Rain Weight for a given sample z; e is Wet Grass Weighted sampling probability is = Π l i =1P (z i jp arents(z i )) Π m i =1P (e i jp arents(e i )) = P (z; e) (by standard global semantics of network) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight AIMA2e hapter 14.1 5 (some topics excluded) 55