What Independencies does a Bayes Net Model? Bayesian Networks: Independencies and Inference. Quick proof that independence is symmetric

Similar documents
Bayesian Networks: Independencies and Inference

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Artificial Intelligence Bayesian Networks

Bayesian belief networks

Hidden Markov Model Cheat Sheet

Hidden Markov Models

Speech and Language Processing

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Calculation of time complexity (3%)

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Week 5: Neural Networks

Problem Set 9 Solutions

Learning with Maximum Likelihood

Lecture 4. Instructor: Haipeng Luo

NP-Completeness : Proofs

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Reasoning under Uncertainty

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Advanced Topics in Optimization. Piecewise Linear Approximation of a Nonlinear Function

6. Hamilton s Equations

Mixture of Gaussians Expectation Maximization (EM) Part 2

Naïve Bayes Classifier

Lecture 10 Support Vector Machines II

Lecture 9: Converse of Shannon s Capacity Theorem

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Generalized Linear Methods

} Often, when learning, we deal with uncertainty:

Clustering with Gaussian Mixtures

Algorithms for factoring

Classification Bayesian Classifiers

Web-Mining Agents Probabilistic Information Retrieval

Singular Value Decomposition: Theory and Applications

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

Question: Is there a BN that is a perfect map for a given MN?

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1

On the correction of the h-index for career length

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture 10: May 6, 2013

Hidden Markov Models

Lecture 4: November 17, Part 1 Single Buffer Management

Week 2. This week, we covered operations on sets and cardinality.

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Structure and Drive Paul A. Jensen Copyright July 20, 2003

The Bellman Equation

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

EM and Structure Learning

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 3 Describing Data Using Numerical Measures

1 Definition of Rademacher Complexity

( ) 2 ( ) ( ) Problem Set 4 Suggested Solutions. Problem 1

Independent Component Analysis

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

find (x): given element x, return the canonical element of the set containing x;

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Lecture 3 Stat102, Spring 2007

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Stat 543 Exam 2 Spring 2016

1 GSW Iterative Techniques for y = Ax

Errors for Linear Systems

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Stat 543 Exam 2 Spring 2016

1 Convex Optimization

Departure Process from a M/M/m/ Queue

Engineering Risk Benefit Analysis

6. Stochastic processes (2)

Pattern Classification

6. Stochastic processes (2)

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

P exp(tx) = 1 + t 2k M 2k. k N

Adsorption: A gas or gases from a mixture of gases or a liquid (or liquids) from a mixture of liquids is bound physically to the surface of a solid.

Chapter 5 Multilevel Models

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

4.3 Poisson Regression

Approximate Inference: Mean Field Methods

a b a In case b 0, a being divisible by b is the same as to say that

Analytical Chemistry Calibration Curve Handout

Spatial Statistics and Analysis Methods (for GEOG 104 class).

Source-Channel-Sink Some questions

Note on EM-training of IBM-model 1

One Dimension Again. Chapter Fourteen

Lecture 3: Probability Distributions

Lecture 3. Ax x i a i. i i

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Lecture 20: Noether s Theorem

4DVAR, according to the name, is a four-dimensional variational method.

The Expectation-Maximization Algorithm

Derivatives of Value at Risk and Expected Shortfall

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Assignment 2. Tyler Shendruk February 19, 2010

THE SUMMATION NOTATION Ʃ

Transcription:

Bayesan Networks: Indeendences and Inference Scott Daves and ndrew Moore Note to other teachers and users of these sldes. ndrew and Scott would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm or to modfy them to ft your own needs. oweront orgnals are avalable. If you make use of a sgnfcant orton of these sldes n your own lecture lease nclude ths message or the followng lnk to the source reostory of ndrew s tutorals: htt://www.cs.cmu.edu/~awm/tutorals. omments and correctons gratefully receved. What Indeendences does a Bayes Net Model? In order for a Bayesan network to model a robablty dstrbuton the followng must be true by defnton: ach varable s condtonally ndeendent of all ts nondescendants n the grah gven the value of all ts arents. Ths mles 1 K arents 1 But what else does t mly? n n What Indeendences does a Bayes Net Model? Quck roof that ndeendence s symmetrc xamle: Y Gven Y does learnng the value of tell us nothng new about? I.e. s Y equal to Y? Yes. Snce we know the value of all of s arents namely Y and s not a descendant of s condtonally ndeendent of. lso snce ndeendence s symmetrc Y Y. ssume: Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Bayes s Rule han Rule By ssumton Bayes s Rule What Indeendences does a Bayes Net Model? What Indeendences does a Bayes Net Model? Let I<Y> reresent and beng condtonally ndeendent gven Y. U Y I<Y>? Yes ust as n revous examle: ll s arents gven and s not a descendant. I<{U}>? No. I<{U}>? Yes. Maybe I< S > ff S acts a cutset between and n an undrected verson of the grah? 1

Thngs get a lttle more confusng The Burglar larm examle Burglar arthquake Y larm has no arents so we know all ts arents values trvally s not a descendant of So I<{}> even though there s a undrected ath from to through an unknown varable Y. What f we do know the value of Y though? Or one of ts descendants? hone all Your house has a twtchy burglar alarm that s also sometmes trggered by earthquakes. arth arguably doesn t care whether your house s currently beng burgled Whle you are on vacaton one of your neghbors calls and tells you your home s burglar alarm s rngng. Uh oh! Thngs get a lot more confusng d-searaton to the rescue Burglar larm arthquake Fortunately there s a relatvely smle algorthm for determnng whether two varables n a Bayesan network are condtonally ndeendent: d-searaton. hone all But now suose you learn that there was a medum-szed earthquake n your neghborhood. Oh whew! robably not a burglar after all. arthquake exlans away the hyothetcal burglar. But then t must not be the case that I<Burglar{hone all} arthquake> even though I<Burglar{} arthquake>! Defnton: and are d-searated by a set of evdence varables ff every undrected ath from to s blocked where a ath s blocked ff one or more of the followng condtons s true:... ath s blocked when... ath s blocked when the funky case There exsts a varable on the ath such that t s n the evdence set the arcs uttng n the ath are tal-to-tal Or there exsts a varable on the ath such that t s n the evdence set the arcs uttng n the ath are tal-to-head Or there exsts a varable on the ath such that t s NOT n the evdence set nether are any of ts descendants the arcs uttng on the ath are head-to-head Or... 2

d-searaton to the rescue cont d d-searaton examle Theorem [erma & earl 1998]: If a set of evdence varables d-searates and n a Bayesan network s grah then I< >. d-searaton can be comuted n lnear tme usng a deth-frst-search-lke algorthm. Great! We now have a fast algorthm for automatcally nferrng whether learnng the value of one varable mght gve us any addtonal hnts about some other varable gven what we already know. Mght : arables may actually be ndeendent when they re not d- searated deendng on the actual robabltes nvolved G I B D F H J I< {} D>? I< {} D>? I< { B} D>? I< { B J} D>? I< { B J} D>? Bayesan Network Inference Inference: calculatng Y for some varables or sets of varables and Y. Inference n Bayesan networks s #-hard! Inuts: ror robabltes of.5 Bayesan Network Inference But nference s stll tractable n some cases. Let s look a secal class of networks: trees / forests n whch each node has at most one arent. Reduces to I1 I2 I3 I4 I5 How many satsfyng assgnments? O O must be #sat. assgn.*.5^#nuts Decomosng the robabltes Decomosng the robabltes cont d Suose we want where s some set of evdence varables. Let s slt nto two arts: - s the art consstng of assgnments to varables n the subtree rooted at s the rest of t 3

4 Decomosng the robabltes cont d Decomosng the robabltes cont d Decomosng the robabltes cont d απ Where: α s a constant ndeendent of π - Usng the decomoston for nference We can use ths decomoston to do nference as follows. Frst comute - for all recursvely usng the leaves of the tree as the base case. If s a leaf: If s n : 1 f matches 0 otherwse If s not n : - s the null set so - 1 constant Quck asde: rtual evdence For theoretcal smlcty but wthout loss of generalty let s assume that all varables n the evdence set are leaves n the tree. Why can we do ths WLOG: Observe quvalent to Observe Where 1 f 0 otherwse alculatng for non-leaves Suose has one chld c. c

5 alculatng for non-leaves Suose has one chld c. c alculatng for non-leaves Suose has one chld c. c alculatng for non-leaves Suose has one chld c. c alculatng for non-leaves Now suose has a set of chldren. Snce d-searates each of ts subtrees the contrbuton of each subtree to s ndeendent: where s the contrbuton to - of the art of the evdence lyng n the subtree rooted at one of s chldren. We are now -hay So now we have a way to recursvely comute all the s startng from the root and usng the leaves as the base case. If we want we can thnk of each node n the network as an autonomous rocessor that asses a lttle message to ts arent. The other half of the roblem Remember απ. Now that we have all the s what about the π s? π. What about the root of the tree r? In that case r s the null set so π r r. No sweat. Snce we also know r we can comute the fnal r. So for an arbtrary wth arent let s nductvely assume we know π and/or. How do we get π?

6 omutng π π omutng π π omutng π π omutng π π omutng π π omutng π Where π s defned as π π

We re done. Yay! Thus we can comute all the π s and n turn all the s. an thnk of nodes as autonomous rocessors assng and π messages to ther neghbors π π onunctve queres What f we want e.g. B nstead of ust margnal dstrbutons and B? Just use chan rule: B B ach of the latter robabltes can be comuted usng the technque ust dscussed. π π π π olytrees Technque can be generalzed to olytrees: undrected versons of the grahs are stll trees but nodes can have more than one arent Dealng wth cycles an deal wth undrected cycles n grah by clusterng varables together B D B D ondtonng Set to 0 Set to 1 Jon trees rbtrary Bayesan network can be transformed va some evl grah-theoretc magc nto a on tree n whch a smlar method can be emloyed. B B BD BD D G DF F In the worst case the on tree nodes must take on exonentally many combnatons of values but often works well n ractce 7