Bayesian Networks Representation

Size: px

Start display at page:

Download "Bayesian Networks Representation"

Kevin Davidson
5 years ago
Views:

1 3/8/2017 ayesian Networks Representation Emily ox University of Washington March 6, 2017 Learning from structured data 2 1

TrueSkill: ayesian Skill Rating System erbrich et al.

difference 3 IU Monitoring MinVolset isconnect VentMach Pulm Embolus Intubation

Ventlv Stroke Volume TPR SO2 PVST Insuff nesth rtco2 Lvailure O atechol Expo2

2 TrueSkill: ayesian Skill Rating System erbrich et al., 2007 Skill Player performance Team performance Observed team performance difference 3 IU Monitoring MinVolset isconnect VentMach Pulm Embolus Intubation VentTube Kinked Tube PP Shunt IO2 Press VentLung ypo Volemia naphy Laxis MinVol Ventlv Stroke Volume TPR SO2 PVST Insuff nesth rtco2 Lvailure O atechol Expo2 istory Lved Volume Errlow Output R Errauter VP PWP P RP rekg RST einlich et al., 1989 leks, Russell, et al.,

3 igging in: Learning with and without context/structure Without context: andwriting recognition haracter recognition, e.g., kernel SVMs a c z r r c c b r r r r 6 3

4 Without context: Webpage classification ompany website University website Personal website 7 With context: andwriting recognition 4

5 With context: Webpage classification 9 Modeling structured relationships via ayesian networks 5

6 Today ayesian networks Provided a huge advancement in I/ML Generalizes naïve ayes and logistic regression ompact representation for exponentially-large probability distributions Exploit conditional independencies 11 ayesian network representation ompact representation of a probability distribution. irected cyclic Graph Vertices: Random Variables Edges: onditional dependencies probabilistic relationships 12 6

7 ayesian network probability factorization P() P() One PT (conditional probability table) for each variable P(variable parents of variable) P(,) implies the factorization: P( ) P(,,,) = P() P() P(,) P( ) 13 What a ayesian network represents (in detail) and what does it buy you? 7

8 ausal structure Suppose we know the following: - The flu causes sinus inflammation - llergies cause sinus inflammation - Sinus inflammation causes a runny nose - Sinus inflammation causes headaches ow are these connected? 15 Possible queries lu llergy Inference Sinus Most probable explanation eadache Nose ctive data collection 16 8

9 arstarts? ayesian network 18 binary attributes Inference - P(atteryge Starts=f) 2 16 terms, why so fast? Not impressed? - ailinder N more than 3 54 = terms 17 actored joint distribution preview lu llergy Sinus eadache Nose 9

10 What are these probabilities? onditional probability tables (PTs) lu llergy Sinus Nose Number of parameters lu llergy Sinus eadache eadache Nose 10

11 actorization speeds up inference lu llergy Exploit distributivity: Sinus Nose Key: Independence assumptions lu llergy Sinus eadache eadache Nose Knowing sinus separates variables from each other 11

12 Marginal and conditional independence (Marginal) Independence S lu and llergy are (marginally) independent N lu = t lu = f llergy = t llergy = f llergy = t lu = t lu = f llergy = f 24 12

13 onditional independence S lu and eadache are not (marginally) ind. N lu and eadache are independent given Sinus infection More generally: 25 onditional independence statements encoded by ayesian networks 13

14 What is a ayes net assuming? Local Markov ssumption: variable X is independent of its non-descendents given its parents E, E, E E G I llows you to read off some simple conditional independence relationships J 27 Explaining away example Local Markov ssumption: variable X is independent of its non-descendents given its parents lu llergy Sinus eadache Nose 28 14

15 Naïve ayes revisited Local Markov ssumption: variable X is independent of its non-descendents given its parents actorization of the joint distribution 15

16 Joint distribution lu llergy Sinus eadache Nose Why can we decompose? Markov ssumption! 31 The chain rule of probabilities P(,) = P()P( ) lu Sinus More generally: - P(X[1],,X[d]) = P(X[1]) P(X[2] X[1]) P(X[d] X[1],, X[d-1]) 32 16

17 hain rule & joint distribution Local Markov ssumption: variable X is independent of its non-descendents given its parents lu llergy Sinus eadache Nose Order of expansion matters! Use topological order The Representation Theorem Joint distribution to N ayes net Encodes independence assumptions If cond. ind. in ayes net are subset of cond. ind. in P Joint distribution factorizes: 34 17

18 ayesian networks recap Representation benefits - ompact representation for probability distributions - Exponential reduction in number of parameters - Lower variance parameter estimates from limited data Inference benefits - Efficient computation of P(X e) (i.e., fast probabilistic inference) - Involves variable elimination algorithms Other important topics - Structure learning: What graph structure to use? - Understanding how evidence can be incorporated and how this changes conditional independence statements (d separation) E G I J 35 idden Markov models: ayesian network for time series 18

19 Example: Motion apture Segmentation Jumping jacks Run Squats Side twists 37 idden Markov model Tutorial: Rabiner, Proc. IEEE 1989 Markov transition dynamics: Jumping jacks jumping jacks Squats squats Side twists side twists State sequence TIME jumping side squats twists jacks STTE 38 19

20 idden Markov model Tutorial: Rabiner, Proc. IEEE 1989 Markov transition dynamics: Jumping jacks jumping jacks Squats squats Side twists side twists onditionally independent emissions: State sequence Joint distribution factorization: Observations (e.g., body position) Latent Markov chain structure enables Efficient computation of marginals via forward-backward alg. Most-probable sequence via Viterbi Parameter learning via aum-welch (EM for MMs) 39 GMMs vs. MMs Gaussian mixture model idden Markov model True Observations mode sequence 40 20

21 3/8/2017 MM applications Example applications: 41 Parsing EEG recordings iscovering behaviors in videos Speech segmentation Volatility regimes in financial time series Genomics Incorporating evidence: ayes ball algorithm for analyzing conditional independencies 21

22 onditional independence in ayes nets onsider 4 different junction configurations x y z x y z x y z x y z x y z x y z x y z x y z (a) (b) (c) (d) onditional versus unconditional independence: 43 ayes ball algorithm onsider 4 different junction configurations x y z x y z x y z x y z x y z x y z x y z x y z (a) (b) (c) (d) ayes ball algorithm: 44 22

23 ayes ball example path from to is ctive if the ayes ball can get from to E G 45 ayes ball example path from to is ctive if the ayes ball can get from to E G 46 23

24 ayes ball example path from to is ctive if the ayes ball can get from to E G 47 ayes ball example path from to is ctive if the ayes ball can get from to E G 48 24

25 ayes ball example path from to is ctive if the ayes ball can get from to E G V structure. not observed. all bounces away. 49 ayes ball example path from to is ctive if the ayes ball can get from to E G 50 25

26 ayes ball example path from to is ctive if the ayes ball can get from to E G V structure. observed. all can pass through 51 ayes ball example path from to is ctive if the ayes ball can get from to E G all gets stuck here 52 26

27 ayes ball example path from to is ctive if the ayes ball can get from to E G 53 ayes ball example path from to is ctive if the ayes ball can get from to E G V structure. escendent of observed. all can pass through 54 27

28 ayes ball example path from to is ctive if the ayes ball can get from to E G 55 28

Bayesian Networks Representation

Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r