Outline Syntax ayesian networks Semantics Parameterized distributions Chapter 4. 3 Chapter 4. 3 Chapter 4. 3 2 ayesian networks simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P(X i Parents(X i )) In the simplest case, conditional distribution represented as a conditional probability table (CP) giving the distribution over X i for each combination of parent values opology of network encodes conditional independence assertions: Weather Cavity oothache Catch W eather is independent of the other variables oothache and Catch are conditionally independent given Cavity Chapter 4. 3 3 Chapter 4. 3 4 contd. I m at work, neighbor ohn calls to say my alarm is ringing, but neighbor ary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? P(). P().2 Variables: urglar,,,, Network topology reflects causal knowledge: burglar can set the alarm off n earthquake can set the alarm off he alarm can cause ary to call he alarm can cause ohn to call P(,).95.94.29. P( ).9.5 P( ).7. Chapter 4. 3 5 Chapter 4. 3 6
Compactness Global semantics CPforooleanX i with k oolean parents has 2 k rows for the combinations of parent values Global semantics defines the full joint distribution as the product of the local conditional distributions: ach row requires one number p for X i = true (the number for X i = false is just p) If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers P (x,...,x n )=Π n i =P (x i parents(x i )) e.g., P (j m a b e) = I.e., grows linearly with n, vs.o(2 n ) for the full joint distribution or burglary net, ++4+2+2=numbers (vs. 2 5 =3) Chapter 4. 3 7 Chapter 4. 3 8 Global semantics Local semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: Local semantics: each node is conditionally independent of its nondescendants given its parents P (x,...,x n )=Π n i =P (x i parents(x i )) e.g., P (j m a b e) U U m = P (j a)p (m a)p (a b, e)p ( b)p ( e) =.9.7..999.998.63 Z j X Z nj Y Y n heorem: Local semantics global semantics Chapter 4. 3 9 Chapter 4. 3 arkov blanket ach node is conditionally independent of all others given its arkov blanket: parents + children + children s parents Constructing ayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics Z j U X U m Z nj. Choose an ordering of variables X,...,X n 2. or i =ton add X i to the network select parents from X,...,X i such that P(X i Parents(X i )) = P(X i X,...,X i ) his choice of parents guarantees the global semantics: Y Y n P(X,...,X n )=Π n i =P(X i X,...,X i ) (chain rule) = Π n i =P(X i Parents(X i )) (by construction) Chapter 4. 3 Chapter 4. 3 2
P ( ) =P ()? P (, ) =P ( )? P (, ) =P ()? Chapter 4. 3 3 Chapter 4. 3 4 P (, ) =P ( )? P (, ) =P ()? P (,, ) =P ( )? P (,, ) =P ()? P (, ) =P ( )? P (, ) =P ()? P (,, ) =P ( )? Yes P (,, ) =P ()? P (,,,) =P ( )? P (,,,) =P (, )? Chapter 4. 3 5 Chapter 4. 3 6 contd. P (, ) =P ( )? P (, ) =P ()? P (,, ) =P ( )? Yes P (,, ) =P ()? P (,,,) =P ( )? P (,,,) =P (, )? Yes Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) ssessing conditional probabilities is hard in noncausal directions Network is less compact: +2+4+2+4=3numbers needed Chapter 4. 3 7 Chapter 4. 3 8
: Car diagnosis Initial evidence: car won t start estable variables (green),, so fix it variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters age dead alternator no charging fanbelt ge Seniorrain : Car insurance GoodStudent Riskversion DrivingSkill DrivingHist DrivQuality akeodel Sociocon ntilock ileage VehicleYear xtracar irbag CarValue Homease ntiheft meter flat no oil no gas fuel line blocked starter Ruggedness ccident OwnDamage heft Cushioning OtherCost OwnCost lights oil light gas gauge car won t start dipstick edicalcost LiabilityCost PropertyCost Chapter 4. 3 9 Chapter 4. 3 2 Compact conditional distributions CP grows exponentially with number of parents CP becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(parents(x)) for some function f.g., oolean functions rthmerican Canadian US exican.g., numerical relationships among continuous variables Level = inflow + precipitation - outflow - evaporation t Compact conditional distributions contd. isy-or distributions model multiple noninteracting causes ) Parents U...U k include all causes (can add leak node) 2) Independent failure probability q i for each cause alone P (X U...U j, U j+... U k )= Π j i =q i Cold lu alaria P (ever) P ( ever)...9..8.2.98.2 =.2..4.6.94.6 =.6..88.2 =.6.2.988.2 =.6.2. Number of parameters linear in number of parents Chapter 4. 3 2 Chapter 4. 3 22 Hybrid (discrete+continuous) networks Discrete (Subsidy? and uys?); continuous (Harvest and Cost) Subsidy? Cost uys? Harvest Option : discretization possibly large errors, large CPs Option 2: finitely parameterized canonical families ) Continuous variable, discrete+continuous parents (e.g., Cost) 2) Discrete variable, continuous parents (e.g., uys?) Continuous child variables Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents ost common is the linear Gaussian model, e.g.,: P (Cost= c Harvest= h, Subsidy?=true) = N(a t h + b t,σ t )(c) = exp c (a t h + b t ) σ t 2π 2 ean Cost varies linearly with Harvest,varianceisfixed Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow σ t 2 Chapter 4. 3 23 Chapter 4. 3 24
Continuous child variables P(Cost Harvest,Subsidy?=true).35.3.25.2.5..5 5 Cost 5 Harvest ll-continuous network with LG distributions full joint distribution is a multivariate Gaussian Discrete variable w/ continuous parents Probability of uys? given Cost should be a soft threshold: P(uys?=false Cost=c).8.6.4.2 Discrete+continuous LG network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values 2 4 6 8 2 Cost c Probit distribution uses integral of Gaussian: Φ(x) = x N(, )(x)dx P (uys?=true Cost= c) =Φ(( c + μ)/σ) Chapter 4. 3 25 Chapter 4. 3 26 Why the probit?. It s sort of the right shape 2. Can view as hard threshold whose location is subject to noise Cost Cost ise uys? Discrete variable contd. Sigmoid (or logit) distribution also used in neural networks: P (uys?=true Cost= c) = +exp( 2 c+μ σ ) Sigmoid has similar shape to probit but much longer tails: P(uys?=false Cost=c).9.8.7.6.5.4.3.2. 2 4 6 8 2 Cost c Chapter 4. 3 27 Chapter 4. 3 28 Summary ayes nets provide a natural representation for (causally induced) conditional independence opology + CPs = compact representation of joint distribution Generally easy for (non)experts to construct Canonical distributions (e.g., noisy-or) = compact representation of CPs Continuous variables parameterized distributions (e.g., linear Gaussian) Chapter 4. 3 29