Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

Size: px

Start display at page:

Download "Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008"

Patrick Gaines
5 years ago
Views:

1 Probabilistic Graphical Models (Cmput 651): Hybrid Network Matthew Brown 24/11/2008 Reading: Handout on Hybrid Networks (Ch. 13 from older version of Koller Friedman) 1

2 Space of topics Semantics Inference Learning Continuous Discrete Directed UnDirected 2

3 Outline Inference in purely continuous nets Hybrid network semantics Inference in hybrid networks 3

4 Linear Gaussian Bayesian networks (KF Definition 6.2.1) Definition: A linear Gaussian Bayesian network satisfies: all variables continuous all CPDs are linear Gaussians Example: A B C D P (A) = N (µ A, σa 2 ) P (B) = N (µ B, σb 2 ) P (C) = N (µ E C, σc 2 ) P (D A, B) = N (β D,0 + β D,1 A + β D,2 B, σd 2 ) P (E C, D) = N (β E,0 + β E,1 C + β E,2 D, σe 2 ) 4

5 Inference in linear Gaussian Bayes nets Recall: linear Gaussian Bayes nets (LGBN) equivalent to multivariate Gaussian distribution To marginalize, could convert LGBN to Gaussian marginalization trivial for Gaussian But ignores structure example LGBN: 3n 1 parameters Gaussian: n 2 +n parameters X 1 X 2... X n p(x i X i 1 ) = N (β i + α i X i 1 ; σ 2 i ) bad for large n, eg: >

6 Variable elimination Marginalize out unwanted X using integration rather than sum, as in discrete case Note: Variable elimination gives exact answers for continuous nets (not for hybrid nets) 6

7 Variable elimination example X 1 X 3 X 4 X 2 p(x 4 ) = = = P (X 1, X 2, X 3, X 4 ) X 1,X 2,X 3 P (X 1 )P (X 2 )P (X 3 X 1, X 2 )P (X 4 X 3 ) X 1,X 2,X 3 P (X 1 ) P (X 2 ) P (X 3 X 1, X 2 )P (X 4 X 3 ) X 1 X 2 X 3 Need a way to represent intermediate factors. Not Gaussian eg: conditional probabilities not (jointly) Gaussian Need elimination, product, etc. on this representation 7

8 Canonical forms (KF Handout Def n ) Definition: canonical form Also written 8

Canonical forms and Gaussians (KF Handout 13.2.

9 Canonical forms and Gaussians (KF Handout ) Canonical forms can represent Gaussians: So: 9

10 Canonical forms and Gaussians (KF Handout ) Canonical forms can represent Gaussians Other things (when K 1 not defined) eg: linear Gaussian CPDs Can also use conditional forms (multivariate linear Gaussian P(X Y) ) to represent linear Gaussian CPDs or Gaussians. 10

11 Operations on canonical forms (KF Handout ) Factor product: When scopes don t overlap, must extend them: Product of and 1st: similarly for product: 11

12 Operations on canonical forms (KF Handout ) Factor division (for belief update message passing) Note multiplying or dividing by vacuous canonical form C (0,0,0) has no effect. 12

13 Operations on canonical forms (KF Handout ) Marginalization given over set of variables {X,Y} want require K YY positive definite so that integral is finite marginal 13

Operations on canonical forms (KF Handout 13.2.

14 Operations on canonical forms (KF Handout ) Conditioning given over set of variables {X,Y} want to condition on Y=y > Notice: Y no longer part of canonical form after conditioning (unlike tables). 14

15 Inference on linear Gaussian Bayesian nets (KF Handout ) Factor operations simple, closed form > Variable elimination > Sum product message passing > Belief update message passing Note on conditioning: conditioned variables disappear from canonical form unlike with factor reduction on table factors > must restrict all factors relevant to inference based on 15 evidence Y=y before doing inference

16 Inference on linear Gaussian Bayesian nets (KF Handout ) Computational performance canonical form operations polynomial in factor scope size n product & division O(n 2 ) marginalization > matrix inversion O(n 3 ) > inference in LGBNs linear in # cliques cubic in max. clique size for discrete networks factor operations on table factors exponential in scope size 16

17 Inference on linear Gaussian Bayesian nets (KF Handout ) Computational performance (cont d) for low dimensionality (small # variables), Gaussian representation can be more efficient for high dimensionality and low tree width, message passing on LGBN much more efficient 17

18 Summary Inference on linear Gaussian Bayesian nets: use canonical forms variable elimination or clique tree calibration exact efficient 18

19 Outline Inference in purely continuous nets Hybrid network semantics Inference in hybrid networks 19

20 Hybrid networks (KF 5.5.1) Hybrid networks combine discrete and continuous variables 20

21 Conditional linear Gaussian (CLG) models (KF 5.1) Definition: Given: continuous variable X with discrete parents continuous parents X has a conditional linear Gaussian CPD if for each assignment coefficients and variance such that 21

22 Conditional linear Gaussian (CLG) models (KF 5.1) Definition: A Bayesian network is a conditional linear Gaussian network if: discrete nodes have only discrete parents continuous nodes have conditional linear Gaussian CPDs continuous parents cannot have discrete children. mixture (weighted average) of Gaussians weight = probability of discrete assignment 22

23 CLG example Country Weight is CLG with continuous parent height discrete parents country and gender Weight Gender Height p(w h, c, g) = N (β c,g,0 + β c,g,1 h; σ 2 c,g) 23

24 Discrete nodes with continuous parents Option 1 hard threshold: eg: continuous X > discrete Y Y = 0 if X < 3.4 and 1 otherwise hard threshold not differentiable no gradient learning hard threshold often not realistic Option 2 soft threshold: linear sigmoid (logistic) multivariate logit NOTE: Nonlinearity! 24

25 Linear sigmoid (Logistic or soft threshold) p(y = 1 x) = exp(θt x) 1 + exp(θ T x) P(Y=1 x) x 25

26 Multivariate logit Price Trade P(trade price) Price Eg: stock trading buy (red) hold (green) sell (blue) as function of stock price l buy = 3*(price 18) l hold = 1 l sell = 3*(price 22) 26

27 Discrete node with discrete & continuous parents Continuous parents input filtered through multivariate logit Assignment to discrete parents determines coefficients for logit 27

28 Example hybrid net Price Trade Strategy stock trade (discrete) = {buy, hold, sell} parents: price (continuous), strategy (discrete) = {1 or 2} P(trade price,strategy) Price strategy 1 (reddish) l buy = 3*(price 18) l hold = 1 l sell = 3*(price 22) strategy 2 (blue/green) l buy = 3*(price 16) l hold = 1 l sell = 1*(price 26) 28

29 Outline Inference in purely continuous nets Hybrid network semantics Inference in hybrid networks Issues Non linear dependencies in continuous nets Discrete & continuous nodes: CLGs General hybrid networks 29

30 Variable elimination example (Handout Example ) Discrete D 1... D n Continuous X 1... X n ( n ) n p(d 1... D n, X 1... X n ) = p(d i ) p(x 1 D 1 ) p(x i D i, X i 1 ) i=1 i=2 p(x 2 ) = p(d 1, D 2, X 1, X 2 ) D 1,D X 1 2 = p(d 1 )p(d 2 )p(x 1 D 1 )p(x 2 D 2, X 1 ) D 1,D X 1 2 = p(d 2 ) p(x 2 D 2, X 1 ) p(x 1 D 1 )p(d 1 ) D X 1 2 D 1 > simple in principal (but see next slide) 30

31 Difficulties with inference in hybrid nets 1. must restrict representation (i.e. factors) implicit in choice to use CLGs for example 2. marginalization difficult with arbitrary hybrid nets especially with non linear dependencies among nodes continuous parent > discrete node requires non linearity! 3. intermediate factors hard to represent / work with eg: mixture of Gaussians from conditional linear Gaussian (CLG) representation > approximation necessary with hybrid nets 31

32 Difficult marginalization (KF Handout Example ) Y X P (Y ) = N (0; 1) P (X) = N (Y 2 ; 1) X non linear in Y Joint Marginal p(x, y) = 1 Z exp( y2 (x y 2 ) 2 ) p(x) = y 1 Z exp( y2 (x y 2 ) 2 ) > No analytic (closed form) solution! 32

Variable elimination example (Handout Example 13

i ) Want P(X 2 ) P(X 1,X 2 ) is a mixture of four Gaussians, 1

33 Variable elimination example (Handout Example ) Discrete binary D 1... D n X 1 p(x 1 d 1 ) = N (β 1,d1 ; σ 2 1,d 1 ) X 2... X n p(x i d i, x i 1 ) = N (β i,di + α i,di x i 1 ; σ 2 i,d i ) Want P(X 2 ) P(X 1,X 2 ) is a mixture of four Gaussians, 1 / assignment to {D 1,D 2 }: Can show P(X 2 ) also a mixture of four Gaussians. not trivial to represent and work with 33

34 Discretization (KF Handout ) What about discretizing continuous variables? Usually no: typically need fine grained representation of continuous X i.e. large # bins especially where P(X) large need inference to find where P(X) large to discretize efficiently defeats the purpose > # bins usually excessively huge AND table factors suffer from curse of dimensionality exponential in Val(X) 34

35 Summary Inference in hybrid networks Difficulties with variable elimination from non linear dependencies > non Gaussian intermediate factors from mixing discrete & continuous variables > mixtures of Gaussians General approach = approximate difficult intermediate factors with Gaussians 35

36 Outline Inference in purely continuous nets Hybrid network semantics Inference in hybrid networks Issues Non linear dependencies in continuous nets Discrete & continuous nodes: CLGs General hybrid networks 36

37 Approximating intermediate factors in VE (KF Handout ) General approach: during variable elimination, when difficult intermediate factor encountered, approximate with Gaussian BUT Gaussians cannot represent: conditional distributions (CPDs) general (unnormalized) factors > must make sure to approximate only valid distributions with Gaussians eg: to eliminate X from P(X Y), must first multiply into a factor P(Y) to give p(x,y) > CPDs must be multiplied into factors in a topological ordering i.e. an ordering with parents always before children 37

38 Example (KF Handout Example ) Cliques: C 1 = {X,Y,Z}, C 2 = {Z,W} Want P(Z W=w 1 ) Variable elimination: Step 0: initialize all cliques to vacuous canonical form C(0,0,0) i.e. initial potentials not product of initial factors > C 1 s initial factors: P(X),P(Y),P(Z X,Y) 38

39 Example cont d (KF Handout Example ) Cliques: C 1 = {X,Y,Z}, C 2 = {Z,W} Want P(Z W=w 1 ) Variable elimination: Step 1: linearize P(X) i.e. approximate with Gaussian represent as canonical form then multiply into C 1 s potential (C(0,0,0) initially) Step 2: same for P(Y) could do P(Y) in step 1, then P(X) > C 1 s potential = ˆP (X, Y ) 39

40 Example cont d (KF Handout Example ) Cliques: C 1 = {X,Y,Z}, C 2 = {Z,W} Want P(Z W=w 1 ) Variable elimination: C 1 has ˆP (X, Y )P (Z X, Y ) Step 3: estimate ˆP (X, Y, Z) P (X, Y, Z) = P (X, Y )P (Z X, Y ) (represented as canonical form) ˆP (X, Y, Z) N eliminate X,Y: ˆP (Z) = ˆP (X, Y, Z) X,Y pass as message to C ˆP (Z) 2 Note: distribution 40

41 Example cont d (KF Handout Example ) Cliques: C 1 = {X,Y,Z}, C 2 = {Z,W} Want P(Z W=w 1 ) Variable elimination: C 2 has ˆP (Z)P (W Z) Step 4: estimate ˆP (W, Z) P (W, Z) = P (Z)P (W Z) (represented as canonical form) ˆP (W, Z) N Step 5: set W=w 1 pass message to C ˆP (W = w 1, Z) 1 (canonical form) Step 6: ˆP (Z W = w 1 ) = ˆP (W = w 1, Z) ˆP (Z) Note: distribution 41

42 Definition (KF Handout Def n ) Definition: A clique tree T with a root clique C r allows topological incorporation if for any variable X, the clique to which X s CPD is assigned is upstream to or equal to the cliques to which X s parents CPDs are assigned. 42

43 Approximating with Gaussians (KF Handout , ) Local approximations: Taylor series Numerical integration Global approximation 43

44 Outline Inference in purely continuous nets Hybrid network semantics Inference in hybrid networks Issues Non linear dependencies in continuous nets Discrete & continuous nodes: CLGs General hybrid networks 44

45 Inference in general hybrid nets (KF Handout ) NP hard even for polytrees mixture of exponentially many Gaussians (1 / assignment to discrete variables) eg: 2 n assignments for n binary variables even easiest case continuous nodes have at most one discrete binary parent i.e. mixture of at most two Gaussians even for easiest approximate inference on discrete binary nodes with relative error < 0.5 relative error = 0.5 is chance 45

46 Canonical tables (KF Handout Def n ) Definition: A canonical table ϕ over discrete D and continuous X has entries ϕ(d): one per assignment D=d entry ϕ(d) = canonical form C(X;K d,h d,g d ) Can represent: table factors linear Gaussians CLGs 46

47 Canonical table example discrete country, gender continuous height, weight Country Weight Gender Height Female Male Canada C(K Can,F,h Can,F,g Can,F ) C(K Can,M,h Can,M,g Can,M ) USA C(K USA,F,h USA,F,g USA,F ) C(K USA,M,h USA,M,g USA,M ) China C(K Chi,F,h Chi,F,g Chi,F ) C(K Chi,M,h Chi,M,g Chi,M ) India C(K Ind,F,h Ind,F,g Ind,F ) C(K Ind,M,h Ind,M,g Ind,M ) Germany C(K Ger,F,h Ger,F,g Ger,F ) C(K Ger,M,h Ger,M,g Ger,M ) 47

48 Operations on canonical tables (KF Handout ) Extensions of canonical form operations: Product Division Marginalization over continuous variables Marginalization over discrete variables > factor not necessarily representable with canonical table > approximate with Gaussians whenever marginalizing (in form of canonical table) (see next slide) 48

49 Marginalization example (KF Handout ) Binary D, continuous X Canonical table: Two Gaussians (blue, green) Red: sum (marginalization over D) > not Gaussian! cannot be represented by canonical table (see next slide) 49

Marginalization example cont d (KF Handout 13.4.

50 Marginalization example cont d (KF Handout ) Binary D, continuous X Canonical table: Two Gaussians (blue, green) Red: Gaussian approximation to sum over blue and green 50

51 Marginalization on canonical tables (KF Handout ) Weak marginalization approximate marginal as Gaussian necessary when marginalizing across mixture of Gaussians Note: canonical tables MUST represent valid mixture Strong marginalization exact marginalize over: marginalize out continuous variables only factor over discrete only identical canonical forms 51

Inference in hybrid nets (KF Handout 13.4.

52 Inference in hybrid nets (KF Handout ) Cannot marginalize discrete variables > must restrict elimination order KF Handout Example A,B,C discrete; X,Y,Z continuous possible clique tree: neither leaf clique can start message passing eg: {B,X,Y} has CPDs for P(B), P(Y B,X) but not P(X) > canonical form over {X,Y} = linear Gaussian CPDs, not Gaussians > cannot marginalize out B 52

53 Strong rooted clique trees Definition: A clique C r in a clique tree is a strong root if for each clique C 1 and its upstream neighbour C 2 C 1 C 2 {continuous variables} C 1 C 2 {discrete variables} In a strongly rooted clique tree, upward pass toward strong root does not require any weak marginalization. in downward pass, all required factors present for weak marginalization to proceed Example strongly rooted clique tree (from example on previous slide): middle clique = strong root 53

54 Strong root sometimes, exist non strongly rooted clique tree that still allow inference example (refer to example 2 slide previous) Also, issue of building strongly rooted trees see KF Handout

55 Outline Inference in purely continuous nets Hybrid network semantics Inference in hybrid networks Issues Non linear dependencies in continuous nets Discrete & continuous nodes: CLGs General hybrid networks 55

56 Inference in general hybrid nets (KF Handout ) Two issues: non linear dependencies intermediate factors > marginalization on canonical tables > non canonical tabular factor solution: approximate with Gaussians (in form of canonical tables) > applies to both issues, as discussed above > allows discrete nodes with continuous parents eg: can model thermostat 56

57 Approximate methods Above, discussed variable elimination based methods Also: particle based (KF Handout 13.5) global approximate methods 57

Stat 521A Lecture 18 1

Stat 521A Lecture 18 1 Outline Cts and discrete variables (14.1) Gaussian networks (14.2) Conditional Gaussian networks (14.3) Non-linear Gaussian networks (14.4) Sampling (14.5) 2 Hybrid networks A hybrid