CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Size: px

Start display at page:

Download "CMPE 58K Bayesian Statistics and Machine Learning Lecture 5"

Patrick Berry
5 years ago
Views:

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi

1 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey Instructor: A. Taylan Cemgil Fall 2009 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul

2 Multivariate Bernoulli A probability table with binary states px 1, x 2 x 2 0 x 2 1 x 1 0 p 00 p 01 x 1 1 p 10 p 11 px 1, x 2 p 1 x 11 x 2 00 p x 11 x 2 10 p 1 x 1x 2 01 p x 1x 2 11 exp 1 x 1 x 2 + x 1 x 2 log p 00 + x 1 x 1 x 2 log p 10 +x 2 x 1 x 2 log p 01 + x 1 x 2 log p 11 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 1

3 Multivariate Bernoulli Independent case px 1, x 2 px 1 px 2 px 1, x 2 x 2 0 x 2 1 x 1 0 p 0 q 0 p 0 q 1 x 1 1 p 1 q 0 p 1 q 1 px 1, x 2 exp exp log p 0 q 0 + x 1 log p 1q 0 + x 2 log p 0q 1 + x 1 x 2 log p 0q 0 p 1 q 1 p 0 q 0 p 0 q 0 p 1 q 0 p 0 q 1 log p 0 q 0 + x 1 log p 1 p 0 + x 2 log q 1 q 0 + x 1 x 2 log 1 The coupling parameter is zero Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 2

4 Multivariate Bernoulli Canonical parameters px 1, x 2 exp g + h 1 x 1 + h 2 x 2 + θ 1,2 x 1 x 2 h 1 log p 10 p 00 h 2 log p 01 p 00 θ 1,2 log p 00p 11 p 10 p 01 h i : threshold θ i,j : pairwise coupling p are the moment parameters Coupling parameters are harder to interpret but provide direct information about the conditional independence structure Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 3

5 Example x i {0, 1} px 1, x 2 exp 2x 1 0.5x 2 3x 1 x 2 qx 1, x 2 exp 2x 1 0.5x x 1 x 2 Are x 1 and x 2 independent, given p or given q? What is px 1 qx 1? Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 4

6 For two events A and B Odds and the Cross Product Ratio cpt oddsa pa pa c oddsa B pa B pa c B cpra, B pa BpAc B c pa c BpA B c oddsa B/oddsA B c When the events A and B are independent, we have pa B pa cpra, B oddsa/oddsa 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 5

7 Odds and the Cross Product Ratio cpt Univariate Bernoulli distribution is parametrised by the event A : x 1 1 px 1 exp log p 0 + x 1 log p 1 p 0 exp x 1 log oddsx 1 1 Bivariate Bernoulli distribution px 1, x 2 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 exp x 1 log oddsx 1 1 x x 2 log oddsx 2 1 x 1 0 +x 1 x 2 log cprx 1 1, x 2 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 6

8 Multivariate Bernoulli For trivariate case, we have pairwise and triple interactions px 1, x 2, x 3 p 1 x 11 x 2 1 x p x 11 x 2 1 x p 1 x 1x 2 1 x p x 1x 2 1 x p 1 x 11 x 2 x p x 11 x 2 x p 1 x 1x 2 x p x 1x 2 x log px log p log p 100 p 000 x 1 + log p 010 p 000 x 2 + log p 001 p 000 x 3 + log p 000p 110 p 010 p 100 x 1 x 2 + log p 000p 101 p 001 p 100 x 1 x 3 + log p 000p 011 p 001 p 010 x 2 x 3 log p 001p 010 p 100 p 111 p 000 p 011 p 101 p 110 x 1 x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 7

9 log px log p log oddsx 1 1 x 2 0, x 3 0x 1 + log oddsx 2 1 x 1 0, x 3 0x 2 + log oddsx 3 1 x 1 0, x 2 0x 3 + log cprx 1 1, x 2 1 x 3 0x 1 x 2 + log cprx 1 1, x 3 1 x 2 0x 1 x 3 + log cprx 2 1, x 3 1 x 1 0x 2 x 3 + log cprx 1 1, x 2 1 x 3 0 cprx 1 1, x 2 1 x 3 1 x 1x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 8

10 Canonical Parameters versus Moment Parameters Canonical parameters are harder to interpret but give information about the independence structure Inference in exponential family models turns out to be just conversion from a canonical representation to a moment representation. Sounds simple but can be easily intractable Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 9

11 The Multivariate Gaussian Distribution. N s; µ, P µ is the mean and P is the covariance: N s; µ, P 2πP 1/2 exp 1 2 s µ P 1 s µ exp 12 s P 1 s + µ P 1 s 12 µ P 1 µ 12 2πP log N s; µ, P 1 2 s P 1 s + µ P 1 s + const 1 2 Tr P 1 ss + µ P 1 s + const Tr P 1 ss + µ P 1 s Notation: log fx + gx fx expgx c R : fx c expgx Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 10

12 Gaussian potentials Consider a Gaussian potential with mean µ and covariance Σ on x. φx αn µ, Σ 1 α 2πΣ 1 2 exp 1 2 x µt Σ 1 x µ 2 where dxφx α and 2πΣ is a short notation for 2π d det Σ, where Σ is d d.. If α 1 the potential is normalized. A general Gaussian potential φ need not to be normalized so α is in fact an arbitrary positive constant. The exponent is just a quadratic form. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 11

13 Canonical Form φx exp{log α 1 2 log 2πΣ 1 2 µt Σ 1 µ} + µ T Σ 1 x 1 2 xt Σ 1 x expg + h T x 1 2 xt Kx Alternative to the conventional and intuitive moment form. Here we represent the potential by the polynomial coefficients h and K. Coefficients h and K as natural parameters. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 12

14 Canonical and Moment parametrisations The moment parameters and canonical parameters are related by K Σ 1 h Σ 1 µ g log α 1 2 log 2πΣ 1 2 µt Σ 1 ΣΣ 1 µ log α log K 2π 1 2 ht K 1 h Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 13

15 Jointly Gaussian Vectors Moment form µ1 Σ11 Σ φx 1, x 2 αn, 12 µ 2 Σ 21 Σ 22 φ α 2πΣ 1 2 exp Σ x1 µ 1 x 2 µ 11 Σ 12 x1 µ 1 2 Σ 21 Σ 22 x 2 µ 2 Canonical form φx 1, x 2 expg + h 1 h 2 x 1 x K x1 x 11 K 12 x1 2 K 21 K 22 x 2 need to find a parametric representation of K Σ 1 in terms of the partitions Σ 11, Σ 12, Σ 21, Σ 22. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 14

16 Partitioned Matrix Inverse Strategy: We will find two matrices X and Z such that W becomes block diagonal. LΣR W Σ L 1 W R 1 Σ 1 RW 1 L K Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 15

17 Add a multiple of row s to row t Gauss Transformations Premultiply Σ with Ls, t where L i,j s, t 1, i j γ, i s and j t 0, o/w Example: s 2, t 1 1 γ 0 1 a b c d a + γc b + γd c d The inverse just subtracts what is added 1 γ γ 0 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 16

18 Gauss Transformations Given Σ, add a multiple of column s to column t Postmultiply Σ with Rs, t where R i,j s, t 1, i j γ, j s and i t 0, o/w Example: s 2, t 1 a b c d 1 0 γ 1 a + γb b c + γd d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 17

19 Scalar example Σ LΣ LΣR a b c d 1 bd 1 a b a bd 1 c b bd 1 d 0 1 c d c d 1 bd 1 a b c d d 1 c 1 a bd 1 c c d d 1 c 1 a bd 1 c 0 a bd c dd 1 1 c 0 W c d 0 d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 18

20 Scalar example cont Σ L 1 W R 1 Σ 1 RW 1 L 1 0 d 1 c 1 a bd 1 c bd 1 0 d a bd 1 c bd 1 d 1 ca bd 1 c 1 d a bd 1 c 1 a bd 1 c 1 bd 1 d 1 ca bd 1 c 1 d 1 + d 1 ca bd 1 c 1 bd 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 19

21 Scalar example We could also use Σ LΣ LΣR RW 1 L a b c d 1 0 a b ca 1 1 c d 1 0 a b 1 a 1 b ca 1 1 c d 0 1 a 0 0 d ca 1 W b a 1 + a 1 b d ca 1 b 1 ca 1 a 1 b d ca 1 b 1 d ca 1 b 1 ca 1 d ca 1 b 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 20

22 Partitioned Matrix Inverse In matrix case, this leads to following dual factorizations of Σ as Σ I Σ12 Σ I I 0 Σ 21 Σ 1 11 I Σ11 Σ 12 Σ 22 1 Σ 21 0 I 0 0 Σ 22 Σ 1 22 Σ 21 I Σ11 0 I Σ 1 0 Σ 22 Σ 21 Σ Σ 12 Σ 12 0 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 21

23 We will introduce the notation The Schur Complement Σ/Σ 22 Σ 11 Σ 12 Σ 22 1 Σ 21 Σ/Σ 11 Σ 22 Σ 21 Σ 11 1 Σ 12 Determinant Σ Σ/Σ 11 Σ 11 Σ/Σ 22 Σ 22 Σ 1 I 0 Σ 1 22 Σ 21 I I Σ 1 11 Σ 12 0 I Σ/Σ Σ 1 22 Σ Σ/Σ 11 1 I Σ12 Σ I I 0 Σ 21 Σ 1 11 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 22

24 Partitioned Matrix Inverse K11 K 12 K 21 K 22 1 Σ11 Σ 12 Σ 21 Σ 22 Σ/Σ 22 1 Σ/Σ 22 1 Σ 12 Σ 1 22 Σ 1 22 Σ 21Σ/Σ 22 1 Σ Σ 1 22 Σ 21Σ/Σ 22 1 Σ 12 Σ 1 22 Σ Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 Σ 1 11 Σ 12Σ/Σ 11 1 Σ/Σ 11 1 Σ 21 Σ 1 11 Σ/Σ 11 1 Quite complicated looking formulas, but straightforward to implement Caution: Σ 1 11 K 11 in general! Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 23

25 Matrix Inversion Lemma Read the diagonal entries Σ 11 Σ 12 Σ 22 1 Σ 21 1 Σ Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 A BC 1 D 1 A 1 + A 1 B C DA 1 B 1 DA 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 24

26 Factorisation of Multivariate Gaussians Consider the joint distribution over the variable x x1 x 2 where the joint distribution is Gaussian px N x; µ, Σ with µ Σ µ1 µ 2 Σ1 Σ 12 Σ 12 Σ 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 25

27 Find the following Factorisation of Multivariate Gaussians 1. Conditionals a px 1 x 2 b px 2 x 1 2. Marginals a px 1 b px 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 26

28 Factorisation of Multivariate Gaussians Using the partitioned inverse equations, we rearrange px 1, x 2 exp 1 2 x1 µ 1 x 2 µ 2 Σ1 Σ 12 Σ 12 Σ 2 1 x1 µ 1 x 2 µ 2 bring the expression in form of px 1 px 2 x 1 or px 2 px 1 x 2 where the marginal and conditional can be easily identified. See also Bishop, section 2.3. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 27

29 Factorisation of Multivariate Gaussians We have the two decompositions Σ1 Σ 12 Σ 12 Σ 2 1 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 Σ Σ 1 1 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 Σ2 Σ 12Σ 1 Σ 2 Σ 12Σ 1 1 Σ 12 1 Σ 1 Σ 12 Σ 1 1 Σ Σ 1 2 Σ 12 1 Σ 12 Σ 1 1 Σ 12 1 Σ 12 Σ Σ Σ 12 Σ1 Σ 12 Σ 1 2 Σ 1 Σ 12 Σ2 Σ2 Σ 12Σ We let s i x i µ i and use the first decomposition. ps 1, s 2 exp 1 2 s1 s 2 Σ1 Σ 12 Σ 12 Σ 2 1 s1 s 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 28

30 exp 1 2 exp 1 2 s 1 s 2 Σ 1 2 Σ 12 s1 1 2 s 2 Σ 1 2 Σ s 2 Σ 1 2 s 2 s 2 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 1 Σ 1 Σ 12 Σ Σ s1 1 Σ 1 Σ 12 Σ Σ s1 Σ 1 Σ 12 Σ 1 2 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 1 Σ12 Σ 1 2 s 2 N s 1 ; Σ 12 Σ 1 2 s 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N s 2; 0, Σ 2 1 Σ 1 Σ 12 Σ 1 1 Σ Σ 1 2 Σ 12 N x 1 ; µ 1 + Σ 12 Σ 1 2 x 2 µ 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N x 2; µ 2, Σ Σ Σ12 Σ 1 Σ1 Σ 12 Σ 1 2 Σ 12 This leads to a factorisation of form px 2 px 1 x 2. The second decomposition will lead to the other factorisation px 1 px 2 x 1. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 29

31 Keywords Summary Multivariate Bernoulli Multivariate Gaussian Partitioned Inverse Equations, Matrix Inversion Lemma Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 30

32 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 31

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,