CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Similar documents
The Multivariate Gaussian Distribution [DRAFT]

Machine learning - HT Maximum Likelihood

Multivariate Bayesian Linear Regression MLAI Lecture 11

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Probabilistic Graphical Models

Gaussian Models (9/9/13)

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Chapter 5 continued. Chapter 5 sections

Nonparameteric Regression:

COM336: Neural Computing

The generative approach to classification. A classification problem. Generative models CSE 250B

CS Lecture 19. Exponential Families & Expectation Propagation

The Expectation-Maximization Algorithm

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Introduction to Machine Learning

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Multivariate Gaussians. Sargur Srihari

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Probability Distributions

Multiple Random Variables

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

A Probability Review

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Chapter 17: Undirected Graphical Models

Machine Learning - MT Classification: Generative Models

Random Variables and Their Distributions

TAMS39 Lecture 2 Multivariate normal distribution

Econ Slides from Lecture 8

A Process over all Stationary Covariance Kernels

Lecture 1 October 9, 2013

3. Probability and Statistics

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Lecture 2: From Linear Regression to Kalman Filter and Beyond

(Multivariate) Gaussian (Normal) Probability Densities

Chapter 1: Systems of Linear Equations and Matrices

STA 4273H: Statistical Machine Learning

Expectation Propagation Algorithm

Gaussian Process Regression

Stat 5101 Lecture Notes

Introduction to Gaussian Processes

CPSC 540: Machine Learning

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian Machine Learning - Lecture 7

1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations

Curve Fitting Re-visited, Bishop1.2.5

Lecture 9: PGM Learning

Exam 2. Jeremy Morris. March 23, 2006

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

GWAS V: Gaussian processes

component risk analysis

Let X and Y denote two random variables. The joint distribution of these random

Lecture 2: Priors and Conjugacy

Lecture 2: Repetition of probability theory and statistics

Introduction to Machine Learning

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Linear Regression and Discrimination

Properties and approximations of some matrix variate probability density functions

1 Bayesian Linear Regression (BLR)

Maximum likelihood estimation

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

1 Data Arrays and Decompositions

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Formulas for probability theory and linear models SF2941

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

The Multivariate Gaussian Distribution

Introduction to Probabilistic Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

Latent Variable Models and EM algorithm

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Chris Bishop s PRML Ch. 8: Graphical Models

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Bayesian Inference for the Multivariate Normal

CS340 Machine learning Gaussian classifiers

Expectation Propagation in Factor Graphs: A Tutorial

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Cheng Soon Ong & Christian Walder. Canberra February June 2018

2.3. The Gaussian Distribution

Based on slides by Richard Zemel

Data Mining and Analysis: Fundamental Concepts and Algorithms

Gaussian Processes for Machine Learning

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes

Dynamic System Identification using HDMR-Bayesian Technique

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Multivariate Gaussian Analysis

Continuous Random Variables

CS Lecture 13. More Maximum Likelihood

Alternative Parameterizations of Markov Networks. Sargur Srihari

Machine Learning (CS 567) Lecture 5

Lecture 5: GPs and Streaming regression

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.

{ p if x = 1 1 p if x = 0

CMPE 547 Bayesian Statistics and Machine Learning

Transcription:

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey Instructor: A. Taylan Cemgil Fall 2009 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul

Multivariate Bernoulli A probability table with binary states px 1, x 2 x 2 0 x 2 1 x 1 0 p 00 p 01 x 1 1 p 10 p 11 px 1, x 2 p 1 x 11 x 2 00 p x 11 x 2 10 p 1 x 1x 2 01 p x 1x 2 11 exp 1 x 1 x 2 + x 1 x 2 log p 00 + x 1 x 1 x 2 log p 10 +x 2 x 1 x 2 log p 01 + x 1 x 2 log p 11 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 1

Multivariate Bernoulli Independent case px 1, x 2 px 1 px 2 px 1, x 2 x 2 0 x 2 1 x 1 0 p 0 q 0 p 0 q 1 x 1 1 p 1 q 0 p 1 q 1 px 1, x 2 exp exp log p 0 q 0 + x 1 log p 1q 0 + x 2 log p 0q 1 + x 1 x 2 log p 0q 0 p 1 q 1 p 0 q 0 p 0 q 0 p 1 q 0 p 0 q 1 log p 0 q 0 + x 1 log p 1 p 0 + x 2 log q 1 q 0 + x 1 x 2 log 1 The coupling parameter is zero Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 2

Multivariate Bernoulli Canonical parameters px 1, x 2 exp g + h 1 x 1 + h 2 x 2 + θ 1,2 x 1 x 2 h 1 log p 10 p 00 h 2 log p 01 p 00 θ 1,2 log p 00p 11 p 10 p 01 h i : threshold θ i,j : pairwise coupling p are the moment parameters Coupling parameters are harder to interpret but provide direct information about the conditional independence structure Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 3

Example x i {0, 1} px 1, x 2 exp 2x 1 0.5x 2 3x 1 x 2 qx 1, x 2 exp 2x 1 0.5x 2 + 0.01x 1 x 2 Are x 1 and x 2 independent, given p or given q? What is px 1 qx 1? Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 4

For two events A and B Odds and the Cross Product Ratio cpt oddsa pa pa c oddsa B pa B pa c B cpra, B pa BpAc B c pa c BpA B c oddsa B/oddsA B c When the events A and B are independent, we have pa B pa cpra, B oddsa/oddsa 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 5

Odds and the Cross Product Ratio cpt Univariate Bernoulli distribution is parametrised by the event A : x 1 1 px 1 exp log p 0 + x 1 log p 1 p 0 exp x 1 log oddsx 1 1 Bivariate Bernoulli distribution px 1, x 2 exp log p 00 + x 1 log p 10 + x 2 log p 01 + x 1 x 2 log p 00p 11 p 00 p 00 p 10 p 01 exp x 1 log oddsx 1 1 x 2 0 + x 2 log oddsx 2 1 x 1 0 +x 1 x 2 log cprx 1 1, x 2 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 6

Multivariate Bernoulli For trivariate case, we have pairwise and triple interactions px 1, x 2, x 3 p 1 x 11 x 2 1 x 3 000 p x 11 x 2 1 x 3 100 p 1 x 1x 2 1 x 3 010 p x 1x 2 1 x 3 110 p 1 x 11 x 2 x 3 001 p x 11 x 2 x 3 101 p 1 x 1x 2 x 3 011 p x 1x 2 x 3 111 log px log p 000 + log p 100 p 000 x 1 + log p 010 p 000 x 2 + log p 001 p 000 x 3 + log p 000p 110 p 010 p 100 x 1 x 2 + log p 000p 101 p 001 p 100 x 1 x 3 + log p 000p 011 p 001 p 010 x 2 x 3 log p 001p 010 p 100 p 111 p 000 p 011 p 101 p 110 x 1 x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 7

log px log p 000 + log oddsx 1 1 x 2 0, x 3 0x 1 + log oddsx 2 1 x 1 0, x 3 0x 2 + log oddsx 3 1 x 1 0, x 2 0x 3 + log cprx 1 1, x 2 1 x 3 0x 1 x 2 + log cprx 1 1, x 3 1 x 2 0x 1 x 3 + log cprx 2 1, x 3 1 x 1 0x 2 x 3 + log cprx 1 1, x 2 1 x 3 0 cprx 1 1, x 2 1 x 3 1 x 1x 2 x 3 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 8

Canonical Parameters versus Moment Parameters Canonical parameters are harder to interpret but give information about the independence structure Inference in exponential family models turns out to be just conversion from a canonical representation to a moment representation. Sounds simple but can be easily intractable Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 9

The Multivariate Gaussian Distribution. N s; µ, P µ is the mean and P is the covariance: N s; µ, P 2πP 1/2 exp 1 2 s µ P 1 s µ exp 12 s P 1 s + µ P 1 s 12 µ P 1 µ 12 2πP log N s; µ, P 1 2 s P 1 s + µ P 1 s + const 1 2 Tr P 1 ss + µ P 1 s + const + 1 2 Tr P 1 ss + µ P 1 s Notation: log fx + gx fx expgx c R : fx c expgx Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 10

Gaussian potentials Consider a Gaussian potential with mean µ and covariance Σ on x. φx αn µ, Σ 1 α 2πΣ 1 2 exp 1 2 x µt Σ 1 x µ 2 where dxφx α and 2πΣ is a short notation for 2π d det Σ, where Σ is d d.. If α 1 the potential is normalized. A general Gaussian potential φ need not to be normalized so α is in fact an arbitrary positive constant. The exponent is just a quadratic form. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 11

Canonical Form φx exp{log α 1 2 log 2πΣ 1 2 µt Σ 1 µ} + µ T Σ 1 x 1 2 xt Σ 1 x expg + h T x 1 2 xt Kx Alternative to the conventional and intuitive moment form. Here we represent the potential by the polynomial coefficients h and K. Coefficients h and K as natural parameters. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 12

Canonical and Moment parametrisations The moment parameters and canonical parameters are related by K Σ 1 h Σ 1 µ g log α 1 2 log 2πΣ 1 2 µt Σ 1 ΣΣ 1 µ log α + 1 2 log K 2π 1 2 ht K 1 h Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 13

Jointly Gaussian Vectors Moment form µ1 Σ11 Σ φx 1, x 2 αn, 12 µ 2 Σ 21 Σ 22 φ α 2πΣ 1 2 exp 1 2 1 Σ x1 µ 1 x 2 µ 11 Σ 12 x1 µ 1 2 Σ 21 Σ 22 x 2 µ 2 Canonical form φx 1, x 2 expg + h 1 h 2 x 1 x 2 1 2 K x1 x 11 K 12 x1 2 K 21 K 22 x 2 need to find a parametric representation of K Σ 1 in terms of the partitions Σ 11, Σ 12, Σ 21, Σ 22. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 14

Partitioned Matrix Inverse Strategy: We will find two matrices X and Z such that W becomes block diagonal. LΣR W Σ L 1 W R 1 Σ 1 RW 1 L K Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 15

Add a multiple of row s to row t Gauss Transformations Premultiply Σ with Ls, t where L i,j s, t 1, i j γ, i s and j t 0, o/w Example: s 2, t 1 1 γ 0 1 a b c d a + γc b + γd c d The inverse just subtracts what is added 1 γ 0 1 1 1 γ 0 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 16

Gauss Transformations Given Σ, add a multiple of column s to column t Postmultiply Σ with Rs, t where R i,j s, t 1, i j γ, j s and i t 0, o/w Example: s 2, t 1 a b c d 1 0 γ 1 a + γb b c + γd d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 17

Scalar example Σ LΣ LΣR a b c d 1 bd 1 a b a bd 1 c b bd 1 d 0 1 c d c d 1 bd 1 a b 1 0 0 1 c d d 1 c 1 a bd 1 c 0 1 0 c d d 1 c 1 a bd 1 c 0 a bd c dd 1 1 c 0 W c d 0 d Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 18

Scalar example cont Σ L 1 W R 1 Σ 1 RW 1 L 1 0 d 1 c 1 a bd 1 c 1 0 1 bd 1 0 d 1 0 1 a bd 1 c 1 0 1 bd 1 d 1 ca bd 1 c 1 d 1 0 1 a bd 1 c 1 a bd 1 c 1 bd 1 d 1 ca bd 1 c 1 d 1 + d 1 ca bd 1 c 1 bd 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 19

Scalar example We could also use Σ LΣ LΣR RW 1 L a b c d 1 0 a b ca 1 1 c d 1 0 a b 1 a 1 b ca 1 1 c d 0 1 a 0 0 d ca 1 W b a 1 + a 1 b d ca 1 b 1 ca 1 a 1 b d ca 1 b 1 d ca 1 b 1 ca 1 d ca 1 b 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 20

Partitioned Matrix Inverse In matrix case, this leads to following dual factorizations of Σ as Σ I Σ12 Σ 1 22 0 I I 0 Σ 21 Σ 1 11 I Σ11 Σ 12 Σ 22 1 Σ 21 0 I 0 0 Σ 22 Σ 1 22 Σ 21 I Σ11 0 I Σ 1 0 Σ 22 Σ 21 Σ 11 1 11 Σ 12 Σ 12 0 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 21

We will introduce the notation The Schur Complement Σ/Σ 22 Σ 11 Σ 12 Σ 22 1 Σ 21 Σ/Σ 11 Σ 22 Σ 21 Σ 11 1 Σ 12 Determinant Σ Σ/Σ 11 Σ 11 Σ/Σ 22 Σ 22 Σ 1 I 0 Σ 1 22 Σ 21 I I Σ 1 11 Σ 12 0 I Σ/Σ22 1 0 0 Σ 1 22 Σ 1 11 0 0 Σ/Σ 11 1 I Σ12 Σ 1 22 0 I I 0 Σ 21 Σ 1 11 I Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 22

Partitioned Matrix Inverse K11 K 12 K 21 K 22 1 Σ11 Σ 12 Σ 21 Σ 22 Σ/Σ 22 1 Σ/Σ 22 1 Σ 12 Σ 1 22 Σ 1 22 Σ 21Σ/Σ 22 1 Σ 1 22 + Σ 1 22 Σ 21Σ/Σ 22 1 Σ 12 Σ 1 22 Σ 1 11 + Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 Σ 1 11 Σ 12Σ/Σ 11 1 Σ/Σ 11 1 Σ 21 Σ 1 11 Σ/Σ 11 1 Quite complicated looking formulas, but straightforward to implement Caution: Σ 1 11 K 11 in general! Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 23

Matrix Inversion Lemma Read the diagonal entries Σ 11 Σ 12 Σ 22 1 Σ 21 1 Σ 1 11 + Σ 1 11 Σ 12Σ/Σ 11 1 Σ 21 Σ 1 11 A BC 1 D 1 A 1 + A 1 B C DA 1 B 1 DA 1 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 24

Factorisation of Multivariate Gaussians Consider the joint distribution over the variable x x1 x 2 where the joint distribution is Gaussian px N x; µ, Σ with µ Σ µ1 µ 2 Σ1 Σ 12 Σ 12 Σ 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 25

Find the following Factorisation of Multivariate Gaussians 1. Conditionals a px 1 x 2 b px 2 x 1 2. Marginals a px 1 b px 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 26

Factorisation of Multivariate Gaussians Using the partitioned inverse equations, we rearrange px 1, x 2 exp 1 2 x1 µ 1 x 2 µ 2 Σ1 Σ 12 Σ 12 Σ 2 1 x1 µ 1 x 2 µ 2 bring the expression in form of px 1 px 2 x 1 or px 2 px 1 x 2 where the marginal and conditional can be easily identified. See also Bishop, section 2.3. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 27

Factorisation of Multivariate Gaussians We have the two decompositions Σ1 Σ 12 Σ 12 Σ 2 1 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 Σ 1 1 + Σ 1 1 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 Σ2 Σ 12Σ 1 Σ 2 Σ 12Σ 1 1 Σ 12 1 Σ 1 Σ 12 Σ 1 1 Σ 1 2 + Σ 1 2 Σ 12 1 Σ 12 Σ 1 1 Σ 12 1 Σ 12 Σ 1 1 1 Σ 1 1 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 1 Σ 12 Σ2 Σ2 Σ 12Σ We let s i x i µ i and use the first decomposition. ps 1, s 2 exp 1 2 s1 s 2 Σ1 Σ 12 Σ 12 Σ 2 1 s1 s 2 Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 28

exp 1 2 exp 1 2 s 1 s 2 Σ 1 2 Σ 12 s1 1 2 s 2 Σ 1 2 Σ 12 1 2 s 2 Σ 1 2 s 2 s 2 Σ1 Σ 12 Σ 1 Σ 1 2 Σ 12 1 Σ 1 Σ 12 Σ 1 2 12 Σ s1 1 Σ 1 Σ 12 Σ 1 2 12 Σ s1 Σ 1 Σ 12 Σ 1 2 Σ 12 2 Σ 12 Σ1 Σ 12 Σ 1 2 Σ 12 1 Σ12 Σ 1 2 s 2 N s 1 ; Σ 12 Σ 1 2 s 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N s 2; 0, Σ 2 1 Σ 1 Σ 12 Σ 1 1 Σ 1 2 + Σ 1 2 Σ 12 N x 1 ; µ 1 + Σ 12 Σ 1 2 x 2 µ 2, Σ 1 Σ 12 Σ 1 2 Σ 12 N x 2; µ 2, Σ 2 1 2 12 Σ Σ12 Σ 1 Σ1 Σ 12 Σ 1 2 Σ 12 This leads to a factorisation of form px 2 px 1 x 2. The second decomposition will lead to the other factorisation px 1 px 2 x 1. Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 29

Keywords Summary Multivariate Bernoulli Multivariate Gaussian Partitioned Inverse Equations, Matrix Inversion Lemma Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 30

Cemgil CMPE 58K Bayesian Statistics and Machine Learning. Lecture 5. Fall 2009, Boğaziçi University, Istanbul 31