Multivariate Gaussians. Sargur Srihari

Similar documents
From Distributions to Markov Networks. Sargur Srihari

A Probability Review

Linear Dynamical Systems

01 Probability Theory and Statistics Review

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Probability Background

Lecture Note 1: Probability Theory and Statistics

Alternative Parameterizations of Markov Networks. Sargur Srihari

Two Posts to Fill On School Board

Probabilistic Graphical Models

Probability and Information Theory. Sargur N. Srihari

UCSD ECE153 Handout #30 Prof. Young-Han Kim Thursday, May 15, Homework Set #6 Due: Thursday, May 22, 2011

16.584: Random Vectors

Introduction to Probabilistic Graphical Models

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Chapter 5 continued. Chapter 5 sections

Basic Sampling Methods

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

The Multivariate Gaussian Distribution

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Using Graphs to Describe Model Structure. Sargur N. Srihari

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Latent Variable View of EM. Sargur Srihari

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

OWELL WEEKLY JOURNAL

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

Alternative Parameterizations of Markov Networks. Sargur Srihari

Review of Probability Theory

Introduction to Graphical Models

MAP Examples. Sargur Srihari

Bayesian Network Representation

From Bayesian Networks to Markov Networks. Sargur Srihari

Multivariate probability distributions and linear regression

Chapter 17: Undirected Graphical Models

CPSC 540: Machine Learning

Continuous Random Variables

8 - Continuous random vectors

BN Semantics 3 Now it s personal!

Gaussian Models (9/9/13)

Introduction to Probability and Stocastic Processes - Part I

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Introduction to Probabilistic Graphical Models: Exercises

The Gaussian distribution

Eigenvectors and SVD 1

Data Analysis and Uncertainty Part 1: Random Variables

STAT 100C: Linear models

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

EE4601 Communication Systems

3. Probability and Statistics

Machine Learning (CS 567) Lecture 5

Elliptically Contoured Distributions

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Random Vectors and Multivariate Normal Distributions

MCMC and Gibbs Sampling. Sargur Srihari

Directed Graphical Models or Bayesian Networks

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

MAS223 Statistical Inference and Modelling Exercises

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Linear Algebra for Machine Learning. Sargur N. Srihari

The generative approach to classification. A classification problem. Generative models CSE 250B

1 Data Arrays and Decompositions

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

11 : Gaussian Graphic Models and Ising Models

Bayesian Machine Learning - Lecture 7

STA 4273H: Statistical Machine Learning

Bayesian Networks Representation

Variational Inference and Learning. Sargur N. Srihari

CS Lecture 3. More Bayesian Networks

Gaussian Process Regression

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Lecture 4: Least Squares (LS) Estimation

Appendix A : Introduction to Probability and stochastic processes

Chris Bishop s PRML Ch. 8: Graphical Models

1.12 Multivariate Random Variables

11 - The linear model

Joint p.d.f. and Independent Random Variables

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Computational Complexity of Inference

Basics on Probability. Jingrui He 09/11/2007

Gaussian random variables inr n

Multiple Random Variables

2.3. The Gaussian Distribution

MIT Spring 2015

Lecture 15: Multivariate normal distributions

Dynamic models 1 Kalman filters, linearization,

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

CSC 412 (Lecture 4): Undirected Graphical Models

Advanced Introduction to Machine Learning CMU-10715

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

Undirected Graphical Models: Markov Random Fields

18 Bivariate normal distribution I

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Bayesian Inference. Chapter 9. Linear models and regression

Transcription:

Multivariate Gaussians Sargur srihari@cedar.buffalo.edu 1

Topics 1. Multivariate Gaussian: Basic Parameterization 2. Covariance and Information Form 3. Operations on Gaussians 4. Independencies in Gaussians 2

Three Representations of Multivariate Gaussian Equivalent representations: 1. Basic parameterization 1. Covariance form, 2. Information (or precision ) form 2. Gaussian Bayesian network Information form captures conditional independencies needed for BN 3. Gaussian Markov Random Field Easiest conversion 3

Parameterizing Multivariate Gaussian 1. Basic parameterization: covariance form 2. Multivariate Gaussian distribution over X 1,..,X n is characterized by an n-dimensional mean vector μ and a symmetric n x n covariance matrix Σ 3. The density function is defined as p( x) = 1 ( 2π ) d/2 Σ exp (x 1/2 µ)t Σ 1 (x µ) 4

Covariance Intuition in Gaussian The pdf specifies a set of ellipsoidal contours around the mean vector µ Contours are parallel Each corresponds to a value of the density function Shape of ellipsoid and steepness of contour are determined by Σ μ=e[x] Σ=E[xx t ]-E[x]E[x] t X,Y uncorrelated X,Y Correlated µ = 1 3 4 Σ = 4 2 2 2 5 5 2 5 8 X 1 is negatively correlated (-2) with X 3 X 2 is negatively correlated (-5) with X 3

Multivariate Gaussian examples 6

Deriving the Information Form Covariance Form of multivariate Gaussian is: p( x) = 1 ( 2π ) d/2 Σ exp (x 1/2 µ)t Σ 1 (x µ) Using inverse covariance matrix Precision J=Σ -1 1 ( 2 x µ )t Σ 1 ( x µ ) = 1 ( 2 x µ )t J ( x µ ) = 1 2 xt Jx 2x t Jµ + µ t Jµ Since last term is constant, we get information form ( ) α exp 1 2 x t Jx + (J µ) t x p x Jμ is called the potential vector

Information form of Gaussian Since Σ is invertible we can also define the Gaussian in terms of the inverse covariance matrix J=Σ -1 Called the information matrix or precision matrix It induces an alternate form of the Gaussian density ( ) α exp 1 2 x t Jx + (J µ) t x p x Jµ is called the potential vector 8

Operations on Gaussians There are two main operations we wish to perform on a distribution: 1. Compute marginal distribution over some subset Y 2. Conditioning the distribution on some Z=z Each operation is very easy in one of the two ways of encoding a Gaussian 1. Marginalization is trivial in the covariance form 2. Conditioning a Gaussian is very easy to perform in the information form 9

Marginalization in Covariance Form Marginal distribution of any subset of variables Trivially read from the mean and covariance matrix p(x 1,X 2,X 3 ) = N(µ, Σ) p(x 2,X 3 ) = Σ X1 p(x 1,X 2,X 3 ) =N(µ,Σ ) µ = 1 3 4 Σ = More generally, if we have a joint distribution over {X,Y}, XεR n, YεR m then we can decompose mean and covariance of joint as: p(x, Y) = N µ X µ Y 4 2 2 2 5 5 2 5 8 where µ X εr n, µ Y εr m, Σ XX is nxn, Σ XY is nxm, Σ YX =Σ T XY is mxn, Σ YY is mxm Lemma: If {X,Y} have a joint normal distribution then marginal of Y is a normal distribution N(μ Y ;Σ YY ) 10 ; Σ XX Σ YX then µ'= Σ XY Σ YY 3 4 Σ'= 5 5 5 8

Conditioning in Information Form Conditioning on an observation Z=z is very easy to perform in the information form Simply assign Z=z in the equation p x ( ) α exp 1 2 x t Jx + (J µ) t x It turns some of the quadratic terms into linear terms or even constant terms Expression has same form with fewer terms Not so straightforward in covariance form 11

Summary of Marginalization & Conditioning To marginalize a Gaussian over a subset of variables: Compute pairwise covariances Which is precisely generating the distribution in its covariance form To condition a Gaussian on an observation: Invert covariance matrix to obtain information form Inversion feasible for small matrices Too costly for high-dimensional space 12

Independencies in Gaussians Theorem: If X=X 1,..,X n have a joint normal distribution N(µ,Σ) then X i and X j are independent if and only if Σ i,j =0 This property does not hold in general If p(x,y) is not Gaussian then it is possible that Cov[X;Y]=0 while X and Y are still dependent in p 13

Independencies from Information Matrix Theorem: Consider a Gaussian distribution p(x 1,..,X n )=N(μ,Σ) and let J=Σ -1 be the information matrix. Then J ij =0 if and only if Example: p ( X i X j χ {X i, X j }) µ = 1 3 4 Σ = 4 2 2 2 5 5 2 5 8 then J = 0.3125 0.125 0 0.125 0.5833 0.333 0 0.333 0.333 14 Then X 1,X 3 are conditionally independent given X 2

Information matrix defines an I-map Information matrix J captures independencies between pairs of variables conditioned on all the remaining variables Can use J to construct unique minimal I-map for p Definition of I-map I independence assertion set of form (X Y/ Z), which are true of distribution P G is an I-map of P implies: I G I» Minimal I-map: removal of a single edge renders the graph not an I-map Introduce an edge between X i and X j whenever ( X i X j χ {X i, X j }) does not hold in p This is precisely when J ij 0 15