The Multivariate Normal Distribution

Similar documents
The Multivariate Gaussian Distribution

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

[y i α βx i ] 2 (2) Q = i=1

The Multivariate Gaussian Distribution [DRAFT]

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Appendix 2. The Multivariate Normal. Thus surfaces of equal probability for MVN distributed vectors satisfy

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

STA 2201/442 Assignment 2

Vectors and Matrices Statistics with Vectors and Matrices

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Minimum Error Rate Classification

EK102 Linear Algebra PRACTICE PROBLEMS for Final Exam Spring 2016

Elliptically Contoured Distributions

M.A.P. Matrix Algebra Procedures. by Mary Donovan, Adrienne Copeland, & Patrick Curry

Principal Components Theory Notes

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Introduction to Probability and Stocastic Processes - Part I

Lecture 11. Multivariate Normal theory

Random Vectors, Random Matrices, and Matrix Expected Value

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

A Probability Review

The Multivariate Normal Distribution 1

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

Multivariate Statistical Analysis

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

Lecture 1: Systems of linear equations and their solutions

Multiple Random Variables

Matrix Approach to Simple Linear Regression: An Overview

2. Matrix Algebra and Random Vectors

The generative approach to classification. A classification problem. Generative models CSE 250B

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Eigenvectors and SVD 1

The Gaussian distribution

(Multivariate) Gaussian (Normal) Probability Densities

Mathematical foundations - linear algebra

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Bayesian Decision Theory

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Let X and Y denote two random variables. The joint distribution of these random

ST 740: Linear Models and Multivariate Normal Inference

Multivariate Distributions

Multivariate Distributions (Hogg Chapter Two)

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Chapter 5 continued. Chapter 5 sections

Gaussian Models (9/9/13)

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Machine Learning (CS 567) Lecture 5

A Peak to the World of Multivariate Statistical Analysis

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Appendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS

Systems of Linear ODEs

From Lay, 5.4. If we always treat a matrix as defining a linear transformation, what role does diagonalisation play?

The Multivariate Normal Distribution 1

Visualizing the Multivariate Normal, Lecture 9

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Elements of Probability Theory

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Multivariate Random Variable

Next is material on matrix rank. Please see the handout

Lecture Note 1: Probability Theory and Statistics

Basic Concepts in Matrix Algebra

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Gaussian random variables inr n

Part 6: Multivariate Normal and Linear Models

SOME PROPERTIES OF THE MULTIVARIATE SPLIT NORMAL DISTRIBUTION

Lecture 2: Review of Probability

Test Problems for Probability Theory ,

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Review (Probability & Linear Algebra)

STA 302f16 Assignment Five 1

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

3. Probability and Statistics

Math 443 Differential Geometry Spring Handout 3: Bilinear and Quadratic Forms This handout should be read just before Chapter 4 of the textbook.

CS 246 Review of Linear Algebra 01/17/19

Module 11: Linear Regression. Rebecca C. Steorts

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Contents 1. Contents

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Linear Algebra: Characteristic Value Problem

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Bayesian Linear Regression [DRAFT - In Progress]

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Math 113 Homework 5 Solutions (Starred problems) Solutions by Guanyang Wang, with edits by Tom Church.

Chapter 5 Matrix Approach to Simple Linear Regression

TAMS39 Lecture 2 Multivariate normal distribution

Transcription:

The Multivariate Normal Distribution Paul Johnson June, 3 Introduction A one dimensional Normal variable should be very familiar to students who have completed one course in statistics. The multivariate Normal is a generalization of the Normal which describes the probability of observing n tuples. The multivariate Normal distribution looks just like you might expect from your experience with the univariate Normal: symmetric, single-peaked, basically pleasant in every way! The vectors that are described by the multivariate Normal distribution can have many elements as we choose to assign. Let s work with an example in which each randomly selected case has two attributes: a two dimensional model. A point drawn from a two dimensional Normal would be a pair of real-valued numbers (you can call it a vector or a point if you like). Given any vector, such as (.43,.) or (.43,.) the multivariate Normal density function gives back an indicator of how likely we are to draw that particular pair of values. The Multi Variate Normal (MVN) distribution has two parameters, but each of these two parameters contains more than one piece of information. One parameter, µ, describes the location or center point of the distribution, while the other parameter determines the dispersion, the width of the spread from side to side in the observed points. Figure illustrates the density function of a Normal distribution on two dimensions. Mathematical Description In the multivariate Normal density, there are two parameters, µ and Σ. The first is an n column vector of location parameters and an n n dispersion matrix Σ.

Figure : Multi Variate Normal Distribution.5..5 probability..5. 3 y y 3 3 3

f(y i µ,σ) = = f((y i,y i,...,y in ) µ = (µ,...,µ n ),Σ = = σ σ n... σ n (π) n Σ exp( (y i µ) Σ (y i µ)) () σ n ) The symbol Σ refers to the determinant of the matrix Σ. The determinant is a single real number. The symbol Σ is the inverse of Σ, a matrix for which ΣΣ = I. This equation assumes that Σ can be inverted, and one sufficient condition for the existence of an inverse is that the determinant is not. The matrix Σ must be positive semi-definite in order to assure that the most likely point is µ = (µ,µ,...,µ n ) and that, as y i moves away from µ in any direction, then the probability of observing y i declines. The denominator in the formula for the MVN is a normalizing constant, one which assures us that the distribution integrates to.. In the last section of this writeup, I have included a little bit to help with the matrix algebra and the importance of a weighted cross product like (y i µ)σ (y i µ). If there is only one dimension, this gives the same value as the ordinary Normal distribution, which one will recall is f(y i µ,σ ) = [ exp (yi µ) ] πσ σ In the one dimensional case, Σ = σ, meaning Σ = σ and Σ = /σ. In the example with two dimensions, the location parameter has two components µ = (µ,µ ) and (again for the two dimensional case) the scale parameter is [ ] σ Σ = σ σ σ For the case, the determinant and inverse are known: Σ = σ σ σ and Σ = σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ so you can actually calculate an explicit value by hand if you have enough patience. 3

f(y i = (y,y ) µ,σ) = (π) n σ σ exp( σ [y i µ,y i µ ] σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ [ yi µ y i µ ] At this time, I m unwilling to write out any more of that. But I encourage you to do it. 3 The Expected Value and Variance Recall that expected value is defined as a probability weighted sum of possible outcomes. E(y) = µ V ar(y) = Σ Because we are considering a multivariate distribution here, it is probably lost in the shuffle that the expected value is calculated by integrating over all dimensions. If we had a discrete distribution, say p(x,x ), then the expected value would be calculated by looping over all values of x and x and calculating a probability weighted sum n m E(x ) = p(x i,x j )x i i= j= With a probability model defined on continuous variables, the calculation of the expected value requires the use of integrals, but except for the change from notation (Σ to ), there is not major conceptual change. E(y ) = + + f(y,y )y Because the covariance between two variables is the same, whether we calculate σ = Covariance(y,y ) or σ = Coviariance(y,y ), the matrix Σ has to be same on either side of the diagonal. As you can see, the matrix is symmetrical, so that the bottom-left element is the same as the top-right. The matrix is the same as its transpose. Σ = [ σ σ σ σ ] = Σ 4

4 Illustrations 4. Contour Plots [ ] The parameters that were used to create Figure were µ = (,) and Σ =. That is the bivariate equivalent of the standardized normal distribution. A contour plot offers one way to view this distribution. The equally-probable points are connected by lines. The idea will be familiar to economists as indifference curves and to geographers as topographical maps. Consider the contours in Figure. 4. Change the Variance If we change one of the diagonal elements in Σ, the effect is to spread out the observations on that dimension. See Figure 3. 4.3 The effect of the Covariance parameter Until this point, we have been considering distributions in which the two variables y and y were statistically independent, in the sense that the expected value of y was not influenced by y. 5 Cross-sections A conditional density gives the probability of the outcome y = (y,y ) variable when one of these is taken as given. For example, f((y,y ) y =.75) reprents the probability of observing y when it is known that y =.75. The Multivariate Normal distribution has the interesting property that the conditional distributions are basically Normal (Normal up to a constant of proportionality). Figure 6 illustrates the MVN when y =.75. The total area under that curve will not equal., but if we normalize it by finding the area (call that A), then is a valid probability density function. f(y y =.75) A 6 Appendix: Matrix Algebra. If you have never studied matrix algebra, I would urge you to do a little study right now. Find any book or one of my handouts. 5

Figure : The View from Above Corresponding to Figure #image(x, x, probx, col = terrain.colors(), axes = FALSE) contour ( x, x, probx ) 3 3..6.8.4...4 3 3 6

Figure 3: MV Normal(µ = (,),σ = [ 3 ] ).5..5 probability..5. 5 y y 5 7

Figure 4: MV Normal(µ = (,),σ = [.8.8 ] ).5..5 probability..5. y y 8

Figure 5: The View from Above #image(x, x, probx, col = terrain.colors(), axes = FALSE) contour ( x, x, probx ).6..4.8..6.4..6..8.4. 9

Figure 6: Cross section of Standard MVN for y =.75.5..5 probability..5. y y

Your objective is to understand enough about multiplication of matrices and vectors so that you can see the meaning of an expression like the following. x Mx x is an n column vector and M is an n n matrix. The result is a single number, and if you write it out term-for-term, you find this a weighted sum of x s squares and cross-products. x m m n (x,x,...,x n ). x... m n m nn x n = (x,x,...,x n ) m x + m x +... + m n x n m x + m x +... + m n x n. m n x + m n x +... + m nn x n = m x x + m x x +... + m n x n x + m x x + m x x +... + m n x n x +... + m n x x n + m n x x n +... + m nn x n x n n n = m ij x i x j i= j= If M is diagonal, meaning M = m m... () m nn then there are no cross products and the result of x Mx is simply a sum of weighted x-squared terms. n = m ii x i (3) i= A matrix is positive semi-definite if, for all x, x Mx (4)

If M is positive semi-definite, so is M. In the Normal distribution s density formula, we have: exp( (y i µ) Σ (y i µ)) (5) In that formula, the role of x is played by (y i µ) and the role of M is played by Σ. The Normal distribution requires that Σ is positive semi-definite. (y i µ) Σ (y i µ) > (6) Tidbit, The Eigenvectors (e,e,...,e n ) of Σ are known as its principal components. The principal components give the axes of symmetry of the MVN. If the i th eigenvalue (the one associated with the i theigenvector is λ i, the length of the ellipsoid on the ith axis is proportional to λ i (B.Walsh and M. Lynch, Appendix, The Multivariate Normal,, http://nitro.biosci.arizona.edu/zbook/volume_/chapters/vol_a.html