Visualizing the Multivariate Normal, Lecture 9

Similar documents
Multivariate Distributions

Unsupervised Learning: Dimensionality Reduction

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

Lecture 15, 16: Diagonalization

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Table of Contents. Multivariate methods. Introduction II. Introduction I

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principle Components

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Review (Probability & Linear Algebra)

Principal Components Analysis (PCA)

Multivariate Statistical Analysis

Linear Dimensionality Reduction

Data Mining and Analysis: Fundamental Concepts and Algorithms

Mathematical foundations - linear algebra

CSC 411 Lecture 12: Principal Component Analysis

4 Gaussian Mixture Models

Exercise Set 7.2. Skills

Introduction to Machine Learning

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Maximum variance formulation

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

Uncorrelatedness and Independence

Announcements Monday, November 13

Recall the convention that, for us, all vectors are column vectors.

ECE 5615/4615 Computer Project

The Multivariate Gaussian Distribution

Announcements (repeat) Principal Components Analysis

Multivariate Analysis Homework 1

Lecture: Face Recognition and Feature Reduction

2. Matrix Algebra and Random Vectors

Vectors and Matrices Statistics with Vectors and Matrices

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Basic Concepts in Matrix Algebra

Gaussian Models (9/9/13)

Lecture 2: Linear Algebra

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Econ Slides from Lecture 7

Lecture: Face Recognition and Feature Reduction

Announcements Monday, November 13

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

ANOVA: Analysis of Variance - Part I

Introduction to Matrix Algebra and the Multivariate Normal Distribution

The Multivariate Normal Distribution

Eigenvectors and SVD 1

The Gaussian distribution

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Principal Component Analysis

CS 143 Linear Algebra Review

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

1 Linearity and Linear Systems

CS281 Section 4: Factor Analysis and PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Machine learning for pervasive systems Classification in high-dimensional spaces

Singular Value Decomposition

L5: Quadratic classifiers

Foundations of Computer Vision

7 Multivariate Statistical Models

Response Surface Methodology III

Chapter 5 continued. Chapter 5 sections

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Lecture 1: Systems of linear equations and their solutions

LINEAR ALGEBRA KNOWLEDGE SURVEY

Modèles stochastiques II

Machine Learning for Data Science (CS4786) Lecture 12

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

(Linear equations) Applied Linear Algebra in Geoscience Using MATLAB

Review of Coordinate Systems

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Linear Algebra & Geometry why is linear algebra useful in computer vision?

UNIT 6: The singular value decomposition.

This appendix provides a very basic introduction to linear algebra concepts.

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

What is Principal Component Analysis?

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Introduction to Machine Learning

Lecture Notes on the Gaussian Distribution

Gaussian random variables inr n

Minimum Error Rate Classification

1 Principal Components Analysis

Lecture 20, Mixture Examples and Complements

Applied Linear Algebra in Geoscience Using MATLAB

12.2 Dimensionality Reduction

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Transcription:

Visualizing the Multivariate Normal, Lecture 9 Rebecca C. Steorts September 15, 2015

Last class Class was done on the board get notes if you missed lecture. Make sure to go through the Markdown example I posted on the webpage.

Spectral Decomposition P is orthogonal if P T P = 1 and PP T = 1. Theorem: Let A be symmetric n n. Then we can write A = PDP T, where D = diag(λ 1,..., λ n ) and P is orthogonal. The λs are the eigenvalues of A and ith column of P is an eigenvector corresponding to λ i. Orthogonal matrices represent rotations of the coordinates. Diagonal matrices represent stretchings/shrinkings of coordinates.

Properties The covariance matrix Σ is symmetric and positive definite, so we know from the spectral decomposition theorem that it can be written as Σ = PΛP T. Λ is the diagonal matrix of the eigenvalues of Σ. P is the matrix whose columns are the orthonormal eigenvectors of Σ (hence V is an orthogonal matrix). Geometrically, orthogonal matrices represent rotations. Multiplying by P rotates the coordinate axes so that they are parallel to the eigenvectors of Σ. Probabilistically, this tells us that the axes of the probability-contour ellipse are parallel to those eigenvectors. The radii of those axes are proportional to the square roots of the eigenvalues.

Can we view the det(σ) as a variance? Variance of one-dimensional Gaussian. From the SDT: det(σ) = i λ i. Eigenvalues (λ i ) tell us how stretched or compressed the distribution is. View det(σ) as stretching/compressing factor for the MVN density. We will see this from the contour plots later.

Our focus is visualizing MVN distributions in R.

What is a Contour Plot? Contour plot is a graphical technique for representing a 3-dimensional surface. We plot constant z slices (contours) on a 2-D format. The contour plot is an alternative to a 3-D surface plot. The contour plot is formed by: Vertical axis: Independent variable 2. Horizontal axis: Independent variable 1. Lines: iso-response values.

Contour Plot The lines of the contour plots denote places of equal probability mass for the MVN distribution The lines represent points of both variables that lead to the same height on the z-axis (the height of the surface) These contours can be constructed from the eigenvalues and eigenvectors of the covariance matrix The direction of the ellipse axes are in the direction of the eigenvalues The length of the ellipse axes are proportional to the constant times the eigenvector More specifically Σ 1/2 (X µ) = c 2 has ellipsoids centered at µ and axes at (λ i v i )

Visualizing the MVN Distribution Using Contour Plots The next figure below shows a contour plot of the joint pdf of a bivariate normal distribution. Note: we are plotting the theoretical contour plot. This particular distribution has mean ( ) 1 µ = 1 (solid dot), and variance matrix Σ = ( 2 1 1 1 ).

Code to construct plot library(mvtnorm) x.points <- seq(-3,3,length.out=100) y.points <- x.points z <- matrix(0,nrow=100,ncol=100) mu <- c(1,1) sigma <- matrix(c(2,1,1,1),nrow=2) for (i in 1:100) { for (j in 1:100) { z[i,j] <- dmvnorm(c(x.points[i],y.points[j]), mean=mu,sigma=sigma) } } contour(x.points,y.points,z) 2 3 0.1 0.08 0.06 0.04 0.14

Contour plot 3 2 1 0 1 2 3 0.02 0.12 0.14 0.1 0.08 0.06 0.04 3 2 1 0 1 2 3

Our findings Probability contours are ellipses. Density changes comparatively slowly along the major axis, and quickly along the minor axis. The two points marked + in the figure have equal geometric distance from µ. But the one to its right lies on a higher probability contour than the one above it, because of the directions of their displacements from the mean.

Another example Now consider the following samples drawn from bivariate normal densities: (( ) ( )) iid 0 1 0.5 X 1,..., X n N2, 0 0.5 1 and Y 1,..., Y n iid N2 (( 2 2 ), ( 1.5 1.5 1.5 1.5 )).

Kernel density estimation (KDE) KDE allows us to estimate the density from which each sample was drawn. This method (which you will learn about in other classes) allows us to approximate the density using a sample. There are R packages that use kde s such as density().

Generate two bivariate normals library(mass) bivn <- mvrnorm(1000, mu = c(0, 0), Sigma = matrix(c(1,.5,.5, 1), 2)) bivn2 <- mvrnorm(1000, mu = c(-2, 2), Sigma = matrix(c(1.5, 1.5, 1.5, 1.5), 2))

Applying KDE and plotting # now we do a kernel density estimate bivn.kde <- kde2d(bivn[,1], bivn[,2], n = 50) # now we do a mixture bivn.mix <- kde2d(c(bivn[,1],bivn2[,1]), c(bivn[,2],bivn2[,2]), n = 50) ##creating contour plots and saving them to files pdf(file = "contour1.pdf") image(bivn.kde); contour(bivn.kde, add = T) dev.off() ## pdf ## 2 pdf(file = "contour_both.pdf") image(bivn.mix); contour(bivn.mix, add = T) dev.off()

Contour plot of X 3 2 1 0 1 2 0.08 0.02 0.06 0.04 0.1 0.14 0.12 3 2 1 0 1 2

Contour plot of X and mixture of X, Y 2 0 2 4 6 0.02 0.04 0.07 0.08 0.1 0.12 0.11 0.09 0.06 0.01 0.02 0.05 0.03 0.03 0.05 0.01 0.07 0.04 0.06 6 4 2 0 2

What did we learn? The contour plot of X (bivariate density): Color is the probability density at each point (red is low density and white is high density). Contour lines define regions of probability density (from high to low). Single point where the density is highest (in the white region) and the contours are approximately ellipses (which is what you expect from a Gaussian).

What can we say in general about the MVN density? The spectral decomposition theorem tells us that the contours of the multivariate normal distribution are ellipsoids. The axes of the ellipsoids correspond to eigenvectors of the covariance matrix. The radii of the ellipsoids are proportional to square roots of the eigenvalues of the covariance matrix.