Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Similar documents
Basic Concepts in Matrix Algebra

2. Matrix Algebra and Random Vectors

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

NORMS ON SPACE OF MATRICES

STAT200C: Review of Linear Algebra

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Symmetric Matrices and Eigendecomposition

Review of Some Concepts from Linear Algebra: Part 2

Lecture 7: Positive Semidefinite Matrices

Knowledge Discovery and Data Mining 1 (VO) ( )

1. Introduction to Multivariate Analysis

UNIT 6: The singular value decomposition.

Vectors and Matrices Statistics with Vectors and Matrices

Nonlinear Programming Algorithms Handout

Lecture 7 Spectral methods

Symmetric matrices and dot products

Principal Components Theory Notes

5 Compact linear operators

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Linear Algebra. Session 12

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

Computational Methods. Eigenvalues and Singular Values

Introduction to Numerical Linear Algebra II

2. Review of Linear Algebra

The Multivariate Normal Distribution. In this case according to our theorem

Chapter 3 Transformations

Positive Definite Matrix

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra

Best approximation in the 2-norm

The following definition is fundamental.

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Linear Algebra Review

Algebra C Numerical Linear Algebra Sample Exam Problems

Notes on Linear Algebra and Matrix Theory

Section 3.9. Matrix Norm

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Linear algebra for computational statistics

CS 246 Review of Linear Algebra 01/17/19

STA 437: Applied Multivariate Statistics

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Random Vectors and Multivariate Normal Distributions

The Eigenvalue Problem: Perturbation Theory

Chap 3. Linear Algebra

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2

Lecture 8: Linear Algebra Background

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Basic Calculus Review

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Notes on Eigenvalues, Singular Values and QR

Part IA. Vectors and Matrices. Year

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Linear Algebra: Matrix Eigenvalue Problems

7. Symmetric Matrices and Quadratic Forms

Stat 206: Sampling theory, sample moments, mahalanobis

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

01 Probability Theory and Statistics Review

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

Math 108b: Notes on the Spectral Theorem

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

CS 143 Linear Algebra Review

Matrix Algebra, part 2

Probabilistic Latent Semantic Analysis

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Tutorials in Optimization. Richard Socher

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Introduction to Matrix Algebra

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T

Linear Algebra And Its Applications Chapter 6. Positive Definite Matrix

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Foundations of Computer Vision

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Linear Algebra, part 2 Eigenvalues, eigenvectors and least squares solutions

8 - Continuous random vectors

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Elementary linear algebra

Linear Algebra Primer

Linear Algebra: Characteristic Value Problem

Linear Algebra (Review) Volker Tresp 2017

A Little Necessary Matrix Algebra for Doctoral Studies in Business & Economics. Matrix Algebra

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Quantum Computing Lecture 2. Review of Linear Algebra

Methods for sparse analysis of high-dimensional data, II

Multivariate Statistical Analysis

Review (Probability & Linear Algebra)

Chapter 1. Matrix Algebra

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

Math Matrix Algebra

Matrix Theory. A.Holst, V.Ufnarovski

Transcription:

MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2 = (x y) t (x y) and d E (x, 0) = x 2 = norm of x. x 2 1 + + x2 p = x t x is the euclidian It follows immediately that all points with the same distance of the origin x 2 = c satisfy x 2 1 + + x2 p = c2 which is the equation of a spheroid. This means that all components of an observation x contribute equally to the euclidian distance of x from the center. However in statistics we prefer a distance that for each of the components (the variables) takes the variability of that variable into accountwhen determining its distance from the center. Components with high variability should receive less weight than components with low variability. This can be obtained by rescaling the components Denote u =( x 1,..., x p )andv =( y 1,..., y p ) s 1 s p s 1 s p then define the distance between x and y as d(x, y) =d E (u, v) = ( x 1 y 1 ) s 2 + +( x p y p ) 1 s 2 = (x y) t D 1 (x y) p where D =diag(s 2 1,...,s2 p ). Now the norm of x equals x = d(x, 0) = d E (u, 0) = u 2 = ( x 1 ) s 2 + +( x p ) 1 s 2 = x t D 1 x p

and all points with the same distance of the origin x = c satisfy ( x 1 s 1 ) 2 + +( x p s p ) 2 = c 2 which is the equation of an ellipsoid centered at the origin with principal axes equal to the coordinate axes. Finally, we also want to take the correlation between variables into account when computing statistical distances. Correlation means that there are associations between the variables. Therefore, we want the axes of ellipsoid to reflect this correlation. This is obtained by allowing the axes of the ellipsoid at constant distance to rotate. This yields the following general form for the statistical distance of two points Def. The statistical distance or Mahalanobis distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p- dimensional space R p is defined as d S (x, y) = (x y) t S 1 (x y) and d S (x, 0) = x S = x t S 1 x is the norm of x. 2 Points with the same distance of the origin x S = c satisfy x t S 1 x = c 2 which is the general equation of an ellipsoid centered at the origin. In general the center of the observations will differ from the origin and we will be interested in the distance of an observation from its center x given by d S (x, x) = (x x) t S 1 (x x).

3 Result 1. Consider any three p-dimensional observations x, y and z of a p-dimensional random variable X =(x 1,...,X p ) t. The Mahalanobis distance satisfies the following properties d S (x, y) =d S (y, x) d S (x, y) > 0ifx y d S (x, y) =0ifx = y d S (x, y) d S (x, z)+d S (z, y) (triangle inequality)

MATRIX ALGEBRA Def. A p-dimensional square matrix Q is orthogonal if QQ t = Q t Q = I p or equivalently Q t = Q 1 This implies that the rows of Q have unit norms and are orthogonal. The columns have the same property. Def. A p-dimensional square matrix A has an eigenvalue λ with corresponding eigenvector x 0if Ax = λx If the eigenvector x is normalized, which means that x =1, then we will denote the normalized eigenvector by e. Result 1. A symmetric p-dimensional square matrix A has p pairs of eigenvalues and eigenvectors (λ 1,e 1 ),...,(λ p,e p ). The eigenvectors can be chosen to be normalized (e t 1 e 1 = = e t pe p = 1) and orthogonal (e t i e j =0ifi j). If all eigenvalues are different, then the eigenvectors are unique.

2 Result 2. Spectral decomposition The spectral decompositon of a p-dimensional symmetric square matrix A is given by A = λ 1 e 1 e t 1 + + λ pe p e t p where (λ 1,e 1 ),...,(λ p,e p ) are the eigenvalue/normalized eigenvector pairs of A. Example Consider the symmetric matrix 13 4 2 A = 4 13 2 2 2 10 From the characteristic equation A λi 3 = 0 we obtain the eigenvalues λ 1 =9,λ 2 =9,andλ 3 = 18 The corresponding normalized eigenvectors are solutions of the equations Ae i = λ i e i for i =1, 2, 3. For example, with e 3 =(e 13,e 23,e 33 ) t the equation Ae 3 = λ 3 e 3 gives 13e 13 4e 23 +2e 33 = 18e 13 4e 13 +13e 23 23 33 = 18e 23 2e 13 2e 23 +10e 33 = 18e 33 Solving this system of equations yields the normalized eigenvector e 3 =(2/3, 2/3, 1/3). For the other eigenvalue λ 1 = λ 2 =9the corresponding eigenvectors are not unique. An orthogonal pair is given by e 1 =(1/ 2, 1/ 2, 0) t and e 2 =(1/ 18, 1/ 18, 4/ 18). With these solutions it can now easily be verified that A = λ 1 e 1 e t 1 + λ 2e 2 e t 2 + λ 3e 3 e t 3 Def. A symmetric p p matrix A is called nonnegative definite if 0 x t Ax for all x R p. A is called positive definite if 0 <x t Ax for all x 0.

It follows (from the spectral decomposition) that A is positive definite if and only if all eigenvalues of A are strictly positive and A is nonnegative definite if and only if all eigenvalues are greater than or equal to zero. Remark The Mahalanobis distance of a point was defined as d 2 S (x, 0) = xt S 1 x which does implies that all eigenvalues of the symmetric matrix S 1 have to be positive. From the spectral decomposition we obtain that a symmetric positive definite p-dimensional square matrix A equals p A = λ i e i e t i = P ΛP t i=1 with P =(e 1,...,e p ) a p-dimensional square matrix whose columns are the normalized eigenvectors of A and Λ = diag(λ 1,...,λ p )a p-dimensional diagonal matrix whose diagonal elements are the eigenvalues of A. NotethatP is an orthogonal matrix. It follows that p A 1 = P Λ 1 P t 1 = e i e t i λ i i=1 and we define the square root of A by p A 1/2 = λi e i e t i = P Λ 1/2 P t i=1 3 Result 3. The square root of a symmetric, positive definite p p matrix A has the following properties (A 1/2 ) t = A 1/2 (that is, A 1/2 is symmetric). A 1/2 A 1/2 = A. A 1/2 =(A 1/2 ) 1 = p i=1 A 1/2 A 1/2 = A 1/2 A 1/2 = I p A 1/2 A 1/2 = A 1 1 λi e i e t i = P Λ 1/2 P t

4 Result 4. Cauchy-Schwarz inequality Let b, d R p be two p-dimensional vectors, then we have that (b t d) 2 (b t b)(d t d) with equality if and only if there exists a constant c R such that b = cd. Result 5. Extended Cauchy-Schwarz inequality Let b, d R p be two p-dimensional vectors and B a p- dimensional positive definite matrix, then (b t d) 2 (b t Bb)(d t B 1 d) with equality if and only if there exists a constant c R such that b = cb 1 d. Proof. The result is obvious if b =0ord = 0. For other cases we use that b t d = b t B 1/2 B 1/2 d =(B 1/2 b) t (B 1/2 d) and apply the previous result to (B 1/2 b)and(b 1/2 d). Result 6. Maximization Lemma Let B be a p-dimensional positive definite matrix and d R p a p-dimensional vector, then (x t d) 2 max x 0 x t Bx = dt B 1 d with the maximum attained when there exists a constant c 0 such that x = cb 1 d Proof. From the previous result, we have that (x t d) 2 (x t Bx)(d t B 1 d)

5 Because x 0andB positive definite, x t Bx > 0, which yields (x t d) 2 x t Bx dt B 1 d for all x 0. From the extended Cauchy-Schwarz inequality we know that the maximum is attained for x = cb 1 d. Result 7. Maximization of Quadratic forms Let B be a p-dimensional positive definite matrix with eigenvalues λ 1 λ 2 λ p > 0 and associated normalized eigenvectors e 1,...,e p.then x t Bx max x 0 x t x min x 0 x t Bx x t x = λ 1 (attained when x = e 1 ) = λ p (attained when x = e p ) More general, x t Bx max x e 1,...e k x t x = λ k+1 (attained when x = e k+1,k =1,...,p 1) Proof. We will proof the first result. B = P ΛP t and denote y = P t x.thenx 0 implies y 0and x t Bx x t x = yt Λy y t y = p i=1 λ iy 2 i p i=1 y2 i λ 1 Now take x = e 1, then y = P t e 1 = (1, 0,...,0) t such that y t Λy/y t y = λ 1. Remark Note that since x t Bx max x 0 x t x =max x =1 xt Bx the prevoius results shows that λ 1 is the maximal value and λ p is the smallest value of the quadratic form x t Bx on the unit sphere.

6 Result 8. Singular Value Decomposition Let A be a (m k) matrix. Then there exist an m m orthogonal matrix U, ak k orthogonal matrix V and an m k matrix Λ with entries (i, i) equaltoλ i 0fori =1,...,r =min(m, k) and all other entries zero such that r A = UΛV t = λ i u i vi t. The positive constants λ i are called the singular values of A. The (λ 2 i,u i) are the eigenvalue/eigenvector pairs of AA t with λ r+1 = = λ m =0ifm>kand then v i = λ 1 i A t u i. Alternatively, (λ 2 i,v i) are the eigenvalue/eigenvector pairs of A t A with λ r+1 = = λ k =0ifk>m i=1

RANDOM VECTORS Suppose that X =(X 1,...,X p ) t is a p-dimensional vector of random variables, also called a random vector. Each of the components of X is a univariate random variable X j (j =1,...,p) with its own marginal distribution having expected value µ j = E[X j ] and variance σ 2 j = E[(X j µ j ) 2 ]. The expected value of X is then defined as the vector of expected values of its components, that is E[X] =(E[X 1 ],...,E[X p ]) t =(µ 1,...,µ p ) t = µ. The population covariance matrix of X is defined as Cov[X] =E[(X µ)(x µ) t ]=Σ. That is, the diagonal elements of Σ equal E[(X j µ j ) 2 ]=σj 2. The off-diagonal elements of Σ equal E[(X j µ j )(X k µ k )] = Cov(X j,x k )(j k) =σ jk, the covariance between the variables X j and X k.notethatσ jj = σj 2. The population correlation between two variables X j and X k is defined as ρ jk = σ jk σ j σ k and measures the amount of linear association between the two variables. The population correlation matrix of X is then defined as 1 ρ 12... ρ 1p ρ 21 1... ρ 2p ρ =...... = V 1 ΣV 1 ρ p1 ρ p2... 1 with V =diag(σ 1,...,σ p ). It follows that Σ = VρV

2 Result 1. Linear combinations of random vectors Consider X a p-dimensional random vector and c R p then c t X is a one-dimensional random variable with E[c t X]=c t µ Var[c t X]=c t Σc In general, if C R q p then CX is a q-dimensional random vector with E[CX]=Cµ Cov[CX]=CΣC t