Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Similar documents
The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Chapter 11: Simple Linear Regression and Correlation

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Linear Approximation with Regularization and Moving Least Squares

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Statistical pattern recognition

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Composite Hypotheses testing

Introduction to Regression

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

APPENDIX A Some Linear Algebra

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Chapter 8 Indicator Variables

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 9: Statistical Inference and the Relationship between Two Variables

10-701/ Machine Learning, Fall 2005 Homework 3

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Economics 130. Lecture 4 Simple Linear Regression Continued

Perron Vectors of an Irreducible Nonnegative Interval Matrix

Lecture 6/7 (February 10/12, 2014) DIRAC EQUATION. The non-relativistic Schrödinger equation was obtained by noting that the Hamiltonian 2

x = , so that calculated

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Basic Business Statistics, 10/e

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

The Ordinary Least Squares (OLS) Estimator

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Feb 14: Spatial analysis of data fields

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Singular Value Decomposition: Theory and Applications

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Quantum Mechanics for Scientists and Engineers. David Miller

First Year Examination Department of Statistics, University of Florida

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

e i is a random error

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Chapter 3 Describing Data Using Numerical Measures

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Probability Theory (revisited)

Statistics for Economics & Business

STAT 511 FINAL EXAM NAME Spring 2001

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

2.3 Nilpotent endomorphisms

1 Matrix representations of canonical matrices

Statistics for Business and Economics

/ n ) are compared. The logic is: if the two

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Lecture 3 Stat102, Spring 2007

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics Chapter 4

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

e - c o m p a n i o n

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Eigenvalues of Random Graphs

STAT 3008 Applied Regression Analysis

MATH Sensitivity of Eigenvalue Problems

Lecture 3: Probability Distributions

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

CSCE 790S Background Results

Comparison of Regression Lines

Solutions Homework 4 March 5, 2018

Unified Subspace Analysis for Face Recognition

Indeterminate pin-jointed frames (trusses)

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

A property of the elementary symmetric functions

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

CHAPTER 14 GENERAL PERTURBATION THEORY

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Scatter Plot x

More metrics on cartesian products

Properties of Least Squares

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

The Geometry of Logit and Probit

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Linear Regression Analysis: Terminology and Notation

Report on Image warping

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Calculus of Variations Basics

Lecture 4: September 12

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Lecture 6: Introduction to Linear Regression

Transcription:

MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and A.J.Collns, Introducton to multvarate analyss. Chapman & Hall, 980 Krzanowsk, W.J. Prncples of multvarate analyss. Oxford, 000 Johnson, R.A.and D.W. Wchern Appled multvarate statstcal analyss. Prentce Hall. 007 (6 th Ed.) Rencher, Alvn D. Methods of multvarate analyss [e-book]. Wley, 00 ( nd Ed.) Tmm, Appled multvarate analyss [e-book]. Sprnger, 00.. Applcatons The need often arses n scence, medcne and socal scence (busness, management) to analyze data on p varables (Note that p = ) data are bvarate). Suppose we have a smple random sample of sze n. measurements on p varates.e. n p The sample conssts of n vectors of vectors (by conventon column vectors) x ; :::x n whch are nserted as rows x T ; :::xt n nto a (n p) data matrx X: When p = we can plot the rows n -dmensonal space, but n hgher dmensons, p > ; other technques are needed. Example Class caton of plants (taxonomy) Varables: (p = ) leaf sze (x ), colour of ower (x ), heght of plant (x ) Sample tems: n = 4 plants from a sngle speces Ams of analyss: ) understand wthn speces varablty ) classfy a new plant speces The data matrx may appear as follows Varables x x x 6. Plants 8. 8 (Items) 5. 0 9 4 6.4 0

Example Credt scorng Varables: personal data held by bank Items: sample of good/bad customers Ams of analyss: ) predct potental defaulters (CRM) ) rsk assessment for new applcant Example Image processng for e.g. qualty control Varables: "features" extracted from an mage Items: sampled from a producton lne Ams of analyss: ) quantfy "normal" varablty ) reject faulty (o spec caton) batches. Sample mean and covarance matrx We shall adopt the followng notaton: x (p ) a random (column) vector of observatons on p varables X (n p) a data matrx whose rows contan an ndependent random sample x T ; :::; xt n of observatons on x x (p ) S (p p) R (p p) sample mean vector x = n X n = x sample covarance matrx contanng the sample covarances de ned as s jk = n X n = (x j x j ) (x k x k ) sample correlaton matrx contanng the sample correlatons de ned as r jk = s jk p = s jk, say sjj s kk s j s k Notes. x j s de ned as the j th component of x (mean of varable j)

. the covarance matrx S s square, symmetrc ( S = S T ), and holds the sample varances s jj = s j = n X n = (x j x j ) along ts man dagonal. the dagonal elements of R are r jj = and, by the Cauchy-Schwartz nequalty, r jk for each j; k.4 Matrx-vector representatons Gven a (n p) data matrx X; de ne the n vector of one s = (; ; :::; ) T The row sums of X are obtaned by pre-multplyng X by T T X = np np x ; ::: ; x p = = = (nx ; :::; nx p ) = nx T Hence x = n XT (.) The centred data matrx X 0 s derved from X by subtractng the varable mean from each element of X..e. x 0 j = x j row of X. where H = matrx, S; as n x j : or, equvalently, by subtractng a constant vector x T from each X 0 = X x T = X n T X = I n n T X = HX (.) I n n T s known as the centrng matrx. We now de ne the sample covarance the centred sum of squares and products (SSP) matrx S = n X0T X 0 (.a) = n = x 0 x 0T (.b) where x 0 = x x denotes the th mean-corrected data pont

For any real p vector y we then have y T Sy = n yt X 0 T X 0 y = n zt z where z = X 0 y = n kzk 0 Hence from the de nton of a p.s.d. matrx, we have Proposton The sample covarance matrx S s postve sem-de nte (p.s.d.) Example Two measurements x ; x made at the same poston on each of cans of food, resulted n the followng X matrx: X = 6 4 4 5 Fnd the sample mean vector x and covarance matrx S. Soluton 7 5 X = 6 4 x = n X 0 = 6 4 4 5 X x = 7 0 5 = S = X 0T X 0 = " 4 7 5 = [x ; x ; x ] T " # " # " #! " # 4 + + = 5 8 Note also that S s bult up from ndvdual data ponts: S = " # h " # h + 0 0 # " # 4:67 0:67 = 0:67 :67 + " # h! 4

and R = " # 0:89 0:89.5 Measures of multvarate scatter It s useful to have a sngle number as a measure of spread n the data. Based on S we de ne two scalar quanttes The total varaton s tr (S) = trace (S) = The generalzed varance s In the above example tr (S) = 4 + 8 = 7: jsj = 4 : 8 = px s = sum of dagonal elements j= = sum of egenvalues of S jsj = product of egenvalues of S (.5).6 Random vectors We wll n ths course generally regard the data as an ndependent random sample from some contnuous populaton dstrbuton wth a probablty densty functon f (x) = f (x ; :::; x p ) (.6) Here x = (x ; :::; x p ) s regarded as a (row or column) vector of p random varables. Independence here refers to the rows of the data matrx. If two of the varables (columns) are for example heght and weght of ndvduals (rows), then knowng one ndvdual s weght says nothng about any measurement on another ndvdual. However the heght and weght for any ndvdual are correlated. For any regon D n p space of the varables Z Pr (x D) = D f (x) dx Mean vector 5

For any j the populaton mean of x j s gven by the p fold ntegral Z E (x j ) = j = x j f (x) dx where the regon of ntegraton s R p. In vector form 0 = E (x) = E B @ x x. 0 C A = B @. C A (.7) x p p Notce that as expectaton s a lnear operator E (Ax + b) = AE (x) +b = A + b Also for any random matrx X and conformable matrces A; B; C of constants we have E (AXB + C) = AE (X) B + C.e. constants are n a sense transparent as far as the operator E (:) s concerned (a property of lnear operators). Covarance matrx The covarance between x j ; x k s de ned as jk = Cov (x j ; x k ) = E x j j (xk k ) = E [x j x k ] j k When j = k we obtan the varance of x j h jj = E x j j The covarance matrx s a p p matrx p = ( j ) = p 6 7 4.. 5 p p pp The alternatve notatons V (x) = Cov (x) = are used. 6

In matrx form h = E (x ) (x ) T (.8a) = E xx T T (.8b) More generally we de ne the covarance between two random vectors x (p ) and y (q ) as the (p q) matrx T Cov (x; y) = E h(x x ) y y In partcular, note that h ) Cov (x; x) = E (x x ) (x x ) T = V (x) ) V (x + y) = V (x) + V (y) + Cov (x; y) + Cov (y; x) ) Cov (x + y; z) = Cov (x; z) + Cov (y; z) v) Cov (Ax; By) = ACov (x; y) B T (.9) = E xy T x T y () Important property of s a postve sem-de nte matrx. Proof Let a (p ) be a constant vector, then E a T x = a T E (x) = a T and V a T x h = E a T x a T h = a T E (x ) (x ) T a = a T a Snce varance s always a postve (non-negatve) quantty we nd a T a 0: From the de nton (see handout) s a postve sem-de nte (p.s.d.)matrx. Suppose we have an ndependent random sample x ; x ; :::; x n from a dstrbuton wth mean and covarance matrx : What s the relaton between (a) the sample and populaton means, (b) the sample and populaton covarance matrces? 7

Result We rst establsh the mean and covarance of the sample mean x. E (x) = (.0a) V (x) = n (.0b) Proof! E (x) = n E x = n = = E (x ) = 0 V (x) = Cov @ n x ; n = j= x j A = n (n) notng that Cov (x ; x ) = and Cov (x ; x j ) = 0 for 6= j: Hence V (x) = n Result We now examne S and derve an unbased estmator for : Proof E (S) = (n ) (.) n S = n = n (x x) (x x) T = x x T = xx T snce n P n = x x T = n x P n = xt = xx T : From (.8b) and (.0b) we see that E x x T = + T E xx T = n + T 8

hence E (S) = + T n + T Therefore an unbased estmate of s = n n S u = = n n S (.) n X 0T X 0.7 Lnear transformatons Let x = (x ; :::; x p ) T be a random p vector. It s often natural and useful to consder lnear combnatons of the components of x such as for example y = x + x or y = x + x x 4 : In general we consder a transformaton from the p component vector x to a q component vector y (q < p) gven by y = Ax + b (.) where A (q p) and b (q ) are constant matrces. Suppose that E (x) = and V (x) = the correspondng expressons for y are E (y) = A + b (.4a) V (y) = AA T (.4b) These follow from the lnearty of the expectaton operator E (y) = E (Ax + b) = AE (x) + E (b) = A + b = y say 9

and V (y) = E yy T T y y h = E (Ax + b) (Ax + b) T (A + b) (A + b) T = AE xx T A T + AE (x) b T + be x T A T +bb T A T A T Ab T ba T bb T = A E xx T T A T = AA T as requred.8 The Mahalanobs transformaton Gven a p varate random varable x wth E (x) = and V (x) =. A transformaton to a standardzed set of uncorrelated varates s gven by the Mahalanobs transformaton. Suppose s postve de nte.e. there s no exact lnear dependence n x. Then the nverse covarance matrx. has a "square root" gven by = V V T (.5) where = V V T s the spectral decomposton (see handout),.e. V s an orthogonal matrx V T V = V V T = I p whose columns are the egenvectors of and = dag ( ; :::; p ) are thecorrespondng egenvalues. The Mahalanobs transformaton takes the form z = (x ) (.6) Usng results (.4a) and (.4b) we can show that E (z) = 0 V (z) = I p Proof E (z) = E h (x ) = [E (x) ] = 0 V (z) = = I p 0

.8. Sample Mahalanobs transformaton Gven a data matrx X T = (x ; :::; x n ) ; the sample Mahalanobs transformaton z = S (x x) for = ; :::; n where S = S x s the sample covarance matrx n XT HX creates a transformed data matrx Z T = (z ; :::; z n ). Now the the data matrces are related by Z T = S X T H or Z = HXS (.7) where H s the centrng matrx. We may easly show (Ex.) that Z T s centred and that S z = I p :.8. Sample scalng transformaton A transformaton of the data that scales each varable to have mean zero and varance one but preserves the correlaton structure s gven by y = D (x x) for = ; :::; n where D = dag (s ; :::; s p ) : Now Y T = D X T H or Y = HXD (.8) Ex. Show that S y = R x :.8. A useful matrx dentty Let u; v be n vectors and form the n n matrx A = uv T : Then ji + uv T j = +v T u (.9) Proof Frst observe that A and I + A share a common set of egenvectors snce Av = v ) (I + A) v = ( + ) v: Moreover the egenvalues of I + A are + where are the egenvalues of A: Now uv T s a rank one matrx, therefore has a sngle nonzero egenvalue (see handout). Snce uv T u = u v T u = u where = v T u, the egenvalues of I + uv T are + ; ; :::; : The determnant of I + uv T s the product of the egenvalues, hence the result.