MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and A.J.Collns, Introducton to multvarate analyss. Chapman & Hall, 980 Krzanowsk, W.J. Prncples of multvarate analyss. Oxford, 000 Johnson, R.A.and D.W. Wchern Appled multvarate statstcal analyss. Prentce Hall. 007 (6 th Ed.) Rencher, Alvn D. Methods of multvarate analyss [e-book]. Wley, 00 ( nd Ed.) Tmm, Appled multvarate analyss [e-book]. Sprnger, 00.. Applcatons The need often arses n scence, medcne and socal scence (busness, management) to analyze data on p varables (Note that p = ) data are bvarate). Suppose we have a smple random sample of sze n. measurements on p varates.e. n p The sample conssts of n vectors of vectors (by conventon column vectors) x ; :::x n whch are nserted as rows x T ; :::xt n nto a (n p) data matrx X: When p = we can plot the rows n -dmensonal space, but n hgher dmensons, p > ; other technques are needed. Example Class caton of plants (taxonomy) Varables: (p = ) leaf sze (x ), colour of ower (x ), heght of plant (x ) Sample tems: n = 4 plants from a sngle speces Ams of analyss: ) understand wthn speces varablty ) classfy a new plant speces The data matrx may appear as follows Varables x x x 6. Plants 8. 8 (Items) 5. 0 9 4 6.4 0
Example Credt scorng Varables: personal data held by bank Items: sample of good/bad customers Ams of analyss: ) predct potental defaulters (CRM) ) rsk assessment for new applcant Example Image processng for e.g. qualty control Varables: "features" extracted from an mage Items: sampled from a producton lne Ams of analyss: ) quantfy "normal" varablty ) reject faulty (o spec caton) batches. Sample mean and covarance matrx We shall adopt the followng notaton: x (p ) a random (column) vector of observatons on p varables X (n p) a data matrx whose rows contan an ndependent random sample x T ; :::; xt n of observatons on x x (p ) S (p p) R (p p) sample mean vector x = n X n = x sample covarance matrx contanng the sample covarances de ned as s jk = n X n = (x j x j ) (x k x k ) sample correlaton matrx contanng the sample correlatons de ned as r jk = s jk p = s jk, say sjj s kk s j s k Notes. x j s de ned as the j th component of x (mean of varable j)
. the covarance matrx S s square, symmetrc ( S = S T ), and holds the sample varances s jj = s j = n X n = (x j x j ) along ts man dagonal. the dagonal elements of R are r jj = and, by the Cauchy-Schwartz nequalty, r jk for each j; k.4 Matrx-vector representatons Gven a (n p) data matrx X; de ne the n vector of one s = (; ; :::; ) T The row sums of X are obtaned by pre-multplyng X by T T X = np np x ; ::: ; x p = = = (nx ; :::; nx p ) = nx T Hence x = n XT (.) The centred data matrx X 0 s derved from X by subtractng the varable mean from each element of X..e. x 0 j = x j row of X. where H = matrx, S; as n x j : or, equvalently, by subtractng a constant vector x T from each X 0 = X x T = X n T X = I n n T X = HX (.) I n n T s known as the centrng matrx. We now de ne the sample covarance the centred sum of squares and products (SSP) matrx S = n X0T X 0 (.a) = n = x 0 x 0T (.b) where x 0 = x x denotes the th mean-corrected data pont
For any real p vector y we then have y T Sy = n yt X 0 T X 0 y = n zt z where z = X 0 y = n kzk 0 Hence from the de nton of a p.s.d. matrx, we have Proposton The sample covarance matrx S s postve sem-de nte (p.s.d.) Example Two measurements x ; x made at the same poston on each of cans of food, resulted n the followng X matrx: X = 6 4 4 5 Fnd the sample mean vector x and covarance matrx S. Soluton 7 5 X = 6 4 x = n X 0 = 6 4 4 5 X x = 7 0 5 = S = X 0T X 0 = " 4 7 5 = [x ; x ; x ] T " # " # " #! " # 4 + + = 5 8 Note also that S s bult up from ndvdual data ponts: S = " # h " # h + 0 0 # " # 4:67 0:67 = 0:67 :67 + " # h! 4
and R = " # 0:89 0:89.5 Measures of multvarate scatter It s useful to have a sngle number as a measure of spread n the data. Based on S we de ne two scalar quanttes The total varaton s tr (S) = trace (S) = The generalzed varance s In the above example tr (S) = 4 + 8 = 7: jsj = 4 : 8 = px s = sum of dagonal elements j= = sum of egenvalues of S jsj = product of egenvalues of S (.5).6 Random vectors We wll n ths course generally regard the data as an ndependent random sample from some contnuous populaton dstrbuton wth a probablty densty functon f (x) = f (x ; :::; x p ) (.6) Here x = (x ; :::; x p ) s regarded as a (row or column) vector of p random varables. Independence here refers to the rows of the data matrx. If two of the varables (columns) are for example heght and weght of ndvduals (rows), then knowng one ndvdual s weght says nothng about any measurement on another ndvdual. However the heght and weght for any ndvdual are correlated. For any regon D n p space of the varables Z Pr (x D) = D f (x) dx Mean vector 5
For any j the populaton mean of x j s gven by the p fold ntegral Z E (x j ) = j = x j f (x) dx where the regon of ntegraton s R p. In vector form 0 = E (x) = E B @ x x. 0 C A = B @. C A (.7) x p p Notce that as expectaton s a lnear operator E (Ax + b) = AE (x) +b = A + b Also for any random matrx X and conformable matrces A; B; C of constants we have E (AXB + C) = AE (X) B + C.e. constants are n a sense transparent as far as the operator E (:) s concerned (a property of lnear operators). Covarance matrx The covarance between x j ; x k s de ned as jk = Cov (x j ; x k ) = E x j j (xk k ) = E [x j x k ] j k When j = k we obtan the varance of x j h jj = E x j j The covarance matrx s a p p matrx p = ( j ) = p 6 7 4.. 5 p p pp The alternatve notatons V (x) = Cov (x) = are used. 6
In matrx form h = E (x ) (x ) T (.8a) = E xx T T (.8b) More generally we de ne the covarance between two random vectors x (p ) and y (q ) as the (p q) matrx T Cov (x; y) = E h(x x ) y y In partcular, note that h ) Cov (x; x) = E (x x ) (x x ) T = V (x) ) V (x + y) = V (x) + V (y) + Cov (x; y) + Cov (y; x) ) Cov (x + y; z) = Cov (x; z) + Cov (y; z) v) Cov (Ax; By) = ACov (x; y) B T (.9) = E xy T x T y () Important property of s a postve sem-de nte matrx. Proof Let a (p ) be a constant vector, then E a T x = a T E (x) = a T and V a T x h = E a T x a T h = a T E (x ) (x ) T a = a T a Snce varance s always a postve (non-negatve) quantty we nd a T a 0: From the de nton (see handout) s a postve sem-de nte (p.s.d.)matrx. Suppose we have an ndependent random sample x ; x ; :::; x n from a dstrbuton wth mean and covarance matrx : What s the relaton between (a) the sample and populaton means, (b) the sample and populaton covarance matrces? 7
Result We rst establsh the mean and covarance of the sample mean x. E (x) = (.0a) V (x) = n (.0b) Proof! E (x) = n E x = n = = E (x ) = 0 V (x) = Cov @ n x ; n = j= x j A = n (n) notng that Cov (x ; x ) = and Cov (x ; x j ) = 0 for 6= j: Hence V (x) = n Result We now examne S and derve an unbased estmator for : Proof E (S) = (n ) (.) n S = n = n (x x) (x x) T = x x T = xx T snce n P n = x x T = n x P n = xt = xx T : From (.8b) and (.0b) we see that E x x T = + T E xx T = n + T 8
hence E (S) = + T n + T Therefore an unbased estmate of s = n n S u = = n n S (.) n X 0T X 0.7 Lnear transformatons Let x = (x ; :::; x p ) T be a random p vector. It s often natural and useful to consder lnear combnatons of the components of x such as for example y = x + x or y = x + x x 4 : In general we consder a transformaton from the p component vector x to a q component vector y (q < p) gven by y = Ax + b (.) where A (q p) and b (q ) are constant matrces. Suppose that E (x) = and V (x) = the correspondng expressons for y are E (y) = A + b (.4a) V (y) = AA T (.4b) These follow from the lnearty of the expectaton operator E (y) = E (Ax + b) = AE (x) + E (b) = A + b = y say 9
and V (y) = E yy T T y y h = E (Ax + b) (Ax + b) T (A + b) (A + b) T = AE xx T A T + AE (x) b T + be x T A T +bb T A T A T Ab T ba T bb T = A E xx T T A T = AA T as requred.8 The Mahalanobs transformaton Gven a p varate random varable x wth E (x) = and V (x) =. A transformaton to a standardzed set of uncorrelated varates s gven by the Mahalanobs transformaton. Suppose s postve de nte.e. there s no exact lnear dependence n x. Then the nverse covarance matrx. has a "square root" gven by = V V T (.5) where = V V T s the spectral decomposton (see handout),.e. V s an orthogonal matrx V T V = V V T = I p whose columns are the egenvectors of and = dag ( ; :::; p ) are thecorrespondng egenvalues. The Mahalanobs transformaton takes the form z = (x ) (.6) Usng results (.4a) and (.4b) we can show that E (z) = 0 V (z) = I p Proof E (z) = E h (x ) = [E (x) ] = 0 V (z) = = I p 0
.8. Sample Mahalanobs transformaton Gven a data matrx X T = (x ; :::; x n ) ; the sample Mahalanobs transformaton z = S (x x) for = ; :::; n where S = S x s the sample covarance matrx n XT HX creates a transformed data matrx Z T = (z ; :::; z n ). Now the the data matrces are related by Z T = S X T H or Z = HXS (.7) where H s the centrng matrx. We may easly show (Ex.) that Z T s centred and that S z = I p :.8. Sample scalng transformaton A transformaton of the data that scales each varable to have mean zero and varance one but preserves the correlaton structure s gven by y = D (x x) for = ; :::; n where D = dag (s ; :::; s p ) : Now Y T = D X T H or Y = HXD (.8) Ex. Show that S y = R x :.8. A useful matrx dentty Let u; v be n vectors and form the n n matrx A = uv T : Then ji + uv T j = +v T u (.9) Proof Frst observe that A and I + A share a common set of egenvectors snce Av = v ) (I + A) v = ( + ) v: Moreover the egenvalues of I + A are + where are the egenvalues of A: Now uv T s a rank one matrx, therefore has a sngle nonzero egenvalue (see handout). Snce uv T u = u v T u = u where = v T u, the egenvalues of I + uv T are + ; ; :::; : The determnant of I + uv T s the product of the egenvalues, hence the result.