Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate Regresson 1 2 Multvarate Data Basc Multvarate Statstcs d nputs (a.k.a., features, attrbutes) nstances (a.k.a., observatons, examples) x1 1 x2 1... x 1 d x1 2 x2 2... x 2 d X =... x1 x2... xd (Later wll consder what happens f some gapes are allowed n observatons.) Mean: E[ x] = µ = [µ 1, µ 2,..., µ d ] T Covarance: σ j Cov(x, x j ) = E[(x µ )(x j µ j )] = E[x x j ] µ µ j Correlaton: Corr(x, x j ) ρ j = σ j σ σ j Covarance matrx Σ Cov( x) = E[( x µ)( x µ) T ] σ1 2 σ 12... σ 1d σ 21 σ2 2... σ 2d =... σ d1 σd 2... σd 2 3 4

Multvarate Parameter Estmaton t=1 Sample Mean m: m = xt, = 1... d Covarance Matrx: s j = Correlaton Matrx: R : r j = s j s s j t=1 (xt m )(xj t m j ) Imputaton What f certan nstances have mssng attrbutes? Throw out entre nstance? problem f the sample s small Imputaton: fll n the mssng value Mean mputaton: use the expected value Imputaton by regresson: predct based on other attrbutes 5 6 Multvarate ormal Dstrbuton Slcng p( x) = x d ( µ, Σ) [ 1 (2π) d/2 exp 1 ] Σ 1/2 2 ( x µ)t Σ 1 ( x µ) 7 Any slce (projecton) along a sngle drecton w s normal: w T x ( w T µ, w T Σ w) Any projecton onto a lnearly transformed set of axes of dmenton d s MV ormal 8

Effects of Covarance ormalzed Dstance z = x µ σ can be seen as a dstance from µ to x n normalzed σ-sze unts. Generalzng to d dmensons gves the Mahalanobs dstance ( x µ) T Σ 1 ( x µ) If x has larger varance than x j, x gets lower weght n ths dstance. If x and x j are hghly correlated, they get less weght than two less correlated varables. A small Σ ndcates that the samples are close to µ and/or the varables are hghly correlated If Σ s zero, then some of the varables are constant or there s a lnear dependency among varables. Ether way, reduce the dmensonalty by removng unneeded varables 9 10 Specal Cases of Mahalanobs Dstance Outlne d( x) = ( x µ) T Σ 1 ( x µ) 1 Multvarate Data If the x are ndependent, off-dagonal elements of Σ are zero d( x) = d ( x µ Σ =0 If the varances are also equal, reduces to Eucldean dstance ) 2 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate Regresson 11 12

Multvarate Classfcaton If p( x C ) ( µ, Σ ), [ 1 p( x C ) = (2π) d/2 exp 1 ] Σ 1/2 2 ( x µ ) T Σ 1 ( x µ ) Dscrmnants are g ( x) = log p( x C ) + log P(C ) Estmate as = d 2 log 2π 1 2 log Σ 1 2 ( x µ ) T Σ 1 ( x µ ) + log P(C ) Quadratc Dscrmnant g ( x) = d 2 log 2π 1 2 log S 1 2 ( x m ) T S 1 ( x = m ) + log ˆP(C ) 1 2 log S 1 2 ( x m ) T S 1 ( x = m ) + log ˆP(C ) = 1 2 log S 1 ( ) x T S 1 x 2 x T S 1 m + m T S 1 m 2 + log ˆP(C ) = x T W x + w T x + w 0 Ths s a quadratc n x. g ( x) = d 2 log 2π 1 2 log S 1 2 ( x m ) T S 1 ( x m )+log ˆP(C ) 13 14 Smplfcaton: Shared covarance Share a common sample covarance S S = ˆP(C )S Dscrmnant smplfes to lkelhoods g ( x) = 1 2 ( x m ) T S 1 ( x m ) + log ˆP(C ) Although ths functon s quadratc n x, t yelds a lnear dscrmnant because the x t x quadratc term s dentcal across all. posteror for C 15 16

Lnear Dscrmnant Further Smplfcaton: Independence If we share a common sample covarance S and the varables are ndependent, then the off-dagonal elements of S are zero. Dscrmnant smplfes to ( ) d x t 2 j m j + log ˆP(C ) g ( x) = 1 2 j=1 s j Ths s the ave Bayes Classfer. Each varable s an ndependent Gaussan Dstance measured n standard devaton unts 17 18 Dagonal S Further Smplfcaton: Equal Varances If varances are also equal, Dscrmnant smplfes to ( ) d x t 2 j m j + log ˆP(C ) g ( x) = 1 2 j=1 s Ths s the nearest mean classfer. 19 20

Model Selecton Assumpton Covarance matrx # Parameters Equal varances S = S = s 2 I 1 Independent S = S, s j = 0 d Shared Covarance S = S d(d + 1)/2 Dfferent Covarances S Kd(d + 1)/2 Bnary Features x j {0, 1} p j = p(x j = 1 C ) If the x j are ndependent (ave Bayes) p( x C ) = d j= p x j j (1 p j) (1 x j ) The dscrmnant n lnear g ( x) = j [x j log ˆp j + (1 x j ) log (1 ˆp j )] + log ˆP(C ) 21 22 Dscrete Features x j {v 1, v 2,..., v nj } p jk = p(z jk = 1 C ) = p(x j = v k C ) If the x j are ndependent g ( x) = j p( x C ) = n d j j= k= p z jk jk z jk log ˆp jk + log ˆP(C ) k 23 Multvarate Regresson Multvarate lnear model Error: E( w X ) = 1 2 r t = g( x t w 0, w 1,..., w d ) + ε w 0 + w 1 x t 1 + w 2 x t 2 +... + w d x t d [ r t (w 0 + w 1 x1 t + w 2 x2 t +... + w d xd t )] t 1 x 1 1 x 1 2... x 1 k 1 x 2 1 x 2 2... x 2 k D =.... 1 x1 x2... xk (D T D) w = D T r w = (D T D) 1 D T r r = r 1 r 2. r 24

Multvarate Regresson (D T D) w = D T r w = (D T D) 1 D T r Soluton s same as for Unvarate polynomal regresson, but usng the dstnct varables nstead of dfferent powers. 25