Some multivariate methods - PDF Free Download

/7/ Outline Some multivrite methods VERIE CRDENS, PH.D. SSOCIE DJUNC PROFESSOR DEPREN OF RDIOOGY ND BIOEDIC IGING Useful liner lgebr Principl Components nlysis (PC) Independent Components nlysis (IC) Joint IC Prllel IC Prtil est Squres (PS) Cnonicl Correltion nlysis (CC) Ridge regression Vector vector is defined s n ordered rry of numbers, of dimensions p, Below is vector c of dimensions c Nottion: vectors re typiclly denoted by lowercse bold letters tri mtri is defined s n ordered rry of numbers, of dimensions p, q (p rows, q columns) Below is mtri of dimensions Nottion: mtrices re typiclly denoted by uppercse bold letters

/7/ ore mtri nottion You cn think of mtri s collection of column vectors of dimension p, c c c You cn think of mtri s collection of row vectors of dimension, q r r r ore mtri nottion he elements of mtri re denoted by i,j, where i refers to the row position nd j to the column position he elements of vector re denoted by c i where i refers to the row position c c c c ypes of mtrices 8 7 6 4 4 9 8 7 6 6 4 rectngulr p q squre digonl ij, i j 6 4 squre p q ij, i j digonl ii symmetric ij ji rnspose of mtri/vector he mtri is composed of elements ij rnspose of, denoted or hs elements ji 6 4 6 4 [ ] v v v

/7/ Vector/tri ddition nd Sclr ultipliction If vectors nd mtrices hve sme number of rows/columns, they cn be dded (or subtrcted) element by element. Vectors nd mtrices cn be multiplied by sclr l t b l t element by element. + +, tri multipliction iner Combintion of Columns For B, ech column of B genertes column of the product B Ech column of B contins set of liner weights hese liner weights re pplied to the columns of to produce single column of numbers + + 9 8 B + 9 4 8 8 9 9 8 4 B tri multipliction iner Combintion of Rows For B, ech row of genertes row of the product B Ech row of contins set of liner weights hese liner weights re pplied to the rows of B to produce single row vector of numbers

/7/ Determinnt of squre mtri B [ ] + [ ] [ 9 8] [ ] + [ ] [ 8 4 9] B 8 9 8 4 9 he determinnt of mtri, denoted, is sclr function tht is zero if n only if mtri is of deficient rnk. he rnk is the number of linerly independent rows nd columns of. linerly independent column is one tht is not liner combintion of other columns in the mtri If ny columns of re liner combintion of some other columns of, then is not full rnk. Eigenvlues nd eigenvectors Eigenvectors of symmetric mtri For squre mtri, sclr c nd non-zero vector v re n eigenvlue nd ssocited eigenvector if nd only if they stisfy the eqution, v cv For symmetric, for distinct eigenvlues c i, c j with ssocited eigenvectors v i, v j, v i v j v i nd v j re orthogonl v i nd v j re linerly independent Interprettion: multipliction of n eigenvector by the mtri does not chnge the direction, but only the mgnitude of the originl vector. he eigenvlue is the fctor by which the eigenvector chnges when multiplied by the mtri. 4

/7/ Eigendecomposition of symmetric mtri et be rel nd symmetric. here eists mtri Q such tht QΛQ where Q is the squre n n mtri whose i th column is the bsis eigenvector q i of nd Λ is the digonl mtri whose digonl elements re the corresponding eigenvlues. tri pproimtion If the eigenvectors nd eigenvlues of re ordered in the mtrices Λ nd Q in descending order, such tht the first element in Λ is the lrgest eigenvlue of, nd the first column in Q is its corresponding eigenvector. Define Q* s the first m columns of Q, nd D* s n m m digonl mtri with the corresponding m eigenvlues s digonl entries. hen * * * Q D Q i.e., mtri of rnk m tht is the best rnk m pproimtion of. Singulr Vlue Decomposition Our imging problem tri fctoriztion good for rel or comple mtri et be n m n mtri, the SVD tkes the form * UV U is n m m rel or comple unitry mtri V* is n n n rel or comple unitry mtri he digonl entries ij of re the singulr vlues If is positive semi-definite, then the SVD if n eigendecomposition of Often used to compute pseudoinverse of, or s low rnk pproimtion of Given rectngulr nd the SVD of, the following holds V( )V U( )U Sttisticl nlysis of medicl imges common Underdetermined problem housnds to millions of sptil vribles (voels) Usully < observtions (subjects) ypicl solution: divide id into subproblems bl Ech subproblem reltes single voel to clinicl vrible Known s voel-wise, univrite, or pointwise regression pproch populrized by SP (Friston, 99) Dependencies between sptil vribles neglected!!

/7/ Emple problem: observtions, 4 sptil vribles Solutions? β β β 4 4 4 β 4 4 p ; p ; p ; coefficient mp -sttistic mp outcome y outcome y outcome y y β 4 β y 4β y 4 β 4 Reduce dimensionlity PC IC CC dd constrint t to sum of squres Ridge regression SSO techniques ethods to reduce dimensionlity PC IC JOIN IC PRE IC PS CC PC: Principl Components nlysis Procedure to convert set of observtions of possibly correlted vribles into set of uncorrelted vribles clled principl components We know voels in our imges re sptilly correlted PC ims to trnsform millions of vribles (voels) to few Projection of dt: high to low dimension Emple: n p dt, subjects, imges with voels X,,,,,, O,,, PC y ) y Y y,,, y y y,,, O y, y, y, 6

/7/ Wht re principl components? Principl components re liner combintions of the observed vribles y b + b + + b where is column vector of i the originl dt mtri X. In our imging emples, liner combintions of the dt columns (voels) he coefficients of these principl components re chosen to meet three criteri Wht re the three criteri? criteri of Principl Components here re ectly p principl components (PCs), ech being liner combintion of the observed vribles p is number of vribles (columns) in originl dt he PCs re mutully orthogonl ogo (i.e., perpendiculr nd uncorrelted) he components re etrcted in order of decresing vrince he first PC eplins s much of the vribility in the full dt set s possible he second PC eplins s much vribility s possible fter the vribility from the first PC hs been removed, etc. Usul steps in PC Eigenfces emple Hve t lest two vribles (usully you think tht these vribles re inter-relted) Generte correltion or vrince-covrince mtri en center dt mtri X (subtrct men from ech vrible) hen (/(n-))x X is the vrince-covrince mtri Obtin eigenvlues nd eigenvectors Use SVD or other mtri decomposition he first eigenvector is the direction eplining the most vrince in the dt mtri X he first eigenvlue is the mount of vrince eplined Select subset of the eigenvectors (principl components) Sum eigenvlues until threshold (9%?) is reched Generte PC scores Reduced spce Cn be used in subsequent regression/visuliztion Fce recognition, efficient storge Prepre trining set of fce imges sme resolution, lighting, normlized so tht fetures lign Store trining set in mtri F, where ech row is trining fce Imge, row Imge, row Imge, row n Imge, row Imge, row Imge, row n F Imge, row Imge, row Imge, row n Imge, row Imge, row Imge, row n Imge m, row Imge m, row Imge m, row n 7

/7/ rining fces Eigenfces, cont. PC of F he principl components, v i, of F re eigenfces he principl component scores re obtined by FV, where V is the mtri of principl components he score is the contribution of ech principl component to the originl fce Cn store the eigenfces nd scores, insted of entire imge If imge hs, voels nd 4 eigenfces describe 98% of the vribility in fces, then for ech new fce need only record 4 scores (not, voels) Principl component regression Independent components nlysis Principl components cn be used for dt reduction prior to regression Y β,, O,,, In PC, the PCs re orthogonl (uncorrelted) In IC, the components re defined to be mimlly sttisticlly independent stronger requirement Independence: knowing gives you no informtion bout y If fdt re Gussin then uncorrelted dimplies independence d Do PC on, then do regression on the scores Y PC PC scores scores β pc pc pc, pc pc pc, O pc, pc, pc, Uncorrelted but not independent Vr Vr 8

/7/ IC, cont. IC tends to do better for etrcting useful ptterns in sets of imges, becuse high-dimensionl dtsets typiclly hve strong non-gussinity Computtionlly more chllenging (not simple mtri ti decomposition) No inherent order of components Components my lso be scled IC lso known s BSS: blind source seprtion Forml sttement of problem N independent sources Z (m n) iing mtri. (n n) Produces set of observtions X (m n) X Z Wnt to demi observtions X into Y WX Y Z W IC is trying to estimte W PC vs. IC PC solution: PCs eplin m vrince 9

/7/ IC solution: projecting dt onto IC nd IC gives bck two sinusoids Principl nd Independent components IC eploits the non-gussinity of source signls IC: he bsic ide esures of sttisticl independence ssume underlying source signls (Z ) re independent. ssume liner miing mtri ( ) X Z in order to find Y ( Z ), find W, ( - )... Y WX Requires mesure of sttisticl independence which we mimize between ech of the components Non-gussinity (mimize kurtosis) utul informtion (minimize between components) Entropy mimize rndomness Entropy-mimize rndomness imum log likelihood How? Initilise W nd itertively updte W to minimise or mimise cost function tht mesures the (sttisticl) independence between the columns of the Y Cnnot solve using mtri decomposition

/7/ Joint IC Prllel IC Vrition on IC to look for components tht pper jointly cross fetures or modlities D imge Discover independent components from two modlities, in ddition to the reltionship between them D imge, control F imge, control F imge X D imge, control F imge, control Observed dt D imge, ptient n D imge, ptient Fetures/imges cross subjects X W Y Joint independent components Component weights/profile Sources dd constrint to independence m { H ( Y ) + H ( Y ) + Corr(, } ) imizing the entropy of the sources in ech modlity nd the correltion between columns of the miing mtri Prtil lest squres (PS) Relted to PC regression YXβ PC of X, keep some PCs nd predict Y PCs eplin vribility in X only hese components my not eplin Y t ll PS finds components of X tht re lso relevnt to Y ltent vectors re components tht simultneously decompose X nd Y tent vectors eplin the covrince between X nd Y Find two sets of weights to crete liner combintions of columns of X nd Y to mimize covrince

/7/ PS steps PS Emple Compute X Y-covrince of X nd Y Do SVD of X Y cn clculte the first ltent vector nd lodings from this Subtrct or prtil out the effect of the first ltent vector from X nd Y to crete X nd Y Repet until X is null 6 cognitive mesures on ptients with mild cognitive impirment -weighted imges normlized to tls PS between cognitive mesures nd moment Similr to PC, cn choose subset of ltent vectors to pproimte the prediction of Y nd chieve substntil dt reduction. First ltent vrible tent vrible scores Regions of reltive contrction (blue) nd epnsion (red) relted to V. Scores cn be computed for ech subject, perhps on reduced number of ltent vribles, nd used in regression nlysis.

/7/ Cnonicl Correltion nlysis CC, cont. Investigte the reltionship between two sets of vribles X D imge fetures, control D imge fetures, control Y F imge fetures, control F imge fetures, control Find pirs of liner combintions of vribles tht re uncorrelted hese pirs re the cnonicl vrites he dt: set of p independent vribles X, X,, X p nd q dependent d vribles Y, Y,, Y q, mesured on smple of N objects, from which we cn derive (p + q) X (p + q) correltion mtri. D imge fetures, ptient n F imge fetures, ptient n D imge D imge, control F imge, control D imge F imge F imge D imge, control F imge, control D imge, ptient n F imge, ptient n r rp r rq CC correltion mtri Within set (X) correltion between set (X,Y) correltion r p r r q O O r pp rp rpq r p r r q O O r qp rq rqq XX YX XY YY Within set (Y) correltion Wht re cnonicl vrites? Cnonicl vrites re the eigenvectors of the corresponding correltion mtri Orthogonl Spn vribility in either X or Y X ξ U ( U ) ξ U ( U ) X X X Y X XX YX Y XY YY Y ξ V ξ ( V ) Y V ( V )

/7/ Estimting cnonicl vrites Estimting cnonicl vrites, cont. he first cnonicl vrite is obtined by finding coefficients of the liner functions p U j X j j q V b Y j j j which mimizes the correltion between U nd V { r( U, )} r ( U, V ) m V he second cnonicl vrite is obtined by finding coefficients of the liner functions p X j j U j q V which mimizes the correltion between U nd V r U, V ) m{ r( U, )} Subject to the following constrints r( U, U ) r( V, V ) j b jy j ( V r( U, V ) r( U, V ) Clculting cnonicl vrites he end result is n eigenvector of: b is n eigenvector of: XX XY YY YY YX XX he squred cnonicl correltion r i is the corresponding eigenvlue YX XY set of r min(p,q) cnonicl vrites, one for the dependent vrible set {V}, the other for the independent vrible set {U} set of r cnonicl correltions C r(u,v) ech representing the correltion between pirs of cnonicl vrites U U High first cnonicl correltion X V ow second cnonicl correltion V 4

/7/ Significnce testing Interprettion: cnonicl coefficients Ech CV (cnonicl vrite) is tested in hierrchicl fshion by first testing significnce of ll CVs If ll CVs combined NS, then no CV is significnt If ll CVs combined re significnt, then remove first CV, reclculte test sttistic nd test Continue until test sttistic NS Emine stndrdized coefficients of cnonicl vrites Inference: vribles with lrge (in bsolute vlue) coefficients re most importnt U.9X.9 X +.48X +.9X 4 U minly contrst between X nd X 4 on the one hnd, nd X on the other Interprettion: cnonicl lodings Considertions Emine correltions of originl vribles with cnonicl vrites Inference: vribles with lrge (in bsolute vlue) correltions re most importnt Cnonicl vrite Vrible U U X -.9. X -.77 -. X.9 -. X 4.9 -. he vrince of U nd V will be influenced by the scling dopted, but the cnonicl correltions will be unffected he rtio of smple size to totl number of vribles should be lrge (> for cnonicl vrites, > 4 for two cnonicl vrites) X 4 is not relted to U

/7/ Ridge regression Constrined sum of squres RIDGE REGRESSION SSO ECHNIQUES Well estblished, widespred method Hoerl nd Kennrd, 97; rqurdt, 97 pplictions to neuroimging (Vldes-Sos, ) Regulrizes underdetermined problem by dding constrint to prmeter sum-of-squres Generliztion of pointwise regression Ordinry lest squres y Xβ+ ε, min ε ε min y-xβ β - β (X X) X y y: n observtions, subjects X: n p independent vribles Solution vlid if X X full-rnk β β: p regression coefficients ε: n residuls Ridge regression solution ( y-xβ λ β ) min +, where λ β - β ridge (X X+ λi) X y λ shrinks bsolute size of coefficients β Shrinkge introduces bis s λ grows, coefficients driven to If p>n, X X will never be full rnk! 6

/7/ Ridge trce Computtion X U % D % V % nd X U D V n n n p p p n k k k k p U, V orthonorml columns k is rnk of X, nd number of nonzero d D digonl with elements d i i - β ridge V(D + λi) R y RUD (n k) (D +λi) k k digonl, esily inverted Compleity order pk insted of p s λ Ridge Z βi( λ) zi ( λ) σ β ( λ) σ i ε ω i X y ω i X Xω i z i (λ) is i th z-sttistic ω I is i th column of Ω(X X+λI) - i y ( i i) i lim zi ( ) λ y λ λ σ σ ( i i) ε i i ε λ his is equivlent to the pointwise estimtion of z i! 7

/7/ ppliction to Deformtion orphometry Sptil correltion structure 8 cognitively impired nd 7 control subjects Verbl memory ssessed t bseline nd yr Bseline RI deformtion mps creted using B- spline-bsed nonliner registrtion (Studholme 4) Pointwise regression Dependent vribles were deformtion mps Independent vribles were chnge in verbl memory Convenient nd computtionlly efficient construction Ridge regression Dependent vribles were chnge in verbl memory Sptil vribles from mps were independent, p86,68 -.6.6 Deformtion ssocited with delyed memory SSO est bsolute Shrinkge nd Selection Opertor Definition It s coefficients shrunken version of the ordinry est Squre Estimte, by minimizing the Residul Sum of Squres subjecting to the constrint tht the sum of the bsolute vlue of the coefficients should be no greter thn constnt. -.96 8

/7/ Solving SSO No solution by decomposition or simple liner lgebr ust use itertive methods 9