Some multivariate methods

Similar documents
Multivariate problems and matrix algebra

The Algebra (al-jabr) of Matrices

SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Non-Linear & Logistic Regression

A Matrix Algebra Primer

Chapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Things to Memorize: A Partial List. January 27, 2017

1 Linear Least Squares

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 2

Matrix Eigenvalues and Eigenvectors September 13, 2017

LINEAR ALGEBRA AND MATRICES. n ij. is called the main diagonal or principal diagonal of A. A column vector is a matrix that has only one column.

SCHOOL OF ENGINEERING & BUILT ENVIRONMENT

Elements of Matrix Algebra

Partial Derivatives. Limits. For a single variable function f (x), the limit lim

INTRODUCTION TO LINEAR ALGEBRA

Geometric Sequences. Geometric Sequence a sequence whose consecutive terms have a common ratio.

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Tests for the Ratio of Two Poisson Rates

MATRICES AND VECTORS SPACE

Matrix Algebra. Matrix Addition, Scalar Multiplication and Transposition. Linear Algebra I 24

SOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS

Lecture Solution of a System of Linear Equation

Elementary Linear Algebra

Operations with Polynomials

Numerical Linear Algebra Assignment 008

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

Student Activity 3: Single Factor ANOVA

The Islamic University of Gaza Faculty of Engineering Civil Engineering Department. Numerical Analysis ECIV Chapter 11

Chapter 14. Matrix Representations of Linear Transformations

Module 6: LINEAR TRANSFORMATIONS

ECON 331 Lecture Notes: Ch 4 and Ch 5

CSCI 5525 Machine Learning

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

Matrices and Determinants

Algebra Of Matrices & Determinants

Chapter 2. Vectors. 2.1 Vectors Scalars and Vectors

Math 270A: Numerical Linear Algebra

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

New Expansion and Infinite Series

Chapter 2. Determinants

THE DISCRIMINANT & ITS APPLICATIONS

Chapter 5 : Continuous Random Variables

5 Probability densities

Is there an easy way to find examples of such triples? Why yes! Just look at an ordinary multiplication table to find them!

Lecture 3 Gaussian Probability Distribution

September 13 Homework Solutions

HW3, Math 307. CSUF. Spring 2007.

The steps of the hypothesis test

Matrices, Moments and Quadrature, cont d

Chapter 3 Polynomials

STURM-LIOUVILLE THEORY, VARIATIONAL APPROACH

8 Laplace s Method and Local Limit Theorems

Operations with Matrices

Introduction to Group Theory

Chapter 5 Determinants

Continuous Random Variables

Pre-Session Review. Part 1: Basic Algebra; Linear Functions and Graphs

Matrices. Elementary Matrix Theory. Definition of a Matrix. Matrix Elements:

CHAPTER 2d. MATRICES

Contents. Outline. Structured Rank Matrices Lecture 2: The theorem Proofs Examples related to structured ranks References. Structure Transport

EE263 homework 8 solutions

fractions Let s Learn to

Lecture Note 9: Orthogonal Reduction

Theoretical foundations of Gaussian quadrature

STUDY GUIDE FOR BASIC EXAM

1.9 C 2 inner variations

Estimation on Monotone Partial Functional Linear Regression

Reinforcement learning II

Visual motion. Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys

13: Diffusion in 2 Energy Groups

New data structures to reduce data size and search time

Introduction to Determinants. Remarks. Remarks. The determinant applies in the case of square matrices

Numerical integration

Partial Differential Equations

Matching patterns of line segments by eigenvector decomposition

Linearity, linear operators, and self adjoint eigenvalue problems

Optimization Lecture 1 Review of Differential Calculus for Functions of Single Variable.

Generalized Fano and non-fano networks

Lecture 2e Orthogonal Complement (pages )

11-755/ Machine Learning for Signal Processing. Algebra. Class August Instructor: Bhiksha Raj

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)

approaches as n becomes larger and larger. Since e > 1, the graph of the natural exponential function is as below

Quantum Physics II (8.05) Fall 2013 Assignment 2

MATHEMATICS AND STATISTICS 1.2

A matrix is a set of numbers or symbols arranged in a square or rectangular array of m rows and n columns as

4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX. be a real symmetric matrix. ; (where we choose θ π for.

The Regulated and Riemann Integrals

In Section 5.3 we considered initial value problems for the linear second order equation. y.a/ C ˇy 0.a/ D k 1 (13.1.4)

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

STRAND B: NUMBER THEORY

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon

Consequently, the temperature must be the same at each point in the cross section at x. Let:

LINEAR ALGEBRA APPLIED

M344 - ADVANCED ENGINEERING MATHEMATICS

Transcription:

/7/ Outline Some multivrite methods VERIE CRDENS, PH.D. SSOCIE DJUNC PROFESSOR DEPREN OF RDIOOGY ND BIOEDIC IGING Useful liner lgebr Principl Components nlysis (PC) Independent Components nlysis (IC) Joint IC Prllel IC Prtil est Squres (PS) Cnonicl Correltion nlysis (CC) Ridge regression Vector vector is defined s n ordered rry of numbers, of dimensions p, Below is vector c of dimensions c Nottion: vectors re typiclly denoted by lowercse bold letters tri mtri is defined s n ordered rry of numbers, of dimensions p, q (p rows, q columns) Below is mtri of dimensions Nottion: mtrices re typiclly denoted by uppercse bold letters

/7/ ore mtri nottion You cn think of mtri s collection of column vectors of dimension p, c c c You cn think of mtri s collection of row vectors of dimension, q r r r ore mtri nottion he elements of mtri re denoted by i,j, where i refers to the row position nd j to the column position he elements of vector re denoted by c i where i refers to the row position c c c c ypes of mtrices 8 7 6 4 4 9 8 7 6 6 4 rectngulr p q squre digonl ij, i j 6 4 squre p q ij, i j digonl ii symmetric ij ji rnspose of mtri/vector he mtri is composed of elements ij rnspose of, denoted or hs elements ji 6 4 6 4 [ ] v v v

/7/ Vector/tri ddition nd Sclr ultipliction If vectors nd mtrices hve sme number of rows/columns, they cn be dded (or subtrcted) element by element. Vectors nd mtrices cn be multiplied by sclr l t b l t element by element. + +, tri multipliction iner Combintion of Columns For B, ech column of B genertes column of the product B Ech column of B contins set of liner weights hese liner weights re pplied to the columns of to produce single column of numbers + + 9 8 B + 9 4 8 8 9 9 8 4 B tri multipliction iner Combintion of Rows For B, ech row of genertes row of the product B Ech row of contins set of liner weights hese liner weights re pplied to the rows of B to produce single row vector of numbers

/7/ Determinnt of squre mtri B [ ] + [ ] [ 9 8] [ ] + [ ] [ 8 4 9] B 8 9 8 4 9 he determinnt of mtri, denoted, is sclr function tht is zero if n only if mtri is of deficient rnk. he rnk is the number of linerly independent rows nd columns of. linerly independent column is one tht is not liner combintion of other columns in the mtri If ny columns of re liner combintion of some other columns of, then is not full rnk. Eigenvlues nd eigenvectors Eigenvectors of symmetric mtri For squre mtri, sclr c nd non-zero vector v re n eigenvlue nd ssocited eigenvector if nd only if they stisfy the eqution, v cv For symmetric, for distinct eigenvlues c i, c j with ssocited eigenvectors v i, v j, v i v j v i nd v j re orthogonl v i nd v j re linerly independent Interprettion: multipliction of n eigenvector by the mtri does not chnge the direction, but only the mgnitude of the originl vector. he eigenvlue is the fctor by which the eigenvector chnges when multiplied by the mtri. 4

/7/ Eigendecomposition of symmetric mtri et be rel nd symmetric. here eists mtri Q such tht QΛQ where Q is the squre n n mtri whose i th column is the bsis eigenvector q i of nd Λ is the digonl mtri whose digonl elements re the corresponding eigenvlues. tri pproimtion If the eigenvectors nd eigenvlues of re ordered in the mtrices Λ nd Q in descending order, such tht the first element in Λ is the lrgest eigenvlue of, nd the first column in Q is its corresponding eigenvector. Define Q* s the first m columns of Q, nd D* s n m m digonl mtri with the corresponding m eigenvlues s digonl entries. hen * * * Q D Q i.e., mtri of rnk m tht is the best rnk m pproimtion of. Singulr Vlue Decomposition Our imging problem tri fctoriztion good for rel or comple mtri et be n m n mtri, the SVD tkes the form * UV U is n m m rel or comple unitry mtri V* is n n n rel or comple unitry mtri he digonl entries ij of re the singulr vlues If is positive semi-definite, then the SVD if n eigendecomposition of Often used to compute pseudoinverse of, or s low rnk pproimtion of Given rectngulr nd the SVD of, the following holds V( )V U( )U Sttisticl nlysis of medicl imges common Underdetermined problem housnds to millions of sptil vribles (voels) Usully < observtions (subjects) ypicl solution: divide id into subproblems bl Ech subproblem reltes single voel to clinicl vrible Known s voel-wise, univrite, or pointwise regression pproch populrized by SP (Friston, 99) Dependencies between sptil vribles neglected!!

/7/ Emple problem: observtions, 4 sptil vribles Solutions? β β β 4 4 4 β 4 4 p ; p ; p ; coefficient mp -sttistic mp outcome y outcome y outcome y y β 4 β y 4β y 4 β 4 Reduce dimensionlity PC IC CC dd constrint t to sum of squres Ridge regression SSO techniques ethods to reduce dimensionlity PC IC JOIN IC PRE IC PS CC PC: Principl Components nlysis Procedure to convert set of observtions of possibly correlted vribles into set of uncorrelted vribles clled principl components We know voels in our imges re sptilly correlted PC ims to trnsform millions of vribles (voels) to few Projection of dt: high to low dimension Emple: n p dt, subjects, imges with voels X,,,,,, O,,, PC y ) y Y y,,, y y y,,, O y, y, y, 6

/7/ Wht re principl components? Principl components re liner combintions of the observed vribles y b + b + + b where is column vector of i the originl dt mtri X. In our imging emples, liner combintions of the dt columns (voels) he coefficients of these principl components re chosen to meet three criteri Wht re the three criteri? criteri of Principl Components here re ectly p principl components (PCs), ech being liner combintion of the observed vribles p is number of vribles (columns) in originl dt he PCs re mutully orthogonl ogo (i.e., perpendiculr nd uncorrelted) he components re etrcted in order of decresing vrince he first PC eplins s much of the vribility in the full dt set s possible he second PC eplins s much vribility s possible fter the vribility from the first PC hs been removed, etc. Usul steps in PC Eigenfces emple Hve t lest two vribles (usully you think tht these vribles re inter-relted) Generte correltion or vrince-covrince mtri en center dt mtri X (subtrct men from ech vrible) hen (/(n-))x X is the vrince-covrince mtri Obtin eigenvlues nd eigenvectors Use SVD or other mtri decomposition he first eigenvector is the direction eplining the most vrince in the dt mtri X he first eigenvlue is the mount of vrince eplined Select subset of the eigenvectors (principl components) Sum eigenvlues until threshold (9%?) is reched Generte PC scores Reduced spce Cn be used in subsequent regression/visuliztion Fce recognition, efficient storge Prepre trining set of fce imges sme resolution, lighting, normlized so tht fetures lign Store trining set in mtri F, where ech row is trining fce Imge, row Imge, row Imge, row n Imge, row Imge, row Imge, row n F Imge, row Imge, row Imge, row n Imge, row Imge, row Imge, row n Imge m, row Imge m, row Imge m, row n 7

/7/ rining fces Eigenfces, cont. PC of F he principl components, v i, of F re eigenfces he principl component scores re obtined by FV, where V is the mtri of principl components he score is the contribution of ech principl component to the originl fce Cn store the eigenfces nd scores, insted of entire imge If imge hs, voels nd 4 eigenfces describe 98% of the vribility in fces, then for ech new fce need only record 4 scores (not, voels) Principl component regression Independent components nlysis Principl components cn be used for dt reduction prior to regression Y β,, O,,, In PC, the PCs re orthogonl (uncorrelted) In IC, the components re defined to be mimlly sttisticlly independent stronger requirement Independence: knowing gives you no informtion bout y If fdt re Gussin then uncorrelted dimplies independence d Do PC on, then do regression on the scores Y PC PC scores scores β pc pc pc, pc pc pc, O pc, pc, pc, Uncorrelted but not independent Vr Vr 8

/7/ IC, cont. IC tends to do better for etrcting useful ptterns in sets of imges, becuse high-dimensionl dtsets typiclly hve strong non-gussinity Computtionlly more chllenging (not simple mtri ti decomposition) No inherent order of components Components my lso be scled IC lso known s BSS: blind source seprtion Forml sttement of problem N independent sources Z (m n) iing mtri. (n n) Produces set of observtions X (m n) X Z Wnt to demi observtions X into Y WX Y Z W IC is trying to estimte W PC vs. IC PC solution: PCs eplin m vrince 9

/7/ IC solution: projecting dt onto IC nd IC gives bck two sinusoids Principl nd Independent components IC eploits the non-gussinity of source signls IC: he bsic ide esures of sttisticl independence ssume underlying source signls (Z ) re independent. ssume liner miing mtri ( ) X Z in order to find Y ( Z ), find W, ( - )... Y WX Requires mesure of sttisticl independence which we mimize between ech of the components Non-gussinity (mimize kurtosis) utul informtion (minimize between components) Entropy mimize rndomness Entropy-mimize rndomness imum log likelihood How? Initilise W nd itertively updte W to minimise or mimise cost function tht mesures the (sttisticl) independence between the columns of the Y Cnnot solve using mtri decomposition

/7/ Joint IC Prllel IC Vrition on IC to look for components tht pper jointly cross fetures or modlities D imge Discover independent components from two modlities, in ddition to the reltionship between them D imge, control F imge, control F imge X D imge, control F imge, control Observed dt D imge, ptient n D imge, ptient Fetures/imges cross subjects X W Y Joint independent components Component weights/profile Sources dd constrint to independence m { H ( Y ) + H ( Y ) + Corr(, } ) imizing the entropy of the sources in ech modlity nd the correltion between columns of the miing mtri Prtil lest squres (PS) Relted to PC regression YXβ PC of X, keep some PCs nd predict Y PCs eplin vribility in X only hese components my not eplin Y t ll PS finds components of X tht re lso relevnt to Y ltent vectors re components tht simultneously decompose X nd Y tent vectors eplin the covrince between X nd Y Find two sets of weights to crete liner combintions of columns of X nd Y to mimize covrince

/7/ PS steps PS Emple Compute X Y-covrince of X nd Y Do SVD of X Y cn clculte the first ltent vector nd lodings from this Subtrct or prtil out the effect of the first ltent vector from X nd Y to crete X nd Y Repet until X is null 6 cognitive mesures on ptients with mild cognitive impirment -weighted imges normlized to tls PS between cognitive mesures nd moment Similr to PC, cn choose subset of ltent vectors to pproimte the prediction of Y nd chieve substntil dt reduction. First ltent vrible tent vrible scores Regions of reltive contrction (blue) nd epnsion (red) relted to V. Scores cn be computed for ech subject, perhps on reduced number of ltent vribles, nd used in regression nlysis.

/7/ Cnonicl Correltion nlysis CC, cont. Investigte the reltionship between two sets of vribles X D imge fetures, control D imge fetures, control Y F imge fetures, control F imge fetures, control Find pirs of liner combintions of vribles tht re uncorrelted hese pirs re the cnonicl vrites he dt: set of p independent vribles X, X,, X p nd q dependent d vribles Y, Y,, Y q, mesured on smple of N objects, from which we cn derive (p + q) X (p + q) correltion mtri. D imge fetures, ptient n F imge fetures, ptient n D imge D imge, control F imge, control D imge F imge F imge D imge, control F imge, control D imge, ptient n F imge, ptient n r rp r rq CC correltion mtri Within set (X) correltion between set (X,Y) correltion r p r r q O O r pp rp rpq r p r r q O O r qp rq rqq XX YX XY YY Within set (Y) correltion Wht re cnonicl vrites? Cnonicl vrites re the eigenvectors of the corresponding correltion mtri Orthogonl Spn vribility in either X or Y X ξ U ( U ) ξ U ( U ) X X X Y X XX YX Y XY YY Y ξ V ξ ( V ) Y V ( V )

/7/ Estimting cnonicl vrites Estimting cnonicl vrites, cont. he first cnonicl vrite is obtined by finding coefficients of the liner functions p U j X j j q V b Y j j j which mimizes the correltion between U nd V { r( U, )} r ( U, V ) m V he second cnonicl vrite is obtined by finding coefficients of the liner functions p X j j U j q V which mimizes the correltion between U nd V r U, V ) m{ r( U, )} Subject to the following constrints r( U, U ) r( V, V ) j b jy j ( V r( U, V ) r( U, V ) Clculting cnonicl vrites he end result is n eigenvector of: b is n eigenvector of: XX XY YY YY YX XX he squred cnonicl correltion r i is the corresponding eigenvlue YX XY set of r min(p,q) cnonicl vrites, one for the dependent vrible set {V}, the other for the independent vrible set {U} set of r cnonicl correltions C r(u,v) ech representing the correltion between pirs of cnonicl vrites U U High first cnonicl correltion X V ow second cnonicl correltion V 4

/7/ Significnce testing Interprettion: cnonicl coefficients Ech CV (cnonicl vrite) is tested in hierrchicl fshion by first testing significnce of ll CVs If ll CVs combined NS, then no CV is significnt If ll CVs combined re significnt, then remove first CV, reclculte test sttistic nd test Continue until test sttistic NS Emine stndrdized coefficients of cnonicl vrites Inference: vribles with lrge (in bsolute vlue) coefficients re most importnt U.9X.9 X +.48X +.9X 4 U minly contrst between X nd X 4 on the one hnd, nd X on the other Interprettion: cnonicl lodings Considertions Emine correltions of originl vribles with cnonicl vrites Inference: vribles with lrge (in bsolute vlue) correltions re most importnt Cnonicl vrite Vrible U U X -.9. X -.77 -. X.9 -. X 4.9 -. he vrince of U nd V will be influenced by the scling dopted, but the cnonicl correltions will be unffected he rtio of smple size to totl number of vribles should be lrge (> for cnonicl vrites, > 4 for two cnonicl vrites) X 4 is not relted to U

/7/ Ridge regression Constrined sum of squres RIDGE REGRESSION SSO ECHNIQUES Well estblished, widespred method Hoerl nd Kennrd, 97; rqurdt, 97 pplictions to neuroimging (Vldes-Sos, ) Regulrizes underdetermined problem by dding constrint to prmeter sum-of-squres Generliztion of pointwise regression Ordinry lest squres y Xβ+ ε, min ε ε min y-xβ β - β (X X) X y y: n observtions, subjects X: n p independent vribles Solution vlid if X X full-rnk β β: p regression coefficients ε: n residuls Ridge regression solution ( y-xβ λ β ) min +, where λ β - β ridge (X X+ λi) X y λ shrinks bsolute size of coefficients β Shrinkge introduces bis s λ grows, coefficients driven to If p>n, X X will never be full rnk! 6

/7/ Ridge trce Computtion X U % D % V % nd X U D V n n n p p p n k k k k p U, V orthonorml columns k is rnk of X, nd number of nonzero d D digonl with elements d i i - β ridge V(D + λi) R y RUD (n k) (D +λi) k k digonl, esily inverted Compleity order pk insted of p s λ Ridge Z βi( λ) zi ( λ) σ β ( λ) σ i ε ω i X y ω i X Xω i z i (λ) is i th z-sttistic ω I is i th column of Ω(X X+λI) - i y ( i i) i lim zi ( ) λ y λ λ σ σ ( i i) ε i i ε λ his is equivlent to the pointwise estimtion of z i! 7

/7/ ppliction to Deformtion orphometry Sptil correltion structure 8 cognitively impired nd 7 control subjects Verbl memory ssessed t bseline nd yr Bseline RI deformtion mps creted using B- spline-bsed nonliner registrtion (Studholme 4) Pointwise regression Dependent vribles were deformtion mps Independent vribles were chnge in verbl memory Convenient nd computtionlly efficient construction Ridge regression Dependent vribles were chnge in verbl memory Sptil vribles from mps were independent, p86,68 -.6.6 Deformtion ssocited with delyed memory SSO est bsolute Shrinkge nd Selection Opertor Definition It s coefficients shrunken version of the ordinry est Squre Estimte, by minimizing the Residul Sum of Squres subjecting to the constrint tht the sum of the bsolute vlue of the coefficients should be no greter thn constnt. -.96 8

/7/ Solving SSO No solution by decomposition or simple liner lgebr ust use itertive methods 9