Standardization and Singular Value Decomposition in Canonical Correlation Analysis
|
|
- Corey Griffin
- 6 years ago
- Views:
Transcription
1 Standardization and Singular Value Decomposition in Canonical Correlation Analysis Melinda Borello Johanna Hardin, Advisor David Bachman, Reader Submitted to Pitzer College in Partial Fulfillment of the Degree of Bachelor of Arts April 24, 2013 Department of Mathematics
2
3 Abstract Canonical correlation analysis (CCA) is a type of multivariate analysis based on the correlation between linear combinations of variables in two data sets. Biological applications of CCA include large scale genomic studies that can have multiple phenotypic or genotypic data. In these cases, CCA can lead to results that lack interpretability since CCA considers all variables. These types of analyses usually have an enormous number of variables, where the number of genes exceeds tens of thousands. Sparse canonical correlation analysis (SCCA) aims to solve the problem of interpretability by providing sparse solutions. In this paper, I examine the relationship between running CCA and SCCA with raw unstandardized data and running both methods using data that have been standardized to have a mean of zero and a standard deviation of one. I also show how both CCA and SCCA relate to singular value decomposition (SVD) by looking at an algorithm for SVD in the context of CCA and SCCA.
4
5 Contents Abstract Acknowledgments iii vii 1 Canonical Correlation Analysis 1 2 Sparse Canonical Correlation Analysis 9 3 Singular Value Decomposition Algorithm 15 Bibliography 21
6
7 Acknowledgments I d like to thank Johanna Hardin for offering to be my thesis adviser at the last minute when I thought it was too late for me to write a thesis. Her guidance throughout the research and writing process has been invaluable. I would also like to thank Associate Professor of Mathematics, Stephen Garcia, for his help with SVD and other linear algebra questions that we came upon throughout the semester. Without some of his explanations, Chapter 3 wouldn t exist. Of course, I would like to thank my mother, for all of my accomplishments are really hers. Thanks Mom, for everything.
8
9 Chapter 1 Canonical Correlation Analysis Canonical correlation analysis (CCA) measures the relationship between two sets of variables. CCA accomplishes this by focusing on the correlation between a linear combination of the variables in one set and a linear combination of the variables in another set. Canonical correlation can also be thought as an extension of bivariate correlation allowing more than two continuous variables in each set. CCA seeks to answer how the best linear combination of one set relates to the best linear combination of the other set of variables. The following presentation of canonical correlation analysis follows that which is offered by Johnson and Wichern (1992). Consider a group of p variables represented by the (p 1) random vector X, and a second group of q variables represented by the (q 1) random vector Y. Assume that p q. The random vectors have Cov(X) = Σ 11, Cov(Y) = Σ 22, and Cov(X, Y) = Σ 12, with E(X) = µ x and E(Y) = µ y. For coefficient vectors
10 2 Canonical Correlation Analysis a and b we can form the linear combinations, G = a X and H = b Y. Then max Corr(G, H) a,b is attained by the linear combinations (first canonical variate pair) G 1 = a 1X and H 1 = b 1Y. The kth pair of canonical variates, k = 2, 3,... p, G k = a k X and H k = b k Y where a k = Σ 1/2 11 u k and b k = Σ 1/2 22 v k for k = 1,..., p, maximizes Corr(G, H) among those linear combinations uncorrelated with the preceding 1, 2,..., k 1 canonical variables. That is, the second canonical variate pair are the linear combinations G 2 and H 2 which maximize all linear combinations which are uncorrelated with the first canonical variate pair. Then we have max a,b Corr(a X, b Y) = Corr(a k X, b k Y) = Corr(G k, H k ) = ρ k. Here the (p 1) vectors, u 1, u 2,... u p, and the (q 1) vectors, v 1, v 2,..., v p, are the left and right singular vectors, respectively, of matrix K = Σ 1/2 11 Σ 12 Σ 1/2 22.
11 3 Th singular values ρ 1 ρ 2 ρ p, of matrix K are the canonical correlations. In some instances, one may want to work with standardized variables, allowing ease of comparison of the variables to each other. There may be other motivations for standardization, such as standardization can simplify computations. Consider if we standardized the original variables as follows: Z X = V 1/2 11 (X µ x ) and Z Y = V 1/2 22 (Y µ y ) where V 1/2 11 is the (p p) diagonal matrix with one over the standard deviation on its diagonal, i.e., V 1/2 11 = 1/ σ x / σ x / σ xpp The matrix V 1/2 22 is similarly defined as a (q q) matrix with one over the standard deviation of Y, 1 σyii, on its diagonal. Here, ρ 11 = Cov(Z X ), ρ 22 = Cov(Z Y ) and ρ 12 = Cov(Z X, Z Y ), with E(Z X ) = E(Z Y ) = 0. Let the (p 1) vectors, e 1, e 2,... e p and the (q 1) vectors, f 1, f 2,..., f p be the left and right singular vectors of L, respectively, where L = ρ 1/2 11 ρ 12 ρ 1/2 22. Coefficient vectors, α k and β k, form the kth pair of canonical variates M k = α k Z X, N k = β k Z Y.
12 4 Canonical Correlation Analysis Then we have max α,β Corr(α Z X, β Z Y ) = Corr(α k Z X, β k Z Y ) = Corr(M k, N k ) = ρ k. Here ρ 1 ρ 2 ρ p are the singular values of matrix L and α k = e k ρ 1/2 11 and β k = f k ρ 1/2 22. Theorem 1.1 Given coefficient vectors α = ρ 1/2 11 e and β = ρ 1/2 22 f from scaled data, Z X and Z Y, and coefficient vectors a = Σ 1/2 11 u and b = Σ 1/2 22 v from raw data, X and Y, if V 1/2 11 and V 1/2 22 are diagonal matrices with i th diagonal element σ xii and σ yii, respectively, then α k = V 1/2 11 a k and β k = V 1/2 22 b k, i.e., coefficients from scaled data are equal to scaled coefficients from raw data in classical canonical correlation analysis. Proof: We will show that e k ρ 1/2 11 = u k Σ 1/2 11 V 1/2 11 or equivalently, that Σ 1/2 11 V 1/2 11 = ρ 1/2 11 and that e k = u k. By showing these two equalities hold, we will prove the theorem since without loss of generality, α k = e k ρ 1/2 11 = u k Σ 1/2 11 V 1/2 11 α k = a k V 1/2 11 = (V 1/2 11 a k) α k = V 1/2 11 a k.
13 5 Consider ρ 11 : ρ 11 = Cov(Z X ) = Cov(V 1/2 11 (X µ x )) = V 1/2 11 Cov(X µ x )V 1/2 11 (1.1) = V 1/2 11 Cov(X)V 1/2 11 (1.2) = V 1/2 11 Σ 11 V 1/2 11 = V 1/2 11 Σ 1/2 11 Σ1/2 11 V 1/2 11 Since ρ 11 is positive semi-definite, taking the square root of ρ 11 results in ρ 1/2 11 = V 1/2 11 Σ 1/2 11 ρ 1/2 11 = Σ 1/2 11 V 1/2 11 Line (1.2) follows from (1.1) because V 1/2 11 is diagonal and thus symmetric, so it is equal to its transpose. A similar argument for ρ 22 also shows that ρ 1/2 22 = Σ 1/2 22 V 1/2 22. Next, does u k = e k? In other words, does the left singular vector of K equal the left singular vector of L? We just proved that ρ 1/2 11 = Σ 1/2 11 V 1/2 11 and ρ 1/2 22 = Σ 1/2 22 V 1/2 22. Recall that ρ 12 = Cov(Z X, Z Y ). Thus we can
14 6 Canonical Correlation Analysis write L as: L = ρ 1/2 11 ρ 12 ρ 1/2 22 = Σ 1/2 11 V 1/2 11 Cov(Z X, Z Y )Σ 1/2 22 V 1/2 22 = Σ 1/2 11 V 1/2 11 1/2 Cov(V11 (X µ x ), V 1/2 22 (Y µ y ))Σ 1/2 22 V 1/2 22 = Σ 1/2 11 V 1/2 11 V 1/2 11 Cov(X µ x, Y µ y )V 1/2 22 Σ 1/2 22 V 1/2 22 (1.3) = Σ 1/2 11 Cov(X, Y)V 1/2 22 (Σ 1/2 22 V 1/2 22 ) (1.4) = Σ 1/2 11 Cov(X, Y)V 1/2 22 V 1/2( ) 22 Σ 1/2( ) 22 (1.5) = Σ 1/2 11 Cov(X, Y)V 1/2 22 V 1/2 22 Σ 1/2 22 = Σ 1/2 11 Cov(X, Y)Σ 1/2 22 = Σ 1/2 11 Σ 12 Σ 1/2 22 = K Moving from line (1.3) to (1.4), note that as proven above Σ 1/2 22 V 1/2 22 = ρ 1/2 22 which is positive definite and symmetric. Line (1.5) follows from (1.4) since Σ 1/2 22 and V 1/2 22 are both symmetric. Since L = K it is clear that L and K have the same singular vectors since they have the same singular value decomposition. Therefore, V 1/2 11 a k in fact is the coefficient vector α k for the k th canonical variate M k constructed from the standardized variable Z X, and V 1/2 22 b k is the coefficient vector β k for the k th canonical variate N k constructed from the standardized variable Z Y. Claim 1.2 If (i) a k and b k maximize Corr(a X, b Y) over all a and b subject to a and b being uncorrelated to a i and b i where i = 1, 2,... k 1 (ii) Corr(G k, H k ) =
15 7 ρ k, then, (i) max α,βcorr(α Z X, β Z Y ) =Corr(α k Z X, β k Z Y) subject to α and β being uncorrelated to α i and β i where i = 1, 2,..., k 1 (ii) Corr(M k, N k ) = ρ k as well, i.e., the canonical correlations are unchanged by the standardization. We will now show that raw and standardized canonical variates produce the same maximum correlation. From the raw data, G k and H k are the k th pair of canonical variates that maximize Corr(G, H). If this correlation is equal to ρ k then, ρ k = Corr(G k, H k ) = Corr(a k X, b k Y) = Corr(a k (X µ x), b k (Y µ y)) = Corr(a k V 1/2 11 V 1/2 11 (X µ x ), b k V 1/2 22 V 1/2 22 (Y µ y )) = Corr(a k V 1/2 11 Z X, b k V 1/2 22 Z Y ) = Corr(α k Z X, β k Z Y ) = Corr(M k, N k ) Thus canonical correlations are unchanged by the standardization. We know that G k and H k maximize the correlation for the raw data, but do M k and N k maximize the correlation for the standardized data? Suppose not. Suppose that there was a larger correlation, ρ k, between the two standardized canonical variates, M k and N k. Then ρ k must also be the maximum correlation for the raw canonical variates, G k and H k, since as shown above, the correlation for M k and N k is equal to the correlation for G k and H k.
16 8 Canonical Correlation Analysis Consider matrices K and L. As shown above, K = L, and thus they have the same singular vectors. It follows that they would also have the same singular values, ρ k. Furthermore, these singular values are the correlation for the kth canonical variate pair (detailed in Chapter 3). Hence, the pairs G k and H k, and M k and N k have the same (maximum) correlation value. Ultimately, CCA provides the same results whether using scaled data or using raw data and scaling coefficients by V 1/2 11 and V 1/2 22 (respectively) at the end. Both raw and standardized variables maximize the correlation of linear combinations of the variables from both data sets.
17 Chapter 2 Sparse Canonical Correlation Analysis As presented above, canonical correlation analysis uses all variables from both sets, X and Y, to create canonical vectors. Yet when trying to apply CCA to real data, CCA often fails to produce interpretable results. Data used in microarray data analysis and genome-wide linkage analysis usually have an enormous number of variables, where the number of genes exceeds tens of thousands. Thus results from CCA lack biological interpretability. One way to combat this issue is by using sparse canonical correlation analysis (SCCA). SCCA helps solve problems of interpretability by providing sparse sets of the associated variables, i.e., canonical vectors contain sparse loadings. Following a method to select appropriate sparseness parameters, the sparse solution contains variables that are deemed more im-
18 10 Sparse Canonical Correlation Analysis portant than others. Thus the solution reduces the dimensionality which improves interpretability. The iterative algorithm below is that presented by Parkhomenko et al. (2009). This algorithm uses soft-thresholding as its penalty function. Note that there are a number of different penalty functions that one can use for SCCA which have been outlined and compared by Chalise and Fridley (2012). Following classical CCA, consider two sets of variables X and Y, with p variables in X and q variables in Y. As before, let K = Σ 1/2 11 Σ 12 Σ 1/2 22, where Cov(X) = Σ 11, Cov(Y) = Σ 22, and Cov(X, Y) = Σ 12. The first sparse canonical vectors are identified using the following algorithm: 1. Select sparseness parameters, λ u and λ v 2. Select initial values u 0 and v 0 and set i = 0 3. Update u: (a) u i+1 Kv i (b) Normalize: u i+1 ui+1 u i+1 (c) Apply soft-thresholding to obtain sparse solution: u i+1 j ( u i+1 j 1 2 λ u) + Sign(u i+1 ) for j = 1,..., p (d) Normalize: u i+1 4. Update v: ui+1 u i+1 j (a) v i+1 K u i (b) Normalize: v i+1 vi+1 v i+1
19 11 (c) Apply soft-thresholding to obtain sparse solution: vj i+1 ( vj i λ v) + Sign(vj i+1 ) for j = 1,..., q (d) Normalize: v i+1 5. i i + 1 vi+1 v i+1 6. Repeat steps 3-5 until convergence where (x) + is equal to x if x 0 and 0 if x < 0, and 1 if x < 0 sign(x) = 1 if x > 1 0 if x = 0. Parkhomenko et al. (2009) replace Σ 11 and Σ 22 by diag(σ 11 ) and diag(σ 22 ) thus K becomes an approximation of the sample correlation matrix of X and Y without any of the information of how the variables in X are correlated or how the variables in Y are correlated. Using diag(σ 11 ) avoids computational problems. The computation of K requires (X X) 1 and (Y Y) 1 which may not exist in cases where the number of variables is greater than the number of observations (when p or q are larger than n). This is common in the biological applications discussed above, where one can have tens of thousands of genes, but only a few hundred observations. The authors also select initial values so that u 0 is the row means of K and v 0 is the column means of K, both standardized to have unit length. Although the authors assume that X and Y have been standardized to Z X and Z Y, SCCA applied to raw data produces the same results as ap-
20 12 Sparse Canonical Correlation Analysis plied to standardized data. As in classical CCA, using scaled and centered data or using raw data does not affect K nor does it affect the left and right singular vectors of K. Since the initial values, u 0 and v 0, come from K and recall that K = L, there is no difference in the algorithm for raw variables or standardized variables. In this context, we are interested in the effect of scaling and centering the variables on the soft-thresholding step. The sparseness parameters λ u and λ v are chosen using k-fold cross-validation as outlined below: 1. Choose λ u and λ v 2. Remove 1 k of the data (testing sample) 3. Find canonical coefficients from k 1 k of the data (training sample) 4. Find canonical correlation using testing sample and coefficients from training sample 5. Repeat steps 2-4 k times. Average across the k correlations, keeping this value 6. Repeat steps 1-5 for new λ u and λ v From this process, we find the optimal combination of λ u and λ v out of all specific pairs of sparseness parameters that correspond to the highest average test sample correlation. This process is not affected by scaling variables as it looks at the correlation between canonical vectors in testing and training samples. We have already proven that the correlation will be the same between raw canonical vectors and standardized canonical vectors.
21 13 Thus, the choices for sparseness parameters are not affected by the scaling of variables. (For more on the selection of sparseness parameters consult Parkhomenko et al. (2009)). The relationship between the coefficients from raw data and coefficients from standardized data is the same in the sparse setting as it was in CCA. The difference is that u and v will have sparse loadings, so our coefficient vectors will also be sparse. As before, if α k and β k are coefficients for standardized data Z X and Z Y, and a k and b k are coefficients for raw data, X and Y, then α k = V 1/2 11 a k and β k = V 1/2 22 b k, where Corr(α k Z X, β k Z Y) = Corr(a k X, b ky). That is, Theorem 1.1 holds for SCCA.
22
23 Chapter 3 Singular Value Decomposition Algorithm One may notice that the sparse algorithm is nothing more than the standard algorithm for singular value decomposition (SVD) with an added soft-thresholding step to obtain a sparse vector. If we consider the standard algorithm, we have: 1. Select initial values u 0 and v 0 and set i = 0 2. Update u: (a) u i+1 Kv i (b) Normalize: u i+1 3. Update v: ui+1 u i+1 (a) v i+1 K u i
24 16 Singular Value Decomposition Algorithm (b) Normalize: v i+1 4. i i + 1 vi+1 v i+1 5. Repeat steps 2-4 until convergence. Why does this provide the singular vectors of K? Recall that the SVD of K is K = UDV where we choose U and V to be orthogonal matrices, i.e., U = U 1 and V = V 1. Thus, the column vectors of U and V, u i and v i, are unit vectors and are the left and right singular vectors of K. Matrix D is a diagonal matrix with the singular values of K on its diagonal, i.e., the square roots of the eigenvalues of K K or KK. Alternating multiplying v by K and u by K produces the singular values of K if the following if noted: Kv i = UDV v i = d i u i. Observe that V v i is a column vector of zeros except for a 1 in the i th position, since this is equivalent to taking the inner product of v i with the columns of V. This then results in DV v i which is a diagonal matrix with all zeros expect for d i on the diagonal in the i th column. Thus UDV v i is the i th column of U multiplied by the singular value, d i. Next we normalize to get a unit vector, giving us, d i u i d i u i = d iu i d i u i = u i u i = u i
25 17 Similarly, K u i = V DU u i = d i v i which we normalize to get v i. The following process will produce the largest eigenvalue of D which is the largest singular value of K. In the context of CCA and SCCA, this is the largest correlation, ρ 1, between the canonical variates. By switching off multiplying u and v by K and K respectively, each starting vector u 0 and v 0 will be scaled towards the direction of the largest singular value of K until u and v become the singular vectors of K. Notice that if we construct a symmetric matrix A, such that A = 0 K K 0 the problem is then reduced to an eigenvalue problem. We wish to solve Ax = dx where d is the dominant singular value of K (and the square root of the dominant eigenvalue of A) and x is the vector ( v u ). Note that applying
26 18 Singular Value Decomposition Algorithm A to x leads to A v = K u u Kv = dv du if u and v are the singular vectors of K. Thus putting K and K into a block matrix results in a symmetric matrix A with dominant eigenvalue d and eigenvector ( v u ). Now instead of finding singular vectors and singular values, we just need to find the eigenvectors and eigenvalues of A. The above algorithm for SVD can be used for CCA and SCCA. It produces the components needed for the canonical variates, (singular vectors) and it produces the canonical correlations, (singular values). How does SVD produce canonical correlations? Recall that in CCA we seek coefficient vectors, a and b, such that Corr(G, H) = a Σ 12 b a Σ 11 a b Σ 22 b
27 19 is maximized. Notice that we can reduce this expression as follows: max a,b Corr(G, H) = max a,b = max u,v = max u,v = max u,v a Σ 12 b a Σ 11 a b Σ 22 b u Σ 1/2 11 Σ 12 Σ 1/2 u Σ 1/2 11 Σ 11 Σ 1/2 11 u u Kv u I p p u v I q q v 22 v v Σ 1/2 22 Σ 22 Σ 1/2 22 v (3.1) (3.2) u Kv u v. (3.3) Line 3.2 follows from 3.1 by a change of variables. Line 3.3 is equivalent to finding u and v that are unit vectors and maximizing over those vectors. Under this condition, 3.3 boils down to max u,v u Kv. Substituting K by its singular value decomposition gives max u,v u UDV v. (3.4)
28 20 Singular Value Decomposition Algorithm When u and v are the i th singular vectors of K, from above calculations, 3.4 becomes, max u,v u UDV v = u 1 UDV v 1 = u 1 d 1 u 1 = d 1 u 1 = d 1 The first singular vector of K will produce the largest singular value, which we saw in the above algorithm. The iterative process to find u 1 and v 1 will always be pulled towards the direction of the largest singular value of K. Thus, SVD produces the canonical correlations. To summarize, SVD is a helpful tool that can be used to accomplish CCA and SCCA. SVD provides an algorithm that helps finds the linear combinations of variables in each data set that produces the maximum correlation. In both SCCA and CCA, standardizing the variables so as to have mean zero and standard deviation of one does not change the canonical correlation. The coefficients from the standardized data are equal to the coefficients from the raw data scaled by their standard deviations. Therefore, it is at the discretion of the researcher whether to use raw data or to use standardized data in CCA or SCCA.
29 Bibliography Chalise, P. and Fridley, B. L. (2012). Comparison of penalty functions for sparse canonical correlation analysis. Comput. Statist. Data Anal., 56(2): Johnson, R. A. and Wichern, D. W. (1992). Applied multivariate statistical analysis. Prentice Hall Inc., Englewood Cliffs, NJ, third edition. Parkhomenko, E., Tritchler, D., and Beyene, J. (2009). Sparse canonical correlation analysis with application to genomic data integration. Stat. Appl. Genet. Mol. Biol., 8:Art. 1, 36.
Lecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationPrincipal Components Theory Notes
Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationLinear Methods in Data Mining
Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationSTAT 501 Assignment 1 Name Spring 2005
STAT 50 Assignment Name Spring 005 Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Written Assignment: Due
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationVALUES FOR THE CUMULATIVE DISTRIBUTION FUNCTION OF THE STANDARD MULTIVARIATE NORMAL DISTRIBUTION. Carol Lindee
VALUES FOR THE CUMULATIVE DISTRIBUTION FUNCTION OF THE STANDARD MULTIVARIATE NORMAL DISTRIBUTION Carol Lindee LindeeEmail@netscape.net (708) 479-3764 Nick Thomopoulos Illinois Institute of Technology Stuart
More informationGershgorin s Circle Theorem for Estimating the Eigenvalues of a Matrix with Known Error Bounds
Gershgorin s Circle Theorem for Estimating the Eigenvalues of a Matrix with Known Error Bounds Author: David Marquis Advisors: Professor Hans De Moor Dr. Kathryn Porter Reader: Dr. Michael Nathanson May
More informationStatistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }
Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationBlock Bidiagonal Decomposition and Least Squares Problems
Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition
More informationNumerical Linear Algebra
Chapter 3 Numerical Linear Algebra We review some techniques used to solve Ax = b where A is an n n matrix, and x and b are n 1 vectors (column vectors). We then review eigenvalues and eigenvectors and
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationIterative Methods for Eigenvalues of Symmetric Matrices as Fixed Point Theorems
Iterative Methods for Eigenvalues of Symmetric Matrices as Fixed Point Theorems Student: Amanda Schaeffer Sponsor: Wilfred M. Greenlee December 6, 007. The Power Method and the Contraction Mapping Theorem
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More information632 CHAP. 11 EIGENVALUES AND EIGENVECTORS. QR Method
632 CHAP 11 EIGENVALUES AND EIGENVECTORS QR Method Suppose that A is a real symmetric matrix In the preceding section we saw how Householder s method is used to construct a similar tridiagonal matrix The
More informationCanonical Correlation Analysis with Kernels
Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1 Overview
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationExample Linear Algebra Competency Test
Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,
More informationDesigning Information Devices and Systems II
EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric
More informationThe Exponential of a Matrix
The Exponential of a Matrix 5-8- The solution to the exponential growth equation dx dt kx is given by x c e kt It is natural to ask whether you can solve a constant coefficient linear system x A x in a
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More information. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in
Vectors and Matrices Continued Remember that our goal is to write a system of algebraic equations as a matrix equation. Suppose we have the n linear algebraic equations a x + a 2 x 2 + a n x n = b a 2
More informationLecture 6. Numerical methods. Approximation of functions
Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation
More informationStat 159/259: Linear Algebra Notes
Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationCh.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]
Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4] With 2 sets of variables {x i } and {y j }, canonical correlation analysis (CCA), first introduced by Hotelling (1936), finds the linear modes
More informationLecture 5 Singular value decomposition
Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationA Short Note on Resolving Singularity Problems in Covariance Matrices
International Journal of Statistics and Probability; Vol. 1, No. 2; 2012 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Short Note on Resolving Singularity Problems
More informationCharacterization of half-radial matrices
Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the
More informationPrincipal component analysis
Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and
More informationReview of similarity transformation and Singular Value Decomposition
Review of similarity transformation and Singular Value Decomposition Nasser M Abbasi Applied Mathematics Department, California State University, Fullerton July 8 7 page compiled on June 9, 5 at 9:5pm
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationRobust Sparse Canonical Correlation Analysis and PITCHf/x. Jacob Coleman, Johanna Hardin
Robust Sparse Canonical Correlation Analysis and PITCHf/x Jacob Coleman, Johanna Hardin April 5, 2013 Chapter 1 Introduction Since the early 2000 s, there has been a wave of thinking in baseball that departs
More informationDef. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as
MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2
More informationPrincipal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays
Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2015 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.
More informationMATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018
Homework #1 Assigned: August 20, 2018 Review the following subjects involving systems of equations and matrices from Calculus II. Linear systems of equations Converting systems to matrix form Pivot entry
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationFrom Lay, 5.4. If we always treat a matrix as defining a linear transformation, what role does diagonalisation play?
Overview Last week introduced the important Diagonalisation Theorem: An n n matrix A is diagonalisable if and only if there is a basis for R n consisting of eigenvectors of A. This week we ll continue
More informationLinear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space
Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................
More informationThis appendix provides a very basic introduction to linear algebra concepts.
APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationFitting functions to data
1 Fitting functions to data 1.1 Exact fitting 1.1.1 Introduction Suppose we have a set of real-number data pairs x i, y i, i = 1, 2,, N. These can be considered to be a set of points in the xy-plane. They
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course
More informationInterlacing Inequalities for Totally Nonnegative Matrices
Interlacing Inequalities for Totally Nonnegative Matrices Chi-Kwong Li and Roy Mathias October 26, 2004 Dedicated to Professor T. Ando on the occasion of his 70th birthday. Abstract Suppose λ 1 λ n 0 are
More informationDot Products, Transposes, and Orthogonal Projections
Dot Products, Transposes, and Orthogonal Projections David Jekel November 13, 2015 Properties of Dot Products Recall that the dot product or standard inner product on R n is given by x y = x 1 y 1 + +
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6
CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 GENE H GOLUB Issues with Floating-point Arithmetic We conclude our discussion of floating-point arithmetic by highlighting two issues that frequently
More informationSingular Value Decomposition
Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More informationCANONICAL LOSSLESS STATE-SPACE SYSTEMS: STAIRCASE FORMS AND THE SCHUR ALGORITHM
CANONICAL LOSSLESS STATE-SPACE SYSTEMS: STAIRCASE FORMS AND THE SCHUR ALGORITHM Ralf L.M. Peeters Bernard Hanzon Martine Olivi Dept. Mathematics, Universiteit Maastricht, P.O. Box 616, 6200 MD Maastricht,
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationNUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA
NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA Institute of Numerical Mathematics of Russian Academy of Sciences eugene.tyrtyshnikov@gmail.com 2 June 2012 COLLABORATION MOSCOW: I.Oseledets, D.Savostyanov
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationEE226a - Summary of Lecture 13 and 14 Kalman Filter: Convergence
1 EE226a - Summary of Lecture 13 and 14 Kalman Filter: Convergence Jean Walrand I. SUMMARY Here are the key ideas and results of this important topic. Section II reviews Kalman Filter. A system is observable
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationAN ITERATION. In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b
AN ITERATION In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b In this, A is an n n matrix and b R n.systemsof this form arise
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationNumerical Linear Algebra
Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one
More informationLecture 02 Linear Algebra Basics
Introduction to Computational Data Analysis CX4240, 2019 Spring Lecture 02 Linear Algebra Basics Chao Zhang College of Computing Georgia Tech These slides are based on slides from Le Song and Andres Mendez-Vazquez.
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationLinear Algebra, part 3 QR and SVD
Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We
More informationILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS
ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS W. T. Federer, C. E. McCulloch and N. J. Miles-McDermott Biometrics Unit, Cornell University, Ithaca, New York 14853-7801 BU-901-MA December 1986
More information22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes
Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one
More informationPrincipal Components Analysis
Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationOn Expected Gaussian Random Determinants
On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose
More informationBasic Calculus Review
Basic Calculus Review Lorenzo Rosasco ISML Mod. 2 - Machine Learning Vector Spaces Functionals and Operators (Matrices) Vector Space A vector space is a set V with binary operations +: V V V and : R V
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationExtreme Values and Positive/ Negative Definite Matrix Conditions
Extreme Values and Positive/ Negative Definite Matrix Conditions James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 8, 016 Outline 1
More information