EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS
|
|
- Cori Small
- 5 years ago
- Views:
Transcription
1 EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that data are usually characterized as a set of n objects with relevant features described by a set of p variables, complexity reduction can be achieved either by reduction of variables or by reduction of objects. Here we consider the first problem assuming variables to be numerical in nature. A first, and obvious, method for simplification of features is to select a subset of q p features able to retain the desired information. A second, more general, method is to look for q p transformations of the observed features able to retain the desired information. Principal component analysis (PCA) belongs to this second family of methods and is a typical step when trying to understand the structure of multidimensional numerical data. 2 Preliminaries: redundancy, rank and linear transformations If we want to preserve the main information given by the observed features, it is clear that complexity reduction is only possible when there is some redundancy in the data. Mathematics offers a useful first notion of redundancy, which is rank. Since p is usually much lower than n, the rank of a data frame can be taken to be the number of linearly independent variables (columns) meaning that, if some variables are linear combinations of the others, then they do not offer new information and can be dropped without loosing anything. The rank can be computed as the number of strictly positive singular values of the data frame. Below we examine the rank of Swiss notes data set. > bn <- read.table(file= + header=true) > str(bn) data.frame : 200 obs. of 7 variables: $ Id : Factor w/ 200 levels "BN1","BN10","BN",..: $ Length : num $ Left : num $ Right : num $ Bottom : num $ Top : num $ Diagonal: num > SVD <- svd(bn[, -1]) > str(svd) 1
2 2 PRELIMINARIES: REDUNDANCY, RANK AND LINEAR TRANSFORMATIONS 2 List of 3 $ d: num [1:6] $ u: num [1:200, 1:6] $ v: num [1:6, 1:6] > # Singular values > SVD$d [1] Here the rank is 6 and equals the number of variables. Let us consider two more variables, that is, the perimeter and the area of each note. > peri <- 2*bn$Length + bn$left + bn$right > area <- bn$length * (bn$left + bn$right)/2 > bn1 <- data.frame(bn[, -1], peri, area) > names(bn1) <- c(names(bn[, -1]), Perimeter, Area ) > SVD1 <- svd(bn1) > # Singular values > SVD1$d [1] e e e e e+00 [6] e e e-13 Now the last singular value, up to precision tolerance, is zero reflecting the fact that perimeter is a linear combination of side lengths of the notes. Hence the rank is 7 = p Note that addition of area, being a non linear transformation of side lengths, does not contribute to redundancy. Another useful notion of redundancy is offered by the correlation matrix R R(X)of a data frame X. An observed p p correlation matrix R(X) varies between two extreme correlation matrices Identity matrix I p = diag(1,..., 1), All-ones matrix J p = 1 p 1 T p. The identity matrix corresponds to the situation where variables are linearly independent, hence there is no redundancy and it is not possible to reduce complexity. The J-matrix corresponds to the seemingly opposite situation where just one variable carries real information all the others being perfect linear transformations of it. The ranks of I p and J p are p and 1, respectively. Underlying the previous notions of redundancy and rank there is a particular class of data transformations, called linear combinations. We introduce a useful notation to represent a general linear combination and recall some properties. Let X be the n p numerical matrix of the data and let a = (a 1,..., a p ) T be a general p-vector. The linear combination z = (z 1, z 2,..., z n ) T associated to vector a is the transformation z z(a) = Xa. (1) As z i = a T x i = p j=1 a jx ij, the values of the linear combination can be interpred as generalized means, the generalization being that the a j are general real numbers, whereas for means they must be non negative numbers summing to one. As an example, the perimeter of Swiss notes is the linear combination associated to vector a = (2, 1, 1, 0, 0, 0) T and the perimeter of the general note is z i = 2x i1 + x i2 + x i3, i = 1,..., n. The properties of a linear combination depend on the transformation vector a and the reference data frame. In particular, the mean and the variance are
3 3 PRINCIPAL COMPONENTS 3 z = a T x = s 2 Z = a T Sa = p a j x j, (2) j=1 p s ij a i a j. (3) The expression of the variance is particularly important. We regard variance as the information content (proportional to L2 norm of errors about the mean) of the corresponding variable. From (??), the variance of a linear combination is a quadratic form depending on the underlying data through variances and pairwise covariances of the observed variables. Hence, the variance of a linear combination incorporates the information about spread as well as linear interdependence, filtered by the coefficients a j. To avoid variance explosion, often normalized linear combinations are considered, where the coefficients a j satisfy the constraint p j=1 a2 j = 1. In this case, the range of variation of a is the boundary of the unit radius hypersphere centered at the origin, instead of the entire euclidean space. Another property of linear combinations is that they preserve normality, when data are normally distributed. 3 Principal components In the previous examples, the vector a of a linear combination was given. But in data analysis the vector a is often determined so as to achieve specific goals. In these situations it is typically a function of the observed data. This is exactly the case of principal components. The principal components of a numerical data frame X are p uncorrelated normalized linear combinations Z 1, Z 2,..., Z p ordered according to an information criterion. The first principal component is the normalized linear combination with maximum variance, the second principal component is the normalized linear combination with maximum variance subject to the constraint of being uncorrelated with the first one, and so on. The last principal component can also be characterized as the normalized linear combination with minimum variance. The computation of principal components is simple because a) the vectors a 1, a 2,..., a p of the optimal linear combinations are known to be the orthonormal eigenvectors of the covariance matrix of X and b) their variances are the corresponding eigenvalues l 1 l 2... l p 0. A geometrical interpretation is also available. The principal component transformation is a rotation of p-dimensional space to new axes, called principal axes, corresponding to directions of maximum variation of the data. For normal data, the principal axes are the axes of the ellipsoids of concentration, that is, the contours of normal density function. In typical applications, PCA includes three steps: 1. preliminary data transformations, always including column centering of the data frame and sometimes column standardization to unit variance, i,j=1 2. rotation to principal axes and computation of principal component scores, 3. selection of an optimal subset of q p principal components. The optimality criterion used in the last step is an information criterion. Recalling that principal components are ordered according to decreasing variance, we have to retain enough pc s so as their cumulated variance approximates the total variance of the original variables. In practice, we consider the information ratio R 2 (q) = q j=1 l j p j=1 l j, (4) and choose q so as R 2 (q) is sufficiently high. Values in the range 70-80% are considered satisfactory.
4 3 PRINCIPAL COMPONENTS 4 pc Variances Figure 1: Swiss banknotes. Ordered variances of principal components. 3.1 Worked example: PCA of centered Swiss notes data Below PCA of Swiss notes is performed. We consider first the analysis of column centered data. > class <- rep(c(0,1), c(, )) > col <- rep(c( black, red ), c(, )) > pc <- princomp(bn[, -1], cor=false) > str(pc) List of 7 $ sdev : Named num [1:6] attr(*, "names")= chr [1:6] "Comp.1" "Comp.2" "Comp.3" "Comp.4"... $ loadings: loadings [1:6, 1:6] attr(*, "dimnames")=list of 2....$ : chr [1:6] "Length" "Left" "Right" "Bottom" $ : chr [1:6] "Comp.1" "Comp.2" "Comp.3" "Comp.4"... $ center : Named num [1:6] attr(*, "names")= chr [1:6] "Length" "Left" "Right" "Bottom"... $ scale : Named num [1:6] attr(*, "names")= chr [1:6] "Length" "Left" "Right" "Bottom"... $ n.obs : int 200 $ scores : num [1:200, 1:6] attr(*, "dimnames")=list of 2....$ : NULL....$ : chr [1:6] "Comp.1" "Comp.2" "Comp.3" "Comp.4"... $ call : language princomp(x = bn[, -1], cor = FALSE) - attr(*, "class")= chr "princomp" > # pc$loadings pxp orthogonal matrix of ordered eigenvectors > # whose columns are the optimal normalized linear combinations > # pc$sdev standard deviations of pc scores (coincident with squared roots
5 3 PRINCIPAL COMPONENTS 5 > # of eigenvalues of covariance matrix) > # pc$scores n x p matrix of pc scores or coordinates (linear combination values) > summary(pc) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation Proportion of Variance Cumulative Proportion Comp.6 Standard deviation Proportion of Variance Cumulative Proportion > plot(pc) > # Statistical summaries of pc scores > round(colmeans(pc$scores), 2) > round(cov(pc$scores), 2) Comp Comp Comp Comp Comp Comp > round(cor(pc$scores), 2) Comp Comp Comp Comp Comp Comp > round(cor(bn[,-1], pc$scores[, 1:6]), 2) Length Left Right Bottom Top Diagonal > plot(pc$scores[,1:2], pch=20, col=col, + xlab= PC1 (66.8%), ylab= PC2 (20.8%), main= PCA of Swiss Banknotes Data ) > abline(h=0, v=0, lty= dotted, col= grey ) > text(pc$scores[,1:2], labels=c(1:, 1:), cex=0.6, pos=3) > pairs(pc$scores, pch=20, col=col)
6 3 PRINCIPAL COMPONENTS 6 PCA of Swiss Banknotes Data PC2 (20.8%) PC1 (66.8%) Figure 2: Swiss banknotes. Scatter plot of the first two principal components of centered data (black/red: genuine/forged bills). A discussion of results is given below. 1. The first two pc s provide a very good approximation of the 6-dimensional data: 66.8% of total variance is absorbed by the first pc and an additional 20.8% by the second one, corresponding to a cumulative value equal to 87.6%. Therefore the visualization of the sample on the cartesian plane of the first two pc s is a reliable picture of the original 6-dimensional configuration. 2. The scatter plot of the first two pc s shows some remarkable features. The two classes appear as separate swarms of points which is important because the information about class composition was NOT explicitly included in principal component transformation. Moreover, the elongated shape of both clusters (more accentuated fot forged bills) suggests within-class negative correlation of pc s scores (recall that, by definition, pc s scores are uncorrelated). Finally, outliers are clearly displayed (e. g., observations no. 5 and 70 from the class of genuine bills). 3. The correlation matrix of principal components with observed variables is the main tool to obtain an interpretation of the transformation. In the present case the first pc is correlated mainly with Bottom and Diagonal whereas the second pc is mainly correlated with Top. While this gives a clue about interpretation of principal axes, it also suggests that estimation of principal axes may be distorted by unbalanced variances of original variables. This can be the case here because these three variables have the highest variances. This is why it is generally recommended to apply PCA after data standardization. 3.2 Worked example: PCA of standardized Swiss notes data For the sake of completeness, we also study the pc transformation of standardized Swiss notes data. Note that in this case we look for the stationary points of the function b T Rb = p i,j=1 r ijb i b j, where R = (r ij ) is the correlation matrix of the data, subject to the constraint b T b = 1. The solution is given by the orthonormal eigenvectors of R and the corresponding eigenvalues.
7 3 PRINCIPAL COMPONENTS 7 PCA of Swiss Banknotes Data PC2 (21.3%) PC1 (49.1%) Standardized Data Figure 3: Swiss banknotes. Scatter plot of the first two principal components of standardized data (black/red: genuine/forged bills). > pc1 <- princomp(bn[, -1], cor=true) > summary(pc1) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation Proportion of Variance Cumulative Proportion Comp.6 Standard deviation Proportion of Variance Cumulative Proportion > plot(pc1) > # Statistical summaries of pc scores > round(colmeans(pc1$scores), 2) > round(cov(pc1$scores), 2) Comp Comp Comp Comp Comp Comp
8 3 PRINCIPAL COMPONENTS 8 > round(cor(pc1$scores), 2) Comp Comp Comp Comp Comp Comp > round(cor(bn[,-1], pc1$scores[, 1:6]), 2) Length Left Right Bottom Top Diagonal > plot(pc1$scores[,1:2], pch=20, col=col, + xlab= PC1 (49.1%), ylab= PC2 (21.3%), main= PCA of Swiss Banknotes Data, + sub= Standardized Data ) > abline(h=0, v=0, lty= dotted, col= grey ) > text(pc1$scores[,1:2], labels=c(1:, 1:), cex=0.6, pos=3) > pairs(pc1$scores, pch=20, col=col) For standardized data, we observe a drop in the value of variance explained by just the first two pc s: 70.4% against 87.6% obtained on centered data. Also, interpretation is different. The first pc heavily depends on all observed variables except Length. Correlations are positive except for Diagonal. The second pc depends almost only on Length. Again, class discrimination is good and there is evidence of positive within-class correlation of pc s scores. 3.3 Worked example: PCA of Swiss notes data augmented with perimeter and area of bills As a final application, we study the effect on pc transformation of inclusion of linear and non linear transformations of variables. Here we consider addition of perimeter and area of notes. > apply(bn1, 2, sd) Length Left Right Bottom Top Diagonal Perimeter Area > pc2 <- princomp(bn1, cor=true) > summary(pc2) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation Proportion of Variance Cumulative Proportion
9 4 BEYOND PRINCIPAL COMPONENTS 9 Comp.6 Comp.7 Comp.8 Standard deviation e-03 0 Proportion of Variance e-07 0 Cumulative Proportion e+00 1 > round(cor(bn1, pc2$scores[, 1:8]), 2) Comp.7 Comp.8 Length Left Right Bottom Top Diagonal Perimeter Area > plot(pc2$scores[,1:2], pch=20, col=col, + xlab= PC1 (52.8%), ylab= PC2 (24.9%), main= PCA of Swiss Banknotes Data, + sub= Data set augmented with Perimeter and Area ) > abline(h=0, v=0, lty= dotted, col= grey ) > text(pc2$scores[,1:2], labels=c(1:, 1:), cex=0.6, pos=3) > pairs(pc2$scores, pch=20, col=col) The results show remarkable differences with respect previous versions. 1. Here it is necessary to apply pc transformation to standardized data because the variance of Area variable is clearly dominant. 2. The last eigenvalue is zero because the rank of the augmented data matrix is 7, not 8, as Perimeter is a linear transformation of a subset of the observed variables. 3. The cumulated variance explained by the first two pc is 77.7%, an intermediate value between the previous results and the cumulated variance explained by the first three pc is 88.6%, a very good value. 4. Let us try to interpret the first three pc, using the coorelations with observed variables. The first, and most important pc, mainly depends on Left, Right, Perimeter and Area (absolute correlations all higher than 0.8, highest value corresponding to Area), the correlations with the remaining variables being non negligible but clearly of minor importance. The second pc mainly depends on Length (correlation equal to 0.81), Diagonal (correlation equal to 0.68), Bottom, Top and Perimeter. The third pc can be interpreted as a contrast between Bottom (correlation equal to 0.55) and Top (correlation equal to 0.73). 5. Class separation remains good. Observe that genuine bills are generally above the line P C2 = P C1, that is, the bisector of third anf fourth quadrants. 4 Beyond principal components Taking linear combinations of observed variables is a powerful method to explore the multidimensional space. Principal components are characterized by the maximum variance property but different solutions can be obtained by changing the function to be optimized. Another two classical examples arise in multiple linear regression and discriminant analysis.
10 4 BEYOND PRINCIPAL COMPONENTS 10 PCA of Swiss Banknotes Data PC2 (24.9%) PC1 (52.8%) Data set augmented with Perimeter and Area Figure 4: Swiss banknotes. Scatter plot of the first two principal components from data augmented with area and perimeter of bills (black/red: genuine/forged bills). In multiple linear regression we are given p explanatory variables X 1,..., X p and a dependent variable Y and we look for the optimal linear predictor of Y based on X 1,..., X p. It turns out that the well-known least square solution is the linear combination of (centered) X 1,..., X p with maximum squared correlation with (centered) Y. In discriminant analysis we are given a partition of the n objects in G classes and we look for the linear combination of the observed features X 1,..., X p producing the best separation of classes. A criterion suggested in 19 by R. A. Fisher is to maximize the ratio of between-group variance to within group variance. Recall that in the scalar case the within group variance s 2 W is the weighted mean of the class variances and the between group variance s 2 B is the variance of the weighted class means about the overall mean. An important result is that the overall variance is identically equal to the sum of the betweengroup and the within-goup components. The resulting optimally separating linear combinations, called canonical variates, are related to linear discriminant analysis. We illustrate the canonical variate method using the Swiss banknotes data. > library(mass) > ld <- lda(scale(bn[, -1]), grouping=class) > str(ld) List of 8 $ prior : Named num [1:2] attr(*, "names")= chr [1:2] "0" "1" $ counts : Named int [1:2]..- attr(*, "names")= chr [1:2] "0" "1" $ means : num [1:2, 1:6] attr(*, "dimnames")=list of 2....$ : chr [1:2] "0" "1"....$ : chr [1:6] "Length" "Left" "Right" "Bottom"... $ scaling: num [1:6, 1] attr(*, "dimnames")=list of 2
11 4 BEYOND PRINCIPAL COMPONENTS 11 PCA and CVA of Swiss Banknotes Data CV PC1 Standardized Data Figure 5: Swiss banknotes. Scatter plot of first principal component (horizontal axis) and first canonical variate (vertical axis) obtained from standardized data (black/red: genuine/forged bills).....$ : chr [1:6] "Length" "Left" "Right" "Bottom" $ : chr "LD1" $ lev : chr [1:2] "0" "1" $ svd : num 49.1 $ N : int 200 $ call : language lda(x = scale(bn[, -1]), grouping = class) - attr(*, "class")= chr "lda" > # $scaling optimal linear combination(s) > cv <- scale(bn[, -1]) %*% ld$scaling > plot(pc1$scores[,1], cv, pch=20, col=col, + xlab= PC1, ylab= CV1, main= PCA and CVA of Swiss Banknotes Data, + sub= Standardized Data ) > abline(h=0, v=0, lty= dotted, col= grey ) > text(pc1$scores[,1], cv, labels=c(1:, 1:), cex=0.6, pos=3) > round(cor(bn[, -1], cv), 2) LD1 Length Left 0.52 Right 0.61 Bottom 0.80 Top 0.63 Diagonal > cor(pc1$scores[,1], cv) LD1 [1,]
12 4 BEYOND PRINCIPAL COMPONENTS 12 Some remarks are given below 1. It is clear that the canonical variate achieves optimal separation, with genuine bills assuming negative scores and forged bills assuming positive scores. 2. Interpretation is again obtainable from correlations with observed variables. Maximum absolute correlations of the canonical variate are with Diagonal ( 0.94) and Bottom (0.80). 3. In this case PCA and CVA produce similar results, as shown by the strong linear relation between the first principal component and the canonical variate. But in general it needs not be so.
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationPRINCIPAL COMPONENTS ANALYSIS
PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More information12.2 Dimensionality Reduction
510 Chapter 12 of this dimensionality problem, regularization techniques such as SVD are almost always needed to perform the covariance matrix inversion. Because it appears to be a fundamental property
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More information1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationDiscriminant Analysis with High Dimensional. von Mises-Fisher distribution and
Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von
More informationLecture 4: Principal Component Analysis and Linear Dimension Reduction
Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationMobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti
Mobile Robotics 1 A Compact Course on Linear Algebra Giorgio Grisetti SA-1 Vectors Arrays of numbers They represent a point in a n dimensional space 2 Vectors: Scalar Product Scalar-Vector Product Changes
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationPrincipal Component Analysis (PCA) Theory, Practice, and Examples
Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationPrincipal component analysis
Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationMultivariate Statistics (I) 2. Principal Component Analysis (PCA)
Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationPrincipal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17
Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into
More informationA Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag
A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization
ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21 Symmetric matrices An n n
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationYORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions
YORK UNIVERSITY Faculty of Science Department of Mathematics and Statistics MATH 222 3. M Test # July, 23 Solutions. For each statement indicate whether it is always TRUE or sometimes FALSE. Note: For
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationMATH 1553 PRACTICE FINAL EXAMINATION
MATH 553 PRACTICE FINAL EXAMINATION Name Section 2 3 4 5 6 7 8 9 0 Total Please read all instructions carefully before beginning. The final exam is cumulative, covering all sections and topics on the master
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationPrincipal Component Analysis (PCA) Principal Component Analysis (PCA)
Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationCollinearity: Impact and Possible Remedies
Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns
More information(v, w) = arccos( < v, w >
MA322 Sathaye Notes on Inner Products Notes on Chapter 6 Inner product. Given a real vector space V, an inner product is defined to be a bilinear map F : V V R such that the following holds: For all v
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationLinear Algebra in Actuarial Science: Slides to the lecture
Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationSTAT 730 Chapter 1 Background
STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationNeuroscience Introduction
Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationR in Linguistic Analysis. Wassink 2012 University of Washington Week 6
R in Linguistic Analysis Wassink 2012 University of Washington Week 6 Overview R for phoneticians and lab phonologists Johnson 3 Reading Qs Equivalence of means (t-tests) Multiple Regression Principal
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationLecture Notes 2: Matrices
Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationPrincipal Component Analysis
I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationPrincipal Components Theory Notes
Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More information1 9/5 Matrices, vectors, and their applications
1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric
More information