Linear Algebra in a Nutshell: PCA. can be seen as a dimensionality reduction technique Baroni & Evert. Baroni & Evert

Size: px

Start display at page:

Download "Linear Algebra in a Nutshell: PCA. can be seen as a dimensionality reduction technique Baroni & Evert. Baroni & Evert"

Sabina Norris
5 years ago
Views:

1 What is? a Nutshell: a Nutshell: can be seen as a dimensionality technique to find the inherent underlying dimensions of a Co matrix a Nutshell Dimensions & Marco Baroni & Stefan Evert Co matrix exploits correlations between the variables (coordinates) essentially the same as SVD and LSA, but the rationale behind the procedure becomes clearer in the approach Institute of Cognitive Science University of Osnabrück, Germany stefanevert@uosde Rovereto, 27 March 2007 set set a Nutshell: Co matrix example: term-term word space cooccurrence data extracted from the BNC for nouns as direct objects of verbs and k = 111 nouns with f 20 (which occur with either verb) vector coordinates are association scores (modified logarithmic Dice coefficient) n = 2 dimensions noun bond cigarette dress freehold land number per pub share system a Nutshell: Co matrix intuitive expecation: associations of a noun with and should be correlated (commodities tend to have high associations with both, non-commodities low associations with both) the main inherent dimension should be a combination of the two association scores the secondary dimension has a less clear interpretation and will typically be omitted from a semantic space ( dimensionality ) of course, real-life word spaces have many more dimensions and not just a single interesting one

2 The of a Centering the a Nutshell: Co matrix the rationale behind is to find the dimensions that give the best explanation for the spread or of the data of a set of vectors (you remember the equations for one-dimensional data, right?): σ 2 = 1 k 1 µ = 1 k x i µ 2 x i a Nutshell: Co matrix uncentered centered of centered data easier to calculate if we center the data so that µ = 0 Centering the Centering the a Nutshell: a Nutshell: Co matrix uncentered centered of centered data Co matrix uncentered centered of centered data σ 2 = 1 x i 2 = 126

3 a Nutshell: Co matrix The approach we want to reduce the dimensionality of the data without losing (intuitively, we want to preserve distances between the points as far as possible) if we reduced the to just a single dimension, which dimension would still have the highest? mathematically, we project the points onto a line through the origin and calculate standard on this line we ll see in a moment how to calculate the projections but first, let us look at a few examples a Nutshell: Co matrix and preserved : examples = 036 a Nutshell: Co matrix and preserved : examples = 072 a Nutshell: Co matrix and preserved : examples = 09

4 The mathematics of projections The co matrix a Nutshell: Co matrix line through origin can be described by unit vector v = 1 given a point x and the corresponding unit vector x = x/ x, we have cos ϕ = x, v x x x ϕ x v 1 P v x x, v v trigonometry: position of projected point on the line is x cos ϕ = x x, v = x, v (projected point in original space is x, v v) amount of preserved = one-dimensional on the line (the is still centered) σ 2 v = 1 k 1 x i, v 2 a Nutshell: Co matrix we want to find the direction v with maximal σ 2 v simplify the repeated calculation of σ 2 v σ 2 v = 1 = 1 = 1 = v T x i, v 2 ( ( T ( x i T v) x i T ( v T x i x i T 1 ) v ) x i x i T v } {{ } =:C = v T C v ) v The co matrix Maximizing the preserved variation a Nutshell: Co matrix C is the co matrix of the data points C is a square n n matrix (2 2 in our example) preserved after projection onto a line v can easily be calculated as σ 2 v = v T C v the original of the is σ 2 = tr(c) = C 11 + C C nn σ 2 1 C 12 C 1n C 21 σ 2 2 C = Cn 1,n a Nutshell: Co matrix in our data, we want to find the axis v 1 that preserves the largest amount of variation by maximizing v T 1 C v 1 for higher-dimensional, we also want to find the axis v 2 of second highest variation, etc this has to be constrained: v 2 must be orthogonal to v 1, ie v 1, v 2 = 0 (and the same for v 3 etc) we can easily solve this problem using a result from linear algebra: since C is a symmetric matrix (C T = C), it has an eigenvalue decomposition with orthogonal eigenvectors a 1, a 2,, a n and corresponding eigenvalues λ 1 λ 2 λ n C n1 C n,n 1 σ 2 n

5 The eigenvalue decomposition of C The a Nutshell: Co matrix the eigenvalue decomposition of C can also be written in the form C = U D U T where U is an orthogonal matrix containing the eigenvectors as columns and D = Diag(λ 1,, λ n ) a diagonal matrix of eigenvalues λ 1 λ 2 U = a 1 a 2 a n D = note that both U and D are n n square matrices λ n a Nutshell: Co matrix now we have σ 2 v = v T C v = v T UDU T v = (U T v) T D (U T v) = ( y) T D y y = U T v = [y 1, y 2,, y n ] T are the coordinates of v according to the basis of eigenvectors of C y = 1 since orthogonal U T is an isometry we want to maximize v T C v = λ 1 (y 1 ) 2 + λ 2 (y 2 ) 2 + λ n (y n ) 2 under the constraint (y 1 ) 2 + (y 2 ) (y n ) 2 = 1 the obvious solution is y = [1, 0,, 0] T, since λ 1 is the largest eigenvalue this corresponds to v = a 1, the first eigenvector of C, and a preserved of σ 2 v = at 1 C a 1 = λ 1 The The a Nutshell: Co matrix in order to find the dimension of second highest, we have to look for an axis v orthogonal to a 1 since U T is an orthogonal matrix, the coordinates y = U T v have to be orthogonal to the first axis [1, 0,, 0] T, ie y = [0, y 2,, y n ] T in other words, we have to maximize v T C v = λ 2 (y 2 ) 2 + λ n (y n ) 2 under constraints y 1 = 0 and (y 2 ) (y n ) 2 = 1 again, the obvious solution is y = [0, 1, 0,, 0] T, corresponding to v = a 2, the second eigenvector of C, and a preserved of σ 2 v = λ 2 similarly for the third, fourth, axis a Nutshell: Co matrix the eigenvectors a i of the co matrix C are called the principal components of the the amount of preserved (or explained ) by the i-th principal component is given by the eigenvalue λ i since λ 1 λ 2 λ n, the first principal component preserves the largest amount of variation etc coordinates of a point x in space are given by U T x (note: these are the projections on the principal components) for the purpose of dimensionality, only the first l principal components (with highest ) are retained, and the other dimensions in space are dropped

6 example in R a Nutshell: Co matrix liquor advertising system arm part insurance clothe food bottle packet pound collection asset stock dress copy number one pair year time suit product property land book car good ticket share house a Nutshell: Co matrix > pca <- prcomp(m) > print(summary(pca)) Importance of components: PC1 PC2 Standard deviation Proportion of Variance Cumulative Proportion > print(pca) Standard deviations: [1] Rotation: PC1 PC in R a Nutshell: Co matrix > head(pca$x) PC1 PC2 acre advertising amount arm asset bag

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction