Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Similar documents
Massachusetts Institute of Technology Department of Economics Statistics. Lecture Notes on Matrix Algebra

STAT200C: Review of Linear Algebra

Matrix Algebra, part 2

Introduction Eigen Values and Eigen Vectors An Application Matrix Calculus Optimal Portfolio. Portfolios. Christopher Ting.

4. Determinants.

Chapter 3. Matrices. 3.1 Matrices

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Introduction to Matrix Algebra

Appendix A: Matrices

Systems of Algebraic Equations and Systems of Differential Equations

Linear Algebra: Matrix Eigenvalue Problems

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

Econ Slides from Lecture 7

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

The Singular Value Decomposition

Knowledge Discovery and Data Mining 1 (VO) ( )

Dimension. Eigenvalue and eigenvector

Numerical Linear Algebra Homework Assignment - Week 2

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Linear Algebra Formulas. Ben Lee

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Chapter 3 Transformations

Lecture 8: Linear Algebra Background

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Math 315: Linear Algebra Solutions to Assignment 7

Principal Components Analysis. Sargur Srihari University at Buffalo

Common-Knowledge / Cheat Sheet

Recall : Eigenvalues and Eigenvectors

Conceptual Questions for Review

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

CP3 REVISION LECTURES VECTORS AND MATRICES Lecture 1. Prof. N. Harnew University of Oxford TT 2013

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

and let s calculate the image of some vectors under the transformation T.

1 Principal component analysis and dimensional reduction

CS 246 Review of Linear Algebra 01/17/19

Chapter 1. Matrix Algebra

Lecture 1 and 2: Random Spanning Trees

MA 1B ANALYTIC - HOMEWORK SET 7 SOLUTIONS

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

1. Linear systems of equations. Chapters 7-8: Linear Algebra. Solution(s) of a linear system of equations (continued)

Repeated Eigenvalues and Symmetric Matrices

Computational Methods. Eigenvalues and Singular Values

Recall the convention that, for us, all vectors are column vectors.

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Chapter 5 Eigenvalues and Eigenvectors

ANOVA: Analysis of Variance - Part I

Ma/CS 6b Class 20: Spectral Graph Theory

Review problems for Math 511

Therefore, A and B have the same characteristic polynomial and hence, the same eigenvalues.

Ma/CS 6b Class 20: Spectral Graph Theory

Mathematical Foundations of Applied Statistics: Matrix Algebra

Matrix Vector Products

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Solution Set 7, Fall '12

0.1 Eigenvalues and Eigenvectors

Singular Value Decomposition and Principal Component Analysis (PCA) I

3.3 Eigenvalues and Eigenvectors

Chapter 3. Determinants and Eigenvalues

Mathematical foundations - linear algebra

3 Matrix Algebra. 3.1 Operations on matrices

Review of Linear Algebra

Properties of Linear Transformations from R n to R m

c c c c c c c c c c a 3x3 matrix C= has a determinant determined by

6 EIGENVALUES AND EIGENVECTORS

A Little Necessary Matrix Algebra for Doctoral Studies in Business & Economics. Matrix Algebra

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another.

Linear Algebra: Characteristic Value Problem

LS.2 Homogeneous Linear Systems with Constant Coefficients

Principal component analysis

Chap 3. Linear Algebra

Announcements Wednesday, November 01

Exercise Sheet 1.

Chapter 1. Matrix Calculus

Eigenvalues and Eigenvectors

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

MATH 431: FIRST MIDTERM. Thursday, October 3, 2013.

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Math Matrix Algebra

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

Announcements Wednesday, November 01

Linear Algebra Primer

ICS 6N Computational Linear Algebra Eigenvalues and Eigenvectors

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Principal Component Analysis

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Linear Algebra Review. Fei-Fei Li

3 (Maths) Linear Algebra

Eigenvalues and diagonalization

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

MAC Module 12 Eigenvalues and Eigenvectors. Learning Objectives. Upon completing this module, you should be able to:

B553 Lecture 5: Matrix Algebra Review

MAC Module 12 Eigenvalues and Eigenvectors

Quick Tour of Linear Algebra and Graph Theory

Notes on Linear Algebra and Matrix Theory

Transcription:

Principal Component Analysis (PCA) PCA is a widely used statistical tool for dimension reduction. The objective of PCA is to find common factors, the so called principal components, in form of linear combinations of the variables under investigation, and to rank them according to their importance. Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R, R = r 11 r 21 r N1 r 12 r 22 r N2...... r 1T r 2T r NT. That is, r it is the return of asset i at time t. Usually centered data are used, so that R R/(T 1) is the sample covariance matrix (or correlation matrix) of the returns under study. 1

The First Principal Component Let us start with one variable, say p. Variable p takes T values, to be arranged in a column vector p = [p 1,..., p T ]. p is not yet determined, but let us proceed as if it were. Then our approximation takes the form R pa, where a is an N dimensional column vector, i.e., r 11 r 21 r N1 r 12 r 22 r N2...... r 1T r 2T r NT = p 1 p 2. p T p 1 a 1 p 1 a 2 p 1 a N p 2 a 1 p 2 a 2 p 2 a N...... p T a 1 p T a 2 p T a N. [ ] a1 a N. 2

Thus, r it is approximated by p t a i. The matrix of discrepancies is R pa. Our criterion for choosing p and a will be to select these vectors such that the sum of squares of all T N discrepancies is minimized, i.e., N i=1 T (r it p t a i ) 2 = tr[(r pa ) (R pa )], (1) t=1 using property (14) of the trace (see Appendix). Note that the product pa remains unchanged when p is multiplied by some scalar c 0 and a by 1/c. By imposing T p 2 t = p p = 1, (2) t=1 we obtain uniqueness except for sign. 3

Then our objective function (1) becomes S = tr[(r pa ) (R pa )] = tr(r R) tr(ap R) tr(r pa ) +tr(a }{{} p p a ) =1 = tr(r R) 2p Ra + a a, (3) using that, from (13), tr(ap R) = tr(p Ra) = p Ra, tr(r pa ) = tr(pa R ) = tr(a R p) = a R p = p Ra, and tr(aa ) = tr(a a) = a a. 4

Differentiating (3) with respect to a (for given p) and putting the derivative equal to zero, S a = 2R p + 2a = 0, gives a = R p. (4) Now substitute (4) in the objective function (3) to obtain S = tr(r R) p RR p, showing that our new task is to maximize p RR p with respect to p, subject to (2). The Lagrangian is L = p RR p + λ(p p 1). 5

The first order condition requires that L p = 2RR p 2λp = 0 where I is the identity matrix. (RR λi)p = 0, (5) For (5) to have a nontrivial solution (p 0), we must have that det(rr λi) = 0, (6) which means that p is an eigenvector of the T T positive semidefinite matrix RR corresponding to the eigenvalue (or root) λ. As RR has, in general, N nonzero eigenvalues (if the sample covariance matrix is of full rank), we have to determine which eigenvalue is to be taken. 6

To do so, multiply (5) by p, resulting in p RR p = λp p = λ, (7) which, as we want to maximize p RR p, means that we should take the largest root of RR. Note that all roots of RR are nonnegative, and the positive roots are those of R R, which is T 1 times the sample covariance matrix of the returns under consideration. Note that by multiplying (5) by R we also obtain (R R λi) R p }{{} =a = (R R λi)a = 0, (8) which means that a is an eigenvector of R R corresponding to the largest root of R R (note that R R and RR have the same nonzero eigenvalues). Furthermore, (4) and (5) imply λp (5) = RR p (4) = Ra p = 1 Ra. (9) λ 7

Vector p given by (9), which is a linear combination of the original variables in R, is the first principal component of the N variables in R. 8

Other Principal Components Let us use subscripts for the first principal component, i.e., p 1, a 1, λ 1, and similarly for the second, third,... principal component. Currently, our matrix is approximated by p 1 a 1. The residual matrix is R p 1 a 1, which in turn will be approximated by another principal component, p 2, with corresponding coefficient vector a 2. As before, for identification, put p 2p 2 = 1. Then we want to minimize S 2 = tr[(r p 1 a 1 p 2 a 2) (R p 1 a 1 p 2 a 2)]. It turns out that the second principal component p 2 is equal to the unit length eigenvector of RR corresponding to the second largest eigenvalue, λ 2, of RR, or, equivalently, of R R. 9

Moreover, a 2 R R, and is the corresponding eigenvector of p 2 = 1 λ 2 Ra 2. We can go on in this way by deriving principal components. The ith such component minimizes the sum of squares of the discrepancies that are left after the earlier components have done their work. The result is that p i is the unit length characteristic vector of RR corresponding to the ith largest eigenvalue, λ i. To find the length of vector a i, use p i = Ra i /λ i, which gives p ip i = 1 = a ir Ra i /λ 2 i = a ia i λ i /λ 2 i a i a i = λ i. 10

As R R and RR have the same nonzero eigenvalues, one may also work in terms of the sample 1 covariance matrix T 1 R R, which is of primary interest in our context. This means that we perform a PCA on the Variables R/ T 1, where R contains the centered (demeaned) returns. In general, if we use r principal components to approximate the variables under study, the approximation is given by R/ T 1 r p i a i = P A, i=1 where P = [p 1,..., p r ], A = [a 1,..., a r ], and an approximation for the covariance matrix is R R/(T 1) AP P A = AA, as P P = I. (10) 11

P P = I follows from our normalization p i p i = 1 and the fact that eigenvectors corresponding to different eigenvalues of symmetric matrices are orthogonal (see Appendix). Note that this means that the principal components are uncorrelated. Note that this approximation will be singular as long as r < N. A full rank covariance matrix can be obtained, however, and quite similar to the Single Index Model, by adding a diagonal matrix of asset specific error variance terms (which are assumed to be uncorrelated). The easiest way to do so is just to replace the diagonal elements of (10) with the sample variances of the individual assets. 12

The rationale behind this procedure is that we want to reduce the number of risk factors to a lower dimension. That is, we hope to capture the systematic part of asset covariation by using just a few principal components, while the covariation in the sample covariance matrix which is not captured by these first few principal components is due to random noise, i.e., it will not improve or even considerably deteriorate forecasts of future asset covariance. As this is a statistical factor model, the factors need not have an economic or financial interpretation. The discussion of principal component analysis given here closely follows Henri Theil (1971). Principles of Econometrics. Amsterdam: John Wiley & Sons. See, in particular, pp. 46-56. 13

Choosing the Number of Principal Components The eigenvalues may be used to measure the relative importance of the corresponding components. The argument is based on the criterion used: The sum of squares of all T N discrepancies. Before any component is used the discrepancies are the elements of R, and their sum of squares is N i=1 T rit 2 = tr(r R). t=1 14

The residual sum of discrepancies with r principal components is given by ( S = tr R = tr(r R) 2 = tr(r R) 2 ) ( r p i a i R i=1 r i) p i a i=1 r tr(r p i a i) + tr(a i p ip j a j) i j r r tr(r p i a i) + a ia i i=1 i=1 i=1 = tr(r R) 2 i p irr p i + i p irr p i = tr(r R) i p irr p i = tr(r R) i p ip i λ i = tr(r R) r λ i, i=1 where the third equality uses p i p j = 0 (1) for i j (i = j). 15

Thus, component i accounts for a reduction of the sum of squared discrepancies equal to λ i. For example, component i accounts for λ i tr(r R) = N λ i λ j j=1 percent of the total variation, and the first r principal components account for r λ j j=1 tr(r R) = r λ j j=1 N λ j j=1 percent of the total variation. 16

The following selection methods are frequently used in practical work: Percent of variance: For a fixed fraction δ, choose r such that it is the smallest number for which r λ j j=1 tr(r R) δ. Average Eigenvalue: Keep all principal components whose eigenvalues exceed the average eigenvalue, N 1 j λ j. Scree Graphs: This is named after the geological term scree (Geröllfeld), referring to the scree at the foot of a rocky cliff. Here, the relevant eigenvalues are the cliff and the unimportant components are represented by the smaller eigenvalues forming the scree. Clearly these methods do not represent formal statistical tests but rather rules of thumb. 17

Example Consider our 24 stocks from the DAX, monthly returns over the period 1996-2001, 60 observations for each stock. The average eigenvalue is given by 91.6254. Thus, when we use the Average Eigenvalue rule to determine the number of components, we will use the first 6 principal components. 1 When we want to employ the Percent of variance rule with, for example, δ = 0.75, we use the first 7 principal components. The Scree Graph also suggests something in this direction. (?) 1 The eigenvalues are shown in the table on the next page. 18

i λ i λ i / 24 j=1 λ j i j=1 λ i/ 24 j=1 λ j 1 746.2738 0.3394 0.3394 2 305.8800 0.1391 0.4785 3 183.0373 0.0832 0.5617 4 134.0729 0.0610 0.6227 5 115.0188 0.0523 0.6750 6 98.9506 0.0450 0.7200 7 82.4595 0.0375 0.7575 8 69.9632 0.0318 0.7893 9 66.0017 0.0300 0.8193 10 60.7800 0.0276 0.8469 11 54.2673 0.0247 0.8716 12 46.9439 0.0213 0.8930 13 42.5606 0.0194 0.9123 14 35.6098 0.0162 0.9285 15 27.8244 0.0127 0.9412 16 24.1203 0.0110 0.9521 17 23.3074 0.0106 0.9627 18 20.6172 0.0094 0.9721 19 15.4306 0.0070 0.9791 20 12.3780 0.0056 0.9848 21 11.6064 0.0053 0.9900 22 9.4735 0.0043 0.9943 23 7.7192 0.0035 0.9979 24 4.7125 0.0021 1.0000 19

800 Eigenvalues of Sample Covariance Matrix 700 600 500 400 300 200 100 0 0 5 10 15 20 25 20

Economic Interpretation of the Components Compared to approaches using financial or macroeconomic variables as factors, the factors extracted using a purely statistical procedure such as PCA are more difficult to interpret (at least for equity portfolios). An exception is the first factor, which is usually highly correlated with an appropriate market index. That is, the first principal component captures the common trend. For our example, suppose we use the first 6 principle components. Then the correlations between these 6 components and the DAX index are as follows: 1 2 3 4 5 6 0.888 0.366 0.106 0.060-0.081-0.005 The first row of the table indicates the component, the second is the correlation with the DAX. 21

Appendix The Trace of a Square Matrix The trace of an n n matrix is the sum of its diagonal elements: tr(a) = n a i i. (11) i=1 Clearly tr(a + B) = tr(a) + tr(b). Moreover, for A m n and B n m, tr(ab) = tr(ba) = m i=1 n a ij b ji. (12) j=1 It follows from (12) that, for conformable matrices A, B and C (permutation rule), tr(abc) = tr(bca) = tr(cab). (13) 22

The sum of squares of all elements a ij of an m n matrix A can be written as the trace of A A: tr(a A) = m i=1 n a 2 ij. (14) j=1 23

Eigenvalues and Eigenvectors An eigenvalue (or root) of an n n matrix A is a real or complex scalar λ satisfying the equation Ax = λx (15) for some nonzero vector x, which is an eigenvector corresponding to λ. Note that an eigenvector is only determined up to a scalar multiple. Equation (15) can be written as (A λi)x = 0, which requires that matrix A λi is singular, or, equivalently, det(a λi) = 0. (16) 24

As det(a λi), which is known as the characteristic polynomial of matrix A, is a polynomial of degree n in λ, an n n matrix has n eigenvalues (counting multiplicities). For illustration, consider the 2 2 matrix [ ] a11 a A = 12. a 21 a 22 Matrix A s characteristic equation is [ ] λ a11 a P (λ) = det(λi 2 A) = det 12 a 21 λ a 22 = (λ a 11 )(λ a 22 ) a 12 a 21 = λ 2 (a 11 + a 22 )λ + a 11 a 22 a 12 a 21 = λ 2 tr(a)λ + det A = 0, which is polynomial of degree 2 in λ, i.e., a quadratic. Thus, A has eigenvalues λ 1 2 = tr(a) ± tr(a) 2 4 det A. (17) 2 25

A general property is that the sum λ 1 + + λ n of the eigenvalues of an n n matrix A is equal to its trace, i.e., tr(a) = n a ii = i=1 n λ i. (18) i=1 For our example, from (17), it is directly observable that λ 1 + λ 2 = a 11 + a 22 = tr(a). In general, the eigenvalues of a matrix may be real or complex. However, for positive definite symmetric matrices (e.g., covariance matrices), we have the following results: i) The eigenvalues of a positive definite matrix are positive. To see this, recall that, for such as matrix, x Ax > 0, x 0. 26

Then, using the definition of an eigenvalue, 0 < x Ax = λx x for a positive definite matrix, thus λ > 0. ii) The eigenvectors of any symmetric matrix are orthogonal if they correspond to different roots: Write λ 1 and λ 2 (λ 1 λ 2 ) for the two roots, and x and y for the corresponding vectors: Ax = λ 1 x (19) Ay = λ 2 y. (20) Multiply (19) by y and (20) by x. Since A = A for symmetric matrix A, x Ay = y Ax, and it follows that 0 = y Ax x Ay = (λ 1 λ 2 )x y. Hence x y = 0. 27

(iii) For any n m matrix A, A A and AA have the same nonzero eigenvalues. (The number of nonzero eigenvalues is equal to the rank of A.) (Premultiplication by A shows that (AA λi)x = 0 implies (A A λi)a x = 0.) 28