Eigenvalues and diagonalization

Similar documents
Linear Algebra - Part II

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

Properties of Matrices and Operations on Matrices

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Maths for Signals and Systems Linear Algebra in Engineering

1 Linearity and Linear Systems

Foundations of Computer Vision

Maths for Signals and Systems Linear Algebra in Engineering

Linear Algebra for Machine Learning. Sargur N. Srihari

14 Singular Value Decomposition

Linear Algebra (Review) Volker Tresp 2017

Deep Learning Book Notes Chapter 2: Linear Algebra

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Linear Algebra (Review) Volker Tresp 2018

AMS526: Numerical Analysis I (Numerical Linear Algebra)

This appendix provides a very basic introduction to linear algebra concepts.

Large Scale Data Analysis Using Deep Learning

Vectors and Matrices Statistics with Vectors and Matrices

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Linear Algebra Review. Vectors

1 Last time: least-squares problems

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

The Singular Value Decomposition

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

18.06 Problem Set 8 Solution Due Wednesday, 22 April 2009 at 4 pm in Total: 160 points.

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Lecture: Face Recognition and Feature Reduction

Singular Value Decomposition

Multivariate Statistical Analysis

EE731 Lecture Notes: Matrix Computations for Signal Processing

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Lecture 7: Positive Semidefinite Matrices

Announcements Wednesday, November 01

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Definition (T -invariant subspace) Example. Example

Singular Value Decomposition

Study Notes on Matrices & Determinants for GATE 2017

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

UNIT 6: The singular value decomposition.

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

MIT Final Exam Solutions, Spring 2017

Lecture 6: Methods for high-dimensional problems

Lecture 14 Singular Value Decomposition

Notes on Eigenvalues, Singular Values and QR

CS 143 Linear Algebra Review

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

CS540 Machine learning Lecture 5

Review problems for MA 54, Fall 2004.

Lecture: Face Recognition and Feature Reduction

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Review of Linear Algebra

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

Diagonalization of Matrices

Background Mathematics (2/2) 1. David Barber

Principal Component Analysis

Lecture 8: Linear Algebra Background

235 Final exam review questions

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

1 Singular Value Decomposition and Principal Component

Math 3191 Applied Linear Algebra

7 Principal Component Analysis

Lecture 7 Spectral methods

Lecture 3: Review of Linear Algebra

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

3 (Maths) Linear Algebra

Lecture 3: Review of Linear Algebra

MATH 423 Linear Algebra II Lecture 20: Geometry of linear transformations. Eigenvalues and eigenvectors. Characteristic polynomial.

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

7. Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms

Linear Algebra & Geometry why is linear algebra useful in computer vision?

B553 Lecture 5: Matrix Algebra Review

Stat 159/259: Linear Algebra Notes

Machine Learning (Spring 2012) Principal Component Analysis

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

COMP 558 lecture 18 Nov. 15, 2010

Role of Linear Algebra in Numerical Analysis

Conceptual Questions for Review

Functional Analysis Review

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Recall the convention that, for us, all vectors are column vectors.

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra Primer

Symmetric and anti symmetric matrices

Designing Information Devices and Systems II

TBP MATH33A Review Sheet. November 24, 2018

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Lecture 10 - Eigenvalues problem

Basic Calculus Review

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Transcription:

Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20

Introduction The next topic in our course, principal components analysis, revolves around decomposing large, complicated, multivariate relationships into simple uncorrelated elements Today s lecture is a primer on the mathematical tools that provide the basis for these methods The proofs of most of today s results are beyond the scope of this course, but the results themselves we will use going forward to derive statistical methods and establish theoretical results Patrick Breheny BST 764: Applied Statistical Modeling 2/20

Direction and length One can think about decomposing a vector v into two separate pieces of information, its direction d and its length λ: λ = v = d = v λ This makes it easier to work with the magnitude of a vector (λ is a scalar) while ignoring its direction, and vice versa (e.g., rotating a vector) j v 2 j Patrick Breheny BST 764: Applied Statistical Modeling 3/20

Eigenvalues and eigenvectors This idea can be extended to matrices as well after all, what is a matrix A but a collection of vectors? This is the idea behind eigenvalues and eigenvectors, which are defined according to this equation: Av = λv Any vector of unit length v which satisfies this equation is special to A: when A operates in that direction, it acts merely to elongate or shrink In other words, the vector is not rotated, and direction is preserved Patrick Breheny BST 764: Applied Statistical Modeling 4/20

Eigenvalues and eigenvectors (cont d) The v s which satisfy this equation are called eigenvectors The λ s are the eigenvalues We will begin by considering the case where A is symmetric; later in the lecture, we will consider more general cases Patrick Breheny BST 764: Applied Statistical Modeling 5/20

Number of solutions Theorem: If A is a symmetric n n matrix, then it has exactly n pairs {λ j, v j } which satisfy the equation Av = λv The n eigenvalues are sometimes referred to as the spectrum of A Patrick Breheny BST 764: Applied Statistical Modeling 6/20

Eigenvalues, traces, and determinants Theorem: If A has eigenvalues λ 1, λ 2,..., λ n, tr(a) = n i=1 λ i Theorem: If A has eigenvalues λ 1, λ 2,..., λ n, A = n i=1 λ i Patrick Breheny BST 764: Applied Statistical Modeling 7/20

Eigenvalues and positive definite matrices Theorem: A positive definite all eigenvalues of A are positive Theorem: A positive semidefinite all eigenvalues of A are nonnegative Patrick Breheny BST 764: Applied Statistical Modeling 8/20

Decomposition Perhaps the most useful aspect of eigenvalues is that they can be used to factor a matrix into simpler elements This general notion is referred to as matrix factorization or matrix decomposition, and it is the main idea behind principal component analysis We introduce now a factorization for symmetric matrices called the eigendecomposition; a more general factorization called the singular value decomposition will be introduced later and applies to all matrices Patrick Breheny BST 764: Applied Statistical Modeling 9/20

Eigendecomposition Lemma: If A is a symmetric matrix, its eigenvectors are orthonormal. Theorem: Any symmetric matrix A can be factored into: A = QΛQ T, where Λ is a diagonal matrix containing the eigenvalues of A, and the columns of Q contain its orthonormal eigenvectors Patrick Breheny BST 764: Applied Statistical Modeling 10/20

Inverses Eigenvalues and eigenvectors Theorem: Suppose A has eigenvectors Q and eigenvalues {λ i }. Then A 1 has eigenvectors Q and eigenvalues {λ 1 i } In other words, if A = QΛQ T, A 1 = QΛ 1 Q T Patrick Breheny BST 764: Applied Statistical Modeling 11/20

Rank Eigenvalues and eigenvectors As you may have supposed from the fact that eigenvalues are intricately tied up with determinants and inverses, they are also related to the rank of a matrix Theorem: Suppose A has rank r. Then A has r nonzero eigenvalues, and the remaining n r eigenvalues are equal to zero. Patrick Breheny BST 764: Applied Statistical Modeling 12/20

Eigenvalues in action: Ridge regression To get a sense for how these facts are useful in statistics, let s look back at some unproven theorems from our lecture on ridge regression Theorem: For any design matrix X, the quantity X T X + λi is always invertible; thus, there is always a unique solution β ridge Theorem: The degrees of freedom for a ridge regression estimate are λi λ i + λ, where {λ i } are the eigenvalues of X T X Patrick Breheny BST 764: Applied Statistical Modeling 13/20

Diagonalization We have seen that eigenvalues are useful because they allow you, when performing certain tasks, to do away with all the complexities of matrix algebra and work on the more familiar level of scalars This was possible because all the relevant computations took place on a diagonal matrix, Λ This idea is known as diagonalization Perhaps the most amazing result in all of linear algebra is that any matrix can be diagonalized in what is known as the singular value decomposition Patrick Breheny BST 764: Applied Statistical Modeling 14/20

Theorem: For any matrix A, we can write A = UDV T, where D is a diagonal matrix with nonnegative entries, and both U and V are orthogonal (i.e., U T U = V T V = I). Non-square matrices do not have eigenvalues, but the elements of D (called the singular values of A) are the square roots of the eigenvalues of A T A (or AA T ) Patrick Breheny BST 764: Applied Statistical Modeling 15/20

Dimensions of the SVD If A is an n n matrix, then U, D, and V are all n n matrices If A is an n p matrix with n > p, then U is n p, while both D and V are p p matrices If A is an n p matrix with n < p, then V T is n p, while both D and U are n n matrices In other words, D is square, with dimension equal to the minimum of n and p Patrick Breheny BST 764: Applied Statistical Modeling 16/20

SVD, applied to ridge regression As before, let s go back to ridge regression for an example of the singular value decomposition in action Theorem: Let X = UDV T. Then X T X = VD 2 V T X(X T X) 1 X T y = UU T y X T X + λi = V(D 2 + λi)v T X(X T X + λi) 1 X T y = USU T y, where S is a diagonal matrix with elements S jj = d2 j d 2 j + λ Patrick Breheny BST 764: Applied Statistical Modeling 17/20

Remarks Eigenvalues and eigenvectors Once again we see the shrinkage aspect of ridge regression Note, however, that small values of d j are shrunken quite a lot, while large values are shrunk very little What do small values of d j mean? Recall the connection between the singular values and eigenvalues; in particular, if X has rank r, then D has r nonzero elements Patrick Breheny BST 764: Applied Statistical Modeling 18/20

Remarks (cont d) The preceding remark holds in a more qualitative sense as well; if X is nearly singular (i.e., has problems with multicollinearity, but is still full rank), then it will have one or more d j very close to zero Thus we have another view of how ridge regression stabilizes estimation in the presence of multicollinearity: d j d 2 j + λ is highly variable and approaches being undefined as d j 0 if λ is not present, but with ridge regression, the quantity is much more stable Patrick Breheny BST 764: Applied Statistical Modeling 19/20

Dimension reduction Instead of shrinkage, a different approach is to eliminate all the small singular values In other words, suppose that X contains 10 explanatory variables, but that two of its singular values are near zero We can eliminate those two values and deal with a more stable, rank-8 version of X This is the main idea behind principal components analysis, and we will get into specifics in the next lecture; for now, let me mention that this idea, of approximating a matrix with a low-rank version of it that hopefully captures all the important information, goes far beyond statistics and is widely used to reduce the storage and computations involved in working with large matrices Patrick Breheny BST 764: Applied Statistical Modeling 20/20