Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 1 / 26
Outline 1 Span & Linear Dependence 2 Norms 3 Eigendecomposition 4 Singular Value Decomposition 5 Traces and Determinant Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 2 / 26
Outline 1 Span & Linear Dependence 2 Norms 3 Eigendecomposition 4 Singular Value Decomposition 5 Traces and Determinant Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 3 / 26
Matrix Representation of Linear Functions Alinearfunction(ormaportransformation)f : R n! R m can be represented by a matrix A, A 2 R m n,suchthat f (x)=ax = y,8x 2 R n,y 2 R m Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26
Matrix Representation of Linear Functions Alinearfunction(ormaportransformation)f : R n! R m can be represented by a matrix A, A 2 R m n,suchthat f (x)=ax = y,8x 2 R n,y 2 R m span(a :,1,,A :,n ) is called the column space of A rank(a)=dim(span(a :,1,,A :,n )) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26
System of Linear Equations Given A and y,solvex in Ax = y Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y,solvex in Ax = y What kind of A that makes Ax = y always have a solution? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y,solvex in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = S i x i A :,i, the column space of A must contain R m,i.e., R m span(a :,1,,A :,n ) Implies n m Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y,solvex in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = S i x i A :,i, the column space of A must contain R m,i.e., R m span(a :,1,,A :,n ) Implies n m When does Ax = y always have exactly one solution? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y,solvex in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = S i x i A :,i, the column space of A must contain R m,i.e., R m span(a :,1,,A :,n ) Implies n m When does Ax = y always have exactly one solution? A has at most m columns; otherwise there is more than one x parametrizing each y Implies n = m and the columns of A are linear independent with each other Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y,solvex in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = S i x i A :,i, the column space of A must contain R m,i.e., R m span(a :,1,,A :,n ) Implies n m When does Ax = y always have exactly one solution? A has at most m columns; otherwise there is more than one x parametrizing each y Implies n = m and the columns of A are linear independent with each other A 1 exists at this time, and x = A 1 y Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
Outline 1 Span & Linear Dependence 2 Norms 3 Eigendecomposition 4 Singular Value Decomposition 5 Traces and Determinant Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 6 / 26
Vector Norms A norm of vectors is a function k k that maps vectors to non-negative values satisfying kxk = 0 ) x = 0 kx + ykapplekxk + kyk (the triangle inequality) kcxk = c kxk,8c 2 R Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26
Vector Norms A norm of vectors is a function k k that maps vectors to non-negative values satisfying kxk = 0 ) x = 0 kx + ykapplekxk + kyk (the triangle inequality) kcxk = c kxk,8c 2 R E.g., the L p norm kxk p = L 2 (Euclidean) norm: kxk =(x > x) 1/2 L 1 norm: kxk 1 =  i x i Max norm: kxk = max i x i! 1/p  x i p i Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26
Vector Norms A norm of vectors is a function k k that maps vectors to non-negative values satisfying kxk = 0 ) x = 0 kx + ykapplekxk + kyk (the triangle inequality) kcxk = c kxk,8c 2 R E.g., the L p norm kxk p = L 2 (Euclidean) norm: kxk =(x > x) 1/2 L 1 norm: kxk 1 =  i x i Max norm: kxk = max i x i! 1/p  x i p i x > y = kxkkykcosq, whereq is the angle between x and y x and y are orthonormal iff x > y = 0 (orthogonal) and kxk = kyk = 1 (unit vectors) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26
Matrix Norms Frobenius norm r kak F = Â A 2 i,j i,j Analogous to the L 2 norm of a vector An orthogonal matrix is a square matrix whose column (resp. rows) are mutually orthonormal, i.e., Implies A 1 = A > A > A = I = AA > Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 8 / 26
Outline 1 Span & Linear Dependence 2 Norms 3 Eigendecomposition 4 Singular Value Decomposition 5 Traces and Determinant Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 9 / 26
Decomposition Integers can be decomposed into prime factors E.g., 12 = 2 2 3 Helps identify useful properties, e.g., 12 is not divisible by 5 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26
Decomposition Integers can be decomposed into prime factors E.g., 12 = 2 2 3 Helps identify useful properties, e.g., 12 is not divisible by 5 Can we decompose matrices to identify information about their functional properties more easily? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26
Eigenvectors and Eigenvalues An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v : Av = lv, where l 2 R is called the eigenvalue corresponding to this eigenvector Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26
Eigenvectors and Eigenvalues An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v : Av = lv, where l 2 R is called the eigenvalue corresponding to this eigenvector If v is an eigenvector, so is any its scaling cv,c 2 R,c 6= 0 cv has the same eigenvalue Thus, we usually look for unit eigenvectors Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26
Eigendecomposition I Every real symmetric matrix A 2 R n n can be decomposed into A = Qdiag(l)Q > l 2 R n consists of real-valued eigenvalues (usually sorted in descending order) Q =[v (1),,v (n) ] is an orthogonal matrix whose columns are corresponding eigenvectors Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26
Eigendecomposition I Every real symmetric matrix A 2 R n n can be decomposed into A = Qdiag(l)Q > l 2 R n consists of real-valued eigenvalues (usually sorted in descending order) Q =[v (1),,v (n) ] is an orthogonal matrix whose columns are corresponding eigenvectors Eigendecomposition may not be unique When any two or more eigenvectors share the same eigenvalue Then any set of orthogonal vectors lying in their span are also eigenvectors with that eigenvalue Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26
Eigendecomposition I Every real symmetric matrix A 2 R n n can be decomposed into A = Qdiag(l)Q > l 2 R n consists of real-valued eigenvalues (usually sorted in descending order) Q =[v (1),,v (n) ] is an orthogonal matrix whose columns are corresponding eigenvectors Eigendecomposition may not be unique When any two or more eigenvectors share the same eigenvalue Then any set of orthogonal vectors lying in their span are also eigenvectors with that eigenvalue What can we tell after decomposition? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26
Eigendecomposition II Because Q =[v (1),,v (n) ] is an orthogonal matrix, we can think of A as scaling space by l i in direction v (i) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 13 / 26
Rayleigh s Quotient Theorem (Rayleigh s Quotient) Given a symmetric matrix A 2 R n n,then8x 2 R n, l min apple x> Ax x > x apple l max, where l min and l max are the smallest and largest eigenvalues of A. x > Px x > x = l i when x is the corresponding eigenvector of l i Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 14 / 26
Singularity Suppose A = Qdiag(l)Q >,thena 1 = Qdiag(l) 1 Q > Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26
Singularity Suppose A = Qdiag(l)Q >,thena 1 = Qdiag(l) 1 Q > A is non-singular (invertible) iff none of the eigenvalues is zero Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26
Positive Definite Matrices I A is positive semidefinite (denoted as A O) iffitseigenvaluesare all non-negative x > Ax 0 for any x A is positive definite (denoted as A O) iffitseigenvaluesareall positive Further ensures that x > Ax = 0 ) x = 0 Why these matter? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 16 / 26
Positive Definite Matrices II Afunctionf is quadratic iff it can be written as f (x)= 1 2 x> Ax b > x + c, wherea is symmetric Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26
Positive Definite Matrices II Afunctionf is quadratic iff it can be written as f (x)= 1 2 x> Ax b > x + c, wherea is symmetric x > Ax is called the quadratic form Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26
Positive Definite Matrices II Afunctionf is quadratic iff it can be written as f (x)= 1 2 x> Ax b > x + c, wherea is symmetric x > Ax is called the quadratic form Figure: Graph of a quadratic form when A is a) positive definite; b) negative definite; c) positive semidefinite (singular); d) indefinite matrix. Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 17 / 26
Outline 1 Span & Linear Dependence 2 Norms 3 Eigendecomposition 4 Singular Value Decomposition 5 Traces and Determinant Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 18 / 26
Singular Value Decomposition (SVD) Eigendecomposition requires square matrices. What if A is not square? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26
Singular Value Decomposition (SVD) Eigendecomposition requires square matrices. What if A is not square? Every real matrix A 2 R m n has a singular value decomposition: A = UDV >, where U 2 R m m, D 2 R m n,andv 2 R n n U and V are orthogonal matrices, and their columns are called the leftand right-singular vectors respectively Elements along the diagonal of D are called the singular values Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26
Singular Value Decomposition (SVD) Eigendecomposition requires square matrices. What if A is not square? Every real matrix A 2 R m n has a singular value decomposition: A = UDV >, where U 2 R m m, D 2 R m n,andv 2 R n n U and V are orthogonal matrices, and their columns are called the leftand right-singular vectors respectively Elements along the diagonal of D are called the singular values Left-singular vectors of A are eigenvectors of AA > Right-singular vectors of A are eigenvectors of A > A Non-zero singular values of A are square roots of eigenvalues of AA > (or A > A) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 19 / 26
Moore-Penrose Pseudoinverse I Matrix inversion is not defined for matrices that are not square Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26
Moore-Penrose Pseudoinverse I Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 R n m of a matrix A 2 R m n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26
Moore-Penrose Pseudoinverse I Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 R n m of a matrix A 2 R m n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By If m > n, then it is possible to have no such B If m < n, then there could be multiple B s Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26
Moore-Penrose Pseudoinverse I Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 R n m of a matrix A 2 R m n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By If m > n, then it is possible to have no such B If m < n, then there could be multiple B s By letting B = A the Moore-Penrose pseudoinverse, wecanmake headway in these cases: When m = n and A 1 exists, A degenerates to A 1 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26
Moore-Penrose Pseudoinverse I Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 R n m of a matrix A 2 R m n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By If m > n, then it is possible to have no such B If m < n, then there could be multiple B s By letting B = A the Moore-Penrose pseudoinverse, wecanmake headway in these cases: When m = n and A 1 exists, A degenerates to A 1 When m > n, A returns the x for which Ax is closest to y in terms of Euclidean norm kax yk Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26
Moore-Penrose Pseudoinverse I Matrix inversion is not defined for matrices that are not square Suppose we want to make a left-inverse B 2 R n m of a matrix A 2 R m n so that we can solve a linear equation Ax = y by left-multiplying each side to obtain x = By If m > n, then it is possible to have no such B If m < n, then there could be multiple B s By letting B = A the Moore-Penrose pseudoinverse, wecanmake headway in these cases: When m = n and A 1 exists, A degenerates to A 1 When m > n, A returns the x for which Ax is closest to y in terms of Euclidean norm kax yk When m < n, A returns the solution x = A y with minimal Euclidean norm kxk among all possible solutions Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 20 / 26
Moore-Penrose Pseudoinverse II The Moore-Penrose pseudoinverse is defined as: A = lim a&0 (A > A + ai n ) 1 A > A A = I Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 21 / 26
Moore-Penrose Pseudoinverse II The Moore-Penrose pseudoinverse is defined as: A = lim a&0 (A > A + ai n ) 1 A > A A = I In practice, it is computed by A = VD U >,whereudv > = A D 2 R n m is obtained by taking the inverses of its non-zero elements then taking the transpose Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 21 / 26
Outline 1 Span & Linear Dependence 2 Norms 3 Eigendecomposition 4 Singular Value Decomposition 5 Traces and Determinant Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 22 / 26
Traces tr(a)=â i A i,i Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 23 / 26
Traces tr(a)=â i A i,i tr(a)=tr(a > ) tr(aa + bb)= atr(a)+btr(b) kak 2 F = tr(aa> )=tr(a > A) tr(abc)=tr(bca)=tr(cab) Holds even if the products have different shapes Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 23 / 26
Determinant I Determinant det( ) is a function that maps a square matrix A 2 R n n to a real value: det(a)=â( 1) i+1 A 1,i det(a 1, i ), i where A 1, i is the (n 1) (n 1) matrix obtained by deleting the i-th row and j-th column Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26
Determinant I Determinant det( ) is a function that maps a square matrix A 2 R n n to a real value: det(a)=â( 1) i+1 A 1,i det(a 1, i ), i where A 1, i is the (n 1) (n 1) matrix obtained by deleting the i-th row and j-th column det(a > )=det(a) det(a 1 )=1/det(A) det(ab) =det(a) det(b) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26
Determinant I Determinant det( ) is a function that maps a square matrix A 2 R n n to a real value: det(a)=â( 1) i+1 A 1,i det(a 1, i ), i where A 1, i is the (n 1) (n 1) matrix obtained by deleting the i-th row and j-th column det(a > )=det(a) det(a 1 )=1/det(A) det(ab) =det(a) det(b) det(a)= i l i What does it mean? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26
Determinant I Determinant det( ) is a function that maps a square matrix A 2 R n n to a real value: det(a)=â( 1) i+1 A 1,i det(a 1, i ), i where A 1, i is the (n 1) (n 1) matrix obtained by deleting the i-th row and j-th column det(a > )=det(a) det(a 1 )=1/det(A) det(ab) =det(a) det(b) det(a)= i l i What does it mean? det(a) can be also regarded as the signed area of the image of the unit square Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 24 / 26
Determinant II apple a Let A = c det(a)=ad b,wehave[1,0]a =[a,b], [0,1]A =[c,d], and d bc Figure: The area of the parallelogram is the absolute value of the determinant of the matrix formed by the images of the standard basis vectors representing the parallelogram s sides. Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 25 / 26
Determinant III The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26
Determinant III The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space If det(a)=0, thenspaceiscontractedcompletelyalongatleastone dimension A is invertible iff det(a) 6= 0 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26
Determinant III The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space If det(a)=0, thenspaceiscontractedcompletelyalongatleastone dimension A is invertible iff det(a) 6= 0 If det(a)=1, thenthetransformationisvolume-preserving Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 26 / 26