STA141C: Big Data & High Performance Statistical Computing

Similar documents
STA141C: Big Data & High Performance Statistical Computing

Mathematical foundations - linear algebra

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear Algebra. Session 12

Review of Some Concepts from Linear Algebra: Part 2

Mathematical foundations - linear algebra

Basic Calculus Review

Linear Algebra in Actuarial Science: Slides to the lecture

7 Principal Component Analysis

4 Linear Algebra Review

2. Review of Linear Algebra

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Linear Algebra Review

Functional Analysis Review

Singular Value Decomposition

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Linear Algebra Review. Vectors

NORMS ON SPACE OF MATRICES

Lecture 2: Linear Algebra Review

Inner products and Norms. Inner product of 2 vectors. Inner product of 2 vectors x and y in R n : x 1 y 1 + x 2 y x n y n in R n

4 Linear Algebra Review

CS 143 Linear Algebra Review

Review of some mathematical tools

Algebra C Numerical Linear Algebra Sample Exam Problems

Linear Algebra - Part II

COMP 558 lecture 18 Nov. 15, 2010

Linear Algebra Methods for Data Mining

The Singular Value Decomposition

Notes on Eigenvalues, Singular Values and QR

STA141C: Big Data & High Performance Statistical Computing

Properties of Matrices and Operations on Matrices

The Singular Value Decomposition

Linear Algebra, part 3 QR and SVD

Linear Algebra for Machine Learning. Sargur N. Srihari

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

STA141C: Big Data & High Performance Statistical Computing

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Normed & Inner Product Vector Spaces

Chapter 7: Symmetric Matrices and Quadratic Forms

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Linear Algebra Massoud Malek

Lecture 1: Review of linear algebra

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

IV. Matrix Approximation using Least-Squares

Linear Least Squares. Using SVD Decomposition.

Knowledge Discovery and Data Mining 1 (VO) ( )

Latent Semantic Analysis. Hongning Wang

Lecture 8: Linear Algebra Background

Parallel Singular Value Decomposition. Jiaxing Tan

The following definition is fundamental.

Singular Value Decomposition

Probabilistic Latent Semantic Analysis

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

EE731 Lecture Notes: Matrix Computations for Signal Processing

MATH 612 Computational methods for equation solving and function minimization Week # 2

Singular Value Decomposition (SVD)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Numerical Methods I Singular Value Decomposition

Introduction to Matrix Algebra

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Principal Component Analysis

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Stat 159/259: Linear Algebra Notes

1 Inner Product and Orthogonality

Basic Concepts in Linear Algebra

Computational Methods. Eigenvalues and Singular Values

Computational math: Assignment 1

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Applied Linear Algebra in Geoscience Using MATLAB

Lecture 7. Econ August 18

Linear Algebra Methods for Data Mining

Lecture notes: Applied linear algebra Part 1. Version 2

15 Singular Value Decomposition

Lecture 3: Review of Linear Algebra

Linear Algebra and Eigenproblems

Lecture 3: Review of Linear Algebra

Review of Basic Concepts in Linear Algebra

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Pseudoinverse & Moore-Penrose Conditions

Main matrix factorizations

Homework 1. Yuan Yao. September 18, 2011

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Vector Spaces and Linear Transformations

A matrix is a rectangular array of. objects arranged in rows and columns. The objects are called the entries. is called the size of the matrix, and

Quantum Computing Lecture 2. Review of Linear Algebra

Review of Linear Algebra

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

A matrix is a rectangular array of. objects arranged in rows and columns. The objects are called the entries. is called the size of the matrix, and

Fundamentals of Matrices

Methods for sparse analysis of high-dimensional data, II

14 Singular Value Decomposition

Linear Algebra & Geometry why is linear algebra useful in computer vision?

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Singular Value Decomposition

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Transcription:

STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018

Linear Algebra Background

Vectors A vector has a direction and a magnitude (norm) Example (2-norm): x = [1, 2] T, x 2 = 1 2 + 2 2 = 5 Properties satisfied by a vector norm ( ) x 0 and x = 0 if and only if x = 0 αx = α x (homogeneity) x + y x + y (triangle inequality)

Examples of vector norms x = [x 1, x 2,, x n ] T x 2 = x 1 2 + x 2 2 + + x n 2 (2-norm) x 1 = x 1 + x 2 + + x n (1-norm) x p = p x 1 p + x 2 p + + x n p (p-norm) x = max 1 i n x i ( -norm)

Distances x = [1, 2], y = [2, 1] x y = [1 2, 2 1] = [ 1, 1] Distance: x y 2 = [ 1, 1] 2 = 2 x y 1 = 2 x y = 1

Metrics Metrics: d(x, y) will satisfy the following properties: d(x, y) 0, and d(x, y) = 0 if and only if x = y d(x, y) = d(y, x) d(x, z) d(x, y) + d(y, z) (triangle inequality) Is d(x, y) = x y a valid metric if is a norm?

Metrics Metrics: d(x, y) will satisfy the following properties: d(x, y) 0, and d(x, y) = 0 if and only if x = y d(x, y) = d(y, x) d(x, z) d(x, y) + d(y, z) (triangle inequality) Is d(x, y) = x y a valid metric if is a norm? Yes. d(x, z) = x z = (x y) + (y z) x y + y z = d(x, y) + d(y, z)

Inner Product between Vectors x = [x 1,, x n ] T, y = [y 1,,, y n ] T Inner product: x T y = x 1 y 1 + x 2 y 2 + + x n y n = n x i y i i=1 x T x = x 2 2, x y 2 2 = (x y)t (x y) Orthogonal: x y x T y = 0 (x and y are orthogonal to each other)

Projection onto a vector x T y = x 2 y 2 cos(θ) cos θ = x T y x 2 cos θ = x T y ( ) = x T ŷ x 2 y 2 y 2 Projection of x onto y: x 2 cos θ ŷ = ŷŷ T x }{{} projection matrix

Linearly independent Suppose we have 3 vectors x 1, x 2, x 3 x 1 = α 2 x 2 + α 3 x 3 x 1 is linearly dependent on x 2 and x 3 When are x 1,, x n linearly independent? α 1 x 1 + α 2 x 2 + + α n x n = 0 if and only if α 1 = α 2 = = α n = 0

Linearly independent Suppose we have 3 vectors x 1, x 2, x 3 x 1 = α 2 x 2 + α 3 x 3 x 1 is linearly dependent on x 2 and x 3 When are x 1,, x n linearly independent? α 1 x 1 + α 2 x 2 + + α n x n = 0 if and only if α 1 = α 2 = = α n = 0 A vector space is a set of vectors that is closed under vector addition & scalar multiplications. If x 1, x 2 V, then α 1 x 1 + α 2 x 2 V

Linearly independent Suppose we have 3 vectors x 1, x 2, x 3 x 1 = α 2 x 2 + α 3 x 3 x 1 is linearly dependent on x 2 and x 3 When are x 1,, x n linearly independent? α 1 x 1 + α 2 x 2 + + α n x n = 0 if and only if α 1 = α 2 = = α n = 0 A vector space is a set of vectors that is closed under vector addition & scalar multiplications. If x 1, x 2 V, then α 1 x 1 + α 2 x 2 V A basis of the vector space is the maximal set of vectors in the subspace that are linearly independent of each other.

Linearly independent Suppose we have 3 vectors x 1, x 2, x 3 x 1 = α 2 x 2 + α 3 x 3 x 1 is linearly dependent on x 2 and x 3 When are x 1,, x n linearly independent? α 1 x 1 + α 2 x 2 + + α n x n = 0 if and only if α 1 = α 2 = = α n = 0 A vector space is a set of vectors that is closed under vector addition & scalar multiplications. If x 1, x 2 V, then α 1 x 1 + α 2 x 2 V A basis of the vector space is the maximal set of vectors in the subspace that are linearly independent of each other. An orthogonal basis is a basis where all basis vectors are orthogonal to each other.

Linearly independent Suppose we have 3 vectors x 1, x 2, x 3 x 1 = α 2 x 2 + α 3 x 3 x 1 is linearly dependent on x 2 and x 3 When are x 1,, x n linearly independent? α 1 x 1 + α 2 x 2 + + α n x n = 0 if and only if α 1 = α 2 = = α n = 0 A vector space is a set of vectors that is closed under vector addition & scalar multiplications. If x 1, x 2 V, then α 1 x 1 + α 2 x 2 V A basis of the vector space is the maximal set of vectors in the subspace that are linearly independent of each other. An orthogonal basis is a basis where all basis vectors are orthogonal to each other. Dimension of the vector space: number of vectors in the basis.

Matrices An m by n matrix: A R m n : a 11 a 12 a 1n...... a m1 a m2 a mn A matrix is also a linear transform: A : R m R n x Ax αx + βy A(αx + βy) = αax + βay (Linear Transform)

Matrix Norms Popular matrix norm: Frobenius norm m A F = i=1 j=1 n a ij 2 Matrix norms satisfy following properties: A 0 and A = 0 if and only if A = 0 (positivity) αa = α A (homogeneity) A + B A + B

Induced Norm Given a vector norm, we can define the corresponding induced norm or operator norm by Induced p-norm: Examples: A = sup{ Ax : x = 1} x = sup{ Ax x x : x 0} Ax p A p = sup x 0 x p Ax A 2 = sup 2 x 0 x 2 = σ max (A) (induced 2-norm is the maximum singular value) m A 1 = max j i=1 a ij n A = max i j=1 a ij

Rank of a matrix Column rank of A: the dimension of column space (vector space formed by column vectors) Row rank of A: the dimension of row space Column rank = row rank := rank (always true) Examples: Rank 2 matrix: 1 0 1 1 0 [ ] 2 3 1 = 2 3 1 0 1 0 1 1 3 3 0 3 3 Rank 1 matrix: [ ] [ ] 1 1 0 2 1 [1 ] = 1 0 2 1 1 0 2 1

Singular Value Decomposition

Low-rank Approximation Big Matrix A: Want to approximate by simple matrix  Goodness of approximation of  can be measured by (e.g., using F ) Low-rank approximation: A BC A Â

Important Question Given A and k, what is the best rank-k approximation? min B,C are rank k A BC F Optimal B and C are obtained by SVD (Singular Value Decomposition) of A

Singular Value Decomposition Any real matrix A R m n has the following Singular Value Decomposition (SVD): A = U Σ V T U is a m m unitary matrix (U T U = I ), columns of U are orthogonal V is a n n unitary matrix (V T V = I ), columns of V are orthogonal Σ is a diagonal m n matrix with non-negative real numbers on the diagonal. Usually, we assume the diagonal numbers are organized in descending order: Σ = diag(σ 1, σ 2,, σ n ), σ 1 σ 2 σ n

Singular Value Decomposition u 1, u 2,, u m : left singular vectors, basis of column space ui T u j = 0 i j, ui T u i = 1 i = 1,, m v 1, v 2,, v n : right singular vectors, basis of row space v T i v j = 0 i j, vi T v i = 1 i = 1,, n SVD is unique (up to permutations, rotations)

Linear Transformations Matrix A is a linear transformation For right singular vectors v 1,, v n, what are the vectors after transformation?

Linear Transformations Matrix A is a linear transformation For right singular vectors v 1,, v n, what are the vectors after transformation? A = UΣV T AV = UΣV T V = UΣ Therefore, Av i = σ i u i, i = 1,, n

Geometry of SVD Given an input vector x, how do we define the linear transform Ax? Assume A = UΣV T is the SVD, then Ax is: Step 1: Represent x as x = a 1 v 1 + + a n v n, (a i = v T i a) Step 2: Transform v i σ i u i for all i: Ax = a 1 σ 1 u 1 + + a n σ n v n

Reduced SVD (Thin SVD) If m > n, then only first n left singular vectors u 1,, u n are useful: Therefore, when m > n, we often just use the following thin SVD : A = UΣV T where U R m n, Σ, V R n n : U, V are unitary, and Σ is diagonal

Best Rank-k approximation Given a matrix A, how to form the best rank k approximation? arg min X :rank(x )=k Solution: truncated SVD: X A or arg min W R n k,h R m k X WHT A U r Σ r V T r, where U r, V r, Σ r are the first r columns of SVD matrices Why? Keep the k-most important components in SVD

Application: Data Approximation/Compression Given a data matrix A R m n Approximate the data matrix by the best rank-k approximation: A WH T Compress the data: O(mn) memory O(mk + nk) memory De-noising (sometimes) Example: if each column of A is a data, then w i (i-th column of W ) is dictionary, and H is the coefficient: a i k H ij w j j=1

Eigenvalue Decomposition

Eigen Decomposition For a n by n matrix H, if Hy = λy, then we say λ is an eigenvalue of H y is the corresponding eigenvector

Eigen Decomposition Consider A R n n to be a square, symmetric matrix. The eigenvalue decomposition of A is: A = V ΛV T, V T V = I (V is unitary), Λ is diagonal A = V ΛV T AV = V Λ Av i = λ i v i, i = 1,, n Each v i is an eigenvector, and each λ i is an eigenvalue

Eigen Decomposition Consider A R n n to be a square, symmetric matrix. The eigenvalue decomposition of A is: A = V ΛV T, V T V = I (V is unitary), Λ is diagonal A = V ΛV T AV = V Λ Av i = λ i v i, i = 1,, n Each v i is an eigenvector, and each λ i is an eigenvalue Usually, we assume the diagonal numbers are organized in descending order: Λ = diag(λ 1, λ 2,, λ n ), λ 1 λ 2 λ n Eigenvalue decomposition is unique when there are n unique eigenvalues.

Eigen Decomposition Each eigenvector v i will be mapped to Av i = λv i after the linear transform: Scaling without changing the direction of eigenvectors Ax = m i=1 λ iv i (vi T x) Project x to eigenvectors, and then scaling each vector

Eigen Decomposition

How to compute SVD/eigen decomposition?

Eigen Decomposition & SVD SVD can be transformed to eigenvalue decomposition!

Eigen Decomposition & SVD SVD can be transformed to eigenvalue decomposition! Given A R m n, and A = UΣV T (SVD) We have AA T = UΣV T V ΣU T = UΣ 2 U T (eigen decomposition of AA T ) After getting U, Σ, we can compute V by V T = Σ 1 U T A

Eigen Decomposition & SVD SVD can be transformed to eigenvalue decomposition! Given A R m n, and A = UΣV T (SVD) We have AA T = UΣV T V ΣU T = UΣ 2 U T (eigen decomposition of AA T ) After getting U, Σ, we can compute V by V T = Σ 1 U T A A T A = V Σ 2 V T : Eigen decomposition of AA T After getting V, Σ, we can compute U = AV Σ 1

Eigen Decomposition & SVD Which one should we use? eigenvalue decomposition of AA T or A T A? Depends on dimensionality m vs n

Eigen Decomposition & SVD Which one should we use? eigenvalue decomposition of AA T or A T A? Depends on dimensionality m vs n If A R m n and m > n: Step 1: Compute the eigenvalue decomposition of A T A = V Σ 2 V T Step 2: Compute U = AV Σ 1 If A R m n and m < n: Step 1: Compute the eigenvalue decomposition of AA T = UΣ 2 U T Step 2: Compute V = A T UΣ 1

Computing Eigenvalue Decomposition If A is a dense matrix: Call scipy.linalg.eig If A is a (large) sparse matrix: usually we only compute compute top-k eigenvectors with a small k Power method Lanczos algorithm (implemented in build-in packages) Call scipy.sparse.linalg.eigs

Dense Eigenvalue Decomposition Only for dense matrix Will compute all the eigenvalues and eigenvectors >>> import numpy as np >>> from scipy import linalg >>> A = np.array([[1, 2], [3, 4]]) >>> S, V = linalg.eig(a) >>> S array([-0.37228132+0.j, 5.37228132+0.j]) >>> V array([[-0.82456484, -0.41597356], [ 0.56576746, -0.90937671]])

Dense SVD Only for dense matrix Specify full matrices=false for the thin SVD >>> A = np.array([[1,2],[3,4], [5,6]]) >>> U, S, V = linalg.svd(a) >>> U array([[-0.2298477, 0.88346102, 0.40824829], [-0.52474482, 0.24078249, -0.81649658], [-0.81964194, -0.40189603, 0.40824829]]) >>> s array([ 9.52551809, 0.51430058]) >>> U, S, V = linalg.svd(a, full_matrices=false) >>> U array([[-0.2298477, 0.88346102], [-0.52474482, 0.24078249], [-0.81964194, -0.40189603]])

Sparse Eigenvalue Decomposition Usually only compute the top-k eigenvalues and eigenvectors (since the U, V will be dense) Use iterative methods to compute U k (top-k eigenvectors): Initialize U k by a n-by-k random matrix U (0) k Iteratively update U k : Converge to the SVD solution: U (0) k U (1) k U (2) k lim t U(t) k = U k

Power Iteration for top eigenvector Main idea: compute (A T v)/ A T v for large T. A T v/ A T v top eigen vector when T

Power Iteration for top eigenvector Main idea: compute (A T v)/ A T v for large T. A T v/ A T v top eigen vector when T Power Iteration for top eigenvector Input: A R d d, number of iterations T Initialize a random vector v R d For t = 1, 2,, T v Av v v/ v Output v (top eigenvector)

Power Iteration for top eigenvector Power Iteration for top eigenvector Input: A R d d, number of iterations T Initialize a random vector v R d For t = 1, 2,, T v Av v v/ v Output v (top eigenvector) Top eigenvalue is v T Av (= v T λv = λ v 2 = λ) Only need one matrix-vector product at each iteartion Time complexity: O(nnz(A)) per iteration If A = XX T, then Av = X (X T v) No need for explicitly forming A

Power Iteration If A = UΛU T is the eigenvalue decomposition of A What is the eigenvalue decomposition of A t? λ t 1 0 0 A t = (UΛU T )(UΛU T ) (UΛU T 0 λ t 2 0 ) = U...... 0 0 λ t d UT A t v = d i=1 λ t i (u T i v)u i u 1 + ( λ 2 λ 1 ) t ( ut 2 v u T 1 v )u 2 + + ( λ d λ 1 ) t ( ut d v u T 1 v )u d If v is a random vector, then u1 T v 0 with probability 1, so A t v A t v u 1 as t

Power iteration for top k eigenvectors Power Iteration for top eigenvector Input: A R d d, number of iterations T, rank k Initialize a random matrix V R d k For t = 1, 2,, T V AV [Q, R] qr(v ) V Q S V T AV [Ū, S] eig( S) Output eigenvectors U = VU, and eigenvalues S Sometimes faster than default functions (linalg.eigs and linalg.svds), since they aim to compute very accurate solutions.

Coming up Numerical Linear Algebra for Machine Learning Questions?