Mathematical foundations - linear algebra

Similar documents
Mathematical foundations - linear algebra

Linear Algebra Review. Vectors

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Multivariate Statistical Analysis

Linear Algebra Massoud Malek

CS 246 Review of Linear Algebra 01/17/19

Introduction to Matrix Algebra

CS 143 Linear Algebra Review

Linear Algebra for Machine Learning. Sargur N. Srihari

Introduction to Machine Learning

Linear Algebra Review

Linear Algebra: Matrix Eigenvalue Problems

PCA, Kernel PCA, ICA

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)

Quantum Computing Lecture 2. Review of Linear Algebra

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Knowledge Discovery and Data Mining 1 (VO) ( )

Example Linear Algebra Competency Test

Review of Linear Algebra

Lecture 7. Econ August 18

Basic Calculus Review

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

STA141C: Big Data & High Performance Statistical Computing

B553 Lecture 5: Matrix Algebra Review

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Lecture 3: Review of Linear Algebra

Math Linear Algebra Final Exam Review Sheet

Lecture 3: Review of Linear Algebra

14 Singular Value Decomposition

Linear Algebra Practice Problems

Properties of Matrices and Operations on Matrices

Review of Some Concepts from Linear Algebra: Part 2

Review of some mathematical tools

Definitions for Quizzes

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Symmetric Matrices and Eigendecomposition

Matrix Vector Products

Lecture 2: Linear Algebra Review

Linear Algebra - Part II

This appendix provides a very basic introduction to linear algebra concepts.

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

Econ Slides from Lecture 7

Matrix Algebra: Summary

Lecture 7 Spectral methods

Principal Component Analysis

Linear Algebra Formulas. Ben Lee

Foundations of Matrix Analysis

Lecture 2: Linear Algebra

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Symmetric and anti symmetric matrices

UNIT 6: The singular value decomposition.

Chap 3. Linear Algebra

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

STA141C: Big Data & High Performance Statistical Computing

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

L3: Review of linear algebra and MATLAB

15 Singular Value Decomposition

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Large Scale Data Analysis Using Deep Learning

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

Foundations of Computer Vision

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Chapter 3 Transformations

Tutorials in Optimization. Richard Socher

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

The Singular Value Decomposition

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Review (Probability & Linear Algebra)

Basic Concepts in Matrix Algebra

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Eigenvalues, Eigenvectors, and an Intro to PCA

Singular Value Decomposition and Principal Component Analysis (PCA) I

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Review of Linear Algebra

1 Linearity and Linear Systems

Background Mathematics (2/2) 1. David Barber

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

1. Select the unique answer (choice) for each problem. Write only the answer.

1. General Vector Spaces

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Linear Algebra & Geometry why is linear algebra useful in computer vision?

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Conceptual Questions for Review

Stat 159/259: Linear Algebra Notes

The following definition is fundamental.

1. Matrix multiplication and Pauli Matrices: Pauli matrices are the 2 2 matrices. 1 0 i 0. 0 i

Transcription:

Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning

Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar multiplication are defined and satisfy for all x, y, z X and λ, µ IR: Addition: associative x + (y + z) = (x + y) + z commutative x + y = y + x identity element 0 X : x + 0 = x inverse element x X x X : x + x = 0 Scalar multiplication: distributive over elements λ(x + y) = λx + λy distributive over scalars (λ + µ)x = λx + µx associative over scalars λ(µx) = (λµ)x identity element 1 IR : 1x = x

Properties and operations in vector spaces subspace Any non-empty subset of X being itself a vector space (E.g. projection) linear combination given λ i IR, x i X n λ i x i i=1 span The span of vectors x 1,..., x n is defined as the set of their linear combinations { n } λ i x i, λ i IR i=1

Basis in vector space Linear independency A set of vectors x i is linearly independent if none of them can be written as a linear combination of the others Basis A set of vectors x i is a basis for X if any element in X can be uniquely written as a linear combination of vectors x i. Necessary condition is that vectors x i are linearly independent All bases of X have the same number of elements, called the dimension of the vector space.

Linear maps Definition Given two vector spaces X, Z, a function f : X Z is a linear map if for all x, y X, λ IR: f (x + y) = f (x) + f (y) f (λx) = λf (x)

Linear maps as matrices A linear map between two finite-dimensional spaces X, Z of dimensions n, m can always be written as a matrix: Let {x 1,..., x n } and {z 1,..., z m } be some bases for X and Z respectively. For any x X we have: n n f (x) = f ( λ i x i ) = λ i f (x i ) f (x i ) = f (x) = i=1 m a ji z j j=1 i=1 n m m n λ i a ji z j = ( λ i a ji )z j = i=1 j=1 j=1 i=1 m µ j z j j=1

Linear maps as matrices Matrix of basis transformation a 11... a 1n M IR m n =... a m1... a mn Mapping from basis coefficients to basis coefficients Mλ = µ

Change of Coordinate Matrix 2D example {[ 1 let B = 0 {[ 3 let B = 1 Note ] [ 0, 1 ], [ 2 1 ]} be the standard basis in IR 2 ]} be an alternative basis The change of coordinate matrix from B to B is: [ ] 3 2 P = 1 1 So that: [v] B = P [v] B and [v] B = P 1 [v] B For arbitrary B and B, P s columns must be the B vectors written in terms of the B ones (straightforward here)

Matrix properties transpose Matrix obtained exchanging rows with columns (indicated with M T ). Properties: (MN) T = N T M T trace Sum of diagonal elements of a matrix tr(m) = n i=1 M ii inverse The matrix which multiplied with the original matrix gives the identity MM 1 = I rank The rank of an n m matrix is the dimension of the space spanned by its columns

Matrix derivatives Mx x y T Mx x x T Mx x x T Mx x x T x x = M = M T y = (M T + M)x = 2Mx if M is symmetric = 2x Note Results are column vectors. Transpose them if row vectors are needed instead.

Metric structure Norm A function : X IR + 0 is a norm if for all x, y X, λ IR: x + y x + y λx = λ x x > 0 if x 0 Metric A norm defines a metric d : X X IR + 0 : d(x, y) = x y Note The concept of norm is stronger than that of metric: not any metric gives rise to a norm

Dot product Bilinear form A function Q : X X IR is a bilinear form if for all x, y, z X, λ, µ IR: Q(λx + µy, z) = λq(x, z) + µq(y, z) Q(x, λy + µz) = λq(x, y) + µq(x, z) A bilinear form is symmetric if for all x, y X : Q(x, y) = Q(y, x)

Dot product Dot product A dot product, : X X IR is a symmetric bilinear form which is positive semi-definite: x, x 0 x X A positive definite dot product satisfies x, x = 0 iff x = 0 Norm Any dot product defines a corresponding norm via: x = x, x

Properties of dot product angle The angle θ between two vectors is defined as: cosθ = x, z x z orthogonal Two vectors are orthogonal if x, y = 0 orthonormal A set of vectors {x 1,..., x n } is orthonormal if Note x i, x j = δ ij where δ ij = 1 if i = j, 0 otherwise. If x and y are n-dimensional column vectors, their dot product is computed as: n x, y = x T y = x i y i i=1

Eigenvalues and eigenvectors Definition Given an n n matrix M, the real value λ and (non-zero) vector x are an eigenvalue and corresponding eigenvector of M if Cardinality Mx = λx An n n matrix has n eigenvalues (roots of characteristic polynomial) An n n matrix can have less than n distinct eigenvalues An n n matrix can have less than n linear independent eigenvectors (also fewer then the number of distinct eigenvalues)

Eigenvalues and eigenvectors Singular matrices A matrix is singular if it has a zero eigenvalue Mx = 0x = 0 A singular matrix has linearly dependent columns: [ ] M1... M n 1 M n x 1. x n 1 = 0 x n

Eigenvalues and eigenvectors Singular matrices A matrix is singular if it has a zero eigenvalue Mx = 0x = 0 A singular matrix has linearly dependent columns: M 1 x 1 + + M n 1 x n 1 + M n x n = 0

Eigenvalues and eigenvectors Singular matrices A matrix is singular if it has a zero eigenvalue Mx = 0x = 0 A singular matrix has linearly dependent columns: M n = M 1 x 1 x n + + M n 1 x n 1 x n

Eigenvalues and eigenvectors Singular matrices A matrix is singular if it has a zero eigenvalue Mx = 0x = 0 A singular matrix has linearly dependent columns: M n = M 1 x 1 x n + + M n 1 x n 1 x n Determinant The determinant M of a n n matrix M is the product of its eigenvalues A matrix is invertible if its determinant is not zero (i.e. it is not singular)

Eigenvalues and eigenvectors Symmetric matrices Eigenvectors corresponding to distinct eigenvalues are orthogonal: λ x, z = Ax, z = (Ax) T z = x T A T z = x T Az = x, Az = µ x, z

Eigen-decomposition Raleigh quotient Ax = λx x T Ax x T x = λxt x x T x = λ Finding eigenvector 1 Maximize eigenvalue: x = max v v T Av v T v 2 Normalize eigenvector (solution is invariant to rescaling) x x x

Eigen-decomposition Deflating matrix à = A λxx T Deflation turns x into a zero-eigenvalue eigenvector: Ãx = Ax λxx T x = Ax λx = 0 (x is normalized) Other eigenvalues are unchanged as eigenvectors with distinct eigenvalues are orthogonal (symmetric matrix): Ãz = Az λxx T z Ãz = Az (x and z orthonormal)

Eigen-decomposition Iterating The maximization procedure is repeated on the deflated matrix (until solution is zero) Minimization is iterated to get eigenvectors with negative eigevalues Eigenvectors with zero eigenvalues are obtained extending the obtained set to an orthonormal basis

Eigen-decomposition Eigen-decomposition Let V = [v 1... v n ] be a matrix with orthonormal eigenvectors as columns Let Λ be the diagonal matrix of corresponding eigenvalues A square simmetric matrix can be diagonalized as: proof follows.. V T AV = Λ Note A diagonalized matrix is much simpler to manage and has the same properties as the original one (e.g. same eigen-decomposition) E.g. change of coordinate system

Eigen-decomposition Proof A [v 1... v n ] = [v 1... v n ] AV = V Λ V 1 AV = V 1 V Λ V T AV = Λ λ 1 0... 0 λ n Note V is a unitary matrix (orthonormal columns), for which: V 1 = V T

Positive semi-definite matrix Definition An n n symmetrix matrix M is positive semi-definite if all its eigenvalues are non-negative. Alternative sufficient and necessary conditions for all x IR n x T Mx 0 there exists a real matrix B s.t. M = B T B

Understanding eigendecomposition x 2 15x 2 Ax apple 5 0 A = 0 15 x 5x 1 x 1 Scaling transformation in standard basis let x 1 = [1, 0], x 2 = [0, 1] be the standard orthonormal basis in IR 2 let x = [x 1, x 2 ] be an arbitrary vector in IR 2 A linear transformation is a scaling transformation if it only stretches x along its directions

Understanding eigendecomposition 15v 2 2 Av v2 v apple 13 4 A = 4 7 v 1 5v 1 apple 1 v 1 = 2 apple 2 v 2 = 1 apple Av 1 = apple Av 2 = 13 8 4 + 14 26 4 8+7 apple 5 = 10 apple 30 = 15 =5v 1 = 15v 2 Scaling transformation in eigenbasis let A be a non-scaling linear transformation in IR 2. let {v 1, v 2 } be an eigenbasis for A. By representing vectors in IR 2 in terms of the {v 1, v 2 } basis (instead of the standard {x 1, x 2 }), A becomes a scaling transformation

Principal Component Analysis (PCA) Description Let X be a data matrix with correlated coordinates. PCA is a linear transformation mapping data to a system of uncorrelated coordinates. It corresponds to fitting an ellipsoid to the data, whose axes are the coordinates of the new space.

Principal Component Analysis (PCA) Procedure (1) Given a dataset X IR n d in d dimensions. 1 Compute the mean of the data (X i is i th row vector of X): x = 1 n n i=1 X i 2 Center the data into the origin: x X. x 3 Compute the data covariance: C = 1 n X T X

Principal Component Analysis (PCA) Procedure (2) 4 Compute the (orthonormal) eigendecomposition of C: V T CV = Λ 5 Use it as the new coordinate system: x = V 1 x = V T x ( V 1 = V T as V is unitary) Warning It assumes linear correlations (and Gaussian distributions)

Principal Component Analysis (PCA) Dimensionality reduction Each eigenvalue corresponds to the amount of variance in that direction Select only the k eigenvectors with largest eigenvalues for dimensionality reduction (e.g. visualization) Procedure 1 W = [v 1,..., v k ] 2 x = W T x