Singular Value Decomposition and its. SVD and its Applications in Computer Vision

Similar documents
Principal Component Analysis

Draft. Lecture 01 Introduction & Matrix-Vector Multiplication. MATH 562 Numerical Analysis II. Songting Luo

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Probabilistic Latent Semantic Analysis

What is Principal Component Analysis?

EECS 275 Matrix Computation

UNIT 6: The singular value decomposition.

EE731 Lecture Notes: Matrix Computations for Signal Processing

IV. Matrix Approximation using Least-Squares

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

STA141C: Big Data & High Performance Statistical Computing

Linear Algebra & Geometry why is linear algebra useful in computer vision?

ON THE SINGULAR DECOMPOSITION OF MATRICES

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Least Squares Optimization

Linear Algebra. Session 12

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

Mathematical foundations - linear algebra

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

The SVD-Fundamental Theorem of Linear Algebra

Linear Algebra & Geometry why is linear algebra useful in computer vision?

1 Inner Product and Orthogonality

Part IA. Vectors and Matrices. Year

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Pseudoinverse & Orthogonal Projection Operators

14 Singular Value Decomposition

Maths for Signals and Systems Linear Algebra in Engineering

STA141C: Big Data & High Performance Statistical Computing

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Singular Value Decomposition

Singular Value Decomposition (SVD)

Numerical Methods I Singular Value Decomposition

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

15 Singular Value Decomposition

Linear Algebra Primer

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Linear Algebra Practice Problems

Tutorial on Principal Component Analysis

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Lecture notes: Applied linear algebra Part 1. Version 2

Linear Algebra Review. Vectors

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Chapter 7: Symmetric Matrices and Quadratic Forms

Review problems for MA 54, Fall 2004.

Exercises * on Principal Component Analysis

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

The Singular Value Decomposition

Large Scale Data Analysis Using Deep Learning

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Machine Learning - MT & 14. PCA and MDS

COMP 558 lecture 18 Nov. 15, 2010

CS 143 Linear Algebra Review

Lecture 1: Review of linear algebra

Pseudoinverse & Moore-Penrose Conditions

7. Dimension and Structure.

FSAN/ELEG815: Statistical Learning

CSL361 Problem set 4: Basic linear algebra

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

7. Symmetric Matrices and Quadratic Forms

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Lecture 5 Singular value decomposition

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

The Mathematics of Facial Recognition

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

Least Squares Optimization

2. Review of Linear Algebra

Pattern Recognition 2

Lecture 3: Review of Linear Algebra

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture 3: Review of Linear Algebra

Normed & Inner Product Vector Spaces

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Draft. Lecture 14 Eigenvalue Problems. MATH 562 Numerical Analysis II. Songting Luo. Department of Mathematics Iowa State University

Moore-Penrose Conditions & SVD

Singular Value Decomposition

ECE 275A Homework #3 Solutions

Linear Algebra- Final Exam Review

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

Singular Value Decomposition

1 9/5 Matrices, vectors, and their applications

Introduction to Matrix Algebra

Lecture: Face Recognition and Feature Reduction

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Basic Calculus Review

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Quantum Computing Lecture 2. Review of Linear Algebra

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Lecture: Face Recognition and Feature Reduction

Linear Algebra Methods for Data Mining

Algebra C Numerical Linear Algebra Sample Exam Problems

Transcription:

Singular Value Decomposition and its Applications in Computer Vision Subhashis Banerjee Department of Computer Science and Engineering IIT Delhi October 24, 2013

Overview Linear algebra basics Singular value decomposition Linear equations and least squares Principal component analysis Latent semantics and topic discovery Clustering?

Linear systems m equations in n unknowns. Ax b. A P R mˆn, x P R n and b P R m. Two reasons usually offered for importance of linearity: Superposition: If f 1 produces a 1 and f 2 produces a 2, then a combined force f 1 ` αf 2 produces a 1 ` αa 2. Pragmatics: f px, yq 0 and gpx, yq 0 yields F pxq 0 by elimination. Degree of F = degree of f ˆ degree of g. A system of m quadratic equation gives a polynomial of degree 2 m. The only case in which the exponential is harmless is when the base is 1 (linear).

Linear (in)dependence Given vectors a 1,..., a n and scalars x 1,..., x n, the vector nÿ b x j a j j 1 is a linear combination of the vectors. The vectors a 1,..., a n are linearly dependent iff at least one of them is a linear combination of the others (ones that precedes it). A set of vectors a 1,..., a n is a basis for a set B of vectors if they are linearly independent and every vector in B can be expressed as a linear combination of a 1,..., a n. Two different bases for the same vector space B have the same number of vectors (dimension).

Inner product and orthogonality 2-norm: mÿ mÿ }b} 2 b1 2 ` } b j e j } 2 bj 2 b T b j 2 j 1 inner product: b T c }b}}c} cos θ orthogonal: b T c 0 projection of b onto c: cc T c T c b

Orthogonal subspaces and rank Any basis a 1,..., a n for a subspace A of R m can be extended to a basis for R m by adding m n vectors a n`1,..., a m If vector space A is a subspace of R m for some m, then the orthogonal complement (A K ) of A is the set of all vectors in R m that are orthogonal to all the vectors in A. dimpaq ` dimpa K q m nullpaq tx : Ax 0u. dimpnullpaqq h (nullity). rangepaq tb : Ax b for some xu. dimprangepaqq r (rank). r n h. Number of linearly independent rows of A is equal to its number of linearly independent columns.

Solutions of a linear system: Ax b rangepaq; dimension r rankpaq nullpaq; dimension h nullitypaq rangepaq K ; dimension m r nullpaq K ; dimension n h nullpaq K rangepa T q rangepaq K nullpa T q b R rangepaq ùñ no solutions b P rangepaq r n m. Invertible. Unique solution. r n, m ą n. Redundant. Unique solution. r ă n. Under determined. 8 n r solutions.

Orthogonal matrices A set of vectors V is orthogonal if its elements are pairwise orthogonal. Orthonormal, if in addition for each x P V, }x} 1. Vectors in an orthonormal set are linearly independent. V rv 1,..., v n s is an orthogonal matrix. V 1 V V T V V 1 V VV T I. The norm of a vector x is not changed by multiplication by an orthogonal matrix: }V x} 2 x T V T V x x T x }x} 2

Vector norms

Matrix norms A 1 2 0 2 j

Overview Linear algebra basics Singular value decomposition (Golub and Van Loan, 1996, Golub and Kahan, 1965) Linear equations and least squares Principal component analysis Latent semantics and topic discovery Clustering?

Singular value decomposition Geometric view: An m ˆ n matrix A of rank r maps the r-dimensional unit hypersphere in rowspacepaq into an r-dimensional hyperellipse in rangepaq. Algebraic view: If A is a real m ˆ n matrix then there exists orthogonal matrices such that U ru 1 u m s P R mˆm V rv 1 v n s P R nˆn U T AV Σ diagpσ 1,..., σ p q P R mˆn where p minpm, nq, and σ 1 ě σ 2 ě... ě σ p ě 0 Equivalently, A UΣV T

Proof (sketch): Consider all vectors of the form Ax b for x on the unit hypersphere }x} 1. Consider the scalar function }Ax}. Let v 1 be a vector on the unit sphere in R n where the scalar function is maximised. Let σ 1 u 1 be the corresponding vector with σ 1 u 1 Av 1 and }u 1 } 1. Let u 1 and v 1 be extended to orthonormal bases for R m and R n respectively. Let the corresponding matrices be U 1 and V 1. We have U1 T AV σ1 w 1 S 1 T j 0 A 1 Consider the length of the vector j 1 σ1 b S 1 σ1 2 ` w wt w Conclude w 0 and induct. 1 b σ1 2 ` wt w σ 2 1 ` w T w A 1 w j

SVD geometry: 1. ξ V T x, where V rv 1 v 2 s» σ 1 0 fi 2. η Σξ, where Σ 0 σ 2 fl 0 0 3. Finally, b Uη.

SVD: structure of a matrix Suppose σ 1 ě... ě σ r ą σ r`1 0. Then, rankpaq r nullpaq spantv r`1,..., v n u rangepaq spantu 1,..., u r u Setting U r Up:, 1 : rq, Σ r Σp1 : r, 1 : rq, and V r V p:, 1 : rq, we have A U r Σ r V r ř r i 1 σ iu i v T i }A} F }A} 2 σ 1 b řm ř n i 1 j 1 a ij 2 σ1 2 `... ` σ2 p

SVD: low rank approximation For any ν with 0 ď ν ď r, define A ν ř ν i 1 σ iu i v T i. If ν p minpm, nq, define σ ν`1 0. Then, Proof (sketch): }A A ν } 2 inf }A B} 2 σ ν`1 BPR mˆn,rankpbqďν Aw is maximised by that w which is closest in direction to most of the rows of A. The projections of the rows of A onto v 1 is given by Av 1 v T 1. This is indeed the best rank 1 approximation: }A Av 1 v T 1 } 2 }A σ 1 u 1 v T 1 } 2 is the smallest over }A B} 2 where B is any rank 1 matrix.

Overview Linear algebra basics Singular value decomposition Linear equations and least squares Principal component analysis Latent semantics and topic discovery Clustering?

Least squares The minimum-norm least squares solution to a linear system Ax b, that is, the shortest vector x that achieves is unique and is given by min x }Ax b} x V Σ : U T b where Σ : diagp1{σ 1,..., 1{σ r, 0q is a n ˆ m diagonal matrix. The matrix A : V Σ : U T is called the pseudoinverse of A.

Pseudoinverse proof (sketch): min x }Ax b} min x }UΣV T x b} min x }UpΣV T x U T bq} min x }ΣV T x U T b} Setting y V T x and c U T b, we have min }Ax b} min }Σy c} x y» σ 1 0 0 0... 0 σ r. 0... 0 0. fi» fl y 1. y r y r`1. y n fi» fl c 1. c r c r`1. c m fi fl

Least squares for homogenous systems The solution to min }Ax} }x} 1 is given by v n, the last column of V. Proof: min }Ax} min }UΣV T x} min }ΣV T x} min }Σy} }x} 1 }x} 1 }x} 1 }y} 1 where y V T x. Clearly this is minimised by the vector y r0,..., 0, 1s T.

A couple of other least squares problems Given an m ˆ n matrix A with m ě n, find the vector x that minimises }Ax} subject to }x} 1 and Cx 0. Given an m ˆ n matrix A with m ě n, find the vector x that minimises }Ax} subject to }x} 1 and x P rangepgq.

Overview Linear algebra basics Singular value decomposition Linear equations and least squares Principal component analysis (Pearson, 1901, Schlens 2003) Latent semantics and topic discovery Clustering?

PCA: A toy problem X ptq rx A ptq y A ptq x B ptq y B ptq x C ptq y C ptqs T, X rx p1q X p2q X pnqs T. Is there another basis, which is a linear combination of the original basis, that best expresses our data set? PX Y

PCA Issues: noise and redundancy Noise Redundancy SNR σ 2 signal σ 2 noise " 1

Covariance Consider zero mean vectors a ra 1 a 2... a n s and b rb 1 b 2... b n s. Variance: σ 2 a a i a i i and σ 2 b b ib i i Covariance: σ 2 ab a ib i i 1 n 1 abt. If X rx 1 x 2... x m s T (m ˆ n) then the covariance matrix is: S X 1 n 1 XX T ij th value of S X is obtained by substituting x i for a and x j for b. S X is square, symmetric, m ˆ m. Diagonal entries of S X are the variance of particular measurement types. The off-diagonal entries of S X are the covariance between measurement types.

Solving PCA S Y 1 n 1 YY T 1 n 1 ppx qppx qt 1 n 1 PXX T P T Writing X UΣV T, we have Setting P U T, we have XX T UΣU T S Y 1 n 1 Σ Data is maximally uncorrelated. Effective rank r of Σ gives dimensionality reduction.

PCA: tacit assumptions Linearity. Mean and variance are sucient statistics ùñ Gaussian distribution. Large variances have important dynamics. The principal components are orthogonal.

Application: eigenfaces (Turk and Pentland, 1991) Obtain a set S of M face images: S tγ 1,..., Γ M u Obtain the mean image Ψ: Ψ 1 M Mÿ j 1 Γ j

Application: eigenfaces (Turk and Pentland, 1991) Compute centered images The covariance matrix is Φ i Γ i Ψ C 1 M Mÿ Φ j Φ T j j 1 AA T Size is N 2 ˆ N 2. Intractable. If v i is an eigenvector of A T A (M ˆ M), then Av i an eigenvector of AA T. A T Av i µ i v i ô AA T Av i µ i Av i

Application: eigenfaces (Turk and Pentland, 1991) Recognition: ω k u k pγ Ψq Compute minimum distance to database of faces

Overview Linear algebra basics Singular value decomposition Linear equations and least squares Principal component analysis Latent semantics and topic discovery (Scott et. al. 1990, Papadimitriou et. al. 1998) Clustering?

Latent semantics and topic discovery Consider a m ˆ n matrix A where the ij th entry denotes the marks obtained by the i th student in the j th test (Naveen Garg, Abhiram Ranade). Are the marks obtained by the i th student in various tests correlated? What are the capabilities of the i th student? What does the j th test evaluate? What is the expected rank of A?

Latent semantics and topic discovery Suppose there are really only three abilities (topics) that determine a student s marks in tests: verbal, logical and quantitative. Suppose v i, l i and q i characterise these abilities of the i th student; let V j, L j and Q j characterise the extent to which the j th test evaluates these abilities. A generative model for the ij th entry of A may be given as v i V j ` l i L j ` q i Q j

Latent semantics and topic discovery A new m ˆ 1 term vector t can be projected in to the LSI space as: ˆt t T U k Σ 1 k A new 1 ˆ n document vector d can be projected in to the LSI space as: ˆd dv k Σ 1 k

Topic discovery example An example with more than 2000 images and with 12 topics (LDA)

Overview Linear algebra basics Singular value decomposition Linear equations and least squares Principal component analysis Latent semantics and topic discovery Clustering (Drineas et. al. 1999)

Clustering Partition rows of a matrix so that similar rows (points in n dimensional space) are clustered together. Given points a 1,..., a m P R m, find c 1,..., c k P R m so as to minimize ÿ dpa i, tc 1,..., c k uq 2 i where dpa, Sq is the smallest distance from a point a to any of the points in S. (k-means) k is a constant. Consider k 2 for simplicity. Even then the problem is NP-complete for arbitrary n. We have k centres. If n k then the problem can be solved in polynomial time.

Clustering The points belonging to the two clusters can be separated by the perpendicular bisector of the line joining the two centres. The centre selected for a group must be its centroid. There are only a polynomial number of lines to consider (Each set of cluster centres define a Voronoi diagram. Each cell is a polyhedron ˆ and the total number of faces in k cells is no more k than. Enumerate all sets of hyperplanes (faces) each 2 of which contains k independent points of A such that they define exactly k cells. Assign each point of A lying on a hyperplane to one of the sides.) The best k dimensional subspace can be found using SVD. Gives a 2-approximation.

Other applications High dimensional matching Graph partitioning Metric embedding Image compression... Learn SVD well Learn SVD well