Linear Algebra Methods for Data Mining

Similar documents
Linear Algebra Methods for Data Mining

Review of Linear Algebra

Linear Algebra Review. Vectors

Knowledge Discovery and Data Mining 1 (VO) ( )

Linear Algebra Review

CS 143 Linear Algebra Review

Linear Algebra Methods for Data Mining

Linear Algebra Massoud Malek

Lecture 1: Review of linear algebra

Linear Methods in Data Mining

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Linear Algebra (Review) Volker Tresp 2017

1 Last time: least-squares problems

Linear Algebra Formulas. Ben Lee

Linear Algebra (Review) Volker Tresp 2018

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Linear Algebra: Matrix Eigenvalue Problems

Properties of Matrices and Operations on Matrices

Introduction to Numerical Linear Algebra II

Quantum Computing Lecture 2. Review of Linear Algebra

Matrix Algebra: Summary

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

UNIT 6: The singular value decomposition.

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Basic Concepts in Linear Algebra

A matrix is a rectangular array of. objects arranged in rows and columns. The objects are called the entries. is called the size of the matrix, and

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Properties of Linear Transformations from R n to R m

Review of Basic Concepts in Linear Algebra

A matrix is a rectangular array of. objects arranged in rows and columns. The objects are called the entries. is called the size of the matrix, and

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Linear Algebra - Part II

Numerical Linear Algebra

Applied Linear Algebra in Geoscience Using MATLAB

Lecture # 3 Orthogonal Matrices and Matrix Norms. We repeat the definition an orthogonal set and orthornormal set.

Notes on Linear Algebra and Matrix Theory

STA141C: Big Data & High Performance Statistical Computing

Basic Elements of Linear Algebra

14 Singular Value Decomposition

MATH 221, Spring Homework 10 Solutions

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Lecture 10 - Eigenvalues problem

CS 246 Review of Linear Algebra 01/17/19

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Introduction to Matrix Algebra

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

Linear Algebra and Matrices

STA141C: Big Data & High Performance Statistical Computing

The Singular Value Decomposition

Section 3.9. Matrix Norm

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Chapter 5. Linear Algebra. A linear (algebraic) equation in. unknowns, x 1, x 2,..., x n, is. an equation of the form

Solution of Linear Equations

LinGloss. A glossary of linear algebra

Notes on Eigenvalues, Singular Values and QR

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

MAT Linear Algebra Collection of sample exams

Linear Algebra in Actuarial Science: Slides to the lecture

Spectral radius, symmetric and positive matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

ELE/MCE 503 Linear Algebra Facts Fall 2018

COMP 558 lecture 18 Nov. 15, 2010

Matrices and Linear Algebra

Linear Algebra, part 3 QR and SVD

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

A = 3 B = A 1 1 matrix is the same as a number or scalar, 3 = [3].

Linear Algebra- Final Exam Review

There are six more problems on the next two pages

Math 443 Differential Geometry Spring Handout 3: Bilinear and Quadratic Forms This handout should be read just before Chapter 4 of the textbook.

L3: Review of linear algebra and MATLAB

Singular Value Decompsition

A VERY BRIEF LINEAR ALGEBRA REVIEW for MAP 5485 Introduction to Mathematical Biophysics Fall 2010

Matrices: 2.1 Operations with Matrices

Chapter 3 Transformations

Linear Algebra and Eigenproblems

Review problems for MA 54, Fall 2004.

Linear Algebra Methods for Data Mining

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Main matrix factorizations

Linear Algebra V = T = ( 4 3 ).

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Review of Some Concepts from Linear Algebra: Part 2

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Linear Algebra Review. Fei-Fei Li

Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

MATH.2720 Introduction to Programming with MATLAB Vector and Matrix Algebra

1 Inner Product and Orthogonality

Linear Least Squares. Using SVD Decomposition.

15 Singular Value Decomposition

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Foundations of Matrix Analysis

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

Scientific Computing: Dense Linear Systems

Transcription:

Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

Example 1: Term-Document matrices Doc1 Doc2 Doc3 Doc4 Query Term1 1 0 1 0 1 Term2 0 0 1 1 1 Term3 0 1 1 0 0 The documents and the query are represented by a vector in R n (here n = 3). In applications matrices may be large! Number of terms: 10 4, number of documents: 10 6. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

Example 1 continued: Tasks Find document vectors close to query. Use some distance measure in R n. Use linear algebra methods for data compression retrieval enhancement. Find topics or concepts from term-document matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

40 Example 2: measurement data 20 0 20 40 0 60 2 4 6 8 T672 10 12 14 16 x 10 4 40 20 0 20 0 2 4 6 8 10 12 14 16 0.08 NOx672 x 10 4 0.06 0.04 0.02 0 0 2 4 6 8 10 12 14 16 csink x 10 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

Example 2 continued In Hyytiälä Forest Field Station the 30 minute averages of some 100+ variables have measured for 10+ years... some 175 000 time points, 100 variables: alot of data! Possible question: how do days vary? how do measured variables depend on each other? what separates days when phenomenon X occurs from those when it doesn t? are there (independent) (pollution) sources present? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

Matrices A = a 11 a 12... a 1n a 21. a 22.... a 2n. a m1 a m2... a mn Rm n Rectangular array of data: elements are real numbers. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

vectors Basic concepts norms and distances eigenvalues, eigenvectors linearly independent vectors, basis orthogonal bases matrices, orthogonal matrices orthogonal matrix decompositions: SVD Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

Next: quick review of the following concepts: matrix-vector multiplication, matrix-matrix multiplication vector norms, matrix norms distances between vectors eigenvalues, eigenvectors linear independence basis orthogonality Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

Matrix-vector multiplication Ax = a 11 a 12... a 1n a 21. a 22.... a 2n. a m1 a m2... a mn x 1 x 2. x n = n j=1 a 1jx j n j=1 a 2jx j. n j=1 a mjx j = y = Symbolically Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

In practice 2 3 6 4 1 0 ( 5 2 ) =??? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

2 3 6 4 1 0 ( 5 2 ) = 2 5 + 3 ( 2) 6 5 + 4 ( 2) 1 5 + 0 ( 2) = 4 22 5 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

Or 2 3 6 4 1 0 ( 5 2 ) = 5 2 6 1 2 3 4 0 = 4 22 5 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

Alternative presentation of matrix-vector multiplication: Denote the column vectors of the matrix A by a j. Then y = Ax = (a 1 a 2... a n ) x 1 x 2. x n = n j=1 x j a j So the vector y is a linear combination of the columns of A. Often this is a useful way to consider matrix-vector multiplication: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

Example Let columns of A be different topics : Topic1 Topic2 Topic3 Term1 1 0 0 Term2 1 0 0 Term3 0 1 0 Term4 0 0 1 Then if we multipy A by the vector w = = A. 2 0 1, we get... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

Aw = 1 0 0 1 0 0 0 1 0 0 0 1 2 0 1 = 2 1 1 0 0 + 0 0 0 1 0 + 1 0 0 0 1 = 2 2 0 1 = y, which represents a document dealing primarily with topic 1 and secondarily with topic 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

Note on computational aspects The column oriented approach is also good when considering computational efficiency. Modern computing devices are able to exploit the fact that a vector operation is a very regular sequence of scalar operations. This approach is embedded in packages like Matlab and LAPACK (and others). SAXPY, GAXPY Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

Matrix-matrix multiplication Let A R m s and B R s n. Then, by definition, R m n C = AB = (c ij ), s c ij = a ik b kj, i = 1...m, j = 1...n. k=1 Note: each column vector in B is multiplied by A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

In practice 2 3 6 4 1 0 ( 5 1 2 1 ) =?????? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

2 3 6 4 1 0 ( 5 1 2 1 ) = 2 5 3 2 2 1 + 3 1 6 5 4 2 6 1 + 4 1 1 5 0 2 1 1 + 0 1 = 4 6 22 10 5 1 = 5 2 6 1 2 3 4 0 1 2 6 1 + 1 3 4 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

Matrix multiplication code for i=1:m, for j=1:n, for k=1:s, c(i,j)=c(i,j)+a(i,s)*b(s,j); end end end Note: loops may be permuted in 6 different ways! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

How to measure the size of a vector? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

1-norm: x 1 = n i=1 x i Vector norms The most common vector norms are Euclidean norm: x 2 = n i=1 x2 i max-norm: x = max 1 i n x i all of the above are special cases of the L p -norm (or p-norm): x p = ( n i=1 xp i )1/p Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

General definition of a vector norm Generally, a vector norm is a mapping R n R, with the properties x 0 for all x, x = 0 if and only if x = 0, αx = α x, for all α R, x + y x + y, the triangular equality. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

How to measure distance between vectors? Obvious answer: the distance between two vectors x and y is x y, where is some vector norm. Frequently one measures the distance by the Euclidean norm x y 2. So usually, if the index is dropped, this is what is meant. Alternative: use the angle between two vectors x and y to measure the distance between them. How to calculate the angle between two vectors? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

Angle between vectors The inner product between two vectors is defined by (x, y) = x T y. This is associated with the Euclidean norm: x 2 = (x T x) 1/2. The angle θ between two vectors x and y is cos θ = xt y x 2 y 2. The cosine of the angle between two vectors x and y can be used to measure the similarity between the two vectors: if x and y are close, the angle between them is small, and cos θ 1. x and y are orthogonal, if θ = π 2, i.e. xt y = 0. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

Why not just use the Euclidean distance? a b c Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

Example: term-document matrix Each entry tells how many times a term appears in the document: Doc1 Doc2 Doc3 Term1 10 1 0 Term2 10 1 0 Term3 0 0 1 Using the Euclidean distance Documents 1 and 2 look dissimilar, and Documents 2 and 3 look similar. This is just due to the length of the documents! Using the cosine of the angle between document vectors Documents 1 and 2 are similar to each other and dissimilar to Document 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

Eigenvalues and eigenvectors Let A be a n n matrix. The vector v that satisfies Av = λv for some scalar λ is called the eigenvector of A and λ is the eigenvalue corresponding to the eigenvector v. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

In practice Av = ( 2 1 1 3 ) v = λv. det(a λi) = 2 λ 1 1 3 λ = (2 λ)(3 λ) 1 = 0 λ 1 = 3.62 λ 2 = 1.38 ( ) ( 0.52 0.85 v 1 = v 2 = 0.85 0.52 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

Matrix norms Let. be a vector norm and A R m n. The corresponding matrix norm is A = sup x 0 Ax x. A 2 = (max 1 i n λ i (A T A)) 1/2 = square root of the largest eigenvalue of A T A. Heavy to compute! A = max 1 i m n j=1 a ij (maximum over rows) A 1 = max 1 j n m i=1 a ij (maximum over columns) m n A F = i=1 j=1 a2 ij Frobenius norm: does not correspond to any vector norm. Still, related to Euclidean vector norm. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

Linear Independence Given a set of vectors (v j ) n j=1 in Rm, m n, consider the set of linear combinations y = n j=1 α jv j for arbitrary coefficients α j. The vectors (v j ) n j=1 are linearly independent, if n j=1 α jv j = 0 if and only if α j = 0 for all j = 1,..., n. A set of m linearly independent vectors of R m is called a basis in R m : any vector in R m can be expressed as a linear combination of the basis vectors. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

Example The column vectors of the matrix 1 0 0 1 1 1 0 0 [v 1 v 2 v 3 v 4 ] = 0 1 1 0 0 0 1 1 are not linearly independent, as α 1 v 1 + α 2 v 2 + α 3 v 3 + α 4 v 4 = 0 holds for α 1 = α 3 = 1, α 2 = α 4 = 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

Rank of a matrix The rank of a matrix is the maximum number of linearly independent column vectors. A square matrix A R n n with rank n is called nonsingular, and it has an inverse A 1 satisfying AA 1 = A 1 A = I. The (outer product) matrix xy T has rank 1: All columns of xy T = (y 1 x y 2 x... y n x) are linearly dependent (and so are all the rows). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

Example The 4 4 matrix 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 has rank 3. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

Example Consider a m n term-document matrix A = [a 1 a 2 a j R m are the documents.... a n ], where If A has rank 3, then all the documents can be expressed as a linear combination of only three vectors v 1, v 2 and v 3 R m : a j = w 1j v 1 + w 2j v 2 + w 3j v 3, j = 1,..., n. The term-document matrix can be written as A = VW where V = (v 1 v 2 v 3 ) R m 3 and W = (w ij ) R 3 n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

For any a 1 the matrix A = ( 1 1 inverse A 1 = 1 a 1 1 a Condition number ). ( a 1 1 1 As a 1, the norm of A 1 tends to infinity. Nonsingularity is not always enough! ) is nonsingular and has the Define the condition number of a matrix to be κ(a) = A A 1. Large condition number means trouble! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

Orthogonality Two vectors x and y are orthogonal, if x T y = 0. Let q j, j = 1,..., n be orthogonal, i.e. q T i q j = 0, i j. Then they are linearly independent. (Proof?) Let the set of orthogonal vectors q j, j = 1,..., m in R m be normalized, q = 1. Then they are orthonormal, and constitute an orthonormal basis in R m. A matrix R m m Q = [q 1 q 2... q m ] with orthonormal columns is called an orthogonal matrix. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

Why we like orthogonal matrices An orthogonal matrix Q R m m has rank m (since its columns are linearly independent). Q T Q = I. QQ T = I. (Proofs?) The inverse of an orthogonal matrix Q is Q 1 = Q T. The Euclidean length of a vector is invariant under an orthogonal transformation Q: Qx 2 = (Qx) T Qx = x T x = x 2. The product of two orthogonal matrices Q and P is orthogonal: X T X = (PQ) T PQ = Q T P T PQ = Q T Q = I. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39