Linear Algebra Methods for Data Mining

Size: px
Start display at page:

Download "Linear Algebra Methods for Data Mining"

Transcription

1 Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

2 Principal components analysis Idea: look for such a direction that the data projected onto it has maimal variance. When found, continue by seeking the net direction, which is orthogonal to this (i.e. uncorrelated), and which eplains as much of the remaining variance in the data as possible. Ergo: we are seeking linear combinations of the original variables. If we are lucky, we can find a few such linear combinations, or directions, or (principal) components, which describe the data fairly accurately. The aim is to capture the intrinsic variability in the data. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

3 1st principal component 2nd principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

4 How to compute the PCA: Data matri A, rows=data points, columns = variables (attributes, parameters). 1. Center the data by subtracting the mean of each column. 2. Compute the SVD of the centered matri values and vectors): Â = UΣV T. Â (or the k first singular 3. The principal components are the columns of V, the coordinates of the data in the basis defined by the principal components are UΣ. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

5 But the PC s are not always what we want! o o o o o o o o o o o o Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

6 Eample: Atmospheric data Data: 1500 days, and for each day, we have the means and stds of around 30 measured variables (temperature, wind speed and direction, rain fall, UV-A radiation, concentration of CO2 etc.) Therefore, our data matri is Visualizing things in a 60-dimensional space is challenging! Instead, do PCA, and project days onto the plane defined by the first two principal components. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

7 30 Days projected in the plane defined by the 1st two principal components, colored per month nd principal component st principal component 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

8 But this is not really what we are interested in! Instead, we are interested in distinguishing days when new particles spontaneously form from days with no such formation. Prinicplal components are not very good at this! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

9 30 Days projected in the plane defined by the first two principal component, colored according to particle formation nd principal component st principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

10 What to do? Look instead for a direction, which Minimized within-group variance Maimized between-group variance. Project the data onto this direction: groups (should be) well separated! This is what Linear Discriminant Analysis does. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

11 0.3 Days projected in the plane defined by the first two linear discriminants, colored according to particle formation nd linear discriminant st linear discriminant Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

12 Linear Discriminant Analysis We are given the data matri together with class labels: each data point belongs to one of the classes 1... k. Goal: map the original data into features that most effectively discriminate between classes. In other words, reduce dimension of data in a way that best preserves its cluster structure. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

13 Assume the columns of A R m n are grouped into k clusters: A = [A 1 A 2... A k ], A i R m n i, k i=1 n i = n. The centroid of each cluster ins computed by taking the average of the columns in A i : c i = 1 n i A i e i, e i = (1,... 1) T R n i 1, and the global centroid is defined as c = 1 n Ae, e = (1,... 1)T R n 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

14 Let N i denote the set of column indices that belong to cluster A i. Then the within-cluster, between-cluster and miture (or total) scatter matrices are defined as follows: S w = k i=1 (a j c i )(a j c i ) T j N i S b = S m = k i=1 (c i c)(c i c) T = j N i n (a j c)(a j c) T k i=1 n i (c i c)(c i c) T i=1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

15 Let w be the vector along which we shall project our data. Now we achieve our goal by maimizing the objective: J(w) = wt S b w w T S w w In doing so, we minimize the within-cluster scatter while maimizing the between-cluster scatter. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

16 Note: one can show that S m = S w + S b, so J(w) = wt S b w w T S w w can be written as J(w) = wt S m w w T S w w 1 which means we are maimizing total scatter while minimizing withincluster scatter. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

17 Note, that the value of J(w) = wt S b w w T S w w is the same regardless of how we scale w αw. Before (when discussing PCA) we chose w T w = 1. This time, let us require w to be such that w T S w w = 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

18 Now our problem can be stated as follows: we wish to maimize J(w) = wt S b w w T S w w subject to the constraint w T S w w = 1. Optimization problem: maimize λ is the Lagrange multiplier. f = w T S b w λ(w T S w w 1), Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

19 Again we solve the optimization problem ma w f = ma w ( w T S b w λ(w T S w w 1) ) by differentiating with respect to w; this yields f w = 2S bw 2λS w w = 0 This leads to the generalized eigenvalue problem S b w = λs w w. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

20 If S w is invertible, the generalized eigenproblem S b w = λs w w. can be written as S 1 w S b w = λw. The solutions of this are the eigenvalues and eigenvectors of the matri Sw 1 S b. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

21 Denote the eigenvalues and eigenvectors by λ k and w k. Remembering, that insert these into J(w): S b w k = λ k S w w k J(w k ) = wt k S bw k w T k S ww k = wt k λ ks w w k w T k S ww k = λ k. So the direction w which maimizes the value of J(w) is the eigenvector corresponding to the largest eigenvalue of S 1 w S b. The largest eigenvalue tells about how well classes separate. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

22 A, B R n n. Generalized eigenvalue problems A = λb, 0. Has n generalized eigenvalues λ if and only if rankb = n. If rankb < n, then the number of λ may be zero, finite, or infinite: A = B = ( ) 1 2, A = 0 3 ( ) 1 0. B = 0 0 ( ) 1 2, A = 0 3 ( ) 0 1. B = 0 0 ( ) 1 2, 0 0 ( ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

23 Symmetric-definite generalized eigenproblems Let the matrices A, B R n n, be such that A symmetric, B symmetric positive definite. Find λ and 0 such that A = λb. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

24 Theorem. If A, B R n n, A symmetric, B symmetric positive definite, then there eists a nonsingular X = [ 1,..., n ] such that X T AX = diag(a 1,..., a n ), X T BX = diag(b 1,..., b n ). Moreover, A i = λ i B i for i = 1,..., n, where λ i = a i /b i. Note: the matri X can be chosen in such a way, that X T AX = diag(λ 1,..., λ n ), X T BX = diag(1,..., 1). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

25 Eample A = X = ( ) ( ) , B = ( 81 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

26 Our generalized eigenvalue problem was S b w = λs w w. We know that both S b and S w are symmetric, and positive semidefinite. Assuming S w is invertible it is also positive definite. So there is a matri X such that and X T S b X = diag(λ 1,..., λ n ), X T S w X = diag(1,..., 1), S b i = λ j S w i. Since S b is positive semidefinite, and T i S b i = λ j, we see that the λ i 0 for all i. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

27 So, the generalized eigenvalues of our problem S b w = λs w w are all nonnegative. Moreover, only the largest r eigenvalues are nonzero, where r = rank(s b ). Remember that k S b = n i (c i c)(c i c) T, i=1 which is a sum of k rank-1 matrices, so the rank of S b is at most k. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

28 Here we only considered looking for the first linear discriminant, which is the eigenvector corresponding to the largest eigenvalue of S 1 w S b. the eigenvectors corre- The l first linear discriminants are (of course!) sponding to the l largest eigenvalues. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

29 So how did we get the linear discriminants? Step 1: Compute the scatter matrices S b and S w. Step 2: Solve the generalized eigenvalue problem S b w = λs w w. (In matlab you can use eigs!) Step 3: Order the eigenvalues from largest to smallest, and the eigenvectors accordingly. These are your linear discriminants. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

30 Step 4: In classification: use training set to decide where boundaries are. Use test set to evaluate performance. Note: in reality, one should use N-fold crossvalidation: Divide all data into N parts, and use one part as the test set and the rest as the training set. Report the average performance across the test sets. Why? To get a more reliable estimate of performance (avoiding the situation where the training set and test set are too good ). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

31 Eample: 2 classes in 2D c1=mean(a1,2);c2=mean(a2,2); A=[A1 A2];c=mean(A,2); sb=n1*(c1-c)*(c1-c) +n2*(c2-c)*(c2-c) ; tmp1=a1-repmat(c1,1,n1); tmp2=a2-repmat(c2,1,n2); sw=tmp1*tmp1 +tmp2*tmp2 ; [v,d]=eigs(sb,sw); Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

32 data y Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

33 1.5 data in pccoords pc pc Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

34 0 data projected in the plane defined by the first two linear discriminants nd linear discriminant st linear discriminant Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

35 3 data,1st pc (solid), 1st ld (dashed) y Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

36 Two-class case S w = j N 1 (a j c 1 )(a j c 1 ) T + where Σ i is the covariance matri of class i. j N 2 (a j c 2 )(a j c 2 ) T = n 1 Σ 1 + n 2 Σ 2, Also, we can use the fact that c = 1 n (n 1c 1 + n 2 c 2 ) to get S b = n 1 (c 1 c)(c 1 c) T +n 2 (c 2 c)(c 2 c) T = n 1n 2 n (c 2 c 1 )(c 2 c 1 ) T. This is a rank-1 matri! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

37 We have, for nonsingular S w, S 1 w S b 1 = S w n 1 n 2 n (c 2 c 1 )(c 2 c 1 ) T 1 = λ 1 1, which yields (for some α) 1 = αs 1 w (c 2 c 1 ), and λ 1 = n 1n 2 n (c 2 c 1 ) T S 1 w (c 2 c 1 ) ( = trace(s 1 w S b ) ). Good class separation? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

38 Remember: the largest eigenvalue tells about how well classes separate. λ 1 = n 1n 2 n (c 2 c 1 ) T S 1 w (c 2 c 1 ) So we get better separation of two classes, if difference of class means (c 2 c 1 ) is large relative to the weighted sum of class covariance matrices n 1 Σ 1 + n 2 Σ 2 = S w. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

39 Atmospheric data again 0.3 Days projected in the plane defined by the first two linear discriminants, colored according to particle formation nd linear discriminant st linear discriminant Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

40 Could we look at the weights of the variables in the first linear discriminant to see which variables are important in separating the red dots from the blue? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39

41 Fisher discriminant analysis (R.A. Fisher, The use of multiple measurements in taonomic problems, 1936) Uses slighlty different criterion: maimize where the scatter matrices are J(w) = wt S b w w T S w w S b = (c 2 c 1 )(c 2 c 1 ) T, S w = Σ 1 + Σ 2. If the classes are of equal size (n 1 = n 2 ), then this is the same as what we discussed above. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40

42 Other criteria Several measures of cluster quality, which involve the three scatter matrices, have been suggested, including J = trace(s 1 w S b ) and J = trace(s 1 w S m ). For more discussion on these and others, see e.g. therein. [2] and references Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41

43 What if S w is singular? Then this approach will not work, as it is based on finding the eigenvalues of S 1 w S b! This is typically the case in undersampled problems, where the number of samples is small compared to the dimension of the data points. For eample, microarray data, tet data, image data. Answer: instead of solving the generalized eigenproblem we can formulate the problem in terms of the generalized SVD. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42

44 References [1] Lars Eldén: Matri Methods in Data Mining and Pattern Recognition, SIAM [2] P. Howland and H. Park: Etension of Discriminant Analysis based on the Generalized Singular Value Decomposition, [3] B. G. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 PCA, NMF Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Summary: PCA PCA is SVD

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Linear Algebra and Matrices

Linear Algebra and Matrices Linear Algebra and Matrices 4 Overview In this chapter we studying true matrix operations, not element operations as was done in earlier chapters. Working with MAT- LAB functions should now be fairly routine.

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 51 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Lecture Eigenvalue Problem Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Eigenvalue Eigenvalue ( A I) If det( A I) (trivial solution) To obtain

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

LEC 3: Fisher Discriminant Analysis (FDA)

LEC 3: Fisher Discriminant Analysis (FDA) LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Chapter 1. Matrix Algebra

Chapter 1. Matrix Algebra ST4233, Linear Models, Semester 1 2008-2009 Chapter 1. Matrix Algebra 1 Matrix and vector notation Definition 1.1 A matrix is a rectangular or square array of numbers of variables. We use uppercase boldface

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois

More information

LECTURE NOTE #10 PROF. ALAN YUILLE

LECTURE NOTE #10 PROF. ALAN YUILLE LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Motivation Principal Component Analysis (PCA) is a multivariate statistical technique that is often useful in reducing dimensionality of a collection of unstructured random

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, 2. REVIEW OF LINEAR ALGEBRA 1 Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, where Y n 1 response vector and X n p is the model matrix (or design matrix ) with one row for

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

The Principal Component Analysis

The Principal Component Analysis The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall 2017 1 / 27 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

Principal Component Analysis!! Lecture 11!

Principal Component Analysis!! Lecture 11! Principal Component Analysis Lecture 11 1 Eigenvectors and Eigenvalues g Consider this problem of spreading butter on a bread slice 2 Eigenvectors and Eigenvalues g Consider this problem of stretching

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

The Mathematics of Facial Recognition

The Mathematics of Facial Recognition William Dean Gowin Graduate Student Appalachian State University July 26, 2007 Outline EigenFaces Deconstruct a known face into an N-dimensional facespace where N is the number of faces in our data set.

More information

18.06SC Final Exam Solutions

18.06SC Final Exam Solutions 18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised

More information

GEOG 4110/5100 Advanced Remote Sensing Lecture 15

GEOG 4110/5100 Advanced Remote Sensing Lecture 15 GEOG 4110/5100 Advanced Remote Sensing Lecture 15 Principal Component Analysis Relevant reading: Richards. Chapters 6.3* http://www.ce.yildiz.edu.tr/personal/songul/file/1097/principal_components.pdf *For

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Conceptual Questions for Review

Conceptual Questions for Review Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Dimensionality Reduction

Dimensionality Reduction 394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the

More information

Notes on Linear Algebra

Notes on Linear Algebra 1 Notes on Linear Algebra Jean Walrand August 2005 I INTRODUCTION Linear Algebra is the theory of linear transformations Applications abound in estimation control and Markov chains You should be familiar

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis. Exam Review Material covered by the exam [ Orthogonal matrices Q = q 1... ] q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis

More information

AN ITERATION. In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b

AN ITERATION. In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b AN ITERATION In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b In this, A is an n n matrix and b R n.systemsof this form arise

More information

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes. Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information