Linear Algebra Methods for Data Mining
|
|
- Victoria Copeland
- 6 years ago
- Views:
Transcription
1 Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
2 Principal components analysis Idea: look for such a direction that the data projected onto it has maimal variance. When found, continue by seeking the net direction, which is orthogonal to this (i.e. uncorrelated), and which eplains as much of the remaining variance in the data as possible. Ergo: we are seeking linear combinations of the original variables. If we are lucky, we can find a few such linear combinations, or directions, or (principal) components, which describe the data fairly accurately. The aim is to capture the intrinsic variability in the data. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1
3 1st principal component 2nd principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2
4 How to compute the PCA: Data matri A, rows=data points, columns = variables (attributes, parameters). 1. Center the data by subtracting the mean of each column. 2. Compute the SVD of the centered matri values and vectors): Â = UΣV T. Â (or the k first singular 3. The principal components are the columns of V, the coordinates of the data in the basis defined by the principal components are UΣ. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3
5 But the PC s are not always what we want! o o o o o o o o o o o o Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4
6 Eample: Atmospheric data Data: 1500 days, and for each day, we have the means and stds of around 30 measured variables (temperature, wind speed and direction, rain fall, UV-A radiation, concentration of CO2 etc.) Therefore, our data matri is Visualizing things in a 60-dimensional space is challenging! Instead, do PCA, and project days onto the plane defined by the first two principal components. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5
7 30 Days projected in the plane defined by the 1st two principal components, colored per month nd principal component st principal component 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6
8 But this is not really what we are interested in! Instead, we are interested in distinguishing days when new particles spontaneously form from days with no such formation. Prinicplal components are not very good at this! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7
9 30 Days projected in the plane defined by the first two principal component, colored according to particle formation nd principal component st principal component Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8
10 What to do? Look instead for a direction, which Minimized within-group variance Maimized between-group variance. Project the data onto this direction: groups (should be) well separated! This is what Linear Discriminant Analysis does. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9
11 0.3 Days projected in the plane defined by the first two linear discriminants, colored according to particle formation nd linear discriminant st linear discriminant Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10
12 Linear Discriminant Analysis We are given the data matri together with class labels: each data point belongs to one of the classes 1... k. Goal: map the original data into features that most effectively discriminate between classes. In other words, reduce dimension of data in a way that best preserves its cluster structure. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11
13 Assume the columns of A R m n are grouped into k clusters: A = [A 1 A 2... A k ], A i R m n i, k i=1 n i = n. The centroid of each cluster ins computed by taking the average of the columns in A i : c i = 1 n i A i e i, e i = (1,... 1) T R n i 1, and the global centroid is defined as c = 1 n Ae, e = (1,... 1)T R n 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12
14 Let N i denote the set of column indices that belong to cluster A i. Then the within-cluster, between-cluster and miture (or total) scatter matrices are defined as follows: S w = k i=1 (a j c i )(a j c i ) T j N i S b = S m = k i=1 (c i c)(c i c) T = j N i n (a j c)(a j c) T k i=1 n i (c i c)(c i c) T i=1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13
15 Let w be the vector along which we shall project our data. Now we achieve our goal by maimizing the objective: J(w) = wt S b w w T S w w In doing so, we minimize the within-cluster scatter while maimizing the between-cluster scatter. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14
16 Note: one can show that S m = S w + S b, so J(w) = wt S b w w T S w w can be written as J(w) = wt S m w w T S w w 1 which means we are maimizing total scatter while minimizing withincluster scatter. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15
17 Note, that the value of J(w) = wt S b w w T S w w is the same regardless of how we scale w αw. Before (when discussing PCA) we chose w T w = 1. This time, let us require w to be such that w T S w w = 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16
18 Now our problem can be stated as follows: we wish to maimize J(w) = wt S b w w T S w w subject to the constraint w T S w w = 1. Optimization problem: maimize λ is the Lagrange multiplier. f = w T S b w λ(w T S w w 1), Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17
19 Again we solve the optimization problem ma w f = ma w ( w T S b w λ(w T S w w 1) ) by differentiating with respect to w; this yields f w = 2S bw 2λS w w = 0 This leads to the generalized eigenvalue problem S b w = λs w w. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18
20 If S w is invertible, the generalized eigenproblem S b w = λs w w. can be written as S 1 w S b w = λw. The solutions of this are the eigenvalues and eigenvectors of the matri Sw 1 S b. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19
21 Denote the eigenvalues and eigenvectors by λ k and w k. Remembering, that insert these into J(w): S b w k = λ k S w w k J(w k ) = wt k S bw k w T k S ww k = wt k λ ks w w k w T k S ww k = λ k. So the direction w which maimizes the value of J(w) is the eigenvector corresponding to the largest eigenvalue of S 1 w S b. The largest eigenvalue tells about how well classes separate. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20
22 A, B R n n. Generalized eigenvalue problems A = λb, 0. Has n generalized eigenvalues λ if and only if rankb = n. If rankb < n, then the number of λ may be zero, finite, or infinite: A = B = ( ) 1 2, A = 0 3 ( ) 1 0. B = 0 0 ( ) 1 2, A = 0 3 ( ) 0 1. B = 0 0 ( ) 1 2, 0 0 ( ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21
23 Symmetric-definite generalized eigenproblems Let the matrices A, B R n n, be such that A symmetric, B symmetric positive definite. Find λ and 0 such that A = λb. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22
24 Theorem. If A, B R n n, A symmetric, B symmetric positive definite, then there eists a nonsingular X = [ 1,..., n ] such that X T AX = diag(a 1,..., a n ), X T BX = diag(b 1,..., b n ). Moreover, A i = λ i B i for i = 1,..., n, where λ i = a i /b i. Note: the matri X can be chosen in such a way, that X T AX = diag(λ 1,..., λ n ), X T BX = diag(1,..., 1). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23
25 Eample A = X = ( ) ( ) , B = ( 81 ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24
26 Our generalized eigenvalue problem was S b w = λs w w. We know that both S b and S w are symmetric, and positive semidefinite. Assuming S w is invertible it is also positive definite. So there is a matri X such that and X T S b X = diag(λ 1,..., λ n ), X T S w X = diag(1,..., 1), S b i = λ j S w i. Since S b is positive semidefinite, and T i S b i = λ j, we see that the λ i 0 for all i. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25
27 So, the generalized eigenvalues of our problem S b w = λs w w are all nonnegative. Moreover, only the largest r eigenvalues are nonzero, where r = rank(s b ). Remember that k S b = n i (c i c)(c i c) T, i=1 which is a sum of k rank-1 matrices, so the rank of S b is at most k. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26
28 Here we only considered looking for the first linear discriminant, which is the eigenvector corresponding to the largest eigenvalue of S 1 w S b. the eigenvectors corre- The l first linear discriminants are (of course!) sponding to the l largest eigenvalues. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27
29 So how did we get the linear discriminants? Step 1: Compute the scatter matrices S b and S w. Step 2: Solve the generalized eigenvalue problem S b w = λs w w. (In matlab you can use eigs!) Step 3: Order the eigenvalues from largest to smallest, and the eigenvectors accordingly. These are your linear discriminants. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28
30 Step 4: In classification: use training set to decide where boundaries are. Use test set to evaluate performance. Note: in reality, one should use N-fold crossvalidation: Divide all data into N parts, and use one part as the test set and the rest as the training set. Report the average performance across the test sets. Why? To get a more reliable estimate of performance (avoiding the situation where the training set and test set are too good ). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29
31 Eample: 2 classes in 2D c1=mean(a1,2);c2=mean(a2,2); A=[A1 A2];c=mean(A,2); sb=n1*(c1-c)*(c1-c) +n2*(c2-c)*(c2-c) ; tmp1=a1-repmat(c1,1,n1); tmp2=a2-repmat(c2,1,n2); sw=tmp1*tmp1 +tmp2*tmp2 ; [v,d]=eigs(sb,sw); Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30
32 data y Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31
33 1.5 data in pccoords pc pc Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32
34 0 data projected in the plane defined by the first two linear discriminants nd linear discriminant st linear discriminant Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33
35 3 data,1st pc (solid), 1st ld (dashed) y Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34
36 Two-class case S w = j N 1 (a j c 1 )(a j c 1 ) T + where Σ i is the covariance matri of class i. j N 2 (a j c 2 )(a j c 2 ) T = n 1 Σ 1 + n 2 Σ 2, Also, we can use the fact that c = 1 n (n 1c 1 + n 2 c 2 ) to get S b = n 1 (c 1 c)(c 1 c) T +n 2 (c 2 c)(c 2 c) T = n 1n 2 n (c 2 c 1 )(c 2 c 1 ) T. This is a rank-1 matri! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35
37 We have, for nonsingular S w, S 1 w S b 1 = S w n 1 n 2 n (c 2 c 1 )(c 2 c 1 ) T 1 = λ 1 1, which yields (for some α) 1 = αs 1 w (c 2 c 1 ), and λ 1 = n 1n 2 n (c 2 c 1 ) T S 1 w (c 2 c 1 ) ( = trace(s 1 w S b ) ). Good class separation? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36
38 Remember: the largest eigenvalue tells about how well classes separate. λ 1 = n 1n 2 n (c 2 c 1 ) T S 1 w (c 2 c 1 ) So we get better separation of two classes, if difference of class means (c 2 c 1 ) is large relative to the weighted sum of class covariance matrices n 1 Σ 1 + n 2 Σ 2 = S w. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37
39 Atmospheric data again 0.3 Days projected in the plane defined by the first two linear discriminants, colored according to particle formation nd linear discriminant st linear discriminant Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38
40 Could we look at the weights of the variables in the first linear discriminant to see which variables are important in separating the red dots from the blue? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39
41 Fisher discriminant analysis (R.A. Fisher, The use of multiple measurements in taonomic problems, 1936) Uses slighlty different criterion: maimize where the scatter matrices are J(w) = wt S b w w T S w w S b = (c 2 c 1 )(c 2 c 1 ) T, S w = Σ 1 + Σ 2. If the classes are of equal size (n 1 = n 2 ), then this is the same as what we discussed above. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40
42 Other criteria Several measures of cluster quality, which involve the three scatter matrices, have been suggested, including J = trace(s 1 w S b ) and J = trace(s 1 w S m ). For more discussion on these and others, see e.g. therein. [2] and references Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41
43 What if S w is singular? Then this approach will not work, as it is based on finding the eigenvalues of S 1 w S b! This is typically the case in undersampled problems, where the number of samples is small compared to the dimension of the data points. For eample, microarray data, tet data, image data. Answer: instead of solving the generalized eigenproblem we can formulate the problem in terms of the generalized SVD. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42
44 References [1] Lars Eldén: Matri Methods in Data Mining and Pattern Recognition, SIAM [2] P. Howland and H. Park: Etension of Discriminant Analysis based on the Generalized Singular Value Decomposition, [3] B. G. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43
Linear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 PCA, NMF Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Summary: PCA PCA is SVD
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationLinear Algebra Review. Fei-Fei Li
Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationFisher s Linear Discriminant Analysis
Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLinear Algebra and Matrices
Linear Algebra and Matrices 4 Overview In this chapter we studying true matrix operations, not element operations as was done in earlier chapters. Working with MAT- LAB functions should now be fairly routine.
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationPrincipal Component Analysis
Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationPrincipal Component Analysis
Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationWhen Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants
When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationPrincipal Components Theory Notes
Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationNotes on Eigenvalues, Singular Values and QR
Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationImage Analysis & Retrieval. Lec 14. Eigenface and Fisherface
Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring
More informationLinear Algebra Review. Fei-Fei Li
Linear Algebra Review Fei-Fei Li 1 / 51 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationLecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method
Lecture Eigenvalue Problem Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Eigenvalue Eigenvalue ( A I) If det( A I) (trivial solution) To obtain
More informationExample: Face Detection
Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationLEC 3: Fisher Discriminant Analysis (FDA)
LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationChapter 1. Matrix Algebra
ST4233, Linear Models, Semester 1 2008-2009 Chapter 1. Matrix Algebra 1 Matrix and vector notation Definition 1.1 A matrix is a rectangular or square array of numbers of variables. We use uppercase boldface
More informationCSE 554 Lecture 7: Alignment
CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex
More informationMore Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson
More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois
More informationLECTURE NOTE #10 PROF. ALAN YUILLE
LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationPrincipal Component Analysis
Principal Component Analysis Motivation Principal Component Analysis (PCA) is a multivariate statistical technique that is often useful in reducing dimensionality of a collection of unstructured random
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationLinear Algebra - Part II
Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the
More informationLecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,
2. REVIEW OF LINEAR ALGEBRA 1 Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, where Y n 1 response vector and X n p is the model matrix (or design matrix ) with one row for
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationDimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014
Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their
More informationThe Principal Component Analysis
The Principal Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) PCA Fall 2017 1 / 27 Introduction Every 80 minutes, the two Landsat satellites go around the world, recording images
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationIssues and Techniques in Pattern Classification
Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated
More informationPrincipal Component Analysis!! Lecture 11!
Principal Component Analysis Lecture 11 1 Eigenvectors and Eigenvalues g Consider this problem of spreading butter on a bread slice 2 Eigenvectors and Eigenvalues g Consider this problem of stretching
More informationImage Analysis & Retrieval Lec 14 - Eigenface & Fisherface
CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationThe Mathematics of Facial Recognition
William Dean Gowin Graduate Student Appalachian State University July 26, 2007 Outline EigenFaces Deconstruct a known face into an N-dimensional facespace where N is the number of faces in our data set.
More information18.06SC Final Exam Solutions
18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationLinear & Non-Linear Discriminant Analysis! Hugh R. Wilson
Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised
More informationGEOG 4110/5100 Advanced Remote Sensing Lecture 15
GEOG 4110/5100 Advanced Remote Sensing Lecture 15 Principal Component Analysis Relevant reading: Richards. Chapters 6.3* http://www.ce.yildiz.edu.tr/personal/songul/file/1097/principal_components.pdf *For
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationConceptual Questions for Review
Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationDimensionality Reduction
394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the
More informationNotes on Linear Algebra
1 Notes on Linear Algebra Jean Walrand August 2005 I INTRODUCTION Linear Algebra is the theory of linear transformations Applications abound in estimation control and Markov chains You should be familiar
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationRecall the convention that, for us, all vectors are column vectors.
Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists
More informationq n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.
Exam Review Material covered by the exam [ Orthogonal matrices Q = q 1... ] q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis
More informationAN ITERATION. In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b
AN ITERATION In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b In this, A is an n n matrix and b R n.systemsof this form arise
More informationKarhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering
Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More informationLECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.
Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More information