Pattern Recognition 2
|
|
- Eustace Elliott
- 6 years ago
- Views:
Transcription
1 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore
2 Outline
3 Outline
4 The Bayes Classifier is theoretically optimum. That is, prob. of error (misclassification), P(error), is smallest, given the set of features. But this requires knowing the true class-conditional pdfs. In reality, we don t know, so we estimate from training data. This may break the optimality, because our training data is not representative of all possible data. Nevertheless, the Bayes Classifier is commonly used. See also: [1]
5 Estimating pdfs One common method is to use histograms. Normalized histogram = pdf.
6 Estimating pdfs Multidimensional histograms are possible. But require a lot of training data to estimate. Heuristic: a D-dimensional histogram needs 10 D training samples. This is the Curse of Dimensionality
7 Naive Bayes To overcome the Curse of Dimensionality, one way is to assume that the D features are statistically independent. Recap: two random variables X, Y are statistically independent iff P(X, Y ) = P(X) P(Y ) Suppose feature vector x = [x 1,..., x D ], then: ω = arg max ω j P(x ω j ) P(ω j ) = arg max ω j P(ω j ) D P(x i ω j ) i=1 In practice, use log probabilities to avoid underflow. This is called the Naive Bayes assumption. Surprisingly, it works well in practice!
8 Face Detection using Naive Bayes [2]
9 Outline
10 K -Nearest Neighbor Instead of using pdfs, why not simply estimate the decision boundary? The K -Nearest Neighbor (KNN) method estimates this implicitly.
11 K -Nearest Neighbor All training data are simply stored, with their class labels. Given a point x to be classified, Select the K nearest neighbors of x. Assign to x the majority label of these K neighbors. Usually, K is odd. Ties can still occur: e.g. 3 classes, and K = 3. This is not optimal classifier, but given enough training data, P(error KNN ) 2 P(error Bayes ) Notion of nearness : Distance metric important. Choice of distance metric affects accuracy. Pros: easy to implement. Cons: cannot scale.
12 Implicit Decision Boundary Each Voronoi cell is a region whose nearest neighbor is the training sample enclosed in the cell.
13 Outline
14 How Good is the Classifier? Test it with another set of labelled data (not training set). ω 1 ω 2 ω 3 ω 4 ω ω ω ω Confusion Matrix The row labels are the true labels of the test data; the columns labels are the ones assigned by the classifier. Diagonal entries are the number of correct classifications. Off-diagonal entries are misclassifications. Ideally, matrix should be diagonal, meaning 0 error. Accuracy = trace/total = 63/96 = 65.6 %
15 Cross Validation To guard against a fluke test. Divide T training samples into N equal bins. Leave out 1 st bin. Train using the rest. Test using 1 st bin. Repeat by leaving out each bin in turn, training using the rest, and testing using the omitted bin. Average the accuracies from all runs. This is called N-fold Cross Validation. Leave-1-out cross validation: when N = T
16 Outline
17 What to Use? Choosing features more of an art than a science. No theory says which feature is best for given problem. Experience, insight, familiarity help. Why not try all possible sets of features? NP-hard! Greedy approach: From F possible features, select 1 that gives best accuracy. From F 1 possible features, select 1 that, when combined with previous feature, gives best accuracy. etc. Common features: edges, lines, color histogram, wavelets, Fourier transform, shape, derivatives.
18 Desirable Properties Easy to compute (efficient) Compact (less storage) Good discriminative power Efficient distance metric Robust to image distortions (e.g. rotation, illumination)
19 Principal Components Analysis () a.k.a. Discrete Karhunen Loeve Transform, Hotelling Transform Let y R k be feature vector computed from image (or another feature vector) x R d, where k d. y = W x (1) W is d k and orthogonal. W to be determined from statistics of x. Let s suppose mean and covariance matrix are: E[x] = 0, E[xx ] = C x We want expected error to be small. How to compute W?
20 Recap Expectation is the mean (average) of random variable x : E[x] = x p(x)dx Variance is the expected squared difference from mean m : Var[x] = E[(x m) 2 ] = E[x 2 ] (E(x)) 2 For vectors, Mean: m = E[x] = [E[x 1 ] E[x 2 ] E[x d ]] Covariance matrix: Var[x] = E[(x m)(x m) ] = E[xx ] mm Note: covariance matrix is symmetric and positive semi-definite
21 Recovered vector x r = Wy Error: ɛ = x x r = x WW x We want small expected error: ɛ 2 = ɛ ɛ = (x WW x) (x WW x) = x x x WW x x WW x + x WW WW x = x x x WW x Note that W W = I.
22 Let k = 1, i.e. W is vector, y is scalar. Then E[ε ε] = E[x x] E[(x w)(w x)] = E[x x] E[w xx w] = E[x x] w E[xx ]w = E[x x] w C x w
23 We need to find w that minimizes E[ɛ ɛ] This is the same as maximizing the 2 nd term on right-hand side: max w w C x w But we should normalize by length of w, so define J = w C x w w w (2) Goal: find w to maximize J
24 Take derivatives and set to 0: dj dw = (w w)2c x w (w C x w)2w (w w) 2 = 0 0 = 2C [ xw w ] w w C x w w 2w w w w Note: term in brackets is J, so we rearrange to get: C x w = J w Eigenvalue problem! Thus, w is eigenvector of C x corresponding to largest eigenvalue (= J).
25 In general, is: y = W (x m) where m = E[x] mean, and W is d k matrix containing the k eigenvectors of Var[x] (covariance matrix) corresponding to the top k eigenvalues. W = w 1 w 2 w k w 1 : First principal component, w 2 : Second principal component, etc.
26 notes is a shift and rotation of the axes. w 1 : direction of greatest elongation w 2 : direction of next greatest elongation, and orthogonal to previous eigenvector; etc. W is orthogonal because C x is symmetric. WW I unless k = d
27 notes Best compression of r.v. x, in the "least-squares error" sense. Guaranteed by the derivation of. De-correlation: y = W (x m) Var[y] = C y = W C x W = Λ diagonal eigenvalue matrix i.e. elements of y = [y 1 y k ] are uncorrelated.
28 notes Dimensionality Reduction: x R d, y R k, k d P d j=1 λ j Typically, choose k so that ratio P k i=1 λ i > 90% "Energy" But how to get C x, m? Statistics of x Given data x 1, x 2,... x N, estimate m, C x Sample mean m = 1 N N i=1 x i Sample covariance matrix: Ĉ x = 1 N (x i m)(x i m) N or Scatter matrix S = or 1 N 1 i=1 N (x i m)(x i m) i=1 N (xi m)(x i m)
29 notes Usually, N d, so Ĉx is rank-deficient, also Ĉx is too large. d d matrix. Computational trick: use inner products. Let A = x 1 m x 2 m x N m, A is d N Then A A is N N. Find eigenvectors/values of A A A Av = λv (AA )Av = λ(av) Av is eigenvector of AA Note that AA = N i=1 (x i m)(x i m) = S, scatter matrix Thus we avoid calculating AA
30 example 1 Perhaps the most famous use of is in face recognition, as exemplified in the classic paper by Turk and Pentland [3]. 2 Start with a set of face images, vectorize them, then compute the. The mean and eigenfaces look like this:
31 example 1 A face image is then a weighted sum of these eigenfaces, + mean. 2 Using more eigenfaces leads to better approximation. Orig. image 1 EF 5 EF 15 EF 25 EF
32 example 1 Each face is thus represented as the weights. 2 Recognition can be done by finding closest weight. 3 What about images of non-faces? Image Approximation
33 Linear Discriminant Analysis () a.k.a. Fisher Linear Discriminant has been used as features in e.g. face recognition. See [3] But is good for pattern representation rather than for pattern discrimination Nevertheless, is still useful for Dimensionality Reduction
34 Linear Discriminant Analysis () will result in w shown in left figure. will result in w shown in right figure.
35 Suppose we have C = 2 classes: ω 1, ω 2 Define the within-class scatter matrix as: S W = C i=1 x ω i (x m i )(x m i ) where m i is the mean of class ω i. Think of this as the sum of the covariance matrix of each class. It measures the spread of each class.
36 Also define the between-class scatter matrix as: S B = (m 1 m 2 )(m 1 m 2 ) This measures the separation of the two classes.
37 Fisher s Criterion is defined as: J F = w S B w w S W w (3) Goal: maximize J F, i.e. find a vector w such that different classes are well separated, while spread in each class is reduced. Compare this with criteria Equation (2) Total scatter matrix S = S W + S B, so is maximizing the spread in the total scatter matrix.
38 It can be shown that the solution to Equation (3) is: S B w = λs W w (4) This is called the Generalized Eigenvalue problem. If S W is invertible, then S 1 W S Bw = λw which is the regular eigenvalue problem involving S 1 W S B
39 Once w is found, feature is computed as: y = w x. Using this feature, we can then classify using, say, KNN. If more than 2 classes, then re-define between-class matrix as: S B = C n i (m i m)(m i m) i=1 where n i is the number of training samples in ω i, and m is the mean of all training samples (ignoring classes).
40 There will be C 1 eigenvectors w i, i = 1... C, each solved using Equation (4). These eigenvectors are not necessarily orthogonal! Because S 1 W B may not be symmetric. These eigenvectors form a linear subspace such that Fisher s Criterion is maximized. Put them into a W matrix, and compute feature as before: y = W x.
41 Outline
42 1 The Bayes Classifier is the theoretical best, but estimating the pdfs is a problem. 2 The K -nearest neighbor classifier is easy to implement, but doesn t scale up well. Also: choice of distance metric important. 3 Deciding what features are best for a given pattern recognition problem is an art, not a science. 4 Two common techniques to compute useful features are: and.
43 References R. Duda, P. Hart, and D. Stork. Pattern Classification, 2nd edition. John Wiley and Sons, H. Schneiderman and T. Kanade. A statistical method for 3d object detection applied to faces and cars. In IEEE Conference on Computer Vision and Pattern Recognition, M. Turk and A. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3(1), 1991.
CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationWhen Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants
When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationExample: Face Detection
Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationDimension Reduction and Component Analysis
Dimension Reduction and Component Analysis Jason Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Dimension Reduction and Component Analysis 1 / 103 Plan We learned about estimating parametric models and
More informationDimension Reduction and Component Analysis Lecture 4
Dimension Reduction and Component Analysis Lecture 4 Jason Corso SUNY at Buffalo 28 January 2009 J. Corso (SUNY at Buffalo) Lecture 4 28 January 2009 1 / 103 Plan In lecture 3, we learned about estimating
More informationEigenfaces and Fisherfaces
Eigenfaces and Fisherfaces Dimension Reduction and Component Analysis Jason Corso University of Michigan EECS 598 Fall 2014 Foundations of Computer Vision JJ Corso (University of Michigan) Eigenfaces and
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationEECS490: Digital Image Processing. Lecture #26
Lecture #26 Moments; invariant moments Eigenvector, principal component analysis Boundary coding Image primitives Image representation: trees, graphs Object recognition and classes Minimum distance classifiers
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationEigenfaces. Face Recognition Using Principal Components Analysis
Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationDimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014
Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationImage Analysis. PCA and Eigenfaces
Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationCHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION
59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More informationLecture: Face Recognition
Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationNon-parametric Classification of Facial Features
Non-parametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted
More informationMinimum Error-Rate Discriminant
Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationFace Detection and Recognition
Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification
More informationImage Analysis & Retrieval. Lec 14. Eigenface and Fisherface
Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring
More informationLECTURE NOTE #10 PROF. ALAN YUILLE
LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationPCA FACE RECOGNITION
PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationLecture 13 Visual recognition
Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationEigenimaging for Facial Recognition
Eigenimaging for Facial Recognition Aaron Kosmatin, Clayton Broman December 2, 21 Abstract The interest of this paper is Principal Component Analysis, specifically its area of application to facial recognition
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationSTUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION
INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Kmeans + GMM + Hierarchical next aper classificaron?
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationLecture 15: Random Projections
Lecture 15: Random Projections Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 15 1 / 11 Review of PCA Unsupervised learning technique Performs dimensionality reduction
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationCS 4495 Computer Vision Principle Component Analysis
CS 4495 Computer Vision Principle Component Analysis (and it s use in Computer Vision) Aaron Bobick School of Interactive Computing Administrivia PS6 is out. Due *** Sunday, Nov 24th at 11:55pm *** PS7
More informationSystem 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:
System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationImage Analysis & Retrieval Lec 14 - Eigenface & Fisherface
CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:
More informationFeature selection and extraction Spectral domain quality estimation Alternatives
Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationLecture Notes on the Gaussian Distribution
Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationL5: Quadratic classifiers
L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationComparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.
Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA 99, ashington D. C. USA, March 22-2, 1999. Comparative Assessment of Independent Component
More informationFace Recognition and Biometric Systems
The Eigenfaces method Plan of the lecture Principal Components Analysis main idea Feature extraction by PCA face recognition Eigenfaces training feature extraction Literature M.A.Turk, A.P.Pentland Face
More informationPattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors
More informationDimensionality Reduction and Principle Components
Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,
More informationA Statistical Analysis of Fukunaga Koontz Transform
1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,
More informationEigenvalues and diagonalization
Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves
More informationDOA Estimation using MUSIC and Root MUSIC Methods
DOA Estimation using MUSIC and Root MUSIC Methods EE602 Statistical signal Processing 4/13/2009 Presented By: Chhavipreet Singh(Y515) Siddharth Sahoo(Y5827447) 2 Table of Contents 1 Introduction... 3 2
More informationRobot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction
Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel
More informationFace detection and recognition. Detection Recognition Sally
Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification
More informationTutorial on Principal Component Analysis
Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms
More informationA Unified Bayesian Framework for Face Recognition
Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More information