PCA FACE RECOGNITION

Similar documents
Face detection and recognition. Detection Recognition Sally

COS 429: COMPUTER VISON Face Recognition

Reconnaissance d objetsd et vision artificielle

Lecture: Face Recognition

2D Image Processing Face Detection and Recognition

Face recognition Computer Vision Spring 2018, Lecture 21

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Lecture 13 Visual recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Boosting: Algorithms and Applications

Deriving Principal Component Analysis (PCA)

Lecture 17: Face Recogni2on

Example: Face Detection

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture 17: Face Recogni2on

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Image Analysis. PCA and Eigenfaces

ECE 661: Homework 10 Fall 2014

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Advanced Introduction to Machine Learning CMU-10715

CITS 4402 Computer Vision

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

Principal Component Analysis

Linear Subspace Models

Principal Component Analysis

20 Unsupervised Learning and Principal Components Analysis (PCA)

What is Principal Component Analysis?

PCA, Kernel PCA, ICA

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Dr. Ulas Bagci

CS 4495 Computer Vision Principle Component Analysis

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Two-Layered Face Detection System using Evolutionary Algorithm

Eigenimaging for Facial Recognition

Face Detection and Recognition

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Covariance and Principal Components

Principal Component Analysis (PCA)

Eigenface-based facial recognition

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Dimensionality reduction

CS 559: Machine Learning Fundamentals and Applications 5 th Set of Notes

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Lecture 7: Con3nuous Latent Variable Models

Unsupervised Learning: K- Means & PCA

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Pattern Recognition 2

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Introduction to Machine Learning

Lecture: Face Recognition and Feature Reduction

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

STA 414/2104: Lecture 8

Lecture: Face Recognition and Feature Reduction

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

1 Principal Components Analysis

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Face Recognition and Biometric Systems

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

Face Recognition Using Eigenfaces

Machine Learning 2nd Edition

CS 3710: Visual Recognition Describing Images with Features. Adriana Kovashka Department of Computer Science January 8, 2015

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Visual Object Detection

Eigenfaces. Face Recognition Using Principal Components Analysis

Learning with multiple models. Boosting.

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 18: Principal Component Analysis (PCA) Dr. Yanjun Qi. University of Virginia

ISSN: Jurnal. Ilmiah. eknik. omputer. Kelompok Kellmuan Teknlk Komputer Sekolah Teknlk Elektro a Informatlka Instltut Teknologl Bandung

Classification: The rest of the story

Machine Learning for Signal Processing Detecting faces in images

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Advanced Machine Learning & Perception

Templates, Image Pyramids, and Filter Banks

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Recap: edge detection. Source: D. Lowe, L. Fei-Fei

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Statistical Pattern Recognition

Unsupervised Learning

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Non-parametric Classification of Facial Features

An overview of Boosting. Yoav Freund UCSD

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

STA 414/2104: Lecture 8

Learning theory. Ensemble methods. Boosting. Boosting: history

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Mathematical foundations - linear algebra

Machine learning for pervasive systems Classification in high-dimensional spaces

Expectation Maximization

W vs. QCD Jet Tagging at the Large Hadron Collider

Transcription:

PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides.

Goal of Principal Components Analysis We wish to explain/summarize the underlying variance-covariance structure of a large set of variables through a few linear combinations of these variables.

Rotate Coordinate Axes Measure M random variables X 1,,X in the N-dimensional Cartesian coordinate system. Find N orthogonal axes in the directions of greatest variability. M M > N or M < N x_2 x_1 This is accomplished by rotating the original axes.

Algebraic Interpretation (1D) Given M points in an N dimensional space how does one project on to a (say) one-dimensional space? Choose the line that fits the data so that the points are maximally spread out along the line.

Assume the line passed through zero, which means, the mean of all points is already subtracted. We want the axes x such that to the covariance (zero mean) of the points (given) is decreasing as we go along the axes. Bx is (MxN)(Nx1) As we go from the first x to the N-s x each axis correspond to a lesser variance of the points in this direction. The last x corresponds to TLS minimizing the distance to the rest of the space.

Algebraic Solution The algebraic solution starts with the relation below and have N solutions in x. (L2 norm) x T B T Bx subject to x T x = 1 orthogonal x-s B is the matrix with points along the rows, MxN,... nr. of points / coor. per point T (point i)... x the unknown line (column vector), Nx1.

Algebraic Solution Rewriting this: x T B T Bx = e = e x T x = x T (ex) <=> x T (B T Bx ex) = 0 e is a scalar The value of x T B T Bx is obtained each time satisfying B T Bx=ex x x = 1 Find the e-s and associated x-s such that the matrix B T B when applied to x yields same x, scaled by e. x are eigenvectors and e are eigenvalues All eigenvectors are mutually orthogonal and if distinct, form a new N-dimensional basis. T

Problem: Size of Covariance Matrix A Each data point has N coordinates T and the covariance matrix is B B = A with the size of covariance matrix A being NxN and the number of eigenvectors is N. Example: N = 256x256 pixels = 65536 in vector form the size of A will be 65536 x 65536 and the number of eigenvectors is 65536. Typically, only 20-30 eigenvectors suffice. So, this method is very inefficient!

Efficient Computation of Eigenvectors If B is MxN and M<<N then A=B T B is NxN >> MxM M number of images, N number of coor. per point use BB T instead; eigenvector of BB T is easily converted to that of B T B (BB T ) y = e y => B T (BB T ) y = e (B T y) => (B T B)(B T y) = e (B T y) => B T y is the eigenvector of B T B

PCA Ignoring Eigenvectors You can decide to ignore the components of lesser significance. You will lose some information, but if the eigenvalues are small, you don t lose more than 2-5%. N dimensions in your data calculate N eigenvectors and eigenvalues choose only the first p eigenvectors final data set has only p dimensions. The matrix B goes from M x N to M x p where M is the number of points.

2D example of PCA what we have to achieve

Covariance values are not affected by subtracting the mean values. Step 1

Step 2 Calculate the 2x2 covariance matrix.616555556.615444444.615444444.716555556 Since the nondiagonal elements in this covariance matrix are all positive, we should expect that both the x and y variables increase together.

Step 3 Calculate the eigenvectors and eigenvalues of the covariance matrix. eigenvalues 1.28402771.049083398 eigenvectors.677873399 -.735178656.735178656.677873399 first second The eigenvalues are in decreasing order.

Principal components overlayed. Here the mean is still substracted.

1D Reconstruction Along the larger eigenvector.

Face Recognition Digital photography Surveillance Album organization Person tracking/id. Emotions and expressions Security/warfare Teleconferencing etc.

Space of Faces An image is a point in a high dimensional space. For example: an N x M image is a point in R NM a point in the vectorized space. [Thanks to Chuck Dyer, Steve Seitz, Nishino]

Image space Face space a linear approach Computes k-dim subspace such that the projection of the data points onto the subspace has the largest variance among all k-dim subspaces. Maximize the scatter of the training images in face space.

Eigenfaces [Turk and Pentland 91] Images in the possible set The original vector space is Z dimensional. {xˆ} are highly correlated. Compress them to a low-dimensional subspace that captures key appearance characteristics of the visual features. Use PCA for estimating the subspace. Two faces are compared in this subspace by measuring the euclidean distance between them. Among the first successful algorithms outside computer vision. It was a linear approach. Was improved later.

Projecting onto the Eigenfaces v is Zx1 dimensional i The eigenfaces v 1,..., v K span the space of faces A face is converted to eigenface coordinates by

Training Algorithm here N images and Z dim. vec. space (not M images!!) 1. Align training images x 1, x 2,, x N Note that each image is formulated into a long vector! 2. Compute average face u = 1/N Σ x i 3. Compute the difference image φ i = x i u i = 1,..., N

Algorithm Each of the N "points" is a column not a row!! 4. Compute the covariance matrix (total scatter matrix) S T = (1/N) Σ φ i φ i T = BB T B=[φ 1, φ 2 φ N ]. 5. Compute the eigenvectors of the covariance matrix S T. 6. Compute training projections in subspace a1, a2... a k << Z Testing 1. Take query image X. 2. Project X into eigenface space, W = {eigenfaces}, and compute projections ω = W(1...k)(X u). 3. Compare projections ω with all training N projections. k

Reconstruction and Errors k = 4 k = 200 k = 400 Only selecting the top k eigenfaces reduces the dimensionality. Fewer eigenfaces result in more information loss, and less discrimination between faces.

Limitations PCA assumes that the data has a distribution which has mean µ, covariance matrix Σ. Example: The shape of this dataset is not well described by its principal components. Credit slide: S. Lazebnik

The spaces of faces is not convex.

The spaces of faces is not convex. The average of two faces is not another face.

How Humans Detect Faces? We do not know yet! Some Conjectures: Memory-prediction model Match faces with the face model in memory. Parallel computing Detect faces at multiple location/scale combination.

Face Detection in Computers Basic Idea: Slide windows of different sizes across image. At each location match the window to a face model. I.1

Basic Framework For each window Extract Match Features F Face Model Yes / No I.12 Features: Which features represent faces well? Classifier: How to construct/match the face model?

Characteristics of Good Features Discriminate Face/Non-Face I.7 I.10 I.8 I.12 I.9 I.11 Extremely Fast to Compute Need to evaluate tens of thousands windows in an image.

The Viola/Jones Face Detector P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004. A paradigmatic method for real-time object detection. Training is slow, but detection is very fast. Three ideas interact Integral images for fast feature evaluation. Boosting for feature selection. Attentional cascade for fast rejection of nonface windows.

Integral Image A table that holds the sum of all pixel values to the left and top of a given pixel, inclusive. For example: 98 110 121 125 122 129 99 110 120 116 116 129 97 109 124 111 123 134 98 112 132 108 123 133 97 113 147 108 125 142 95 111 168 122 130 137 96 104 172 130 126 130 Image 98 208 329 454 576 705 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 392 833 1330 1790 2274 2799 489 1043 1687 2255 2864 3531 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 Integral Image

Integral Image A table that holds the sum of all pixel values to the left and top of a given pixel, inclusive. For example: 98 110 121 125 122 129 99 110 120 116 116 129 97 109 124 111 123 134 98 112 132 108 123 133 97 113 147 108 125 142 95 111 168 122 130 137 96 104 172 130 126 130 Image 98 208 329 454 576 705 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 392 833 1330 1790 2274 2799 489 1043 1687 2255 2864 3531 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 Integral Image

Integral Image A table that holds the sum of all pixel values to the left and top of a given pixel, inclusive. For example: 98 110 121 125 122 129 99 110 120 116 116 129 97 109 124 111 123 134 98 112 132 108 123 133 97 113 147 108 125 142 95 111 168 122 130 137 96 104 172 130 126 130 Image 98 208 329 454 576 705 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 392 833 1330 1790 2274 2799 489 1043 1687 2255 2864 3531 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 Integral Image

Integral Image A table that holds the sum of all pixel values to the left and top of a given pixel, inclusive. For example: 98 110 121 125 122 129 99 110 120 116 116 129 97 109 124 111 123 134 98 112 132 108 123 133 97 113 147 108 125 142 95 111 168 122 130 137 96 104 172 130 126 130 Image 98 208 329 454 576 705 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 392 833 1330 1790 2274 2799 489 1043 1687 2255 2864 3531 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 Integral Image

Summation Within a Rectangle Fast summations of arbitrary rectangles using integral images. 98 110 121 125 122 129 99 110 120 116 116 129 97 109 124 111 123 134 98 112 132 108 123 133 97 113 147 108 125 142 95 111 168 122 130 137 96 104 172 130 126 130 98 208 329 454 576 705 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 392 833 1330 1790 2274 2799 489 1043 1687 2255 2864 3531 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 Image Integral Image (II)

Summation Within a Rectangle Fast summations of arbitrary rectangles using integral images. 98 110 121 125 122 129 98 208 329 454 576 705 99 110 120 116 116 129 197 417 658 899 1137 1395 97 109 124 111 123 134 294 623 988 1340 1701 2093 98 112 132 108 123 133 392 833 1330 1790 2274 2799 97 113 147 108 125 142 489 1043 1687 2255 2864 3531 95 111 168 122 130 137 96 104 172 130 126 130 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 P Sum = II P + Image = 3490 + Integral Image (II)

Summation Within a Rectangle Fast summations of arbitrary rectangles using integral images. 98 110 121 125 122 129 98 208 329 454 576 705 99 110 120 116 116 129 97 109 124 111 123 134 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 Q 98 112 132 108 123 133 392 833 1330 1790 2274 2799 97 113 147 108 125 142 489 1043 1687 2255 2864 3531 95 111 168 122 130 137 96 104 172 130 126 130 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 P Image Sum = II P II Q + = 3490 1137 + Integral Image (II)

Summation Within a Rectangle Fast summations of arbitrary rectangles using integral images. 98 110 121 125 122 129 98 208 329 454 576 705 99 110 120 116 116 129 97 109 124 111 123 134 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 Q 98 112 132 108 123 133 392 833 1330 1790 2274 2799 97 113 147 108 125 142 489 1043 1687 2255 2864 3531 95 111 168 122 130 137 96 104 172 130 126 130 S 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 P Image Integral Image (II) Sum = II P II Q II S + = 3490 1137 1249 +

Summation Within a Rectangle Can be computed in constant time with only 4 references 98 110 121 125 122 129 98 208 329 454 576 705 99 110 120 116 116 129 97 109 124 111 123 134 R 197 417 658 899 1137 1395 294 623 988 1340 1701 2093 Q 98 112 132 108 123 133 392 833 1330 1790 2274 2799 97 113 147 108 125 142 489 1043 1687 2255 2864 3531 95 111 168 122 130 137 96 104 172 130 126 130 S 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 P Image Integral Image (II) Sum = II P II Q II S + II R = 3490 1137 1249 + 417 = 1521

Boosting Designing a strong classifier from a set of weak classifier. Background Decision boundary Features Computer screen In some feature space.

Boosting Defines a classifier using an additive model: Strong classifier Features vector Weight Weak classifier We need to define a family of weak classifiers. form a family of weak classifiers. A simple algorithm for learning robust classifiers.

Boosting - mathematics Example of a weak learner value of rectangle feature h ( x) j 1 if f j( x) j 0 otherwise threshold Final strong classifier T 1 1 h( x) hx ( ) 2 0 otherwise t 1 t t t 1 t T

A weak classifier Four kind of rectangle filters Value = (pixels in white area) (pixels in black area) called Haar filters (features). Credit slide: S. Lazebnik

Haar Response using Integral Image T 98 110 121 125 122 129 99 110 120 116 116 129 R 98 208 329 454 576 705 197 417 658 899 1137 1395 Q 97 109 124 111 123 134 294 623 988 1340 1701 2093 98 112 132 108 123 133 392 833 1330 1790 2274 2799 97 113 147 108 125 142 489 1043 1687 2255 2864 3531 95 111 168 122 130 137 96 104 172 130 126 130 S 584 1249 2061 2751 3490 4294 680 1449 2433 3253 4118 5052 P O Image Integral Image (II) V A = (pixels in white area) (pixels in black area) = (II O II T + II R II S ) (II P II Q + II T II O ) = (2061 329+98 584) (3490 576+329 2061) = 64

Face Detection at Different Scales Use filters of different sizes to find faces at corresponding scale

Weak classifier will behave this way......evaluate each rectangle filter on each window and on each example. 1 (,1) x1 ( x2,1) ( x3,0) x4 (,0) ( 5,0) x 6 ( x,0) 0.8 0.7 0.2 0.3 0.8 0.1.. ( x, y ) n n h ( x) j 1 if f j( x) j 0 otherwise threshold a weak classifier, total of T weak classifiers

Viola-Jones detector: features Considering all possible filter parameters: position, scale(1.25), and type. 180,000+ possible features associated on 12 scales. At base 24 x 24 windows. At learning a 24x24 window is a face if it is a positive window and a nonface if it is a negative window, Which subset of these features should we use to determine if a window has a face?

The Viola-Jones detector used a simple boosting method the AdaBoost process. For a single feature, T weak classifiers. (Freund and Schapire, 1995) Learning Negative (more) and positive image examples. Total n images. For a t = 1,...,T find from the t weak classifier which has the minimum training error. At each iteration: The weights of incorrectly classified example are decreased and the correctly classified example increased. The t+1 step tries to correct wrongly classified images. The error decreases almost every step. Weak Class. h_{t+1}(x) The final strong classifier have the T weights inversely proportional to the t = 1,...,T training errors. Testing New images.

Boosting for face detection A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in 14084. Not good enough! We want 1 in 1,000,000 11 classifiers ~10 operations. Receiver operating characteristic (ROC) curve

Boosting: pros and cons Advantages Integrates classification with feature selection. Complexity of training is linear in the number of training examples. Flexibility in the choice of weak learners and boosting schemes. Easy to implement. Disadvantages Needs many training (pos./neg.) examples. Often found to work less well than alternative discriminative classifier, like support vector machine (SVM), especially for many class problems. Slide credit: S. Lazebnik

Cascading classifiers for detection Form a cascade with low false negative rates early on. Apply less accurate but faster classifiers first to discard windows that clearly appear to be negative. Kristen Grauman

Attentional cascade We start with simple classifiers which reject a few of the negative windows while detecting almost all positive windows. Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on. The classifier have progressively lower false positive rates. The detection and false positive rates of the cascade are found by multiplying the individual rates. -6 example: detection rate 0.9 false positive rate ~10 can be achieved by a 10-stage cascade where each stage d.r. 0.99 and f.p.r. 0.3.

Viola-Jones detector: summary Train cascade of classifiers with AdaBoost Faces New image Non-faces Selected features, thresholds, and weights 384x288 new faces [Implementation available in OpenCV: http://www.intel.com/technology/computing/opencv/] Kristen Grauman

The implemented system Training Data 4916 faces All frontal, rescaled to 24x24 pixels per face 350 million non-faces in 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose Real-time detector using 38 layer cascade, total of 6060 features About a week of training. (~2002, 466Mhz) (Most slides from Paul Viola)

The two curves mean different amount of windows examined. 75million/18million In each layer max. 6000 non-faces were collected. First layer 2 features; rejects 50% non-faces, accepts close to 100% faces. Second layer 10 features; 80% non-faces, ~100% faces. Third and fourth layer 25 features... Average of 10 features evaluated per window on test set.

Output of VJ Face Detector: Test Images

Facial Feature Localization Profile Detection Male vs. female

Face recognition is far from perfect. In a face is moved, say, 30 degrees off frontal, the performance decreases a lot. There are many face recognition system by now, e.g., face recognition in secure entrance. They are much faster and with many more faces in the database. But they are not perfect and, say, the first 20 frontal face images are examed for a querry.