Some Interesting Problems in Pattern Recognition and Image Processing

Similar documents
PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

PCA and LDA. Man-Wai MAK

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

PCA and LDA. Man-Wai MAK

Dimensionality Reduction

Principal Component Analysis (PCA)

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Face Detection and Recognition

Introduction to Machine Learning

14 Singular Value Decomposition

Principal Component Analysis (PCA)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Machine Learning - MT & 14. PCA and MDS

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Advanced Introduction to Machine Learning CMU-10715

Example: Face Detection

Machine Learning (Spring 2012) Principal Component Analysis

Lecture: Face Recognition and Feature Reduction

ECE 661: Homework 10 Fall 2014

HST.582J/6.555J/16.456J

Principal Component Analysis and Linear Discriminant Analysis

Least Squares Optimization

Linear Dimensionality Reduction

PCA Review. CS 510 February 25 th, 2013

Extraction of Individual Tracks from Polyphonic Music

STA 414/2104: Lecture 8

Invariant Scattering Convolution Networks

Least Squares Optimization

20 Unsupervised Learning and Principal Components Analysis (PCA)

Independent Component Analysis (ICA)

Lecture: Face Recognition and Feature Reduction

Singular Value Decomposition

Independent Component Analysis. Contents

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Announcements (repeat) Principal Components Analysis

MATH36001 Generalized Inverses and the SVD 2015

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Introduction to SVD and Applications

Salt Dome Detection and Tracking Using Texture Analysis and Tensor-based Subspace Learning

Principal components analysis COMS 4771

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Numerical Methods I Singular Value Decomposition

PCA and admixture models

Least Squares Optimization

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Singular Value Decompsition

Fisher s Linear Discriminant Analysis

Principal Component Analysis CS498

The Essentials of Linear State-Space Systems

Main matrix factorizations

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Principal Component Analysis (PCA) CSC411/2515 Tutorial

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

L11: Pattern recognition principles

DS-GA 1002 Lecture notes 12 Fall Linear regression

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Tutorial on Blind Source Separation and Independent Component Analysis

Problem # Max points possible Actual score Total 120

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

Exercise Set 7.2. Skills

CS281 Section 4: Factor Analysis and PCA

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Using SVD to Recommend Movies

Math 307 Learning Goals. March 23, 2010

Digital Image Processing

Principal Component Analysis

Singular Value Decomposition and Digital Image Compression

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

15 Singular Value Decomposition

LECTURE NOTE #10 PROF. ALAN YUILLE

Gopalkrishna Veni. Project 4 (Active Shape Models)

Learning gradients: prescriptive models

Principal Component Analysis

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Regularized Discriminant Analysis and Reduced-Rank LDA

A NO-REFERENCE SHARPNESS METRIC SENSITIVE TO BLUR AND NOISE. Xiang Zhu and Peyman Milanfar

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

Principal Component Analysis

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Statistical Geometry Processing Winter Semester 2011/2012

Independent Component Analysis and Its Application on Accelerator Physics

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Linear Subspace Models

STA 414/2104: Lecture 8

PCA, Kernel PCA, ICA

Principal Components Analysis

Transcription:

Some Interesting Problems in Pattern Recognition and Image Processing JEN-MEI CHANG Department of Mathematics and Statistics California State University, Long Beach jchang9@csulb.edu University of Southern California Women in Math

Outline 1 Face Recognition - PCA 2 Bankruptcy Prediction - LDA 3 Cocktail Party Problem - BSS 4 Speech Recognition - DFT 5 Handwritten Digit Classification - Tangent 6 Image Compression - SVD and DWT 7 Smoothing and Sharpening - Convolution and Filtering

Background: Data Matrix An r-by-c gray scale digital image corresponds to an r-by-c matrix where each entry enumerates one of the 256 possible gray levels of the corresponding pixel.

Background: Data Vector Realize the data matrix by its columns and concatenate columns into a single column vector. IMAGE MATRIX VECTOR

Classification/Recognition Problem 1.5 Preprocessing Geometric normalization, Feature extraction, etc 1. Data collection Database creation a.k.a. Gallery 3. Classification Assign label to probe and assess accuracy 2. Present novel data a.k.a. Probe

Face Recognition Problem True Positive = Who is it? False Positive Gallery Probe =

Architectures Historically single-to-single Currently subspace-to-subspace (1) d ( P, x ) d (2) (, )... (3) d( P, x ) P G (1) x,,, (2) x (N) x single-to-many many-to-many G Image Set 1 P, p * Grassmann Manifold (1) X (2) X (N) X Image Set 2 q * Project Eigenfaces

Commercial Applications

A Possible Mathematical Approach Eigenfaces Image space Feature space (Adapted from Vladimir Bondarenko at University of Konstanz, ST; http://www.inf.unikonstanz.de/cgip/lehre/na 08/Lab2/5 FaceRecognition/html/myFaceRecognition.html

A Possible Mathematical Approach Classification in feature space Database in feature space (Adapted from Vladimir Bondarenko at University of Konstanz, ST; http://www.inf.unikonstanz.de/cgip/lehre/na 08/Lab2/5 FaceRecognition/html/myFaceRecognition.html

X = USV T Left singular vector Singular value Right singular vector V = [v 1,..., v r, v r+1,..., v n ] is orthogonal with v i s eigenvectors of X T X. S = diag(s 1,..., s r, 0,..., 0) is diagonal with s i s square root of eigenvalues of X T X. U = S 1 XV is orthogonal with u i s eigenvectors of XX T.

Bankruptcy prediction is the art of predicting bankruptcy and various measures of financial distress of public firms. It is a vast area of finance and accounting research. The importance of the area is due in part to the relevance for creditors and investors in evaluating the likelihood that a firm may go bankrupt 1. Form a feature vector for each firm where each entry in the feature vector is a numerical value for a certain characteristic, e.g., number of customers and annual profit. This is a two-class (bankrupt or non-bankrupt) classification problem. 1 adapted from Wikipedia

A Possible Mathematical Approach Bad projection Good projection Question: What are the characteristics of a good projection?

A Possible Mathematical Approach Bad projection Good projection Question: What are the characteristics of a good projection?

Two-Class LDA m 1 = 1 w T x, m 2 = 1 w T y n 1 n 2 x D 1 y D 2 Look for a projection w that maximizes (inter-class) distance in the projected space, and minimizes the (intra-class) distances in the projected space.

Two-Class LDA Namely, we desire a w such that w = arg max w (m 1 m 2 ) 2 S 1 + S 2, where S 1 = (w T x m 1 ) 2 and S 2 = (w T y m 2 ) 2. x D 1 y D 2 Alternatively, (with scatter matrices) w = arg max w 2 with S W = (x m i )(x m i ) T, i=1 x D i S B = (m 2 m 1 )(m 2 m 1 ) T. w T S B w w T S W w, (1)

Two-Class LDA Namely, we desire a w such that w = arg max w (m 1 m 2 ) 2 S 1 + S 2, where S 1 = (w T x m 1 ) 2 and S 2 = (w T y m 2 ) 2. x D 1 y D 2 Alternatively, (with scatter matrices) w = arg max w 2 with S W = (x m i )(x m i ) T, i=1 x D i S B = (m 2 m 1 )(m 2 m 1 ) T. w T S B w w T S W w, (1)

LDA The criterion in Equation (1) is commonly known as the generalized Rayleigh quotient, whose solution can be found via the generalized eigenvalue problem S B w = λs W w. Once we have the good projection w, we can predict whether a given bank will go bankruptcy by projecting it onto the real line using w. Multi-class LDA follows similarly.

Cocktail Party Problem (adapted from André Mouraux)

Cocktail Party Problem (adapted from André Mouraux)

Cocktail Party Problem (adapted from André Mouraux)

A Similar Problem: EEG (adapted from André Mouraux)

Commercial Applications (http://computer.howstuffworks.com/brain-computer-interface1.htm)

A Possible Mathematical Approach Decompose observed data into its noise and signal components: x (µ) = s (µ) + n (µ), or, in terms of data matrices, X = S + N. (S = signal, N = noise ) The optimal first basis vector, φ, is taken as a superposition of the data, i.e., φ = ψ 1 x (1) + + ψ P x (P) = Xψ. May decompose φ into signal and noise components φ = φ n + φ s, where φ s = Sψ and φ n = Nψ.

MNF/BBS The basis vector φ is said to have maximum noise fraction (MNF) if the ratio D(φ) = φt n φ n φ T φ is a maximum. A steepest descent method yields the symmetric definite generalized eigenproblem N T Nψ = µ 2 X T Xψ. This problem may be solved without actually forming the product matrices N T N and X T X, using the generalized SVD (gsvd). Note that the same orthonormal basis vector φ optimizes the signal-to-noise ratio. And this technique is called Blind Source Separation (BSS).

Audio (adapted from AT&T Lab Inc. - http://www.research.att.com/viewproject.cfm?prjid=49)

Audio-Visual (adapted from Project MUSSLAP - http://musslap.zcu.cz/img/audiovizualni-zpracovani-reci/schema.jpg)

Commercial Applications

A Possible Mathematical Approach Continuous F(ω) = Discrete f (ω) = c k e ikω k Z f (t)e iωt dt window Frequency Amplitude Time Time

Discrete Fourier Transform (DFT) Fourier analysis is applied to speech waveform in order to discover what frequencies are present at any given moment in the speech signal with time on the horizontal axis and frequency on the vertical. The speech recognizer has a database of several thousand such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is identified by matching it to its closest entry in the codebook.

Handwritten Digits

Handwritten Digit Classification How do we tell whether a new digit is a 4 or a 9?

Commercial Applications Santa thought to himself, only if these mails can go to the right place according to their zip code.

A Possible Mathematical Approach Imagine a high-d surface (red curve) where all 4 s live on and a high-d surface (blue curve) where all 9 s live on.

Manifold Learning Create a Tangent Space of the 4 s at F and create a Tangent Space of the 9 s at N.

Which Distance?

Classification So, is it a 4 or a 9?

High-Resolution Image The objective of image compression is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form.

Low-Resolution Image

Compression with SVD If we know the correct rank of A, e.g., by inspecting the singular values, then we can remove the noise and compress the data by approximating A by a matrix of the correct rank. One way to do this is to truncate the singular value expansion: Theorem (Approximation Theorem) If A k = k i=1 σ i u i v T i (1 k r) then A k = min rank(b)=k A B 2 and A k = min rank(b)=k A B F.

Compression with SVD The error term of rank k approximation is given by the (k + 1) th singular value σ k+1. (a) full rank (rank 480) (b) rank 10, rel. err. = 0.0551 (c) rank 50, rel. err. = 0.0305 (d) rank 170, rel. err. = 0.0126

Compression with DWT X(b, a) = 1 a Original Image x(t)ψ ( t b a ) dt Approximation coef. at level 1 10 20 30 40 50 60 70 80 90 20 40 60 80 100 120 dwt Synthesized Image Image Selection idwt Decomposition at level 1

Convolution as Filters f (x, y) s(x, y) = a b s(m, n)f (x m, y n) m= a n= b s f 0 0 0 0 0 1 2 3 0 0 0 0 0 4 5 6 0 0 1 0 0 7 8 9 0 0 0 0 0 0 0 0 0 0 zero-padded s rotated f cropped result 0 0 0 0 0 0 0 0 0 9 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 1 0 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 4 5 6 0 0 Convolution 0 0 0 0 1 0 0 0 0 0 0 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Smoothing with Low-pass Filters Filtering with k k low-pass filters 1 1 1 k 2...... 1 1 (a) original (b) 3 3 (c) 5 5 (d) 9 9 (e) 15 15 (f) 35 35

Smoothing with Median Filter (a) (b) (c) Figure: (a) X-ray image of circuit board corrupted by salt-and-pepper noise. (b) Noise reduction with a 3 3 averaging filter. (c) Noise reduction of a 3 3 median filter.

Sharpening with High-pass Filters The simplest isotropic filter (direction independent) derivative operator is the discrete Laplacian of two variables: 2 f (x, y) = f (x + 1, y) + f (x 1, y) + f (x, y + 1) + f (x, y 1) 4f (x, y). This equation can be implemented using the filter mask (x, y 1) 0 1 0 (x 1, y) (x, y) (x + 1, y) 1 4 1. (x, y + 1) 0 1 0 Blurred image Filter mask 1 Laplacian enhanced image