Announcements. Recognition II. Computer Vision I. Example: Face Detection. Evaluating a binary classifier

Similar documents
3D Geometry for Computer Graphics. Lesson 2: PCA & SVD

Dimensionality reduction Feature selection

Introduction to local (nonparametric) density estimation. methods

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Naïve Bayes MIT Course Notes Cynthia Rudin

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Summary of the lecture in Biostatistics

Kernel-based Methods and Support Vector Machines

ENGI 3423 Simple Linear Regression Page 12-01

QR Factorization and Singular Value Decomposition COS 323

TESTS BASED ON MAXIMUM LIKELIHOOD

Chapter 9 Jordan Block Matrices

Bayes (Naïve or not) Classifiers: Generative Approach

ρ < 1 be five real numbers. The

MATH 247/Winter Notes on the adjoint and on normal operators.

Principal Component Analysis (PCA)

Simple Linear Regression

Generative classification models

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Applications of Multiple Biological Signals

Bayes Decision Theory - II

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Dimensionality Reduction and Learning

Chapter 4 Multiple Random Variables

III-16 G. Brief Review of Grand Orthogonality Theorem and impact on Representations (Γ i ) l i = h n = number of irreducible representations.

Econometric Methods. Review of Estimation

6. Nonparametric techniques

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Lecture Note to Rice Chapter 8

6.867 Machine Learning

Lecture Notes Types of economic variables

Entropies & Information Theory

An Introduction to. Support Vector Machine

Singular Value Decomposition. Linear Algebra (3) Singular Value Decomposition. SVD and Eigenvectors. Solving LEs with SVD

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Point Estimation: definition of estimators

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Dimensionality Reduction

STK4011 and STK9011 Autumn 2016

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

LECTURE 2: Linear and quadratic classifiers

Unsupervised Learning and Other Neural Networks

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

CHAPTER VI Statistical Analysis of Experimental Data

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Lecture 7: Linear and quadratic classifiers

Investigating Cellular Automata

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Lecture 3 Probability review (cont d)

Chapter 11 Systematic Sampling

Functions of Random Variables

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

X ε ) = 0, or equivalently, lim

A scalar t is an eigenvalue of A if and only if t satisfies the characteristic equation of A: det (A ti) =0

Tema 5: Aprendizaje NO Supervisado: CLUSTERING Unsupervised Learning: CLUSTERING. Febrero-Mayo 2005

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Discriminative Feature Extraction and Dimension Reduction

18.413: Error Correcting Codes Lab March 2, Lecture 8

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

Multiple Linear Regression Analysis

Lecture 3. Sampling, sampling distributions, and parameter estimation

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter 2 - Free Vibration of Multi-Degree-of-Freedom Systems - II

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Algebraic-Geometric and Probabilistic Approaches for Clustering and Dimension Reduction of Mixtures of Principle Component Subspaces

Statistics MINITAB - Lab 5

Lecture 9: Tolerant Testing

D. VQ WITH 1ST-ORDER LOSSLESS CODING

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

MOLECULAR VIBRATIONS

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 5 Properties of a Random Sample

Lecture 07: Poles and Zeros

Chapter Two. An Introduction to Regression ( )

Model Fitting, RANSAC. Jana Kosecka

means the first term, a2 means the term, etc. Infinite Sequences: follow the same pattern forever.

CHAPTER 4 RADICAL EXPRESSIONS

ECE 559: Wireless Communication Project Report Diversity Multiplexing Tradeoff in MIMO Channels with partial CSIT. Hoa Pham

Data Analysis and Dimension Reduction

ENGI 4421 Propagation of Error Page 8-01

ε. Therefore, the estimate

Multiple Choice Test. Chapter Adequacy of Models for Regression

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

CHAPTER 2. = y ˆ β x (.1022) So we can write

EE 6885 Statistical Pattern Recognition

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

Supervised learning: Linear regression Logistic regression

Transcription:

Aoucemets Recogto II H3 exteded to toght H4 to be aouced today. Due Frday 2/8. Note wll take a whle to ru some thgs. Fal Exam: hursday 2/4 at 7pm-0pm CSE252A Lecture 7 Example: Face Detecto Evaluatg a bary classfer For a detector, there are two types of errors: False Postves, False accept (e.g., o-face s detected as a face) False Negatves, False Reject (e.g., face s mssed) ROC Curve (Recever Operator Characterstc)-Plot of tradeoff betwee False Postves ad false egatves Sca wdow over mage. Classfy wdow as ether: Face No-face See also deftos of precso ad recall https://e.wkpeda.org/wk/precso_ad_recall Face dow Classfer No-face See for example, Vola-Joes face detector OpeCV Evaluatg Mult-class classfers CIFAR 0 60,000 32x32 color mages 0 classes Evaluatg Mult-class classfers Overall accuracy Cofuso Matrx Example from Coral Reef Classfcato OHER HARD CORALS Groud ruth 2008, 2009 200 (83.%) CCA.89.04.0.04 urf.40.46.0.03.03.06 4759.02 Macro.69.05.9.02.0.02.02 3285.83.0 49 Sad.5.0 Acrop.3.4.0 Pavo.25.06.0.03.60.0 Mot.4.03.0.07.42 Pocll.34.03 Port 9570.62.0 82.0.04 586.05 28.60.02 838.20.02.0.76 9967 CC u Ma Sa Ac Pa Mo Po Po cro d rop vo t cll rt A rf Estmated

Nearest Neghbor Classfer { R j } are set of trag mages. ID = arg m dst ( R, I) j j K-th Nearest Neghbor Classfcato I x 2 R x x 3 R 2 Maxmum a posteror classfer (MAP) g j (x) = P(ω j x) = P(x ω j )P(ω j ) P(x) Classfcato: Class codtoal desty ĵ = arg max g j (x) j P(x ω ) : Class codtoal desty P(ω ) : Pror of class j Posteror Curse of Dmesoalty If we wat to buld a mmum-error rate classfer the we eed a very good estmate of P(ω x) How do we do ths? Let s say our feature space s just -dmesoal ad our feature x [ 0,] Ad let s say we have 0,000 trag samples from whch to estmate our a posteror probabltes. e could estmate these probabltes usg a hstogram whch we dvded the terval to 00 evely spaced bs.. 2

O average each b would have 00 samples. e could estmate P(x ω ) as the umber of samples from class that fall the same b that falls to dvded by the total umber of samples that b. But ths pla does ot scale as we crease the dmesoalty of the feature space! Let s say our feature space s just 3-D dmesoal ad our feature x [ 0,] 3 Let s say we stll have 0,000 trag samples from whch to estmate our a posteror probabltes. If we estmate these probabltes usg a hstogram whch we dvde the volume to the same wdth bs as before Dmesoalty Reducto O average each b would oly have 0.0 samples! e re ot gog be able to estmate probabltes well Dmesoalty reducto: lear projecto A -pxel mage x R ca be projected to a low-dmesoal feature space y R m by y = x where s a by m matrx. Recogto s performed usg earest eghbor R m. How do we choose a good? Drop dmesos (feature selecto) Radom projects Prcpal compoet aalyss Lear dscrmat aalyss Idepedet compoet aalyss Or eve o-lear dmesoalty reducto How do we choose a good? 3

Egefaces: Prcpal Compoet Aalyss () Frst Prcpal Compoet Drecto of Maxmum Varace Mea Some detals: Use Sgular value decomposto, trck descrbed text to compute bass whe <<d Sgular Value Decomposto Ay m by matrx A may be factored such that A = UΣV [m x ] = [m x m][m x ][ x ] U: m by m, orthogoal matrx Colums of U are the egevectors of AA V: by, orthogoal matrx, colums are the egevectors of A A Σ: m by, dagoal wth o-egatve etres (σ, σ 2,, σ s ) wth s=m(m,) are called the called the sgular values Sgular values are the square roots of egevalues of both AA ad A A & Colums of U are correspodg Egevectors SVD Propertes I Matlab [u s v] = svd(a), ad you ca verfy that: A=u*s*v r=rak(a) = # of o-zero sgular values. U, V gve us orthoormal bases for the subspaces of A: st r colums of U: Colum space of A Last m - r colums of U: Left ullspace of A st r colums of V: Row space of A st - r colums of V: Nullspace of A For d r, the frst d colums of U provde the best d-dmesoal bass for colums of A least squares sese. Result of SVD algorthm: σ σ 2 σ s Performg wth SVD Sgular values of A are the square roots of egevalues of both AA ad A A & Colums of U are correspodg Egevectors Ad a a = [ a a2! a][ a a2! a] = AA = Covarace matrx s: Σ = =!!!! ( x µ )( x µ ) So, gorg / subtract mea mage µ from each put mage, create data matrx, ad perform th SVD o the data matrx. Commet o mages collectos A = UΣV [m x ] = [m x m][m x ][ x ] he matrx A s sometmes called the data matrx. Colums of A are vectorzed mages. So, we have m pxels ad mages. For large mages (e.g., kxk), we ofte have more > m. Usg SVD s preferred. For CIFAR, we have m=3072, ad we have =60k ad so explct form wth covarace matrx or SVD ca work. 4

h SVD Ay m by matrx A may be factored such that A = UΣV [m x ] = [m x m][m x ][ x ] If m>, the oe ca vew Σ as: ' 0 here Σ =dag(σ, σ 2,, σ s ) wth s=m(m,), ad lower matrx s (-m by m) of zeros. Alteratvely, you ca wrte: A = U Σ V I Matlab, th SVD s: [U S V] = svds(a) for recogto (Egefaces) Modelg. Gve a collecto of labeled trag mages, 2. Compute mea mage ad covarace matrx. 3. Compute k Egevectors (ote that these are mages) of covarace matrx correspodg to k largest Egevalues. (Or perform usg SVD!!) 4. Project the trag mages to the k-dmesoal Egespace. Recogto. Gve a test mage, project to Egespace. 2. Perform classfcato to the projected trag mages. Egefaces: rag Images Egefaces [ urk, Petlad 0 Mea Image Bass Images Accuracy of + K-NN Dffcultes wth Projecto may suppress mportat detal smallest varace drectos may ot be umportat Method does ot take dscrmatve task to accout typcally, we wsh to compute features that allow good dscrmato ot the same as largest varace or mmzg recostructo error. 5

Fsherfaces: Class specfc lear projecto A -pxel mage x R ca be projected to a low-dmesoal feature space y R m by P. Belhumeur, J. Hespaha, D. Kregma, Egefaces vs. Fsherfaces: Recogto Usg Class Specfc Lear Projecto, PAMI, 997, y = x where s a by m matrx. Recogto s performed usg earest eghbor R m. How do we choose a good? & Fsher s Lear Dscrmat Betwee-class scatter th-class scatter c S = ( xk µ )( xk otal scatter c S = ( xk µ here c S B = χ ( µ µ )( µ µ ) = = xk χ = xk χ c s the umber of classes µ s the mea of class χ χ s umber of samples of χ.. k µ ) )( x µ ) = S + S B χ χ 2 µ µ µ 2 If the data pots are projected by y=x ad scatter of pots s S, the the scatter of the projected pots s S & Fsher s Lear Dscrmat Computg the Fsher Projecto Matrx (Egefaces) χ χ 2 = arg max S Maxmzes projected total scatter Fsher s Lear Dscrmat FLD fld SB = arg max S Maxmzes rato of projected betwee-class to projected wth-class scatter he w are orthoormal here are at most c- o-zero geeralzed Egevalues, so m <= c- Ca be computed wth eg Matlab fld = arg max Fsherfaces = fld Sce S s rak N-c, project trag set to subspace = arg max S spaed by frst N-c prcpal compoets of the trag set. Apply FLD to N-c S B dmesoal subspace yeldg c- dmesoal feature space. S Fsher s Lear Dscrmat projects away the wth-class varato (lghtg, expressos) foud trag set. Fsher s Lear Dscrmat preserves the separablty of the classes. Harvard Face Database 0 dvduals 66 mages per perso ra o 6 mages at 5 o est o remag mages 5 o 30 o 45 o 60 o 6

Recogto Results: Lghtg Extrapolato 45 40 35 30 Error Rate 25 20 5 0 5 0 0-5 degrees 30 degrees 45 degrees Lght Drecto Correlato Egefaces Egefaces (w/o st 3) Fsherface 7