LECTURE 9: Principal Components Analysis

Similar documents
Lecture 7: Linear and quadratic classifiers

Lecture 12: Multilayer perceptrons II

Kernel-based Methods and Support Vector Machines

Objectives of Multiple Regression

Supervised learning: Linear regression Logistic regression

Generative classification models

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

15-381: Artificial Intelligence. Regression and neural networks (NN)

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Regression and the LMS Algorithm

Dimensionality reduction Feature selection

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Model Fitting, RANSAC. Jana Kosecka

Unsupervised Learning and Other Neural Networks

Introduction to local (nonparametric) density estimation. methods

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

CSE 5526: Introduction to Neural Networks Linear Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression

Point Estimation: definition of estimators

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Sampling Theory MODULE V LECTURE - 14 RATIO AND PRODUCT METHODS OF ESTIMATION

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Classification : Logistic regression. Generative classification model.

Chapter Two. An Introduction to Regression ( )

Binary classification: Support Vector Machines

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Lecture 10: Dimensionality reduction

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Linear regression (cont) Logistic regression

9.1 Introduction to the probit and logit models

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Simple Linear Regression

Econometric Methods. Review of Estimation

Support vector machines II

St John s College. Preliminary Examinations July 2014 Mathematics Paper 1. Examiner: G Evans Time: 3 hrs Moderator: D Grigoratos Marks: 150

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Linear regression (cont.) Linear methods for classification

ECON 5360 Class Notes GMM

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Line Fitting and Regression

Chapter 10 Two Stage Sampling (Subsampling)

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

STK4011 and STK9011 Autumn 2016

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

Lecture Notes Types of economic variables

Chapter 14 Logistic Regression Models

Lecture 3. Least Squares Fitting. Optimization Trinity 2014 P.H.S.Torr. Classic least squares. Total least squares.

LECTURE 2: Linear and quadratic classifiers

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

TESTS BASED ON MAXIMUM LIKELIHOOD

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5


BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

ε. Therefore, the estimate

Lecture 1: Introduction to Regression

Maximum Likelihood Estimation

LECTURE 21: Support Vector Machines

Some Applications of the Resampling Methods in Computational Physics

Summary of the lecture in Biostatistics

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

Lecture 1: Introduction to Regression

ESS Line Fitting

Announcements. Recognition II. Computer Vision I. Example: Face Detection. Evaluating a binary classifier

Spring Ammar Abu-Hudrouss Islamic University Gaza

Overcoming Limitations of Sampling for Aggregation Queries

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

Study of Correlation using Bayes Approach under bivariate Distributions

Lecture 2: The Simple Regression Model

QR Factorization and Singular Value Decomposition COS 323

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

Lecture 3. Sampling, sampling distributions, and parameter estimation

Chapter 5. Curve fitting

Correlation and Regression Analysis

Nonparametric Techniques

L5 Polynomial / Spline Curves

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Arithmetic Mean and Geometric Mean

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Lecture 3 Probability review (cont d)

7.0 Equality Contraints: Lagrange Multipliers

FACE RECOGNITION BASED ON OPTIMAL KERNEL MINIMAX PROBABILITY MACHINE

3D Geometry for Computer Graphics. Lesson 2: PCA & SVD

Transcription:

LECURE 9: Prcpal Compoets Aalss he curse of dmesoalt Dmesoalt reducto Feature selecto vs. feature etracto Sal represetato vs. sal classfcato Prcpal Compoets Aalss Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

he curse of dmesoalt () he curse of dmesoalt A term coed Bellma 96 Refers to the prolems assocated wth multvarate data aalss as the dmesoalt creases We wll llustrate these prolems wth a smple eample Cosder a 3-class patter recoto prolem A smple approach would e to Dvde the feature space to uform s Compute the rato of eamples for each class at each ad, For a ew eample, fd ts ad choose the predomat class that I our to prolem we decde to start wth oe sle feature ad dvde the real le to 3 semets After do ths, we otce that there ests too much overlap amo the classes, so we decde to corporate a secod feature to tr ad mprove separalt Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

he curse of dmesoalt () We decde to preserve the raulart of each as, whch rases the umer of s from 3 ( D) to 3 9 ( D) At ths pot we eed to make a decso: do we mata the dest of eamples per or do we keep the umer of eamples had for the oe-dmesoal case? Choos to mata the dest creases the umer of eamples from 9 ( D) to 7 ( D) Choos to mata the umer of eamples results a D scatter plot that s ver sparse Costat dest Costat # eamples ov to three features makes the prolem worse: he umer of s rows to 3 3 7 For the same dest of eamples the umer of eeded eamples ecomes 8 For the same umer of eamples, well, the 3D scatter plot s almost empt 3 Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 3

he curse of dmesoalt (3) Ovousl, our approach to dvde the sample space to equall spaced s was qute effcet here are other approaches that are much less susceptle to the curse of dmesoalt, ut the prolem stll ests How do we eat the curse of dmesoalt? B corporat pror kowlede B provd creas smoothess of the taret fucto B reduc the dmesoalt I practce, the curse of dmesoalt meas that, for a ve sample sze, there s a mamum umer of features aove whch the performace of our classfer wll derade rather tha mprove I most cases, the addtoal formato that s lost dscard some features s (more tha) compesated a more accurate mapp the lowerdmesoal space performace dmesoalt Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 4

he curse of dmesoalt (4) here are ma mplcatos of the curse of dmesoalt Epoetal rowth the umer of eamples requred to mata a ve sampl dest For a dest of eamples/ ad D dmesos, the total umer of eamples s D Epoetal rowth the complet of the taret fucto (a dest estmate) wth creas dmesoalt A fucto defed hh-dmesoal space s lkel to e much more comple tha a fucto defed a lower-dmesoal space, ad those complcatos are harder to dscer Fredma hs meas that, order to lear t well, a more comple taret fucto requres deser sample pots! What to do f t a t Gaussa? For oe dmeso a lare umer of dest fuctos ca e foud tetooks, ut for hh-dmesos ol the multvarate Gaussa dest s avalale. oreover, for larer values of D the Gaussa dest ca ol e hadled a smplfed form! Humas have a etraordar capact to dscer patters ad clusters, ad 3-dmesos, ut these capaltes derade drastcall for 4 or hher dmesos Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 5

Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 6 Dmesoalt reducto () wo approaches are avalale to perform dmesoalt reducto Feature etracto: creat a suset of ew features comatos of the est features Feature selecto: choos a suset of all the features (the oes more formatve) he prolem of feature etracto ca e stated as Gve a feature space R fd a mapp f():r R wth < such that the trasformed feature vector R preserves (most of) the formato or structure R. A optmal mapp f() wll e oe that results o crease the mmum proalt of error hs s, a Baes decso rule appled to the tal space R ad to the reduced space R eld the same classfcato rate etracto feature selecto feature f

Dmesoalt reducto () I eeral, the optmal mapp f() wll e a o-lear fucto However, there s o sstematc wa to eerate o-lear trasforms he selecto of a partcular suset of trasforms s prolem depedet For ths reaso, feature etracto s commol lmted to lear trasforms: W hs s, s a lear projecto of OE: Whe the mapp s a o-lear fucto, the reduced space s called a mafold lear feature etracto w w w w w w L L O w w w We wll focus o lear feature etracto for ow, ad revst o-lear techques whe we cover mult-laer perceptros Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 7

Sal represetato versus classfcato he selecto of the feature etracto mapp f() s uded a ojectve fucto that we seek to mamze (or mmze) Deped o the crtera used the ojectve fucto, feature etracto techques are rouped to two cateores: Sal represetato: he oal of the feature etracto mapp s to represet the samples accuratel a lower-dmesoal space Classfcato: he oal of the feature etracto mapp s to ehace the class-dscrmator formato the lower-dmesoal space Wth the realm of lear feature etracto, two techques are commol used Prcpal Compoets Aalss (PCA) uses a sal represetato crtero Lear Dscrmat Aalss (LDA) uses a sal classfcato crtero Feature Sal represetato Classfcato Feature Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 8

Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 9 Prcpal Compoets Aalss, PCA () he ojectve of PCA s to perform dmesoalt reducto whle preserv as much of the radomess (varace) the hh-dmesoal space as possle Let e a -dmesoal radom vector, represeted as a lear comato of orthoormal ass vectors [ϕ ϕ... ϕ ] as Suppose we choose to represet wth ol (<) of the ass vectors. We ca do ths replac the compoets [,, ] wth some pre-selected costats he represetato error s the We ca measure ths represetato error the mea-squared matude of Our oal s to fd the ass vectors ϕ ad costats that mmze ths mea-square error j j where j ˆ() ( ) ˆ() () [ ] ( )( ) ( ) [ ] j j j j E E () E () ε

Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst Prcpal Compoets Aalss, PCA () As we have doe earler the course, the optmal values of ca e foud comput the partal dervatve of the ojectve fucto ad equat t to zero herefore, we wll replace the dscarded dmesos s ther epected value (a tutve soluto) he mea-square error ca the e wrtte as where Σ s the covarace matr of We seek to fd the soluto that mmzes ths epresso suject to the orthoormalt costrat, whch we corporate to the epresso us a set of Larae multplers λ Comput the partal dervatve wth respect to the ass vectors So ϕ ad λ are the eevectors ad eevalues of the covarace matr Σ ( ) [ ] [ ] ( ) [ ] E E E ( ) [ ] ( ) ( ) [ ] ( )( ) [ ] Σ E[] E[] E E[ ] E[ ] E ] E[ E () ε ) λ ( Σ () ε ( ) ( ) ( ) A A A A d d OE : λ Σ λ Σ ) λ ( Σ () ε smmmetrc A s f

Prcpal Compoets Aalss, PCA (3) We ca the epress the sum-square error as ε () Σ λ λ I order to mmze ths measure, λ wll have to e smallest eevalues herefore, to represet wth mmum sum-square error, we wll choose the eevectors ϕ correspod to the larest eevalues λ. PCA dmesoalt reducto he optmal* appromato of a radom vector R a lear comato of (<) depedet vectors s otaed project the radom vector oto the eevectors ϕ correspod to the larest eevalues λ of the covarace matr Σ *optmalt s defed as the mmum of the sum-square matude of the appromato error Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

Prcpal Compoets Aalss, PCA (4) OES Sce PCA uses the eevectors of the covarace matr Σ, t s ale to fd the depedet aes of the data uder the umodal Gaussa assumpto For o-gaussa or mult-modal Gaussa data, PCA smpl de-correlates the aes he ma lmtato of PCA s that t does ot cosder class separalt sce t does ot take to accout the class lael of the feature vector PCA smpl performs a coordate rotato that als the trasformed aes wth the drectos of mamum varace here s o uaratee that the drectos of mamum varace wll cota ood features for dscrmato Hstorcal remarks Prcpal Compoets Aalss s the oldest techque multvarate aalss PCA s also kow as the Karhue-Loève trasform (commucato theor) PCA was frst troduced Pearso 9, ad t epereced several modfcatos utl t was eeralzed Loève 963 Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst

PCA eample () I ths eample we have a three-dmesoal Gaussa dstruto wth the follow parameters he three pars of prcpal compoet projectos are show elow µ [ 5 ] 5 ad Σ 7 4 4 7 4 otce that the frst projecto has the larest varace, followed the secod projecto Also otce that the PCA projectos de-correlates the as (we kew ths sce Lecture 3, thouh) 3 8 6 4 - -4-6 -8 5 - -5 5 5 5 8 3 5 3 6 4-5 - -5 - -5 - -5 5 5-5 - -5 5 5 - -8-6 -4-4 6 8 Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 3

PCA eample () hs eample shows a projecto of a three-dmesoal data set to two dmesos Itall, ecept for the eloato of the cloud, there s o apparet structure the set of pots Choos a approprate rotato allows us to uvel the uderl structure. (You ca thk of ths rotato as "walk aroud" the three-dmesoal set, look for the est vewpot) PCA ca help fd such uderl structure. It selects a rotato such that most of the varalt wth the data set s represeted the frst few dmesos of the rotated data I our three-dmesoal case, ths ma seem of lttle use However, whe the data s hhl multdmesoal ( s of dmesos), ths aalss s qute powerful Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 4

PCA eample (3) Compute the prcpal compoets for the follow two-dmesoal dataset X(, ){(,),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)} Let s frst plot the data to et a dea of whch soluto we should epect SOLUIO ( had) he (ased) covarace estmate of the data s: 6.5 4.5 Σ 4.5 3.5 he eevalues are the zeros of the characterstc equato 6.5 - λ 4.5 Σ v λv Σ λi λ 9.34; λ 4.5 3.5 - λ he eevectors are the solutos of the sstem 8 6 4.4; v v 4 6 8 6.5 4.5 6.5 4.5 4.5 v 3.5 v 4.5 v 3.5 v λv λv λv λv v.8 v.59 v -.59 v.8 HI: o solve each sstem mauall, frst assume that oe of the varales s equal to oe (.e. v ), the fd the other oe ad fall ormalze the vector to make t ut-leth Itroducto to Patter Aalss Rcardo Guterrez-Osua eas A& Uverst 5