LEC 3: Fisher Discriminant Analysis (FDA)

Size: px

Start display at page:

Download "LEC 3: Fisher Discriminant Analysis (FDA)"

Neal Armstrong
6 years ago
Views:

1 LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016

2 Outline Motivation: PCA is unsupervised which does not use training labels Variance is not always useful for classification FDA: a supervised dimensionality reduction approach 2-class FDA Multiclass FDA Comparison between PCA and FDA

3 Two-class FDA See Prof. Olga Veksler s slides at Dr. Guangliang Chen Mathematics & Statistics, San José State University 3/20

4 Two-class FDA (a summary) The optimal discriminatory direction is v = S 1 w (µ 1 µ 2 ) (plus normalization) It is the solution of where max v: v =1 v T S b v v T S w v ( µ 1 µ 2 ) 2 s s2 2 S b = (µ 1 µ 2 )(µ 1 µ 2 ) T S w = S 1 + S 2, S i = (x µ i )(x µ i ) T x Class i Dr. Guangliang Chen Mathematics & Statistics, San José State University 4/20

5 Experiment (2 digits) MNIST handwritten digits 0 and 1 4 PCA (95%) + FDA 10 PCA Dr. Guangliang Chen Mathematics & Statistics, San José State University 5/20

6 How to extend to c 3 classes? Let s start by finding the most discriminatory direction. For any v, the total within-class scatter in the v space is s 2 i = v T S i v = v T ( Si ) v = v T S w v where the S i are defined in the same way as before. Dr. Guangliang Chen Mathematics & Statistics, San José State University 6/20

7 To define the between-class scatter in the v space, we need to introduce the global center of the training data µ = 1 xi = 1 ni µ i, n n and its projection onto v: µ = v T µ = 1 n yi = 1 n ni µ i Dr. Guangliang Chen Mathematics & Statistics, San José State University 7/20

8 The between-class scatter in the v space is defined as n i ( µ i µ) 2 = n i v T (µ i µ)(µ i µ) T v i i = v T ( i n i (µ i µ)(µ i µ) T ) v = v T S b v. We have thus arrived at the same kind of problem max v: v =1 v T S b v v T S w v ni ( µ i µ) 2 s 2 i Dr. Guangliang Chen Mathematics & Statistics, San José State University 8/20

9 The solution is given by the largest eigenvector of S 1 w S b : S 1 w S b v = λ 1 v. It is also a generalized eigenvector: S b v = λ 1 S w v. However, the formula v = S 1 w (µ 1 µ 2 ) is no longer valid. Dr. Guangliang Chen Mathematics & Statistics, San José State University 9/20

10 Dr. Guangliang Chen Mathematics & Statistics, San José State University 10/20

11 The connection to 2-class FDA Proposition. When c = 2, we have n i ( µ i µ) 2 = n 1n 2 n ( µ 1 µ 2 ) 2 i and. S b = n 1n 2 n (µ 2 µ 1 )(µ 2 µ 1 ) T This implies that the criterion two-class FDA. i n i( µ i µ) 2 i s2 i is a generalization of that of the Dr. Guangliang Chen Mathematics & Statistics, San José State University 11/20

12 Proof : We prove the first identity below: i ( n i ( µ i µ) 2 = n 1 µ 1 n ) 2 ( 1 µ 1 + n 2 µ 2 + n 2 µ 2 n ) 2 1 µ 1 + n 2 µ 2 n n = n 1n 2 2 n 2 ( µ 1 µ 2 ) 2 + n 2n 2 1 n 2 ( µ 2 µ 1 ) 2 = n 1n 2 n ( µ 2 µ 1 ) 2. The proof of the second identity is very similar: S b = n i (µ i µ)(µ i µ) T = Dr. Guangliang Chen Mathematics & Statistics, San José State University 12/20

13 How many discriminatory directions can/should we use? The answer is at most c 1. The discriminatory directions all satisfy the equation S 1 w S b v = λv. with the corresponding eigenvalues representing the magnitudes of separation. Therefore, we only need to count the number of nonzero eigenvectors. Dr. Guangliang Chen Mathematics & Statistics, San José State University 13/20

14 The within-class scatter matrix S w is assumed to be nonsingular. However, the between-class scatter matrix S b is of low rank: S b = n i (µ i µ)(µ i µ) T n1 (µ 1 µ) T = [ n 1 (µ 1 µ) n c (µ c µ)]. nc (µ c µ) T Observe that the columns of the left matrix are linearly dependent: n1 n 1 (µ 1 µ) + + n c n c (µ c µ) = 0 and thus the column rank is at most c 1. Dr. Guangliang Chen Mathematics & Statistics, San José State University 14/20

15 Multiclass FDA: A summary Input: c training classes Output: At most c 1 discriminatory directions Steps: 1. Form S w = i x Class i (x µ i)(x µ i ) T and S b = i n i(µ i µ)(µ i µ) T. 2. Solve the eigenvalue problem S 1 w S b v = λv 3. Return all nonzero eigenvectors v 1,..., v k (k c 1) in decreasing order. Dr. Guangliang Chen Mathematics & Statistics, San José State University 15/20

16 Multiclass FDA Illustration Dr. Guangliang Chen Mathematics & Statistics, San José State University 16/20

17 An important practical issue In the cases of high dimensional data, the within-class scatter matrix S w R d d is often singular due to lack of observations (in certain dimensions). Two common fixes: Apply PCA before FDA. Regularize S w to have S w = S w + βi d Dr. Guangliang Chen Mathematics & Statistics, San José State University 17/20

18 Fisher Discriminant Analysis (FDA) Experiment (3 digits) MNIST handwritten digits 0, 1, and 2 PCA (95%) + FDA PCA Dr. Guangliang Chen Mathematics & Statistics, San José State University 8 18/20

19 Comparison between PCA and FDA PCA FDA Use labels? no (unspervised) yes (supervised) Criterion variance discriminatory Linear separation? yes yes Noninear separation? no no #Dimensions any c 1 Solution SVD eigenvalue problem Remark. In the case of nonlinear separation, PCA (applied conservatively) often works better than FDA as the latter can only find at most c 1 directions (which are insufficient to preserve all the separation in the training data). Dr. Guangliang Chen Mathematics & Statistics, San José State University 19/20

20 HW2b (due Friday, March 4) First apply PCA 95% + FDA to all 10 classes of the MNIST digits and then do the following. 4 Apply the plain knn classifier to the reduced data with k = 1,..., 10 and display the test errors curve. Compare with that of PCA 50 + knn (for each k). What is your conclusion? 5 Repeat Question 4 with local kmeans instead of knn (everything else being the same). Note. Be sure to project the test data onto the same PCA 95% and FDA bases learned on training data, in the same order! Dr. Guangliang Chen Mathematics & Statistics, San José State University 20/20

LEC 5: Two Dimensional Linear Discriminant Analysis

LEC 5: Two Dimensional Linear Discriminant Analysis Dr. Guangliang Chen March 8, 2016 Outline Last time: LDA/QDA (classification) Today: 2DLDA (dimensionality reduction) HW3b and Midterm Project 3 What