Fisher s Linear Discriminant Analysis
|
|
- Marybeth Dean
- 5 years ago
- Views:
Transcription
1 Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 26
2 FLD or LDA Introduced by Fisher (1936) One of widely-used linear discriminant analysis (LDA) methods Curse of dimensionality Linear dimensionality reduction: PCA, ICA, FLD, MDS Nonlinear dimensionality reduction: Isomap, LLE, Laplacian eigenmap FLD aims at achieving an optimal linear dimensionality reduction for classification 2 / 26
3 An Example: Isotropic Case x 2 µ 1 µ 2 x 1 3 / 26
4 FLD: A Graphical Illustration x 2 µ 1 µ 2 x 1 4 / 26
5 Two Classes Given a set of data points, {x R D }, one wished to find a linear projection of the data onto a 1-dimensional space, y = w x. Sample means for x: µ i = 1 x. N i x C i Sample means for the projected points: µ i = 1 N i y = 1 w x = w µ N i. i y Y i x C i We know that the difference between sample means is not always a good measure of the separation between projected points: µ 1 µ 2 = w (µ 1 µ 2 ). Scale w µ 1 µ 2 (not desirable!). 5 / 26
6 FLD: Two Classes Define the within-class scatter for projected samples by s s2 2, where s i 2 = [ ] (y µ i ) 2 = w (x µ i ) (x µ i ) w. y Y i x C i FLD finds } {{ } S i w = arg max w µ 1 µ 2 2 s s2 2 = arg max w w S B w w S W w, where S W = S 1 + S 2 (within-class scatter matrix) and S B = (µ 1 µ 2 ) (µ 1 µ 2 ) (between-class scatter matrix). arg maxw w S B w w S W w S Bw = λs W w (generalized eigenvalue problem). 6 / 26
7 Multiple Discriminant Functions For the case of K classes, FLD involves K 1 discriminant functions, i.e., the projection is from R D to R K 1. Given a set of data {x R D }, one wishes to find a linear lower-dimensional embedding W R (K 1) D such that {y = W x} are classified as well as possible in the lower-dimensional space. y 1. y K 1 }{{} y = w 1. w K 1 } {{ } W x 1. x D. }{{} x 7 / 26
8 Scatter Matrices Within-class scatter matrix S W = K Between-class scatter matrix K S B = (µ i µ) (µ i µ) = C i Total scatter matrix: S T = S W + S B x C i (x µ i ) (x µ i ). K N i (µ i µ) (µ i µ). S T = x (x µ) (x µ). Rank(S B ) K 1, Rank(S W ) N K, Rank(S T ) N 1. 8 / 26
9 Total Scatter Matrix Define X = [X 1,..., X K ] where X i is a matrix whose columns are associated with data vectors belonging to C i. Define H W = [ X 1 µ 1 e 1,..., X K µ K e K ], H B = [ (µ 1 µ)e 1,..., (µ K µ)e K ], H T = [x 1 µ 1,..., x N µ]. One can easily see that H T = X µe = H W + H B. We also have S W = H W H W, S B = H B H B, S T = H T H T. Since H W H B = 0, we have S T = (H W + H B )(H W + H B ) = S W + S B. The column vectors of S W and S B are linear combinations of centered data samples. 9 / 26
10 FLD: Multiple Classes Define S W = S B = K (y µ i ) (y µ i ) y Y i K N i ( µ i µ) ( µ i µ). One can easily show that S W = W S W W, S B = W S B W. 10 / 26
11 FLD seeks K 1 discriminant functions W such that y = W x: leading to W = arg max W = arg max W = arg max W J FLD { S 1 tr S } W B { ( ) 1 ( ) } tr W S W W W S B W, arg max J FLD S B w i = λ i S W w i. W generalized eigenvalue problem 11 / 26
12 Rayleigh Quotient Definition Let A R m m be symmetric. The Rayleigh quotient R(x, A) is defined by R(x, A) = x Ax x x. Theorem Let A R m m be symmetric with its eigenvalues being {λ 1 λ m }. For x 0 R m, we have and in particular, λ m x Ax x x λ 1, x Ax λ m = min x 0 x x, λ x Ax 1 = max x 0 x x. 12 / 26
13 An Extremal Property of Generalized Eigenvalues Theorem Let A and B be m m matrices, with A being nonnegative definite and B positive definite. For h = 1,..., m, define X h = [x 1,..., x h ], Y h = [x h,..., x m ], where x 1,..., x m are linear independent eigenvectors of B 1 A corresponding to the eigenvalues Then where x = 0 is excluded. λ 1 ( B 1 A ) λ m ( B 1 A ). λ m ( B 1 A ) = min Y h+1bx =0 λ 1 ( B 1 A ) = max X h 1Bx =0 x Ax x Bx, x Ax x Bx, 13 / 26
14 Relation to Least Squares Regression: Binary Class Given a training set {x i, y i } N, where x i R D and y i {1, 1}, consider a linear discriminant function: f (x i ) = w x i + b. Partition the data matrix into two groups, each group of which contains examples in class 1 or class 2, i.e., X = [X 1, X 2 ], where X 1 R D N1 and X 2 R D N2. Define binary label vector y R N, then LS regression is formulated as arg min y X w b1 N 2, w,b where 1 N is the N-dimensional vector of all ones, which can be re-written as [ ] [ arg min X 1 1 N1 w w,b 1 N2 b X 2 ] [ ] 1 N N2 14 / 26
15 The solution to this LS problem satisfies the normal equation: [ ] [ ] [ ] [ ] [ ] X 1 X 2 X 1 1 N1 w X 1 X N 1 1 N 2 X = N1 2 1 N2 b 1 N 1 1, N 2 1 N2 which is written as [ X 1 X 1 + X 2 X 2 X 1 1 N1 + X 2 1 N2 1 N 1 X N 2 X 2 1 N 1 1 N1 + 1 N 2 1 N2 ] [ w b ] [ X 1 1 N1 X 2 1 N2 = 1 N 1 1 N1 1 N 2 1 N2 ]. Recall S B = (µ 1 µ 2 )(µ 1 µ 2 ) ) ) ) ) S W = (X 1 µ 1 1 N1 (X 1 µ 1 1 N1 + (X 2 µ 2 1 N2 (X 2 µ 2 1 N2 = X 1 X 1 N 1 µ 1 µ 2 + X 2 X 2 N 2 µ 2 µ / 26
16 With S B and S W, the normal equation is written as [ SW + N 1 µ 1 µ 1 + N ] [ 2µ 2 µ 2 N 1 µ 1 + N 2 µ 2 w (N 1 µ 1 + N 2 µ 2 ) N 1 + N 2 b ] = [ ] N1 µ 1 N 2 µ 2. N 1 N 2 Solve the 2nd equation for b to obtain b = (N 1 N 2 ) (N 1 µ 1 + N 2 µ 2 ) w N 1 + N 2. Substitute this into the 1st equation to obtain [ S W + N ] 1N 2 S B w = 2N 1 N 2 (µ N 1 + N 1 µ 2 ) / 26
17 Note that the vector S B w is in the direction of µ 1 µ 2 for w, since S B w = (µ 1 µ 2 )(µ 1 µ 2 ) w. Thus we write Then we have N 1 N 2 N 1 + N 2 S B w = (2N 1 N 2 α)(µ 1 µ 2 ). w = αs 1 W (µ 1 µ 2 ), which is identical to FLD solutions except for scaling factor. 17 / 26
18 Simultaneous Diagonalization The goal: Given two symmetric matrices, Σ 1 and Σ 2, find a linear transformation W such that W Σ 1 W = I, W Σ 2 W = Λ. (diagonal) Methods: It turns out that simultaneous diagonalization involves the generalized eigen-decomposition. Two-stage method 1. whitening 2. unitary transformation Single-stage method: generalized eigenvalue decomposition 18 / 26
19 Simultaneous Diagonalization: Algorithm Outline 1. First, whiten Σ 1, i.e., where Σ 1 = U 1 DU 1. D 1 2 U 1 Σ 1 U 1 D 1 2 = I, D 1 2 U 1 Σ 2 U 1 D 1 2 = K, (not diagonal), 2. Second, apply an unitary transformation to diagonalize K, i.e., where K = U 2 ΛU 2. U 2 I U 2 = I, U 2 KU 2 = Λ, Then, the transformation W which simultaneously diagonalizes Σ 1 and Σ 2, is given by, W = U 1 D 1 2 U2, such that W Σ 1 W = I and W Σ 2 W = Λ 19 / 26
20 Simultaneous Diagonalization: Generalized Eigen-Decomposition Alternatively we can diagonalize two symmetric matrices Σ 1 and Σ 2 as W Σ 1 W = I, W Σ 2 W = Λ, (diagonal) where Λ, W are eigenvalues and eigenvectors of Σ 1 1 Σ 2, i.e., Σ 1 1 Σ 2W = W Λ. Prove it! 20 / 26
21 Example: Multi-Modal Data 21 / 26
22 Alternative Expressions of S W and S B Alternatively, S W and S B are expressed as S W = 1 A W ij (x i x j )(x i x j ), 2 i j S B = 1 A B ij (x i x j )(x i x j ), 2 i j A W ij = A B ij = { 1 N k if x i C k and x j C k, 0 if x i and x i are in different classes, { 1 N 1 N k if x i C k and x j C k, if x i and x i are in different classes. 1 N 22 / 26
23 S W = = = = = = 1 2 = 1 2 K (x µ i )(x µ i ) x C i K x j 1 x u x j 1 x v N x i N j C i x i u C i x v C i K x jx j 1 x j x v 1 x ux j + 1 N x i N j C i x i v C i x N u C i 2 i K x i x 1 i x ux v N i x u C i x v C i ( N ) A W ij x i x i A W ij x i x j j=1 j=1 A W ij j=1 ( x i x i + x j x j x i x j x j x i A W ij (x i x j )(x i x j ). j=1 ) x u C i x ux v x v C i 23 / 26
24 S B = S T S W = (x i µ)(x i µ) S W = = 1 2 = 1 2 ( N ) 1 x i x i N 1 2 j=1 j=1 j=1 j=1 A W ij ( 1 N AW ij 1 N x i x j j=1 ( x i x i + x j x j x i x j x j x i ) ( x i x i + x j x j x i x j x j x i ( ) 1 N AW ij (x i x j )(x i x j ). ) ) 24 / 26
25 Local Within-Class and Between-Class Scatter Given weighted adjacency matrix [A ij ], introduce local within-class scatter and local between-class scatter: S W = 1 A W ij (x i x j )(x i x j ), 2 i j S B = 1 A B ij (x i x j )(x i x j ), 2 i j A W ij = A B ij = { Aij N k if x i C k and x j C k, 0 if x i and x i are in different classes, { ( ) 1 A ij N 1 N k if x i C k and x j C k, 1 N if x i and x i are in different classes. 25 / 26
26 Local Fisher Discriminant Analysis (LFDA) Proposed by M. Sugiyama (ICML-2006). LFDA seeks K 1 discriminant functions W such that y = W x: arg max W Local within-class scatter matrix { ( ) 1 ( ) } tr W S W W W S B W, S W = 1 A W ij (x i x j )(x i x j ), 2 i j Local between-class scatter matrix S B = 1 A B ij (x i x j )(x i x j ). 2 i j 26 / 26
Nonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationNotes on Implementation of Component Analysis Techniques
Notes on Implementation of Component Analysis Techniques Dr. Stefanos Zafeiriou January 205 Computing Principal Component Analysis Assume that we have a matrix of centered data observations X = [x µ,...,
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationLECTURE NOTE #10 PROF. ALAN YUILLE
LECTURE NOTE #10 PROF. ALAN YUILLE 1. Principle Component Analysis (PCA) One way to deal with the curse of dimensionality is to project data down onto a space of low dimensions, see figure (1). Figure
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationFisher Linear Discriminant Analysis
Fisher Linear Discriminant Analysis Cheng Li, Bingyu Wang August 31, 2014 1 What s LDA Fisher Linear Discriminant Analysis (also called Linear Discriminant Analysis(LDA)) are methods used in statistics,
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationMachine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,
Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationApprentissage non supervisée
Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let
More informationSpectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods
Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationManifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012
Manifold Learning: From Linear to nonlinear Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012 1 Preview Goal: Dimensionality Classification reduction and clustering Main idea:
More informationMultisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues
Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationFinite Automata. Seungjin Choi
Finite Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 28 Outline
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationIntroduction to Signal Detection and Classification. Phani Chavali
Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)
More informationLinear & Non-Linear Discriminant Analysis! Hugh R. Wilson
Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationA Coupled Helmholtz Machine for PCA
A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationLEC 3: Fisher Discriminant Analysis (FDA)
LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationProperties of Context-Free Languages
Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationAdvanced data analysis
Advanced data analysis Akisato Kimura ( 木村昭悟 ) NTT Communication Science Laboratories E-mail: akisato@ieee.org Advanced data analysis 1. Introduction (Aug 20) 2. Dimensionality reduction (Aug 20,21) PCA,
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationLinear algebra and applications to graphs Part 1
Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationImage Analysis & Retrieval. Lec 14. Eigenface and Fisherface
Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationLocalized Sliced Inverse Regression
Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,
More informationMid-year Report Linear and Non-linear Dimentionality. Reduction. applied to gene expression data of cancer tissue samples
Mid-year Report Linear and Non-linear Dimentionality applied to gene expression data of cancer tissue samples Franck Olivier Ndjakou Njeunje Applied Mathematics, Statistics, and Scientific Computation
More informationExample: Face Detection
Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization
More informationIndependent Component Analysis
Independent Component Analysis Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr March 4, 2009 1 / 78 Outline Theory and Preliminaries
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationDiscriminant Uncorrelated Neighborhood Preserving Projections
Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationCOMPSCI 514: Algorithms for Data Science
COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 8 Spectral Clustering Spectral clustering Curse of dimensionality Dimensionality Reduction
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationLecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA
Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationLecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)
More informationDimensionality Reduction:
Dimensionality Reduction: From Data Representation to General Framework Dong XU School of Computer Engineering Nanyang Technological University, Singapore What is Dimensionality Reduction? PCA LDA Examples:
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationIntroduction to Spectral Graph Theory and Graph Clustering
Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation
More informationIntroduction to Automata
Introduction to Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More information