High Dimensional Discriminant Analysis
|
|
- Elfrieda Spencer
- 5 years ago
- Views:
Transcription
1 High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43
2 Introduction High dimensional data: many scientific domains need to analyze data which are increasingly complex, modern data are made up of many variables: imagery (MRI, vision), biology (DNA micro-array),... Classification is very difficult in high dimensional spaces: many learning methods suffer from the curse of dimensionality [Bel61], since the number n of data is not generally sufficient to learn high-dimensional data. The empty space phenomena [ST83] allows to assume that data live in subspaces with lower dimensionality. High Dimensional Discriminant Analysis - Lear seminar p.2/43
3 Introduction Classification: supervised classification (discriminant analysis) requires examples of classes, unsupervised classification (clustering) aims to organize data in homogeneous classes. 2 ways: generative methods: QDA, LDA, GMM, discriminantive methods: logistic regression and SVM. Generative models can be both used in supervised and unsupervised classification. High Dimensional Discriminant Analysis - Lear seminar p.3/43
4 Outline of the talk Discriminant analysis framework New modelisation of high-dimensional data High dimensional discriminant analysis (HDDA) construction of the decision rule a posteriori probability and reformulation Particular rules Estimators and intrinsic dimension estimation Numerical results application to image categorization application to object recognition Extension to unsupervised classification High Dimensional Discriminant Analysis - Lear seminar p.4/43
5 Part 1 Discriminant analysis framework High Dimensional Discriminant Analysis - Lear seminar p.5/43
6 Discriminant analysis framework Discriminant analysis is the supervised part of classification, i.e. it requires a professor! Discriminant analysis goals: descriptive aspect: find a data representation which allows to interpret the groups using explanatory variables. decisional aspect: the major goal is to find the good class membership of a new data x. Of course, HDDA favours the decisional aspect! High Dimensional Discriminant Analysis - Lear seminar p.6/43
7 Discrimination problem The basic problem: assign an observation x = (x 1,..., x p ) R p with unknown class membership to one of k classes C 1,..., C k known a priori. We have a learning dataset A: A = {(x 1, c 1 ),..., (x n, c n )/x j R p and y j {1,..., k}}, where the vector x j contains p explanatory variables and y j indicates the index of the class of x j. We have to construct a decision rule δ: δ : R p {1,..., k} x y. High Dimensional Discriminant Analysis - Lear seminar p.7/43
8 Bayes decision rule The optimal decision rule δ, called Bayes decision rule, is : δ : x C i, if i = argmax{p(c i x)}, i=1,...,k δ : x C i, if i = argmin{ 2 log(π i f i (x))}, i=1,...,k where π i is the a priori probability of class C i and f i (x) denotes the class conditional density of x. Generative methods usually assume that distributions of classes are Gaussian N (µ i, Σ i ). High Dimensional Discriminant Analysis - Lear seminar p.8/43
9 Classical discriminant analysis method Quadratic discriminant analysis (QDA): i = argmin{(x µ i ) t Σ 1 i (x µ i ) + log(det Σ i ) 2 log(π i )}. i=1,...,k Linear discriminant analysis (LDA): with the assumption that i, Σ i = Σ i = argmin{µ t iσ 1 µ i 2µ t iσ 1 x 2 log(π i )}. i=1,...,k QDA and LDA have disappointing behavior when the size of the training dataset n is small compared to the number p of variables. High Dimensional Discriminant Analysis - Lear seminar p.9/43
10 Discriminant analysis regularization Dimension reduction: PCA, FDA, features selection, Fischer discriminant analysis (FDA) combines: a dimension reduction step (projection on the k 1 discriminant axes) with one of the previous methods (usually LDA). Parsimonious models: Regularized discriminant analysis (RDA, [Fri89]), is an intermediate classifier between QDA and LDA, Eigenvalue decomposition discriminant analysis (EDDA, [BC96]) is based on the re-parametrization of the covariance matrices of classes: Σ i = λ i D i A i D t i. High Dimensional Discriminant Analysis - Lear seminar p.10/43
11 Dimension reduction for classification PCA axes Discriminant axes Fig.1 - High-dimensional data which classes live in different subspaces with lower dimensionality. High Dimensional Discriminant Analysis - Lear seminar p.11/43
12 Part 2 New modelisation High Dimensional Discriminant Analysis - Lear seminar p.12/43
13 New modelisation The empty space phenomena enables us to assume that HD data live in subspaces with low dimensionality. The main idea of the new modelisation is: each class is decomposed on two subspaces with low dimensionality, and the classes are assumed spherical in these subspaces. High Dimensional Discriminant Analysis - Lear seminar p.13/43
14 New modelisation We assume that class conditional densities are Gaussian N (µ i, Σ i ) with means µ i and covariance matrices Σ i. Let Q i be the orthogonal matrix of eigenvectors of the covariance matrix Σ i, Let B i be the basis of R p made of the eigenvectors of Σ i. The class conditional covariance matrix i is defined in the basis B i by: i = Q t i Σ i Q i. High Dimensional Discriminant Analysis - Lear seminar p.14/43
15 New modelisation We assume in addition that i contains only two different eigenvalues a i > b i. Let E i be the affine space generated by eigenvectors associated to the eigenvalue a i and such that µ i E i. We define also E i such that E i E i = R p and µ i E i. Let P i and P i be the projection operators on E i and E i. High Dimensional Discriminant Analysis - Lear seminar p.15/43
16 New modelisation Thus, we assume that i has the following form: i = 0 a i a i 0 0 b i C A 9 >= >; 9 >= >; d i (p d i ) 0 b i High Dimensional Discriminant Analysis - Lear seminar p.16/43
17 New modelisation: illustration High Dimensional Discriminant Analysis - Lear seminar p.17/43
18 Part 3 High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis - Lear seminar p.18/43
19 High Dimensional Discriminant Analysis Under the preceding assumptions, the Bayes decision rule yields a new decision rule δ + : Theorem 1: The new decision rule δ + consists in classifying x to the class C i if: i = argmin i=1,...,k { 1 µ i P i (x) x P i (x) 2 a i b i } +d i log(a i ) + (p d i ) log(b i ) 2 log(π i ). High Dimensional Discriminant Analysis - Lear seminar p.19/43
20 HDDA: illustration K i (x) = 1 a i µ i P i (x) b i x P i (x) 2 + d i log(a i ) + (p d i ) log(b i ) 2 log(π i ). High Dimensional Discriminant Analysis - Lear seminar p.20/43
21 HDDA: a posteriori probability In many applications, it is interesting to dispose of the a posteriori probability p(c i x) that x belongs to C i. The Bayes formula yields: p(c i x) = exp ( 1 2 K i(x) ) k j=1 exp ( 1 2 K j(x) ), where K i is the cost function of δ + conditionally with the class C i : K i (x) = 1 a i µ i P i (x) b i x P i (x) 2 +d i log(a i ) + (p d i ) log(b i ) 2 log(π i ). High Dimensional Discriminant Analysis - Lear seminar p.21/43
22 HDDA: reformulation In order to interpret easily the decision rule δ +, we introduce α i and σ i : a i = σ2 i α i and b i = σ2 i (1 α i ) with α i ]0, 1[ and σ i > 0. Thus, the decision rule δ + consists in classifying x to the class C i if: { 1 i ( = argmin i=1,...,k σi 2 αi µ i P i (x) 2 + (1 α i ) x P i (x) 2) ( ) } 1 αi +2p log(σ i ) + d i log p log(1 α i ) 2 log(π i ). Notation: HDDA is the model [a i b i Q i d i ] or [α i σ i Q i d i ]. α i High Dimensional Discriminant Analysis - Lear seminar p.22/43
23 Part 4 Particular rules High Dimensional Discriminant Analysis - Lear seminar p.23/43
24 Particular rules By allowing some but not all of HDDA parameters to vary, we obtain 24 particular rules: which correspond to different regularizations, which some ones are easily geometrically interpretable, which 9 have explicit formulations. HDDA can be interpreted as a classical discriminant analysis in particular cases: if i, α i = 1 2 : δ+ is QDA with sperical classes, if in addition i, σ i = σ: δ + is LDA with sperical classes. High Dimensional Discriminant Analysis - Lear seminar p.24/43
25 Links with classical methods QDA Σ i = λ i D i A i D t i Σ i = Q i i Q t i EDDA HDDA Σ i = λdad t A i = Id α i = LDA QDAs... Σ i = σ 2 i Id σ i = σ LDAs π i = π LDA géo High Dimensional Discriminant Analysis - Lear seminar p.25/43
26 Model [ασq i d i ] The decision rule δ + consists in classifying x to the class C i if: i = argmin{α µ i P i (x) 2 + (1 α) x P i (x) 2 }. i=1,...,k High Dimensional Discriminant Analysis - Lear seminar p.26/43
27 Part 5 Estimation High Dimensional Discriminant Analysis - Lear seminar p.27/43
28 HDDA estimators Estimators are computed using maximum likelihood estimation from the learning set A. Common estimators: ˆπ i = n i n, n i = #(C i ), ˆµ i = 1 n i x j C i x j, ˆΣ i = 1 n i x j C i (x j ˆµ i ) t (x j ˆµ i ). High Dimensional Discriminant Analysis - Lear seminar p.28/43
29 Estimators of the model [a i b i Q i d i ] Assuming d i is known, the ML estimators are: ˆQ i is made of the eigenvectors associated to the ordered eigenvalues of ˆΣ i, â i is the mean of the largest d i eigenvalues of ˆΣ i : â i = d i l=1 λ il d i, ˆb i is the mean of the smallest (p d i ) eigenvalues of ˆΣ i : p λ il ˆb i = (p d i ). l=d i +1 High Dimensional Discriminant Analysis - Lear seminar p.29/43
30 Estimation trick The decision rule δ + do not requires to compute the last (p d i ) eigenvectors of ˆΣ i. Thus, in order to minize the number of parameters to estimate, we use the following relation: p l=d i +1 λ il = tr( ˆΣ i ) d i l=1 λ il. Number of parameters to estimate with p = 100, d i = 10 and k = 4: QDA: HDDA: High Dimensional Discriminant Analysis - Lear seminar p.30/43
31 Intrinsic dimension estimation We base our approach to chose the values of d i on eigenvalues of Σ i, We use two empirical methods: common thresholding on the cumulative variance: d p d i = argmin λ d=1,...,p 1 j / λ j s, j=1 j=1 scree-test of Cattell: analyses differences between the eigenvalues in order to find a brake in the scree of eigenvalues. High Dimensional Discriminant Analysis - Lear seminar p.31/43
32 Intrinsic dimension estimation Ordered eigenvalues of Σ i Ordered eigenvalues of Σ i Cumulative sum of eigenvalues Common tresholding Difference betwenn eigenvalues Scree-test of Cattell High Dimensional Discriminant Analysis - Lear seminar p.32/43
33 Part 6 Numerical results High Dimensional Discriminant Analysis - Lear seminar p.33/43
34 Results: artificial data Method Classification rate HDDA ([a i b i Q i d i ]) HDDA ([a i b i Q i d]) LDA FDA 0.51 SVM Gaussian densities in R 15, with d 1 = 3, d 2 = 4 and d 3 = 5, In addition, the proportions are very different: π 1 = 1 2, π 2 = 1 3 and π 3 = 1 6, High Dimensional Discriminant Analysis - Lear seminar p.34/43
35 Results: image categorization A recent study [LBGGDH03] proposes an approach based on the human perception to categorize natural images. An image is represented by a vector of 49 dimensions. Each one of these 49 components is the response of the image to a Gabor filter. High Dimensional Discriminant Analysis - Lear seminar p.35/43
36 Results: image categorization Data: 328 descriptors in 49 dimensions, Results: Method Classification rate HDDA ([a i b i Q i d i ]) HDDA ([a i bq i d]) QDA LDA FDA (d = k 1) 0.79 SVM Classification results for the image categorization experiment (leave-one-out). High Dimensional Discriminant Analysis - Lear seminar p.36/43
37 Results: object recognition Our approach uses local descriptors (Harris-Laplace+Sift), We consider 3 object classes (wheels, seat and handlebars) and 1 background class, The dataset is made of 1000 descriptors in 128 dimensions: learning dataset: 500, test dataset: 500. High Dimensional Discriminant Analysis - Lear seminar p.37/43
38 Results: object recognition True positives FDA LDA True positives SVM classifiers HDDA classifiers False positives HDDA with error probability < 10 5 with error probability < False positives Classification results for the object recognition experiment. High Dimensional Discriminant Analysis - Lear seminar p.38/43
39 Results: object recognition Recognition using HDDA Recognition using SVM High Dimensional Discriminant Analysis - Lear seminar p.39/43
40 Part 7 Unsupervised classification High Dimensional Discriminant Analysis - Lear seminar p.40/43
41 Extension to unsupervised classification The unsupervised classification aims to organize data in homogeneous classes. Gaussian mixture models (GMM) are an efficient way for unsupervised classification: in Gaussian mixture models, the density of the mixture is: f(x, θ) = k i=1 π i f i (x; µ i, Σ i ), where θ = {π 1,..., π k, µ 1,..., µ k, Σ 1,..., Σ k }. the parameters estimation is generally done by the EM algorithm. High Dimensional Discriminant Analysis - Lear seminar p.41/43
42 Extension to unsupervised classification Using our model for HD data, the two main steps of the EM algorithm are: E step: compute t (q) ij t (q) ij where K (q) i d (q) i log(a (q) i = t (q) i (x j ) = exp( K (q) i (x j )/2)/ k l=1 exp( K(q) l (x j )/2), (x) = µ(q) i P (q) i (x j ) 2 a (q) i ) + (p d (q) i ) log(b (q) i + x j P (q) i (x j ) 2 b (q) i ) 2 log(π (q) i ). M step: classical estimation of π i, µ i and Σ i ; the estimators of a i, b i and Q i are the same as those of HDDA. + High Dimensional Discriminant Analysis - Lear seminar p.42/43
43 References [BC96] H. Bensmail and G. Celeux. Regularized gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91: , [Bel61] R. Bellman. Adaptive Control Processes. Princeton University Press, [Fri89] J.H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84: , [LBGGDH03] H. Le Borgne, N. Guyader, A. Guerin-Dugué, and J. Hérault. Classification of images: Ica filters vs human perception. In 7th International Symposium on Signal Processing and its Applications, number 2, pages , [ST83] D. Scott and J. Thompson. Probability density estimation in higher dimensions. In Proceedings of the Fifteenth Symposium on the Interface, North Holland-Elsevier Science Publishers, pages , High Dimensional Discriminant Analysis - Lear seminar p.43/43
High Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationClassification of high dimensional data: High Dimensional Discriminant Analysis
Classification of high dimensional data: High Dimensional Discriminant Analysis Charles Bouveyron, Stephane Girard, Cordelia Schmid To cite this version: Charles Bouveyron, Stephane Girard, Cordelia Schmid.
More informationModel-based clustering of high-dimensional data: an overview and some recent advances
Model-based clustering of high-dimensional data: an overview and some recent advances Charles BOUVEYRON Laboratoire SAMM, EA 4543 Université Paris 1 Panthéon-Sorbonne This presentation is based on several
More informationJournal of Statistical Software
JSS Journal of Statistical Software January 2012, Volume 46, Issue 6. http://www.jstatsoft.org/ HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data Laurent
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationINRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis
Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationAdaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes
Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron SAMOS-MATISSE, CES, UMR CNRS 8174 Université Paris 1 (Panthéon-Sorbonne), Paris, France Abstract
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationModel-Based Clustering of High-Dimensional Data: A review
Model-Based Clustering of High-Dimensional Data: A review Charles Bouveyron, Camille Brunet To cite this version: Charles Bouveyron, Camille Brunet. Model-Based Clustering of High-Dimensional Data: A review.
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationSubspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici
Subspace Analysis for Facial Image Recognition: A Comparative Study Yongbin Zhang, Lixin Lang and Onur Hamsici Outline 1. Subspace Analysis: Linear vs Kernel 2. Appearance-based Facial Image Recognition.
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationRegularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution
Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained
More informationTHESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott
THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION Submitted by Daniel L Elliott Department of Computer Science In partial fulfillment of the requirements
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationModel selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)
Model selection criteria in Classification contexts Gilles Celeux INRIA Futurs (orsay) Cluster analysis Exploratory data analysis tools which aim is to find clusters in a large set of data (many observations
More informationKernel discriminant analysis and clustering with parsimonious Gaussian process models
Kernel discriminant analysis and clustering with parsimonious Gaussian process models Charles Bouveyron, Mathieu Fauvel, Stephane Girard To cite this version: Charles Bouveyron, Mathieu Fauvel, Stephane
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationMultivariate Analysis
Prof. Dr. J. Franke All of Statistics 3.1 Multivariate Analysis High dimensional data X 1,..., X N, i.i.d. random vectors in R p. As a data matrix X: objects values of p features 1 X 11 X 12... X 1p 2.
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationAdaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes
Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron To cite this version: Charles Bouveyron. Adaptive Mixture Discriminant Analysis for Supervised Learning
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationHigh Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution
High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution Presented by 1 In collaboration with Mathieu Fauvel1, Stéphane Girard2
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationSpring 2006: Linear Discriminant Analysis, Etc.
36-724 Spring 2006: Linear Discriminant Analysis, Etc. Brian Junker April 17, 2006 Review: The Bayes Classifier Linear and Quadratic Discriminant Analysis and Friends Linear regression of an indicator
More informationMinimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions
Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationDiscriminant Analysis Documentation
Discriminant Analysis Documentation Release 1 Tim Thatcher May 01, 2016 Contents 1 Installation 3 2 Theory 5 2.1 Linear Discriminant Analysis (LDA).................................. 5 2.2 Quadratic Discriminant
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationDimensionality Reduction and Principal Components
Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationSystem 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:
System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More information