Lecture 9: Classification, LDA

Size: px
Start display at page:

Download "Lecture 9: Classification, LDA"

Transcription

1 Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, / 21

2 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we predict the response as in a Bayes classifier: ŷ 0 = argmax y ˆP (Y = y X = x0 ). 2 / 21

3 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 3 / 21

4 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 3 / 21

5 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 2. ˆP (Y ): How probable is each category? 3 / 21

6 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 2. ˆP (Y ): How probable is each category? Then, we use Bayes rule to obtain the estimate: ˆP (Y = k X = x) = ˆP (X = x Y = k) ˆP (Y = k) ˆP (X = x) 3 / 21

7 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 2. ˆP (Y ): How probable is each category? Then, we use Bayes rule to obtain the estimate: ˆP (Y = k X = x) = ˆP (X = x Y = k) ˆP (Y = k) j ˆP (X = x Y = j) ˆP (Y = j) 3 / 21

8 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X), we compute estimates: 1. ˆP (X = x Y = k) = ˆfk (x), where each ˆf k (x) is a Multivariate Normal Distribution density: X X X X1 4 / 21

9 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X), we compute estimates: 1. ˆP (X = x Y = k) = ˆfk (x), where each ˆf k (x) is a Multivariate Normal Distribution density: X X X X1 2. ˆP (Y = k) = ˆπk is estimated by the fraction of training samples of class k. 4 / 21

10 LDA has linear decision boundaries Suppose that: 5 / 21

11 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. 5 / 21

12 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. P (X = x Y = k) is Mutivariate Normal with density: f k (x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) 5 / 21

13 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. P (X = x Y = k) is Mutivariate Normal with density: f k (x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) µ k : Mean of the inputs for category k. Σ : Covariance matrix (common to all categories). 5 / 21

14 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. P (X = x Y = k) is Mutivariate Normal with density: f k (x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) µ k : Mean of the inputs for category k. Σ : Covariance matrix (common to all categories). Then, what is the Bayes classifier? 5 / 21

15 LDA has linear decision boundaries By Bayes rule, the probability of category k, given the input x is: P (Y = k X = x) = likelihood prior marginal = f k(x)π k P (X = x) 6 / 21

16 LDA has linear decision boundaries By Bayes rule, the probability of category k, given the input x is: P (Y = k X = x) = likelihood prior marginal = f k(x)π k P (X = x) The denominator does not depend on the response k, so we can write it as a constant: P (Y = k X = x) = C f k (x)π k 6 / 21

17 LDA has linear decision boundaries By Bayes rule, the probability of category k, given the input x is: P (Y = k X = x) = likelihood prior marginal = f k(x)π k P (X = x) The denominator does not depend on the response k, so we can write it as a constant: P (Y = k X = x) = C f k (x)π k Plugging in the formula for f k (x): P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) 6 / 21

18 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) 7 / 21

19 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) 7 / 21

20 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) and take the logarithm of both sides: log P (Y = k X = x) = log C + log π k 1 2 (x µ k) T Σ 1 (x µ k ). 7 / 21

21 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) and take the logarithm of both sides: log P (Y = k X = x) = log C + log π k 1 2 (x µ k) T Σ 1 (x µ k ). This constant is the same for every category, k. 7 / 21

22 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) and take the logarithm of both sides: log P (Y = k X = x) = log C + log π k 1 2 (x µ k) T Σ 1 (x µ k ). This constant is the same for every category, k. So we want to find the maximum of this over k. 7 / 21

23 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). 8 / 21

24 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). = log π k 1 2 [ x T Σ 1 x + µ T k Σ 1 µ k ] + x T Σ 1 µ k 8 / 21

25 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). = log π k 1 2 [ x T Σ 1 x + µ T k Σ 1 µ k ] + x T Σ 1 µ k =C + log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k 8 / 21

26 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). = log π k 1 2 [ x T Σ 1 x + µ T k Σ 1 µ k ] + x T Σ 1 µ k =C + log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k We define the objective: δ k (x) = log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k At an input x, we predict the response with the highest δ k (x). 8 / 21

27 LDA has linear decision boundaries What is the decision boundary? It is the set of points in which 2 classes are equally probable: δ k (x) = δ l (x) 9 / 21

28 LDA has linear decision boundaries What is the decision boundary? It is the set of points in which 2 classes are equally probable: δ k (x) = δ l (x) log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k = log π l 1 2 µt l Σ 1 µ l + x T Σ 1 µ l 9 / 21

29 LDA has linear decision boundaries What is the decision boundary? It is the set of points in which 2 classes are equally probable: δ k (x) = δ l (x) log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k = log π l 1 2 µt l Σ 1 µ l + x T Σ 1 µ l This is a linear equation in x. X X X X1 9 / 21

30 Estimating π k ˆπ k = #{i ; y i = k} n In words, the fraction of training samples of class k. 10 / 21

31 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k x i 11 / 21

32 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k Estimate the common covariance matrix Σ: x i 11 / 21

33 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k Estimate the common covariance matrix Σ: One predictor (p = 1): x i ˆσ 2 = 1 n K K k=1 i ; y i =k (x i ˆµ k ) / 21

34 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k Estimate the common covariance matrix Σ: One predictor (p = 1): x i ˆσ 2 = 1 n K K k=1 i ; y i =k (x i ˆµ k ) 2. Many predictors (p > 1): Compute the vectors of deviations (x 1 ˆµ y1 ), (x 2 ˆµ y2 ),..., (x n ˆµ yn ) and use an unbiased estimate of its covariance matrix, Σ. 11 / 21

35 LDA prediction For an input x, predict the class with the largest: ˆδ k (x) = log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k 12 / 21

36 LDA prediction For an input x, predict the class with the largest: ˆδ k (x) = log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k The decision boundaries are defined by: log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k = log ˆπ l 1 2 ˆµT l ˆΣ 1ˆµ l + x T ˆΣ 1ˆµ l 12 / 21

37 LDA prediction For an input x, predict the class with the largest: ˆδ k (x) = log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k The decision boundaries are defined by: log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k = log ˆπ l 1 2 ˆµT l ˆΣ 1ˆµ l + x T ˆΣ 1ˆµ l These are the solid lines in the following image: X X X X1 12 / 21

38 Quadratic discriminant analysis (QDA) The assumption that the inputs of every class have the same covariance Σ can be quite restrictive: X X X 1 Boundaries for Bayes (dashed), LDA (dotted), and QDA (solid). X 1 13 / 21

39 Quadratic discriminant analysis (QDA) In quadratic discriminant analysis we estimate a mean ˆµ k and a covariance matrix ˆΣ k for each class separately. 14 / 21

40 Quadratic discriminant analysis (QDA) In quadratic discriminant analysis we estimate a mean ˆµ k and a covariance matrix ˆΣ k for each class separately. Given an input, it is easy to derive an objective function: δ k (x) = log π k 1 2 µt k Σ 1 k µ k + x T Σ 1 k µ k 1 2 xt Σ 1 k x 1 2 log Σ k 14 / 21

41 Quadratic discriminant analysis (QDA) In quadratic discriminant analysis we estimate a mean ˆµ k and a covariance matrix ˆΣ k for each class separately. Given an input, it is easy to derive an objective function: δ k (x) = log π k 1 2 µt k Σ 1 k µ k + x T Σ 1 k µ k 1 2 xt Σ 1 k x 1 2 log Σ k This objective is now quadratic in x and so are the decision boundaries. 14 / 21

42 Quadratic discriminant analysis (QDA) Bayes boundary ( ) LDA ( ) QDA ( ). X X X X 1 15 / 21

43 Evaluating a classification method We have talked about the 0-1 loss: 1 m m 1(y i ŷ i ). i=1 It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn t tell you anything about this. 16 / 21

44 Evaluating a classification method We have talked about the 0-1 loss: 1 m m 1(y i ŷ i ). i=1 It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn t tell you anything about this. A much more informative summary of the error is a confusion matrix: 16 / 21

45 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > / 21

46 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. 17 / 21

47 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. However, the error rate among people who do default (false negative rate) is 76%. 17 / 21

48 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. However, the error rate among people who do default (false negative rate) is 76%. False negatives may be a bigger source of concern! 17 / 21

49 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. However, the error rate among people who do default (false negative rate) is 76%. False negatives may be a bigger source of concern! One possible solution: Change the threshold. 17 / 21

50 Example. Predicting default Changing the threshold to 0.2 makes it easier to classify to yes. Predicted yes if P (default = yes X) > / 21

51 Example. Predicting default Changing the threshold to 0.2 makes it easier to classify to yes. Predicted yes if P (default = yes X) > 0.2. Note that the rate of false positives became higher! That is the price to pay for fewer false negatives. 18 / 21

52 Example. Predicting default Let s visualize the dependence of the error on the threshold: Error Rate Threshold False negative rate (error for defaulting customers) False positive rate (error for non-defaulting customers) 0-1 loss or total error rate. 19 / 21

53 Example. The ROC curve True positive rate ROC Curve Displays the performance of the method for any choice of threshold False positive rate 20 / 21

54 Example. The ROC curve True positive rate ROC Curve False positive rate Displays the performance of the method for any choice of threshold. The area under the curve (AUC) measures the quality of the classifier: 0.5 is the AUC for a random classifier The closer AUC is to 1, the better. 20 / 21

55 Next time Comparison of logistic regression, LDA, QDA, and KNN classification. Start Chapter 5: Resampling. 21 / 21

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Linear Decision Boundaries

Linear Decision Boundaries Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Lecture 5: Classification

Lecture 5: Classification Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Spring 2006: Linear Discriminant Analysis, Etc.

Spring 2006: Linear Discriminant Analysis, Etc. 36-724 Spring 2006: Linear Discriminant Analysis, Etc. Brian Junker April 17, 2006 Review: The Bayes Classifier Linear and Quadratic Discriminant Analysis and Friends Linear regression of an indicator

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis

MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,

More information

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

Week 5: Classification

Week 5: Classification Big Data BUS 41201 Week 5: Classification Veronika Ročková University of Chicago Booth School of Business http://faculty.chicagobooth.edu/veronika.rockova/ [5] Classification Parametric and non-parametric

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Data Analytics for Social Science

Data Analytics for Social Science Data Analytics for Social Science Johan A. Elkink School of Politics & International Relations University College Dublin 17 October 2017 Outline 1 2 3 4 5 6 Levels of measurement Discreet Continuous Nominal

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Naive Bayes & Introduction to Gaussians

Naive Bayes & Introduction to Gaussians Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

Generative Model (Naïve Bayes, LDA)

Generative Model (Naïve Bayes, LDA) Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Discriminant Analysis Documentation

Discriminant Analysis Documentation Discriminant Analysis Documentation Release 1 Tim Thatcher May 01, 2016 Contents 1 Installation 3 2 Theory 5 2.1 Linear Discriminant Analysis (LDA).................................. 5 2.2 Quadratic Discriminant

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data by Han Liu A thesis submitted in conformity with the requirements for the degree of Master of Science

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information