Lecture 9: Classification, LDA
|
|
- Randell Stevenson
- 6 years ago
- Views:
Transcription
1 Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, / 21
2 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we predict the response as in a Bayes classifier: ŷ 0 = argmax y ˆP (Y = y X = x0 ). 2 / 21
3 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 3 / 21
4 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 3 / 21
5 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 2. ˆP (Y ): How probable is each category? 3 / 21
6 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 2. ˆP (Y ): How probable is each category? Then, we use Bayes rule to obtain the estimate: ˆP (Y = k X = x) = ˆP (X = x Y = k) ˆP (Y = k) ˆP (X = x) 3 / 21
7 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X) directly, we will estimate: 1. ˆP (X Y ): Given the response, what is the distribution of the inputs? 2. ˆP (Y ): How probable is each category? Then, we use Bayes rule to obtain the estimate: ˆP (Y = k X = x) = ˆP (X = x Y = k) ˆP (Y = k) j ˆP (X = x Y = j) ˆP (Y = j) 3 / 21
8 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X), we compute estimates: 1. ˆP (X = x Y = k) = ˆfk (x), where each ˆf k (x) is a Multivariate Normal Distribution density: X X X X1 4 / 21
9 Linear Discriminant Analysis (LDA) Instead of estimating P (Y X), we compute estimates: 1. ˆP (X = x Y = k) = ˆfk (x), where each ˆf k (x) is a Multivariate Normal Distribution density: X X X X1 2. ˆP (Y = k) = ˆπk is estimated by the fraction of training samples of class k. 4 / 21
10 LDA has linear decision boundaries Suppose that: 5 / 21
11 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. 5 / 21
12 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. P (X = x Y = k) is Mutivariate Normal with density: f k (x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) 5 / 21
13 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. P (X = x Y = k) is Mutivariate Normal with density: f k (x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) µ k : Mean of the inputs for category k. Σ : Covariance matrix (common to all categories). 5 / 21
14 LDA has linear decision boundaries Suppose that: We know P (Y = k) = π k exactly. P (X = x Y = k) is Mutivariate Normal with density: f k (x) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) µ k : Mean of the inputs for category k. Σ : Covariance matrix (common to all categories). Then, what is the Bayes classifier? 5 / 21
15 LDA has linear decision boundaries By Bayes rule, the probability of category k, given the input x is: P (Y = k X = x) = likelihood prior marginal = f k(x)π k P (X = x) 6 / 21
16 LDA has linear decision boundaries By Bayes rule, the probability of category k, given the input x is: P (Y = k X = x) = likelihood prior marginal = f k(x)π k P (X = x) The denominator does not depend on the response k, so we can write it as a constant: P (Y = k X = x) = C f k (x)π k 6 / 21
17 LDA has linear decision boundaries By Bayes rule, the probability of category k, given the input x is: P (Y = k X = x) = likelihood prior marginal = f k(x)π k P (X = x) The denominator does not depend on the response k, so we can write it as a constant: P (Y = k X = x) = C f k (x)π k Plugging in the formula for f k (x): P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) 6 / 21
18 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) 7 / 21
19 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) 7 / 21
20 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) and take the logarithm of both sides: log P (Y = k X = x) = log C + log π k 1 2 (x µ k) T Σ 1 (x µ k ). 7 / 21
21 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) and take the logarithm of both sides: log P (Y = k X = x) = log C + log π k 1 2 (x µ k) T Σ 1 (x µ k ). This constant is the same for every category, k. 7 / 21
22 LDA has linear decision boundaries P (Y = k X = x) = Cπ k (2π) p/2 Σ 1/2 e 1 2 (x µ k) T Σ 1 (x µ k ) Now, let us absorb everything that does not depend on k into a constant C : P (Y = k X = x) = C π k e 1 2 (x µ k) T Σ 1 (x µ k ) and take the logarithm of both sides: log P (Y = k X = x) = log C + log π k 1 2 (x µ k) T Σ 1 (x µ k ). This constant is the same for every category, k. So we want to find the maximum of this over k. 7 / 21
23 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). 8 / 21
24 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). = log π k 1 2 [ x T Σ 1 x + µ T k Σ 1 µ k ] + x T Σ 1 µ k 8 / 21
25 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). = log π k 1 2 [ x T Σ 1 x + µ T k Σ 1 µ k ] + x T Σ 1 µ k =C + log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k 8 / 21
26 LDA has linear decision boundaries Goal, maximize the following over k: log π k 1 2 (x µ k) T Σ 1 (x µ k ). = log π k 1 2 [ x T Σ 1 x + µ T k Σ 1 µ k ] + x T Σ 1 µ k =C + log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k We define the objective: δ k (x) = log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k At an input x, we predict the response with the highest δ k (x). 8 / 21
27 LDA has linear decision boundaries What is the decision boundary? It is the set of points in which 2 classes are equally probable: δ k (x) = δ l (x) 9 / 21
28 LDA has linear decision boundaries What is the decision boundary? It is the set of points in which 2 classes are equally probable: δ k (x) = δ l (x) log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k = log π l 1 2 µt l Σ 1 µ l + x T Σ 1 µ l 9 / 21
29 LDA has linear decision boundaries What is the decision boundary? It is the set of points in which 2 classes are equally probable: δ k (x) = δ l (x) log π k 1 2 µt k Σ 1 µ k + x T Σ 1 µ k = log π l 1 2 µt l Σ 1 µ l + x T Σ 1 µ l This is a linear equation in x. X X X X1 9 / 21
30 Estimating π k ˆπ k = #{i ; y i = k} n In words, the fraction of training samples of class k. 10 / 21
31 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k x i 11 / 21
32 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k Estimate the common covariance matrix Σ: x i 11 / 21
33 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k Estimate the common covariance matrix Σ: One predictor (p = 1): x i ˆσ 2 = 1 n K K k=1 i ; y i =k (x i ˆµ k ) / 21
34 Estimating the parameters of f k (x) Estimate the the mean vector µ k for each class: ˆµ k = 1 #{i ; y i = k} i ; y i =k Estimate the common covariance matrix Σ: One predictor (p = 1): x i ˆσ 2 = 1 n K K k=1 i ; y i =k (x i ˆµ k ) 2. Many predictors (p > 1): Compute the vectors of deviations (x 1 ˆµ y1 ), (x 2 ˆµ y2 ),..., (x n ˆµ yn ) and use an unbiased estimate of its covariance matrix, Σ. 11 / 21
35 LDA prediction For an input x, predict the class with the largest: ˆδ k (x) = log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k 12 / 21
36 LDA prediction For an input x, predict the class with the largest: ˆδ k (x) = log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k The decision boundaries are defined by: log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k = log ˆπ l 1 2 ˆµT l ˆΣ 1ˆµ l + x T ˆΣ 1ˆµ l 12 / 21
37 LDA prediction For an input x, predict the class with the largest: ˆδ k (x) = log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k The decision boundaries are defined by: log ˆπ k 1 2 ˆµT k ˆΣ 1ˆµ k + x T ˆΣ 1ˆµ k = log ˆπ l 1 2 ˆµT l ˆΣ 1ˆµ l + x T ˆΣ 1ˆµ l These are the solid lines in the following image: X X X X1 12 / 21
38 Quadratic discriminant analysis (QDA) The assumption that the inputs of every class have the same covariance Σ can be quite restrictive: X X X 1 Boundaries for Bayes (dashed), LDA (dotted), and QDA (solid). X 1 13 / 21
39 Quadratic discriminant analysis (QDA) In quadratic discriminant analysis we estimate a mean ˆµ k and a covariance matrix ˆΣ k for each class separately. 14 / 21
40 Quadratic discriminant analysis (QDA) In quadratic discriminant analysis we estimate a mean ˆµ k and a covariance matrix ˆΣ k for each class separately. Given an input, it is easy to derive an objective function: δ k (x) = log π k 1 2 µt k Σ 1 k µ k + x T Σ 1 k µ k 1 2 xt Σ 1 k x 1 2 log Σ k 14 / 21
41 Quadratic discriminant analysis (QDA) In quadratic discriminant analysis we estimate a mean ˆµ k and a covariance matrix ˆΣ k for each class separately. Given an input, it is easy to derive an objective function: δ k (x) = log π k 1 2 µt k Σ 1 k µ k + x T Σ 1 k µ k 1 2 xt Σ 1 k x 1 2 log Σ k This objective is now quadratic in x and so are the decision boundaries. 14 / 21
42 Quadratic discriminant analysis (QDA) Bayes boundary ( ) LDA ( ) QDA ( ). X X X X 1 15 / 21
43 Evaluating a classification method We have talked about the 0-1 loss: 1 m m 1(y i ŷ i ). i=1 It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn t tell you anything about this. 16 / 21
44 Evaluating a classification method We have talked about the 0-1 loss: 1 m m 1(y i ŷ i ). i=1 It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn t tell you anything about this. A much more informative summary of the error is a confusion matrix: 16 / 21
45 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > / 21
46 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. 17 / 21
47 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. However, the error rate among people who do default (false negative rate) is 76%. 17 / 21
48 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. However, the error rate among people who do default (false negative rate) is 76%. False negatives may be a bigger source of concern! 17 / 21
49 Example. Predicting default Used LDA to predict credit card default in a dataset of 10K people. Predicted yes if P (default = yes X) > 0.5. The error rate among people who do not default (false positive rate) is very low. However, the error rate among people who do default (false negative rate) is 76%. False negatives may be a bigger source of concern! One possible solution: Change the threshold. 17 / 21
50 Example. Predicting default Changing the threshold to 0.2 makes it easier to classify to yes. Predicted yes if P (default = yes X) > / 21
51 Example. Predicting default Changing the threshold to 0.2 makes it easier to classify to yes. Predicted yes if P (default = yes X) > 0.2. Note that the rate of false positives became higher! That is the price to pay for fewer false negatives. 18 / 21
52 Example. Predicting default Let s visualize the dependence of the error on the threshold: Error Rate Threshold False negative rate (error for defaulting customers) False positive rate (error for non-defaulting customers) 0-1 loss or total error rate. 19 / 21
53 Example. The ROC curve True positive rate ROC Curve Displays the performance of the method for any choice of threshold False positive rate 20 / 21
54 Example. The ROC curve True positive rate ROC Curve False positive rate Displays the performance of the method for any choice of threshold. The area under the curve (AUC) measures the quality of the classifier: 0.5 is the AUC for a random classifier The closer AUC is to 1, the better. 20 / 21
55 Next time Comparison of logistic regression, LDA, QDA, and KNN classification. Start Chapter 5: Resampling. 21 / 21
Lecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationIntroduction to Data Science
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More informationProblem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56
STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLinear Decision Boundaries
Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationLinear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationHow to evaluate credit scorecards - and why using the Gini coefficient has cost you money
How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More informationSpring 2006: Linear Discriminant Analysis, Etc.
36-724 Spring 2006: Linear Discriminant Analysis, Etc. Brian Junker April 17, 2006 Review: The Bayes Classifier Linear and Quadratic Discriminant Analysis and Friends Linear regression of an indicator
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationMATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis
Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationRegularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution
Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained
More informationWeek 5: Classification
Big Data BUS 41201 Week 5: Classification Veronika Ročková University of Chicago Booth School of Business http://faculty.chicagobooth.edu/veronika.rockova/ [5] Classification Parametric and non-parametric
More informationSF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationData Analytics for Social Science
Data Analytics for Social Science Johan A. Elkink School of Politics & International Relations University College Dublin 17 October 2017 Outline 1 2 3 4 5 6 Levels of measurement Discreet Continuous Nominal
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationSupervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012
Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:
More informationIntroduction to Signal Detection and Classification. Phani Chavali
Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationCS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall
CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationNaive Bayes & Introduction to Gaussians
Naive Bayes & Introduction to Gaussians Andreas C. Kapourani 2 March 217 1 Naive Bayes classifier In the previous lab we illustrated how to use Bayes Theorem for pattern classification, which in practice
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationMachine Learning - MT Classification: Generative Models
Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationGenerative Model (Naïve Bayes, LDA)
Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationDiscriminant Analysis Documentation
Discriminant Analysis Documentation Release 1 Tim Thatcher May 01, 2016 Contents 1 Installation 3 2 Theory 5 2.1 Linear Discriminant Analysis (LDA).................................. 5 2.2 Quadratic Discriminant
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationDevelopment and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu
Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data by Han Liu A thesis submitted in conformity with the requirements for the degree of Master of Science
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More information