Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Similar documents
Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Regularized Discriminant Analysis and Reduced-Rank LDA

Cheng Soon Ong & Christian Walder. Canberra February June 2018

LDA, QDA, Naive Bayes

Introduction to Machine Learning

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

CMSC858P Supervised Learning Methods

Linear Models for Classification

Lecture 5: LDA and Logistic Regression

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Classification techniques focus on Discriminant Analysis

Machine Learning Linear Classification. Prof. Matteo Matteucci

Classification: Linear Discriminant Analysis

Classification 2: Linear discriminant analysis (continued); logistic regression

Introduction to Machine Learning

Discriminant analysis and supervised classification

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Computation. For QDA we need to calculate: Lets first consider the case that

Linear Classification: Probabilistic Generative Models

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

LEC 4: Discriminant Analysis for Classification

Lecture 9: Classification, LDA

Gaussian Models

Lecture 9: Classification, LDA

5. Discriminant analysis

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Bayesian Decision Theory

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Linear Methods for Prediction

STAT 730 Chapter 1 Background

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Introduction to Machine Learning Spring 2018 Note 18

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Lecture 9: Classification, LDA

1. Introduction to Multivariate Analysis

6.867 Machine Learning

Introduction to Graphical Models

Bayesian Decision Theory

Discriminant Analysis Documentation

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Introduction to Data Science

Bayesian Decision and Bayesian Learning

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Machine Learning 2017

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

From dummy regression to prior probabilities in PLS-DA

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

The Bayes classifier

Course Outline MODEL INFORMATION. Bayes Decision Theory. Unsupervised Learning. Supervised Learning. Parametric Approach. Nonparametric Approach

SGN (4 cr) Chapter 5

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Support Vector Machines

Motivating the Covariance Matrix

Bayes Decision Theory

Statistical Machine Learning Hilary Term 2018

Applied Multivariate and Longitudinal Data Analysis

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

An Introduction to Multivariate Methods

A Study of Relative Efficiency and Robustness of Classification Methods

Introduction to Machine Learning

Classification 1: Linear regression of indicators, linear discriminant analysis

CSC 411: Lecture 04: Logistic Regression

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear Methods for Prediction

Bayesian Classification Methods

Ch 4. Linear Models for Classification

PCA and LDA. Man-Wai MAK

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Lecture 5: Classification

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

L11: Pattern recognition principles

Data Mining and Analysis: Fundamental Concepts and Algorithms

Chemometrics: Classification of spectra

Classifier Complexity and Support Vector Classifiers

A Statistical Analysis of Fukunaga Koontz Transform

Does Modeling Lead to More Accurate Classification?

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

CMU-Q Lecture 24:

High Dimensional Discriminant Analysis

Introduction to Machine Learning

Classification. Chapter Introduction. 6.2 The Bayes classifier

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Machine Learning Lecture 2

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Transcription:

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control: Testset and Crossvalidation Case study: Text recognition 1

Conditional Probability Sample space T: Med. Test positive T (Marginal) Probability: C: Patient has cancer C P(T), P(C) New sample space: People with cancer P(T C) large Conditional Probability: P(T C), P(C T) Bayes Theorem: New sample space: People with pos. test P(C T) small posterior P(CjT) = P (TjC)P (C) P (T ) Class conditional probability prior 2

One approach to supervised learning P(CjX) = P (C)P (XjC) P (X)» P(C)P(XjC) Find some estimate Prior / prevalence: Fraction of samples in that class Assume: XjC» N(¹ c ; c ) Bayes rule: Choose class where P(C X) is maximal (rule is optimal if all types of error are equally costly) Special case: Two classes (0/1) - choose c=1 if P(C=1 X) > 0.5 or - choose c=1 if posterior odds P(C=1 X)/P(C=0 X) > 1 In Practice: Estimate P C, μ C, Σ C 3

QDA: Doing the math P C X ~ P C P(X C) Use the fact: max P C X max(log P C X ) δ c x = log P C X = log P C + log P X C = p 1 exp 1 (2¼) d j C j 2 (x ¹ c) T 1 C (x ¹ c) = log P C 1 2 log Σ C 1 2 x μ C T Σ C 1 x μ C + c Prior Additional term Sq. Mahalanobis distance Choose class where δ c x is maximal Special case: Two classes Decision boundary: Values of x where δ 0 x = δ 1 (x) is quadratic in x Quadratic Discriminant Analysis (QDA) 4

Simplification Assume same covariance matrix in all classes, i.e. X C ~ N(μ c, Σ) δ c x = log P C 1 2 log Σ 1 2 x μ C T Σ 1 x μ C + c = Prior = log P C 1 x μ 2 C T Σ 1 x μ C + d = (= log P C + x T Σ 1 μ C 1 μ 2 C T Σ 1 μ C ) Fix for all classes Sq. Mahalanobis distance Decision boundary is linear in x Linear Discriminant Analysis (LDA) 1 Classify to which class (assume equal prior)? 0 Physical distance in space is equal Classify to class 0, since Mahal. Dist. is smaller 5

LDA vs. QDA + Only few parameters to estimate; accurate estimates - Inflexible (linear decision boundary) - Many parameters to estimate; less accurate + More flexible (quadratic decision boundary) 6

Fisher s Discriminant Analysis: Idea Find direction(s) in which groups are separated best 1. Principal Component 1. Linear Discriminant = 1. Canonical Variable Class Y, predictors X = X 1,, X d U = w T X Find w so that groups are separated along U best Measure of separation: Rayleigh coefficient D(U) J w = Var(U) where D U = E U Y = 0 E U Y = 1 2 E X Y = j = μ j, Var X Y = j = Σ E U Y = j = w T μ j, V U = w T Σw Concept extendable to many groups D(U) J w large D(U) J w small Var(U) Var(U) 7

LDA and Linear Discriminants - Direction with largest J(w): 1. Linear Discriminant (LD 1) - orthogonal to LD1, again largest J(w): LD 2 - etc. At most: min(nmb. dimensions, Nmb. Groups -1) LD s e.g.: 3 groups in 10 dimensions need 2 LD s Computed using Eigenvalue Decomposition or Singular Value Decomposition Proportion of trace: Captured % of variance between group means for each LD R: Function «lda» in package MASS does LDA and computes linear discriminants (also «qda» available) 8

Example: Classification of Iris flowers Iris setosa Iris versicolor Iris virginica Classify according to sepal/petal length/width 9

Quality of classification Use training data also as test data: Overfitting Too optimistic for error on new data Separate test data Training Test Cross validation (CV; e.g. leave-one-out cross validation): Every row is the test case once, the rest in the training data 10

Measures for prediction error Confusion matrix (e.g. 100 samples) Truth = 0 Truth = 1 Truth = 2 Estimate = 0 23 7 6 Estimate = 1 3 27 4 Estimate = 2 3 1 26 Error rate: 1 sum(diagonal entries) / (number of samples) = = 1 76/100 = 0.24 We expect that our classifier predicts 24% of new observations incorrectly (this is just a rough estimate) 11

Example: Digit recognition 7129 hand-written digits Each (centered) digit was put in a 16*16 grid Measure grey value in each part of the grid, i.e. 256 grey values Sample of digits Example with 8*8 grid 12

Concepts to know Idea of LDA / QDA Meaning of Linear Discriminants Cross Validation Confusion matrix, error rate 13

R functions to know lda 14