Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Similar documents
Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Bayesian Decision Theory

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

44 CHAPTER 2. BAYESIAN DECISION THEORY

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Pattern Recognition. Parameter Estimation of Probability Density Functions

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning Lecture 2

Bayesian Decision Theory

Lecture 1: Bayesian Framework Basics

CMU-Q Lecture 24:

p(x ω i 0.4 ω 2 ω

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Machine Learning Lecture 2

What does Bayes theorem give us? Lets revisit the ball in the box example.

Minimum Error-Rate Discriminant

Naïve Bayes classification

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Statistical methods in recognition. Why is classification a problem?

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Decision Theory

Bayesian Methods: Naïve Bayes

Bayesian Decision Theory

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

CSE555: Introduction to Pattern Recognition Midterm Exam Solution (100 points, Closed book/notes)

Introduction to Probability and Statistics (Continued)

Introduction to Statistical Inference

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

p(x ω i 0.4 ω 2 ω

Machine Learning Lecture 2

Intelligent Systems Statistical Machine Learning

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

6.867 Machine Learning

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Notes on the Multivariate Normal and Related Topics

6.867 Machine Learning

7 Gaussian Discriminant Analysis (including QDA and LDA)

10-701/ Machine Learning, Fall

6.867 Machine Learning

Bayesian decision making

Pattern Classification

Bayesian Decision and Bayesian Learning

Intelligent Systems Statistical Machine Learning

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Bayesian Decision Theory Lecture 2

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Bayes Decision Theory

MAT 271E Probability and Statistics

Introduction to Signal Detection and Classification. Phani Chavali

MAT 271E Probability and Statistics

BAYESIAN DECISION THEORY

Lecture : Probabilistic Machine Learning

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

Notes on Discriminant Functions and Optimal Classification

Expect Values and Probability Density Functions

Introduction to Machine Learning

Curve Fitting Re-visited, Bishop1.2.5

The Naïve Bayes Classifier. Machine Learning Fall 2017

Machine Learning. Instructor: Pranjal Awasthi

CSC2515 Assignment #2

Bayesian Learning. Bayesian Learning Criteria

Recursive Estimation

Bayesian Decision Theory

Bayesian Learning (II)

Probability Theory for Machine Learning. Chris Cremer September 2015

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Machine Learning. Probability Theory. WS2013/2014 Dmitrij Schlesinger, Carsten Rother

Generative v. Discriminative classifiers Intuition

Stephen Scott.

Week 5: Logistic Regression & Neural Networks

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Part 2 Elements of Bayesian Decision Theory

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

An Introduction to Statistical and Probabilistic Linear Models

Covariance. if X, Y are independent

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Machine Learning 4771

Generative v. Discriminative classifiers Intuition

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 195-5: Machine Learning Problem Set 1

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Transcription:

Problem Set MAS 6J/1.16J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a reasonable level of consistency and simplicity we ask that you do not use other software tools.] If you collaborated with other members of the class, please write their names at the end of the assignment. Moreover, you will need to write and sign the following statement: In preparing my solutions, I did not look at any old homeworks, copy anybody s answers or let them copy mine. Problem 1: [10 Points] In many pattern classification problems we have the option to either assign a pattern to one of c classes, or reject it as unrecognizable - if the cost to reject is not too high. Let the cost of classification be defined as: 0 if ω i = ω j, i.e. Correct Classification λω i ω j = λ r if ω i = ω 0, i.e. Rejection Otherwise, i.e. Substitution Error λ s Show that for the minimum risk classification, the decision rule should associate a test vector x with class ω i, if Pω i x Pω j x for all j and and Pω i x 1 - λr λ s, and reject otherwise. Solution: 1

Average risk is choosing class ω i : c Rω i x = λω i ω j P ω j x = 0.P ω i x + j=1 c j=1,j!=i λ s P ω j x where λω i ω j is used to mean the cost of choosing class ω i where the true class is ωj. Hence: Rω i x = λ s 1 P ω i x Associate x with the class ω i if highest posterior class probability and the average risk is less than the cost of rejection: λ s 1 P ω i x λ r P ω i x 1 λ r /λ s Problem : [16 Points] Use signal detection theory as well as the notation and basic Gaussian assumptions described in the text to address the following. a. Prove that Px > x x w and Px > x x w 1, taken together, uniquely determine the discriminability d Let x = x and x = x. Based on the Gaussian assumption, we see that P x > x x w i = P > x µ i x w i N0, 1 for i = 1, x µ i x µ Thus, we know σ from the hit rate P x > x x µ x w and 1 σ 1 from the false alarm rate P x > x x w 1, and these let us calculate the discriminability. x µ 1 x µ σ 1 σ = x µ 1 x µ = µ µ 1 σ = µ µ 1 σ = d

Therefore d is uniquely determined by the hit and false alarm rates. b. Use error functions erf to express d in terms of the hit and false alarm rates. Estimate d if Px > x x w 1 =.7 and Px > x x w =.5. Repeat for d if Px > x x w 1 =.85 and Px > x x w =.15. There are a couple of ways in which you can proceed. This document will detail one of them. Let y be a random variable such that P y > y y w i N0, 1 for i = 1,. 1 y π 0 e t 1 1 y dt = e t dt π y ; du = dt let u = t = 1 y 1 π = 1 y π 0 = 1 y erf y e u du e u du Then, given that P x > x x w i = P N0, 1 for i = 1,. x µ i > x µ i x w i 3

x µi P > x µ i x w i x µi = 1 P σ i 1 1 = + 1erf 1 1 = 1erf 1 + 1erf µ i x 1 1 erf = 1 1 + erf < x µ i x w i x σi µ i 1 1erf µ i x σi x σi µ i σi x µ i σi µ i x σi > 0 < 0 > 0 < 0 > 0 < 0 Therefore: x µ i = erf 1 [1 P x > x x w i ] if P x > x x w i <.5 µ i x = erf 1 [P x > x x w i 1] if P x > x x w i >.5 Then to find the values of erf 1 use a table of erf, a table of the cumulative normal distribution, or use Matlab erf and erf inv functions. d 1 = µ 1 x x µ = erf 1 [.7 1] erf 1 [.5 1] = erf 1 [.4] erf 1 [0] =.?? 4

d = µ 1 x x µ = erf 1 [.85 1] erf 1 [1.15] = erf 1 [.7] erf 1 [.7] = 0 c. Given that the Gaussian assumption is valid, calculate the Bayes error for both the cases in b. According to Problem 3 Pr[error] = Pw 1 ɛ 1 + Pw ɛ where ɛ 1 = px w 1 dx and ɛ = χ px w dx χ 1 Since the regions χ 1 and χ are defined by our decision boundary x. We can see that ɛ 1 = px w 1 dx = P x < x x w 1 = 1 P x > x x w 1 χ Similarly ɛ = px w dx = P x > x x w χ 1 Therefore, Pr[error1] = 1.7P w 1 +.5P w Pr[error] = 1.85P w 1 +.15P w And if the priors are equally likely, Pr[error1] =.5 Pr[error] =.15 5

d. Using a trivial one-line computation or a graph determine which case has the higher d, and explain your logic: Case A: Px > x x w 1 =.75 and Px > x x w =.45. Case B: Px > x x w 1 =. and Px > x x w =.9. From part b we can calculate the discriminability given the hit and false alarm rates, and see that Case B has a higher discriminability. d 1 = µ 1 x x µ = erf 1 [.7 1] erf 1 [1.4] = erf 1 [.4] erf 1 [.] =.777 d = x µ 1 + µ x = erf 1 [1.] + erf 1 [.8 1] = erf 1 [.4] + erf 1 [.6] = 1.366 Key point of this problem: The error function erf is related to the cumulative normal distribution. Assuming that the probability of x belonging to one of two classes is Gaussian, then knowing the values of the false alarm and hit rates for an arbitrary x is enough to compute the discriminability d. Moreover, if the Gaussian assumption holds, a determination of the discriminability allows us to calculate the Bayes error rate. 6

Problem 3: [16 Points] a. Show that the maximum likelihood ML estimation of the mean for a Gaussian is unbiased but the ML estimate of variance is biased i.e., slightly wrong. Show how to correct this variance estimate so that it is unbiased. b. For this part you ll write a program with Matlab to explore the biased and unbiased ML estimations of variance for a Gaussian distribution. Find the data for this problem on the class webpage as ps.dat. This file contains n=5000 samples from a 1-dimensional Gaussian distribution. a Write a program to calculate the ML estimate of the mean, and report the output. b Write a program to calculate both the biased and unbiased ML estimate of the variance of this distribution. For n=1 to 5000, plot the biased and unbiased estimates of the variance of this Gaussian. This is as if you are being given these samples sequentially, and each time you get a new sample you are asked to re-evaluate your estimate of the variance. Give some interpretation of your plot. Problem 4: [16 Points] Suppose x and y are random variables. Their joint density, depicted below, is constant in the shaded area and 0 elsewhere, a. Let ω 1 be the case when x 0, and ω be the case when x > 0. Determine the a priori probabilities of the two classes Pω 1 and Pω. y is the observation. from which we infer whether ω 1 or ω happens. Find the likelihood functions, namely, the two conditional distributions py ω 1 and py ω. b. Find the decision rule that minimizes the probability of error, and calculate what the probability of error is. Please note that there will be ambiguities at decision boundaries, but how you classify when y falls on the decision boundary doesn t affect the probability of error. 7

Figure 1: The joint distribution of x and y. 8