Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Similar documents
Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Bayesian Decision Theory

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Bayesian Decision Theory

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

44 CHAPTER 2. BAYESIAN DECISION THEORY

Introduction to Statistical Inference

Bayesian Decision and Bayesian Learning

5. Discriminant analysis

Pattern Recognition. Parameter Estimation of Probability Density Functions

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Dimensionality reduction

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning Lecture 2

Discriminant analysis and supervised classification

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Machine Learning Lecture 2

Feature selection and extraction Spectral domain quality estimation Alternatives

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Lecture Notes on the Gaussian Distribution

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Detection theory 101 ELEC-E5410 Signal Processing for Communications

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

The Bayes classifier

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Naive Bayes classification

CSC411 Fall 2018 Homework 5

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Dimensionality Reduction and Principal Components

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Naïve Bayes classification

STA 414/2104, Spring 2014, Practice Problem Set #1

Introduction to Signal Detection and Classification. Phani Chavali

Dimensionality Reduction and Principle Components

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Multivariate statistical methods and data mining in particle physics

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Final Exam. 1 True or False (15 Points)

Intelligent Systems Statistical Machine Learning

Minimum Error Rate Classification

6.867 Machine Learning

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

ECE521 week 3: 23/26 January 2017

Bayesian Methods: Naïve Bayes

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Machine Learning Lecture 2

CS 195-5: Machine Learning Problem Set 1

Intelligent Systems Statistical Machine Learning

Probability Theory for Machine Learning. Chris Cremer September 2015

MACHINE LEARNING ADVANCED MACHINE LEARNING

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning (CS 567) Lecture 5

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Bayesian Decision Theory Lecture 2

Bayes Decision Theory

CMU-Q Lecture 24:

Performance Evaluation

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Motivating the Covariance Matrix

Statistical methods in recognition. Why is classification a problem?

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Bayesian Learning. Bayesian Learning Criteria

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

L5: Quadratic classifiers

01 Probability Theory and Statistics Review

Metric-based classifiers. Nuno Vasconcelos UCSD

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Approximate Message Passing

Introduction to Machine Learning

CSC2515 Assignment #2

L11: Pattern recognition principles

Linear Classification: Probabilistic Generative Models

Lecture 3: Pattern Classification

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Machine Learning

Review (Probability & Linear Algebra)

Bayesian Decision Theory

6.867 Machine Learning

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Lecture Note 1: Probability Theory and Statistics

Lecture 8: Signal Detection and Noise Assumption

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Machine Learning for Signal Processing Bayes Classification

Gaussian Quiz. Preamble to The Humble Gaussian Distribution. David MacKay 1

Lecture 1: Bayesian Framework Basics

What does Bayes theorem give us? Lets revisit the ball in the box example.

Minimum Error-Rate Discriminant

EE 302 Division 1. Homework 6 Solutions.

6.867 Machine Learning

Transcription:

Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a reasonable level of consistency and simplicity we ask that you do not use other software tools.] If you collaborated with other members of the class, please write their names at the end of the assignment. Moreover, you will need to write and sign the following statement: In preparing my solutions, I did not look at any old homeworks, copy anybody s answers or let them copy mine. Problem 1: [10 Points] In many pattern classification problems we have the option to either assign a pattern to one of c classes, or reject it as unrecognizable - if the cost to reject is not too high. Let the cost of classification be defined as: 0 if ω i = ω j, (i.e. Correct Classification) λ(ω i ω j ) = λ r if ω i = ω 0, (i.e. Rejection) Otherwise, (i.e. Substitution Error) λ s Show that for the minimum risk classification, the decision rule should associate a test vector x with class ω i, if P(ω i x ) P(ω j x ) for all j and and P(ω i x ) 1 - λr λ s, and reject otherwise. Problem 2: In a particular binary hypothesis testing application, the conditional density for a scalar feature x given class w 1 is p(x w 1 ) = k 1 exp( x 2 /20) 1

Given class w 2 the conditional density is p(x w 2 ) = k 2 exp( (x 6) 2 /12) a. Find k 1 and k 2, and plot the two densities on a single graph using Matlab. b. Assume that the prior probabilities of the two classes are equal, and that the cost for choosing correctly is zero. If the costs for choosing incorrectly are C 12 = 3 and C 21 = 5 (where C ij corresponds to predicting class i when it belongs to class j), what is the expression for the conditional risk? c. Find the decision regions which minimize the Bayes risk, and indicate them on the plot you made in part (a) d. For the decision regions in part (c), what is the numerical value of the Bayes risk? Problem 3: Let s consider a simple communication system. The transmitter sends out messages m = 0 or m = 1, occurring with a priori probabilities 3 and 1 4 4 respectively. The message is contaminated by a noise n, which is independent from m and takes on the values -1, 0, 1 with probabilities 1, 5, and 2 8 8 8 respectively. The received signal, or the observation, can be represented as r = m + n. From r, we wish to infer what the transmitted message m was (estimated state), denoted using ˆm. ˆm also takes values on 0 or 1. When m = ˆm, the detector correctly receives the original message, otherwise an error occurs. Figure 1: A simple receiver 2

a. Find the decision rule that achieves the maximum probability of correct decision. Compute the probability of error for this decision rule. b. Let s have the noise n be a continuous random variable, uniformly distributed between 3 and 2, and still statistically independent of m. 4 4 First, plot the pdf of n. Then, find a decision rule that achieves the minimum probability of error, and compute the probability of error. Problem 4: [Note: Use Matlab for the computations, but make sure to explicitly construct every transformation required, that is either type it or write it. Do not use Matlab if you are asked to explain/show something.] Consider the three-dimensional normal distribution p(x w) with mean µ and covariance matrix Σ where µ = 3 1 2, Σ = 1 0 0 0 5 4 0 4 5 Compute the matrices representing the eigenvectors and eigenvalues Φ and Λ to answer the following questions:. a. Find the probability density at the point x 0 = (5 6 3) T b. Construct an orthonormal transformation y = Φ T x. Show that for orthonormal transformations, Euclidean distances are preserved (i.e., y 2 = x 2 ). c. After applying the orthonormal transformation add another transformation Λ 1/2 and convert the distribution to one centered on the origin with covariance matrix equal to the identity matrix. Remember that A w = ΦΛ 1/2 is a linear transformation (i.e., A w (ax + by) = aa w x + ba w y) d. Apply the same overall transformation to x 0 to yield a transformed point x w e. Calculate the Mahalanobis distance from x 0 to the mean µ and from x w to 0. Are they different or are they the same? Why? 3

Problem 5: [16 Points] Use signal detection theory as well as the notation and basic Gaussian assumptions described in the text to address the following. a. Prove that P (x > x x w 2 ) and P (x > x x w 1 ), taken together, uniquely determine the discriminability d b. Use error functions erf( ) to express d in terms of the hit and false alarm rates. Estimate d if P (x > x x w 1 ) =.7 and P (x > x x w 2 ) =.5. Repeat for d if P (x > x x w 1 ) =.9 and P (x > x x w 2 ) =.15. c. Given that the Gaussian assumption is valid, calculate the Bayes error for both the cases in (b). d. Using a trivial one-line computation or a graph determine which case has the higher d, and explain your logic: Case A: P (x > x x w 1 ) =.75 and P (x > x x w 2 ) =.35. Case B: P (x > x x w 1 ) =.8 and P (x > x x w 2 ) =.25. Problem 6: [16 Points] a. Show that the maximum likelihood (ML) estimation of the mean for a Gaussian is unbiased but the ML estimate of variance is biased (i.e., slightly wrong). Show how to correct this variance estimate so that it is unbiased. b. For this part you ll write a program with Matlab to explore the biased and unbiased ML estimations of variance for a Gaussian distribution. Find the data for this problem on the class webpage as ps2.mat. This file contains n=5000 samples from a 1-dimensional Gaussian distribution. Problem 7: [10 Points] Suppose x and y are random variables. Their joint density, depicted below, is constant in the shaded area and 0 elsewhere, 4

Figure 2: The joint distribution of x and y. a. Let ω 1 be the case when x 0, and ω 2 be the case when x > 0. Determine the a priori probabilities of the two classes P(ω 1 ) and P(ω 2 ). Let y be the observation from which we infer whether ω 1 or ω 2 happens. Find the likelihood functions, namely, the two conditional distributions p(y ω 1 ) and p(y ω 2 ). b. Find the decision rule that minimizes the probability of error, and calculate what the probability of error is. Please note that there will be ambiguities at decision boundaries, but how you classify when y falls on the decision boundary doesn t affect the probability of error. 5