Bayesian Decision Theory

Size: px
Start display at page:

Download "Bayesian Decision Theory"

Transcription

1 Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46

2 Bayesian Decision Theory Bayesian Decision Theory is a fundamental statistical approach that quantifies the tradeoffs between various decisions using probabilities and costs that accompany such decisions. First, we will assume that all probabilities are known. Then, we will study the cases where the probabilistic structure is not completely known. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 2 / 46

3 Fish Sorting Example Revisited State of nature is a random variable. Define w as the type of fish we observe (state of nature, class) where w = w1 for sea bass, w = w2 for salmon. P (w1 ) is the a priori probability that the next fish is a sea bass. P (w2 ) is the a priori probability that the next fish is a salmon. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 3 / 46

4 Prior Probabilities Prior probabilities reflect our knowledge of how likely each type of fish will appear before we actually see it. How can we choose P (w 1 ) and P (w 2 )? Set P (w1 ) = P (w 2 ) if they are equiprobable (uniform priors). May use different values depending on the fishing area, time of the year, etc. Assume there are no other types of fish (exclusivity and exhaustivity). P (w 1 ) + P (w 2 ) = 1 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 4 / 46

5 Making a Decision How can we make a decision with only the prior information? Decide w 1 if P (w 1 ) > P (w 2 ) otherwise w 2 What is the probability of error for this decision? P (error) = min{p (w 1 ), P (w 2 )} CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 5 / 46

6 Class-Conditional Probabilities Let s try to improve the decision using the lightness measurement x. Let x be a continuous random variable. Define p(x w j ) as the class-conditional probability density (probability of x given that the state of nature is w j for j = 1, 2). p(x w 1 ) and p(x w 2 ) describe the difference in lightness between populations of sea bass and salmon. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 6 / 46

7 Class-Conditional Probabilities Figure 1: Hypothetical class-conditional probability density functions for two classes. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 7 / 46

8 Posterior Probabilities Suppose we know P (w j ) and p(x w j ) for j = 1, 2, and measure the lightness of a fish as the value x. Define P (w j x) as the a posteriori probability (probability of the state of nature being w j given the measurement of feature value x). We can use the Bayes formula to convert the prior probability to the posterior probability P (w j x) = p(x w j)p (w j ) p(x) where p(x) = 2 j=1 p(x w j)p (w j ). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 8 / 46

9 Making a Decision p(x w j ) is called the likelihood and p(x) is called the evidence. How can we make a decision after observing the value of x? w 1 if P (w 1 x) > P (w 2 x) Decide otherwise w 2 Rewriting the rule gives w 1 if p(x w 1) Decide otherwise w 2 > P (w 2) p(x w 2 ) P (w 1 ) Note that, at every x, P (w 1 x) + P (w 2 x) = 1. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 9 / 46

10 Probability of Error What is the probability of error for this decision? P (w 1 x) if we decide w 2 P (error x) = P (w 2 x) if we decide w 1 What is the average probability of error? P (error) = p(error, x) dx = P (error x) p(x) dx Bayes decision rule minimizes this error because P (error x) = min{p (w 1 x), P (w 2 x)}. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 10 / 46

11 Bayesian Decision Theory How can we generalize to more than one feature? replace the scalar x by the feature vector x more than two states of nature? just a difference in notation allowing actions other than just decisions? allow the possibility of rejection different risks in the decision? define how costly each action is CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 11 / 46

12 Bayesian Decision Theory Let {w 1,..., w c } be the finite set of c states of nature (classes, categories). Let {α 1,..., α a } be the finite set of a possible actions. Let λ(α i w j ) be the loss incurred for taking action α i when the state of nature is w j. Let x be the d-component vector-valued random variable called the feature vector. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 12 / 46

13 Bayesian Decision Theory p(x w j ) is the class-conditional probability density function. P (w j ) is the prior probability that nature is in state w j. The posterior probability can be computed as P (w j x) = p(x w j)p (w j ) p(x) where p(x) = c j=1 p(x w j)p (w j ). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 13 / 46

14 Conditional Risk Suppose we observe x and take action α i. If the true state of nature is w j, we incur the loss λ(α i w j ). The expected loss with taking action α i is R(α i x) = c λ(α i w j )P (w j x) j=1 which is also called the conditional risk. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 14 / 46

15 Minimum-Risk Classification The general decision rule α(x) tells us which action to take for observation x. We want to find the decision rule that minimizes the overall risk R = R(α(x) x) p(x) dx. Bayes decision rule minimizes the overall risk by selecting the action α i for which R(α i x) is minimum. The resulting minimum overall risk is called the Bayes risk and is the best performance that can be achieved. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 15 / 46

16 Two-Category Classification Define α1 : deciding w 1, α2 : deciding w 2, λij = λ(α i w j ). Conditional risks can be written as R(α 1 x) = λ 11 P (w 1 x) + λ 12 P (w 2 x), R(α 2 x) = λ 21 P (w 1 x) + λ 22 P (w 2 x). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 16 / 46

17 Two-Category Classification The minimum-risk decision rule becomes w 1 if (λ 21 λ 11 )P (w 1 x) > (λ 12 λ 22 )P (w 2 x) Decide otherwise w 2 This corresponds to deciding w 1 if p(x w 1 ) p(x w 2 ) > (λ 12 λ 22 ) (λ 21 λ 11 ) P (w 2 ) P (w 1 ) comparing the likelihood ratio to a threshold that is independent of the observation x. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 17 / 46

18 Minimum-Error-Rate Classification Actions are decisions on classes (α i is deciding w i ). If action α i is taken and the true state of nature is w j, then the decision is correct if i = j and in error if i j. We want to find a decision rule that minimizes the probability of error. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 18 / 46

19 Minimum-Error-Rate Classification Define the zero-one loss function 0 if i = j λ(α i w j ) = 1 if i j i, j = 1,..., c (all errors are equally costly). Conditional risk becomes c R(α i x) = λ(α i w j ) P (w j x) j=1 = P (w j x) j i = 1 P (w i x). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 19 / 46

20 Minimum-Error-Rate Classification Minimizing the risk requires maximizing P (w i x) and results in the minimum-error decision rule Decide w i if P (w i x) > P (w j x) j i. The resulting error is called the Bayes error and is the best performance that can be achieved. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 20 / 46

21 Minimum-Error-Rate Classification Figure 2: The likelihood ratio p(x w 1 )/p(x w 2 ). The threshold θ a is computed using the priors P (w 1 ) = 2/3 and P (w 2 ) = 1/3, and a zero-one loss function. If we penalize mistakes in classifying w 2 patterns as w 1 more than the converse, we should increase the threshold to θ b. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 21 / 46

22 Discriminant Functions A useful way of representing classifiers is through discriminant functions g i (x), i = 1,..., c, where the classifier assigns a feature vector x to class w i if g i (x) > g j (x) j i. For the classifier that minimizes conditional risk g i (x) = R(α i x). For the classifier that minimizes error g i (x) = P (w i x). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 22 / 46

23 Discriminant Functions These functions divide the feature space into c decision regions (R 1,..., R c ), separated by decision boundaries. Note that the results do not change even if we replace every g i (x) by f(g i (x)) where f( ) is a monotonically increasing function (e.g., logarithm). This may lead to significant analytical and computational simplifications. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 23 / 46

24 The Gaussian Density Gaussian can be considered as a model where the feature vectors for a given class are continuous-valued, randomly corrupted versions of a single typical or prototype vector. Some properties of the Gaussian: Analytically tractable. Completely specified by the 1st and 2nd moments. Has the maximum entropy of all distributions with a given mean and variance. Many processes are asymptotically Gaussian (Central Limit Theorem). Linear transformations of a Gaussian are also Gaussian. Uncorrelatedness implies independence. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 24 / 46

25 Univariate Gaussian For x R: p(x) = N(µ, σ 2 ) [ = 1 exp 1 ( ) ] 2 x µ 2πσ 2 σ where µ = E[x] = σ 2 = E[(x µ) 2 ] = x p(x) dx, (x µ) 2 p(x) dx. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 25 / 46

26 Univariate Gaussian Figure 3: A univariate Gaussian distribution has roughly 95% of its area in the range x µ 2σ. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 26 / 46

27 Multivariate Gaussian For x R d : p(x) = N(µ, Σ) [ 1 = exp 1 ] (2π) d/2 Σ 1/2 2 (x µ)t Σ 1 (x µ) where µ = E[x] = x p(x) dx, Σ = E[(x µ)(x µ) T ] = (x µ)(x µ) T p(x) dx. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 27 / 46

28 Multivariate Gaussian Figure 4: Samples drawn from a two-dimensional Gaussian lie in a cloud centered on the mean µ. The loci of points of constant density are the ellipses for which (x µ) T Σ 1 (x µ) is constant, where the eigenvectors of Σ determine the direction and the corresponding eigenvalues determine the length of the principal axes. The quantity r 2 = (x µ) T Σ 1 (x µ) is called the squared Mahalanobis distance from x to µ. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 28 / 46

29 Linear Transformations Recall that, given x R d, A R d k, y = A T x R k, if x N(µ, Σ), then y N(A T µ, A T ΣA). As a special case, the whitening transform A w = ΦΛ 1/2 where Φ is the matrix whose columns are the orthonormal eigenvectors of Σ, Λ is the diagonal matrix of the corresponding eigenvalues, gives a covariance matrix equal to the identity matrix I. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 29 / 46

30 Discriminant Functions for the Gaussian Density Discriminant functions for minimum-error-rate classification can be written as g i (x) = ln p(x w i ) + ln P (w i ). For p(x w i ) = N(µ i, Σ i ) g i (x)= 1 2 (x µ i) T Σ 1 i (x µ i ) d 2 ln2π 1 2 ln Σ i +lnp (w i ). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 30 / 46

31 Case 1: Σ i = σ 2 I Discriminant functions are g i (x) = w T i x + w i0 (linear discriminant) where w i = 1 σ 2 µ i w i0 = 1 2σ 2 µt i µ i + ln P (w i ) (w i0 is the threshold or bias for the i th category). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 31 / 46

32 Case 1: Σ i = σ 2 I Decision boundaries are the hyperplanes g i (x) = g j (x), and can be written as w T (x x 0 ) = 0 where w = µ i µ j x 0 = 1 2 (µ i + µ j ) σ 2 µ i µ j ln P (w i) 2 P (w j ) (µ i µ j ). Hyperplane separating R i and R j passes through the point x 0 and is orthogonal to the vector w. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 32 / 46

33 Case 1: Σ i = σ 2 I Figure 5: If the covariance matrices of two distributions are equal and proportional to the identity matrix, then the distributions are spherical in d dimensions, and the boundary is a generalized hyperplane of d 1 dimensions, perpendicular to the line separating the means. The decision boundary shifts as the priors are changed. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 33 / 46

34 Case 1: Σ i = σ 2 I Special case when P (w i ) are the same for i = 1,..., c is the minimum-distance classifier that uses the decision rule assign x to w i where i = arg min i=1,...,c x µ i. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 34 / 46

35 Case 2: Σ i = Σ Discriminant functions are g i (x) = w T i x + w i0 (linear discriminant) where w i = Σ 1 µ i w i0 = 1 2 µt i Σ 1 µ i + ln P (w i ). CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 35 / 46

36 Case 2: Σ i = Σ Decision boundaries can be written as where w = Σ 1 (µ i µ j ) x 0 = 1 2 (µ i + µ j ) w T (x x 0 ) = 0 ln(p (w i )/P (w j )) (µ i µ j ) T Σ 1 (µ i µ j ) (µ i µ j ). Hyperplane passes through x 0 but is not necessarily orthogonal to the line between the means. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 36 / 46

37 Case 2: Σ i = Σ Figure 6: Probability densities with equal but asymmetric Gaussian distributions. The decision hyperplanes are not necessarily perpendicular to the line connecting the means. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 37 / 46

38 Case 3: Σ i = arbitrary Discriminant functions are g i (x) = x T W i x + w T i x + w i0 (quadratic discriminant) where W i = 1 2 Σ 1 i w i = Σ 1 i µ i w i0 = 1 2 µt i Σ 1 i µ i 1 2 ln Σ i + ln P (w i ). Decision boundaries are hyperquadrics. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 38 / 46

39 Case 3: Σ i = arbitrary Figure 7: Arbitrary Gaussian distributions lead to Bayes decision boundaries that are general hyperquadrics. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 39 / 46

40 Case 3: Σ i = arbitrary Figure 8: Arbitrary Gaussian distributions lead to Bayes decision boundaries that are general hyperquadrics. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 40 / 46

41 Error Probabilities and Integrals For the two-category case P (error) = P (x R 2, w 1 ) + P (x R 1, w 2 ) = P (x R 2 w 1 )P (w 1 ) + P (x R 1 w 2 )P (w 2 ) = p(x w 1 ) P (w 1 ) dx + p(x w 2 ) P (w 2 ) dx. R 2 R 1 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 41 / 46

42 Error Probabilities and Integrals For the multicategory case P (error) = 1 P (correct) c = 1 P (x R i, w i ) = 1 = 1 i=1 c P (x R i w i )P (w i ) i=1 c i=1 R i p(x w i ) P (w i ) dx. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 42 / 46

43 Error Probabilities and Integrals Figure 9: Components of the probability of error for equal priors and the non-optimal decision point x. The optimal point x B minimizes the total shaded area and gives the Bayes error rate. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 43 / 46

44 Receiver Operating Characteristics Consider the two-category case and define w1 : target is present, w2 : target is not present. Table 1: Confusion matrix. Assigned w 1 w 2 True w 1 correct detection mis-detection w 2 false alarm correct rejection Mis-detection is also called false negative or Type II error. False alarm is also called false positive or Type I error. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 44 / 46

45 Receiver Operating Characteristics If we use a parameter (e.g., a threshold) in our decision, the plot of these rates for different values of the parameter is called the receiver operating characteristic (ROC) curve. Figure 10: Example receiver operating characteristic (ROC) curves for different settings of the system. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 45 / 46

46 Summary To minimize the overall risk, choose the action that minimizes the conditional risk R(α x). To minimize the probability of error, choose the class that maximizes the posterior probability P (w j x). If there are different penalties for misclassifying patterns from different classes, the posteriors must be weighted according to such penalties before taking action. Do not forget that these decisions are the optimal ones under the assumption that the true values of the probabilities are known. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 46 / 46

Contents 2 Bayesian decision theory

Contents 2 Bayesian decision theory Contents Bayesian decision theory 3. Introduction... 3. Bayesian Decision Theory Continuous Features... 7.. Two-Category Classification... 8.3 Minimum-Error-Rate Classification... 9.3. *Minimax Criterion....3.

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

Bayesian Decision Theory Lecture 2

Bayesian Decision Theory Lecture 2 Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors

More information

p(x ω i 0.4 ω 2 ω

p(x ω i 0.4 ω 2 ω p(x ω i ).4 ω.3.. 9 3 4 5 x FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value x given the pattern is in category

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3 Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion

More information

p(x ω i 0.4 ω 2 ω

p(x ω i 0.4 ω 2 ω p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Minimum Error-Rate Discriminant

Minimum Error-Rate Discriminant Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Lecture Notes on the Gaussian Distribution

Lecture Notes on the Gaussian Distribution Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3 CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Bayes Rule for Minimizing Risk

Bayes Rule for Minimizing Risk Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Example - basketball players and jockeys. We will keep practical applicability in mind:

Example - basketball players and jockeys. We will keep practical applicability in mind: Sonka: Pattern Recognition Class 1 INTRODUCTION Pattern Recognition (PR) Statistical PR Syntactic PR Fuzzy logic PR Neural PR Example - basketball players and jockeys We will keep practical applicability

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

CS340 Machine learning Gaussian classifiers

CS340 Machine learning Gaussian classifiers CS340 Machine learning Gaussian classifiers 1 Correlated features Height and weight are not independent 2 Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Image Processing 1 (IP1) Bildverarbeitung 1

Image Processing 1 (IP1) Bildverarbeitung 1 MIN-Fakultät Fachbereich Informatik Arbeitsbereich SAV/BV (KOGS) Image Processing 1 (IP1) Bildverarbeitung 1 Lecture 16 Decision Theory Winter Semester 014/15 Dr. Benjamin Seppke Prof. Siegfried SKehl

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Part 2 Elements of Bayesian Decision Theory

Part 2 Elements of Bayesian Decision Theory Part 2 Elements of Bayesian Decision Theory Machine Learning, Part 2, March 2017 Fabio Roli 1 Introduction ØStatistical pattern classification is grounded into Bayesian decision theory, therefore, knowing

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information