Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
|
|
- Bruce Logan
- 5 years ago
- Views:
Transcription
1 Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology Bayesian decision theory is a fundamental statistical approach to the problem of pattern classification. This approach is based on quantifying the tradeoffs between various classification decisions and their costs. The problem needs to be posed in probabilistic terms and all associated probabilities need to be completely known. The theory is just a formalization of some common sense procedures; However, it gives a solid foundation to which various pattern classification methods can build on Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.1/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.2/50 The fish example: Terminology The fish example continued Separate between two kinds of fish: Sea bass and salmon. No other kinds of fish are possible. ω is true class, it is a random variable (RV). Two classes are possible: ω = ω 1 for sea-bass and ω = ω 2 for salmon. If the sea contains more sea-basses than salmons, it is natural to assume (even when no data or features are available), that the caught fish is a sea-bass. This is modeled with the prior probabilities, P (ω 1 ) and P (ω 2 ), which are positive and sum to one. If there are more sea basses, P (ω 1 ) > P (ω 2 ). If we must decide at this point (for some curious reason) which fish we have, how would we decide? Because there are more sea-basses, we would say that the fish is a sea-bass. In other words, our decision rule becomes: Decide ω 1 if P (ω 1 ) > P (ω 2 ) and otherwise decide ω 2. To develop better rules, we must extract some information or features from the data. This means, for instance, making lightness measurements about the fish Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.3/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.4/50
2 The fish example continued The fish example continued Suppose we have a lightness reading, say x, from the fish. What to do next? We know every probability relevant to the classification problem. Particularly, we know P (ω 1 ), P (ω 2 ) and the class conditional probability densities p(x ω 1 ) and p(x ω 2 ). Based on these we can compute the probability that the class ω = ω 1 given that the lightness reading is x and similarly for salmon. Just use Bayes formula: The decision rule: Decide ω 1 if P (ω 1 x) > P (ω 2 x) and otherwise decide ω 2. Remember that we do not know (directly) P (ω j x). They must be computed through the Bayes rule: P (ω j x) = p(x ω j)p (ω j ), p(x) P (ω j x) = p(x ω j)p (ω j ), p(x) where the evidence p(x) = 2 j=1 p(x ω j)p (ω j ) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.5/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.6/50 The fish example continued The fish example continued The justification for the rule: P (error x) = P (ω 1 x) if we decide ω 2. P (error x) = P (ω 2 x) if we decide ω 1. Average error P (error) = P (error, x)dx = P (error x)p(x)dx. Thus, if P (error x) is minimal for every x, also the average error is minimized. The decision rule Decide ω 1 if P (ω 1 x) > P (ω 2 x) and otherwise decide ω 2. guarantees just that. An equivalent decision rule is obtained by multiplying P (ω j x) in the previous rule by p(x). Because p(x) is a constant this obviously does not affect the decision itself. We have a rule: Decide ω 1 if p(x ω 1 )P (ω 1 ) > p(x ω 2 )P (ω 2 ), otherwise decide ω Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.7/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.8/50
3 BDT - features BDT - classes The purpose is to decide for an action based on the sensed object with the measured feature vector x. Each object to be classified has a corresponding feature vector and we identify the feature vector x with the object to be classified. The set of all possible feature vectors is called feature space, which we denote by F. Feature spaces correspond to sample spaces. Examples of feature spaces are R d, {0, 1} d, R {0, 1} d. For the moment, we assume that the feature space is R d. We denote by ω the (unknown) class or the category of the object x. We use symbols ω 1,..., ω c for the c categories, or classes to which x can belong to. At this stage, c is fixed. Each category ω i has a prior probability P (ω i ). The prior probability tells us how likely particular class is before making any observations. In addition, we know the probability density functions (pdfs) of feature vectors drawn from a certain class. These are called class conditional density functions (ccdfs) and denoted by p(x ω i ) for i = 1,..., c. Ccdfs tell us how probable is the feature vector x provided that the class of the object is ω i Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.9/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.10/50 BDT - actions We write α 1,..., α a for a possible actions and α(x) for the action taken after observing x. Thought as a function from the feature space to {α 1,..., α a }, α(x) is called a decision rule. In fact, any function from the feature space to {α 1,..., α a } is a decision rule. The number of actions a need not to be equal to c, the number of classes. But if a = c and the actions α i read: Assign x to the class i, we often forget about the actions and talk about assigning x to a certain class. BDT - loss function We will develop the optimal decision rule based on statistics, but before that we need to tie actions and classes together. This is done with a loss function. The loss function, denoted by λ, tells how costly each action is and it is used to convert a probability determination into a decision of an action. λ is a function from action/class pairs to the set of positive real numbers. λ(α i ω j ) describes the loss incurred for taking action α i if the true class is ω j. Low λ(α ω j ) for good actions given the class ω j and high λ(α ω j ) for bad actions given the class ω j Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.11/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.12/50
4 BDT - Bayes decision rule Bayes decision rule If the true class is ω j, by definition we will incur the loss λ(α i ω j ) when taking the action α i after observing x. The expected loss or the conditional risk of taking action α i, after observing x, is R(α i x) = c λ(α i ω j )P (ω j x). j=1 The total expected loss, termed overall risk, is R total = R total (α) = R(α(x) x)p(x)dx Now, we would like to derive such decision rule α(x) that it minimizes the overall risk. This decision rule is Select action α i that gives the minimum expected loss R(α i x). i.e. α(x) = arg min αi R(α i x) This is called the Bayes decision rule. The classifier build upon this rule is called the Bayes (minimum risk) classifier. The overall risk R total for the Bayes decision rule is called Bayes risk. It is the smallest overall risk that is possible. for the decision rule α Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.13/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.14/50 Bayes decision rule Bayes decision rule We now prove that the Bayes decision rule indeed minimizes the overall risk. THEOREM. Let α : F {α 1,..., α a } be an arbitrary decision rule and α bayes : F {α 1,..., α a } the Bayes decision rule. Then R total (α bayes ) R total (α). PROOF: Note that by the definition of the Bayes decision rule R(α bayes (x) x) R(α(x) x). Hence, because p(x) 0, R(α Bayes (x) x)p(x) R(α(x) x)p(x). And R total (α bayes ) = R(α bayes (x) x)p(x)dx R(α(x) x)p(x)dx = R total (α) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.15/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.16/50
5 Two category classification Two category classification The possible classes are ω 1, ω 2. The action α 1 corresponds deciding that the true class is ω 1 and α 2 corresponds deciding that the true class is ω 2. Write λ(α i ω j ) = λ ij R(α 1 x) = λ 11 P (ω 1 x) + λ 12 P (ω 2 x) R(α 2 x) = λ 21 P (ω 1 x) + λ 22 P (ω 2 x) The Bayes decision rule; decide that the true class is ω 1 if R(α 1 x) < R(α 2 x) and ω 2 otherwise. We decide (that the true class is) ω 1 if (λ 21 λ 11 )P (ω 1 x) > (λ 12 λ 22 )P (ω 2 x). Ordinarily λ 21 λ 11 and λ 12 λ 22 are positive. That is, the loss is greater when making a mistake. Assume λ 21 > λ 11. Then Bayes decision rule can be written as: Decide ω 1 if p(x ω 1 ) p(x ω 2 ) > λ 12 λ 22 λ 21 λ 11 P (ω 2 ) P (ω 1 ) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.17/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.18/50 Example: SPAM filtering Zero-one-loss We have two actions: α 1 stands for keep the mail and α 2 stands for delete as SPAM. There are two classes ω 1 (normal mail) ω 2 (SPAM i.e. junk mail). P (ω 1 ) = 0.4, P (ω 2 ) = 0.6 and λ 11 = 0, λ 21 = 3, λ 12 = 1, λ 22 = 0. That is, deleting important mail as SPAM is more costly than keeping a SPAM mail. We get an message with the feature vector x and p(x ω 1 ) = 0.35, p(x ω 2 ) = How does the Bayes minimum risk classifier act? P (ω 1 x) = = 0.264; P (ω 2 x) = R(α 1 x) = = 0.736, R(α 2 x) = = Don t delete the mail! Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.19/50 An important loss function is zero-one-loss λ(α i ω j ) = 0 if i = j 1 if i j In the matrix form Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.20/50
6 Minimum error rate classification Recap: The Bayes Classifier Assume that our loss-function is zero-one-loss and actions α i read as Decide that the true class is ω i. We can then identify the action α i and the class ω i. The Bayes decision rule applied to this case leads to minimum error rate classification rule and Bayes (minimum error) classifier. The minimum error rate classification rule is: Decide ω i if P (ω i x) > P (ω j x) for all j i. Given a feature vector x, compute the conditional risk for taking action α i for all i = 1,..., a and select the action that gives the smallest conditional risk R(α i x). Classification with zero-one-loss: Compute the probability P (ω i x) for all categories ω 1,..., ω c and select the category that gives the largest probability. Remember to use Bayes rule in computing the probabilities Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.21/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.22/50 Discriminant functions Discriminant functions Note: In what follows we will assume that a = c and use ω i and α i interchangeably. There are many ways to represent the pattern classifiers. One of the most useful is in terms of a set of discriminant functions g 1 (x),..., g c (x) for a c category classifier. The classifier assigns a feature vector x to class ω i if g i (x) > g j (x) Figure 2.5 from Duda, Hart, Stork: Pattern Classification, Wiley, 2001 for all i j. For the Bayes classifier g i (x) = R(α i x) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.23/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.24/50
7 Equivalent discriminant functions Equivalent discriminant functions The choice of discriminant functions is not unique. Many distinct sets of discriminant functions lead to the same classifier, that is, to the same decision rule. We say that two sets discriminant functions are equivalent if they lead to the same classifier. Or to put it more formally, two sets discriminant functions are equivalent if their corresponding decision rules give equal decisions for all x. The following holds: Let f(x) < f(y) whenever x < y (i.e. f is monotonically increasing function) and g i (x), i = 1,..., c be the discriminant functions representing a classifier. Then, the discriminant functions f(g i (x)), i = 1,..., c represent essentially the same classifier as g i (x), i = 1,..., c. Example: Equivalent sets of discriminant functions for the minimum error rate (Bayes) classifier; g i (x) = P (ω i x) = p(x ω i)p (ω i ) P c j=1 p(x ω j)p (ω j ) g i (x) = p(x ω i )P (ω i ) g i (x) = ln p(x ω i ) + ln P (ω i ) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.25/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.26/50 Linear discriminant functions Discriminant functions: Two categories A discriminant function is linear if it can be written in the form g i (x) = w t i x + w i0; w i = [w i1,..., w id ]. The term w i0 is called the threshold or bias for the ith category. Note that a linear discriminant function is linear with respect to w i, w i0, but actually affine with respect to x. Don t let this bother you too much! A classifier is linear if it can be represented using entirely linear discriminant functions. Linear classifiers have some important properties which we will study later during this course. Two-category case: We may combine the two discriminant functions into a single discriminant function. The decision rule: Decide ω 1 if g 1 (x) > g 2 (x) and otherwise decide ω 2. Define g(x) = g 1 (x) g 2 (x). We obtain an equivalent decision rule: Decide ω 1 if g(x) > Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.27/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.28/50
8 Decision regions Decision regions The effect of any decision rule is to divide the feature space (in this case R d ) into c disjoint decision regions, R 1,..., R c. Decision rules can written with the help of decision regions: If x R i decide ω i. (Therefore decision regions form a representation for a classifier.) Decision regions can be derived from discriminant functions R i = {x : g i (x) > g j (x) i j}. Note that decision regions are properties of the classifier and they are not affected if the discriminant functions are changed to equivalent ones. Boundaries of decision regions, i.e places where two or more discriminant functions yield the same value, are called decision boundaries Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.29/50 Figure 2.6 from Duda, Hart, Stork: Pattern Classification, Wiley, Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.30/50 Decision regions example Decision regions example Consider two-category classification problem with P (ω 1 ) = 0.6, P (ω 2 ) = 0.4 and p(x ω 1 ) = 1 2π exp[ 0.5x 2 ] and p(x ω 2 ) = 1 2π exp[ 0.5(x 1) 2 ]. Find the decision regions and boundaries for the minimum error rate Bayes classifier. The decision region R 1 is the set of points where P (ω 1 x) > P (ω 2 x). The decision region R 2 is the set of points where P (ω 2 x) > P (ω 1 x). The decision boundary is the set of points where P (ω 2 x) = P (ω 1 x) Class conditional densities class 1 class Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.31/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.32/50
9 Decision regions example Decision regions example Let us begin with the decision boundary: P (ω 1 x) = P (ω 2 x) p(x ω 1 )P (ω 1 ) = p(x ω 2 )P (ω 2 ), where we used the Bayes formula and multiplied with p(x). p(x ω 1 )P (ω 1 ) = p(x ω 2 )P (ω 2 ) ln[p(x ω 1 )P (ω 1 )] = ln[p(x ω 2 )P (ω 2 )] Class 1 decision region class 1 class 2 Class 2 decision region (x/2) 2 + ln 0.6 = ((x 1)/2) 2 + ln 0.4 x 2 4 ln 0.6 = x 2 2x ln 0.4 Decision boundary is x = ln 0.6 ln R 1 = {x : x < x }, R 2 = {x : x > x } P (ω 1 x) and P (ω 2 x) and decision regions Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.33/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.34/50 The normal density The normal density - properties Univariate case where parameter σ > 0. Multivariate case p(x) = p(x) = 1 2πσ exp[ 1 2 (x µ σ )2 ], 1 (2π) d/2 det(σ) exp[ 1 2 (x µ)t Σ 1 (x µ)], where x is a d-column vector and Σ is a positive definite matrix. For a positive definite matrix Σ, x T Σx > 0 for all x 0. The normal density has several properties which give it a special position among probability densities. To large extent, this is due analytical tractability - as we shall soon see - but there are also other reasons for favoring normal densities. X N(µ, Σ) stands for X is a RV having the normal density with parameters µ, Σ. The expected value of X is E[X] = µ. The variance of X is V ar[x] = Σ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.35/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.36/50
10 The normal density - properties DFs for the normal density Let X = [X 1,..., X d ] N(µ, Σ). Then X i N(µ i, σ ii ). Let A, B be d d matrices. Then AX N(Aµ, AΣA T ). AX and BX are independent if and only if AΣB T is the zero matrix. The sum of two normally distributed RVs is also normally distributed. Central Limit Theorem: The sum of n identically distributed independent (i.i.d) RVs tends to a normally distributed RV as n approaches infinity. The minimum error rate classifier can be represented by discriminant functions (DFs) g i (x) = ln p(x ω i ) + ln P (ω i ), i = 1,..., c. Letting p(x ω i ) = N(µ i, Σ i ) we obtain g i (x) = 1 2 (x µ i) T Σ 1 i (x µ i ) d 2 ln 2π 1 2 ln det(σ i)+ln P (ω i ) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.37/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.38/50 DFs for the normal density: Σ i = σ 2 I DFs for the normal density: Σ i = σ 2 I Σ i = σ 2 I: Features are independent and each feature has the same variance σ 2. Geometrically, this corresponds to the situation in which the samples fall in the equally-sized (hyper)spherical clusters and the cluster for the ith class is centered around µ i. We will now derive equivalent linear DFs to the DFs g i (x) = 1 2 (x µ i) T Σ 1 i (x µ i ) d 2 ln 2π 1 2 ln det(σ i)+ln P (ω i ). 1) All the constant terms like d 2 ln 2π can be dropped - dropping them does not affect the classification result. 2) In this particular case also the determinants of the covariance matrices have all the same value (σ 2d ). 3) Σ 1 = 1 σ 2 I and hence g i (x) = x µ i 2 2σ 2 + ln P (ω i ). We (sloppily) use = sign to indicate that discriminant functions are equivalent Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.39/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.40/50
11 DFs for the normal density: Σ i = σ 2 I Minimum distance classifier g i (x) = x µ i 2 2σ 2 + ln P (ω i ) = 1 2σ 2 (x µ i) T (x µ i ) + ln P (ω i ) = 1 2σ 2 (xt x 2µ T i x + µ T i µ i ) + ln P (ω i ). From the last expression we see that the quadratic term x T x is same for all categories and can be dropped. Hence, we obtain equivalent linear discriminant functions: If we assume that all P (ω i ) are equal and Σ i = σ 2 I, we obtain the minimum distance classifier. Note that this means that P (ω i ) = 1 c. The name of the classifier follows from the set of discriminant functions used: g i (x) = x µ i. Hence, a feature vector is assigned to the category with the nearest mean. Note that a minimum distance classifier can be also implemented as a linear classifier. g linear i (x) = 1 σ 2 (µt i x 1 2 µt i µ i ) + ln P (ω i ) Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.41/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.42/50 DFs for the normal density: Σ i = Σ DFs for the normal density: Σ i arbitrary Consider a bit more complicated model. Now all covariance matrices have still the same value but there exists some correlation between the features. Also in this case we have a linear classifier: It is time to consider the most general Gaussian model, where features from each category are assumed to be normally distributed, but nothing more is not assumed. In this case the discriminant functions where and g i (x) = w T i x + w i0, w i = Σ 1 µ i w i0 = 1 2 µt i Σ 1 µ i + ln P (ω i ). g i (x) = 1 2 (x µ i) T Σ 1 i (x µ i ) d 2 ln 2π 1 2 ln det(σ i)+ln P (ω i ). cannot be simplified much; Only constant terms can be dropped. Discriminant functions are now necessarily quadratic which means that the decision regions may have more complicated shapes than in the linear case Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.43/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.44/50
12 BDT - Discrete features Independent binary features In many practical applications, the feature space is discrete. The components of the feature vectors are then binary or higher-integer valued. This simply means that integrals as in continuous case must be replaced with sums and probability densities must be replaced with probabilities. For example, the minimum error rate classification rule is: Decide ω i if Consider the two-category problem, where the feature vectors x = [x 1,..., x d ] are binary, i.e. x i is either 0 or 1. Assume further that features are (conditionally) independent, that is P (x ω j ) = d P (x i ω j ). i=1 Denote p i = P (x i = 1 ω 1 ) and q i = P (x i = 1 ω 2 ). P (ω i x) = P (x ω i)p (ω i ) P (x) > P (ω j x) i j Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.45/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.46/50 Independent binary features Independent binary features P (x ω 1 ) = d i=1 px i i (1 p i) 1 x i P (x ω 2 ) = d i=1 qx i i (1 q i) 1 x i Use discriminant functions g 1 (x) = ln P (ω 1 x), g 2 (x) = ln P (ω 2 x). Recall that g(x) = g 1 (x) g 2 (x), when the classifier studied the sign of g(x). By Bayes rule, this leads to g(x) = ln P (x ω 1) P (x ω 2 ) + ln P (ω 1) P (ω 2 ). g(x) = d i=1 [x i ln p i q i + (1 x i ) ln 1 p i 1 q i ] + ln P (ω 1) P (ω 2 ) g(x) = d i=1 x i ln p i(1 q i ) q i (1 p i ) + d i=1 ln 1 p i 1 q i + ln P (ω 1) P (ω 2 ) Note that we have a linear machine: g(x) = d i=1 w ix i + w i0. The magnitude of w i determines the importance of the yes (1) answer for x i. If p i = q i, the value of x i gives no information about the class. The prior probabilities appear only in the bias term. Increasing P (ω 1 ) biases the decision in favor of ω Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.47/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.48/50
13 Receiver operating characteristic For signal detection theory and receiver operating characteristic (ROC), see milos/courses/cs2750/lectures/class9.pdf. BDT - context This far we have assumed that our interest is in classifying a single object at time. However, in applications we may need to classify several objects at same time. Example: image segmentation. If we assume that the class of one object is independent from remaining ones, nothing does change. If there is some dependence, then the basic principles remain the same, i.e. we assign objects to the most probable category. But now we need to take also categories of other objects into account and place all objects in such categories that the probability of the whole ensemble is maximized. Computational difficulties follow Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.49/ Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory p.50/50
Bayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationBayesian Decision Theory Lecture 2
Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2
More informationPattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors
More informationContents 2 Bayesian decision theory
Contents Bayesian decision theory 3. Introduction... 3. Bayesian Decision Theory Continuous Features... 7.. Two-Category Classification... 8.3 Minimum-Error-Rate Classification... 9.3. *Minimax Criterion....3.
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationMinimum Error-Rate Discriminant
Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3
CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationSGN-2506: Introduction to Pattern Recognition. Jussi Tohka Tampere University of Technology Institute of Signal Processing 2006
SGN-2506: Introduction to Pattern Recognition Jussi Tohka Tampere University of Technology Institute of Signal Processing 2006 September 1, 2006 ii Preface This is an English translation of the lecture
More informationError Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3
Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion
More informationLecture Notes on the Gaussian Distribution
Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s
More information44 CHAPTER 2. BAYESIAN DECISION THEORY
44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationNearest Neighbor Pattern Classification
Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationBayes Rule for Minimizing Risk
Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationMinimum Error Rate Classification
Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationExpect Values and Probability Density Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1
CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of
More information2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?
ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationp(x ω i 0.4 ω 2 ω
p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationApplication: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth
Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationLecture 8: Classification
1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation
More informationLecture 8: Signal Detection and Noise Assumption
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(0, σ I n n and S = [s, s,..., s n ] T
More informationLecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary
ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationExample - basketball players and jockeys. We will keep practical applicability in mind:
Sonka: Pattern Recognition Class 1 INTRODUCTION Pattern Recognition (PR) Statistical PR Syntactic PR Fuzzy logic PR Neural PR Example - basketball players and jockeys We will keep practical applicability
More informationMachine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1
Machine Learning 1 Linear Classifiers Marius Kloft Humboldt University of Berlin Summer Term 2014 Machine Learning 1 Linear Classifiers 1 Recap Past lectures: Machine Learning 1 Linear Classifiers 2 Recap
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Introduction to Probability and Statistics Lecture 14: Continuous random variables Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationL5: Quadratic classifiers
L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationp(x ω i 0.4 ω 2 ω
p(x ω i ).4 ω.3.. 9 3 4 5 x FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value x given the pattern is in category
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More information