CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

Size: px
Start display at page:

Download "CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3"

Transcription

1 CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1

2 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an with your name and course number in the subject, and in the body: if you are graduate or undergraduate your department the year you are in if undergraduate

3 Announcements Information Session on "HOW TO ALY FOR EXTERNAL AWARDS" Location: Social Science Centre, Room 036 Tuesday, September 1, 004 from 4 pm to 5pm Lutfiyya is 3

4 Today Finish Matlab Introduction Course Roadmap Change in Notation (for consistency with textbook) Conditional distributions (forgot to review) Bayesian Decision Theory Two category classification Multiple category classification Discriminant Functions 4

5 Course Road Map a lot is known easier little is known harder

6 Bayesian Decision theory Know probability distribution of the categories never happens in real world Do not even need training data Can design optimal classifier Example respected fish expert says that salmon s length has distribution N(5,1) and sea bass s length has distribution N(10,4) a lot is known easier salmon sea bass little is known harder

7 ML and Bayesian parameter estimation Shape of probability distribution is known Happens sometimes Labeled training data salmon bass salmon salmon Need to estimate parameters of probability distribution from the training data Example respected fish expert says salmon s ( length has distribution N µ 1, σ ) 1 and sea ( bass s length has distribution N µ σ ), Need to estimate parameters µ, σ, µ σ 1 1, Then can use the methods from the bayesian decision theory a lot is known easier little is known harder

8 lightness Linear discriminant functions and Neural Nets No probability distribution (no shape or parameters are known) salmon bass salmon salmon Labeled data The shape of discriminant functions is known salmon bass linear discriminant function a lot is known length Need to estimate parameters of the discriminant function (parameters of the line in case of linear discriminant) little is known

9 Non-arametric Methods Neither probability distribution nor discriminant function is known Happens quite often All we have is labeled data a lot is known easier salmon bass salmon salmon Estimate the probability distribution from the labeled data little is known harder

10 Unsupervised Learning and Clustering Data is not labeled Happens quite often a lot is known easier 1. Estimate the probability distribution from the unlabeled data. Cluster the data little is known harder

11 Course Road Map 1. Bayesian Decision theory (rare case) Know probability distribution of the categories Do not even need training data Can design optimal classifier. ML and Bayesian parameter estimation Need to estimate arameters of probability dist. Need training data 3. Non-arametric Methods No probability distribution, labeled data 4. Linear discriminant functions and Neural Nets The shape of discriminant functions is known Need to estimate parameters of discriminant functions 5. Unsupervised Learning and Clustering No probability distribution and unlabeled data a lot is known little is known

12 Notation Change (for consistency with textbook) r [A] (x) p(x) p(x,y) p(x y) (x y) probability of event A probability mass function of discrete r.v. x probability density function of continuous r.v. x joint probability density of r.v. x and y conditional density of x given y conditional mass of x given y

13 More on robability For events A and B, we have defined conditional probability r(a B) r(a B)= r(b) U r law of total probability n ( A) = r ( A Bk ) r ( B k ) k = 1 Bayes rule r ( B A) i = n k = 1 r r ( A B ) r ( B ) i ( A B ) r ( B ) k i k Usually model with random variables not events. Need equivalents of these laws for mass and density functions (could go from random variables back to events, but time consuming)

14 Conditional Mass Function: Discrete RV For discrete RV nothing new because mass function is really a probability law Define conditional mass function of X given Y=y by ( ) ( x, y ) x y = ( y ) This is a probability mass function because: ( x, y ) x ( ) ( y ) x y = = = 1 ( y ) ( y ) x y is fixed This is really nothing new because: ( x, y ) ( y ) [ X = x Y = y ] r [ Y = y ] r ( x y ) = = = r = [ X = x Y y ]

15 Conditional Mass Function: Bayes Rule The law of Total robability: ( x) = ( x,y ) = ( x y ) ( y ) y y The Bayes Rule: ( y x ) = ( y, x ) ( x ) = y ( x y ) ( y ) ( x y ) ( y )

16 Conditional Density Function: Continuous RV Does it make sense to talk about conditional density p(x y) if Y is a continuous random variable? After all, r[y=y]=0, so we will never see Y=y in practice Measurements have limited accuracy. Can interpret observation y as observation in interval [y-ε, y+ε], and observation x as observation in interval [x-ε, x+ε] y-ε y+ε x-ε x+ε y x

17 Conditional Density Function: Continuous RV Let B(x) denote interval [x-ε,x+ε] r x + ε [ X B( x) ] = p( x) dx ε p( x) x ε Similarly r[ Y B( y )] ε p( y ) r[ X B( x) Y B( y )] 4ε p( x,y ) Thus we should have ( x y ) p ( x y ) Which can be simplified to: p r r [ X B( x) Y B( y )] ε r[ Y B( y )] p(x) x-ε x x+ε [ X B( x) Y B( y )] ε ( x,y ) ( y ) p p

18 Conditional Density Function: Continuous RV Define conditional density function of X given Y=y by p ( ) ( x, y ) p x y = p( y ) This is a probability density function because: y is fixed ( x, y ) p( y ) ( x, y ) p dx p p p ( x y ) dx = dx = = = p p The law of Total robability: ( y ) ( x) p( x, y) dy = p( x y ) p( y ) p = dy ( y ) ( y ) 1

19 Conditional Density Function: Bayes Rule The Bayes Rule: p ( y x ) = p p ( y, x ) ( x ) = p p ( x y ) p( y ) ( x y ) p( y ) dy

20 Mixed Discrete and Continuous X discrete, Y continuous Bayes rule p ( ) ( y x ) ( x ) x y = p( y ) X continuous, Y discrete Bayes rule ( ) ( y x ) p( x ) p x y = ( y )

21 Bayesian Decision Theory Know probability distribution of the categories Almost never the case in real life! Nevertheless useful since other cases can be reduced to this one after some work Do not even need training data Can design optimal classifier 1

22 Cats and Dogs Suppose we have these conditional probability mass functions for cats and dogs (small ears dog) = 0.1, (large ears dog) = 0.9 (small ears cat) = 0.8, (large ears cat) = 0. Observe an animal with large ears Dog or a cat? Makes sense to say dog because probability of observing large ears in a dog is much larger than probability of observing large ears in a cat r[large ears dog] = 0.9 > 0.= r[large ears cat] = 0. We choose the event of larger probability, i.e. maximum likelihood event

23 Example: Fish Sorting Respected fish expert says that Salmon length has distribution N(5,1) Sea bass s length has distribution N(10,4) Recall if r.v. is ( ) p l salmon = ( ) p l ( N µ,σ ) = 1 e π σ ( l ) then it s density is ( l µ ) 1 σ e π Thus class conditional densities are 5 ( ) p l bass = 1 e π ( l ) 10 *4 3

24 Likelihood function Thus class conditional densities are ( ) p l salmon fixed = 1 e π ( l ) 5 ( ) p l bass fixed = 1 e π ( l ) 10 *4 Fix length, let fish class vary. Then we get likelihood function (it is not density and not probability mass) p l ( ) fixed class = 1 e π 1 e π ( l 5 ) ( l 10 ) 8 if class = salmon if class = bass 4

25 Likelihood vs. Class Conditional Density p(l class) 7 length Suppose a fish has length 7. How do we classify it? 5

26 ML (maximum likelihood) Classifier We would like to choose salmon if r length= 7 salmon > r length= 7 bass [ ] [ ] However, since length is a continuous r.v., [ length= 7 salmon] = r[ length= 7 bass] 0 r = Instead, we choose class which maximizes likelihood ( l 5) 1 ( l 10) ( ) 1 p l salmon = e ( ) *4 p l bass = e π π ML classifier: for an observed l: bass <?p l bass > salmon ( ) ( ) p l salmon in words: if p(l salmon) > p(l bass), classify as salmon, else classify as bass

27 > > Interval Justification p( 7 bass) p( 7 salmon) Thus we choose the class (bass) which is more likely to have given the observation r r 7 [ l B( 7) bass] ε p( 7 bass) [ l B( 7) salmon] ε p( 7 salmon) 7

28 Decision Boundary classify as salmon classify as sea bass 6.70 length 8

29 riors rior comes from prior knowledge, no data has been seen yet Suppose a fish expert says: in the fall, there are twice as many salmon as sea bass rior for our fish sorting problem (salmon) = /3 (bass) = 1/3 With the addition of prior to our model, how should we classify a fish of length 7? 9

30 How rior Changes Decision Boundary? Without priors salmon 6.70 sea bass length How should this change with prior? (salmon) = /3 (bass) = 1/3 salmon?? 6.70 sea bass length 30

31 Bayes Decision Rule 1. Have likelihood functions p(length salmon) and p(length bass). Have priors (salmon) and (bass) Question: Having observed fish of certain length, do we classify it as salmon or bass? Natural Idea: salmon if bass if ( ) ( ) salmon length > bass length ( ) ( ) bass length > salmon length 31

32 osterior (salmon length) and (bass length) are called posterior distributions, because the data (length) was revealed (post data) How to compute posteriors? Not obvious From Bayes rule: ( salmon length) = ( ) p( length) p salmon,length Similarly: ( bass length) = = ( salmon ) ( salmon) p( length) p length ( bass ) ( bass) p( length) p length 3

33 MA (maximum a posteriori) classifier p > salmon salmon length? bass < ( ) ( ) bass ( length salmon) ( salmon) p( length ) bass salmon >? < p length ( length bass) ( bass) p( length ) ( ) ( ) ( ) ( ) p length salmon salmon? bass < >salmon p length bass bass 33

34 Back to Fish Sorting Example likelihood p( l salmon) = 1 e π ( l ) 5 ( bass) riors: (salmon) = /3, (bass) = 1/3 Solve inequality salmon 6.70 e π p l = 1 e π ( l ) 10 8 ( l 5 ) ( l 10 ) 3 1 e π 1 8 new decision boundary 7.18 > sea bass length 1 3 New decision boundary makes sense since we expect to see more salmon 34

35 rior (s)=/3 and (b)= 1/3 vs. rior (s)=0.999 and (b)= salmon bass length

36 Likelihood vs osteriors (salmon l) p(l salmon) (bass l) p(l bass) likelihood p(l fish class) density with respect to length, area under the curve is 1 length posterior (fish class l) mass function with respect to fish class, so for each l, (salmon l )+(bass l ) = 1

37 More on osterior posterior density (our goal) ( ) c l = likelihood (given) ( l c) ( c) ( l) rior (given) normalizing factor, often do not even need it for classification since (l) does not depend on class c. If we do need it, from the law of total probability: ( l ) = p( l salmon ) p( salmon ) + p( l bass) p( bass) Notice this formula consists of likelihoods and priors, which are given

38 More on osterior posterior cause (class) ( ) c l = c likelihood ( l c) ( c) ( l) l prior effect (length) If cause c is present, it easy to determine the probability of effect l with likelihood (l c) Usually observe the effect l without knowing cause c. Hard to determine cause c because there may be several causes which could produce same effect l Bayes rule makes I easy to determine posterior (c l), if we know likelihood (l c) and prior (c)

39 More on riors rior comes from prior knowledge, no data has been seen yet If there is a reliable source prior knowledge, it should be used Some problems cannot even be solved reliably without a good prior However prior alone is not enough, we still need likelihood (salmon)=/3, (sea bass)=1/3 If I don t let you see the data, but ask you to guess, will you choose salmon or sea bass? 39

40 More on Map Classifier posterior ( ) c l = likelihood prior ( l c) ( c) ( l) Do not care about (l) when maximizing (c l ) ( ) ( ) ( ) c l proportional c If (salmon)=(bass) (uniform prior) MA classifier becomes ML classifier ( c l) ( l c) If for some observation l, (l salmon)=(l bass), then this observation is uninformative and decision is based solely on the prior ( c l) ( c) l c

41 Justification for MA Classifier Let s compute probability of error for the MA estimate: > ( salmon l )? ( bass l ) bass < salmon For any particular l, probability of error (bass l) if we decide salmon r[error l ]= (salmon l) if we decide bass Thus MA classifier is optimal for each individual l! 41

42 Justification for MA Classifier We are interested to minimize error not just for one l, we really want to minimize the average error over all l r [ error] = p( error,l) dl = r[ error l] p( l)dl If r[error l ]is as small as possible, the integral is small as possible But Bayes rule makes r[error l ] as small as possible Thus MA classifier minimizes the probability of error!

43 Today Bayesian Decision theory Multiple Classes General loss functions Multivariate Normal Random Variable Classifiers Discriminant Functions 43

44 More General Case Let s generalize a little bit Have more than one feature x = [ x ] 1, x,..., xd Have more than classes { } c 1,c,..., cm

45 More General Case As before, for each j we have ( ) p x c j is likelihood of observation x given that the true class is c ( ) j c j is prior probability of class c j ( c ) j x is posterior probability of class c j given that we observed data x Evidence, or probability density for data p m ( x) = p( x c ) ( ) j c j j= 1 45

46 Minimum Error Rate Classification Want to minimize average probability of error r = = [ error] p( error, x) dx r[ error x] p( x)dx [ ] ( ) need to make this as small as possible r error x = 1 ci x if we decide class c i r[ error x] Decide on class c i is minimized with MA classifier ( c x) > ( c x) j i i MA classifier is optimal If we want to minimize the probability of error j if 1 1-(c 1 x) 1-(c x) (c 1 x) (c x) 1-(c 3 x) (c 3 x)

47 General Bayesian Decision Theory In close cases we may want to refuse to make a decision (let human expert handle tough case) allow actions { } α, α 1 α,..., Suppose some mistakes are more costly than others (classifying a benign tumor as cancer is not as bad as classifying cancer as benign tumor) Allow loss functions λ α describing loss i c j occurred when taking action α i when the true class is c j k ( ) 47

48 Conditional Risk Suppose we observe x and wish to take action α i If the true class is c, by definition, we incur loss ( ) j λ αi c j robability that the true class is c j after observing x is R ( c x) j The expected loss associated with taking action is called conditional risk and it is: α i m ( α ) = ( ) ( ) i x λ αi c j c j x j= 1

49 Conditional Risk sum over disjoint events (different classes) probability of class c j given observation x R m ( α ) = ( ) ( ) i x λ αi c j c j x penalty for taking action α i if observe x j= 1 part of overall penalty which comes from event that true class is c j c 1 c c 3 c 4 λ(α i c 1 ) λ(α i c ) λ(α i c 3 ) λ(α i c 4 )

50 Example: Zero-One loss function action is decision that true class is R m ( α x) λ( α c ) ( c x) = i α i λ ( α ) i c j = 1 = j= 1 = 1 ci i 0 if i = j otherwise j ( x) MA classifier is Bayes decision rule under zero-one loss function j i j (no mistake) (mistake) ( x) = c Thus MA classifier optimizes R(α i x) ( c x) > ( c x) j i i = r j j c i [ error if decide ] c i

51 Overall Risk Decision rule is a function α(x) which for every x specifies action α α out of { 1,,..., α k } The average risk for α(x) ( ) ( ( x) x) p( x)dx R α = R α need to make this as small as possible Bayes decision rule α(x) for every x is the action which minimizes the conditional risk R m ( α ) = ( ) ( ) i x λ αi c j c j x j= 1 Bayes decision rule α(x) is optimal, i.e. gives the minimum possible overall risk R* x 1 x x 3 X α(x 1 ) α(x ) α(x 3 ) { α, α α } 1,..., k

52 Bayes Risk: Example Salmon is more tasty and expensive than sea bass λ ( salmon bass) sb = λ = classify bass as salmon λ ( bass salmon) bs = λ = 1 classify salmon as bass λ = λ 0 no mistake, no loss ss bb = Likelihoods R ( salmon) p l riors (salmon)= (bass) = 1 e π ( l ) Risk R( x) = λ( α c ) ( ) j c j x 5 ( bass) p l = 1 e π ( ) ( ) ( ) ( ) salmon R l = λ s l + λ b l = λ ss sb sb b ( ) ( ) ( ) ( ) bass l j= 1 = λ s l + λ b l = λ bs bb bs s ( l ) l l 10 *4 m α = λ α s( s l) + λα b( b l)

53 R Bayes Risk: Example ( salmon l) = ( b l) R( bass l) ( s l) λ sb = λ Bayes decision rule (optimal for our loss function) Need to solve ( ) ( ) λ sb b l? λ > < salmon bs bass ( b l) λbs < ( s l) λsb ( l b) ( b) p( l) ( ) ( ) ( ) p l l s s s Or, equivalently, since priors are equal: = l ( l b) λ < ( l s) λsb bs bs

54 Bayes Risk: Example Need to solve ( l b) λbs < ( l s) λsb Substituting likelihoods and losses 1 π exp π exp ( l 10 ) 8 ( l 5 ) < 1 exp exp ( l 10 ) 8 ( l 5 ) < 1 exp ln exp ( l 10 ) 8 < ln( 1) ( l 5 ) ( l 10) ( l 5) + < 0 3l 0l < 0 l < salmon new decision boundary sea bass length

55 Likelihood Ratio Rule In category case, use likelihood ratio rule ( x c ) 1 ( ) x c > λ λ 1 1 λ λ 11 ( ) ( ) c c 1 likelihood ratio fixed number Independent of x If above inequality holds, decide c 1 Otherwise decide c 55

56 Discriminant Functions All decision rules have the same structure: at observation x choose class s.t. g i ( x) > g ( x) j i discriminant function ML decision rule: g ( x) = ( x ) j c i i c i MA decision rule: g ( x) ( c x) i = i Bayes decision rule: g ( x) R( c x) i = i

57 Discriminant Functions Classifier can be viewed as network which computes m discriminant functions and selects category corresponding to the largest discriminant select class giving maximim discriminant functions g 1 ( x) g ( x) g m ( x) features x1 x x 3 g i (x) can be replaced with any monotonically increasing function, the results will be unchanged xd

58 Decision Regions Discriminant functions split the feature vector space X into decision regions c 1 ( x) max{ } g = g i c c 3 c 3 c 1 58

59 Important oints If we know probability distributions for the classes, we can design the optimal classifier Definition of optimal depends on the chosen loss function Under the minimum error rate (zero-one loss function No prior: ML classifier is optimal Have prior: MA classifier is optimal More general loss function General Bayes classifier is optimal 59

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3 Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1 CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Bayesian Decision Theory Lecture 2

Bayesian Decision Theory Lecture 2 Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

BAYES DECISION THEORY

BAYES DECISION THEORY BAYES DECISION THEORY PROF. ALAN YUILLE 1. How to make decisions in the presence of uncertainty? History 2 nd World War: Radar for detection aircraft, code-breaking, decryption. The task is to estimate

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Bayes Rule for Minimizing Risk

Bayes Rule for Minimizing Risk Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 2. Bayes Decision Theory

Lecture 2. Bayes Decision Theory Lecture 2. Bayes Decision Theory Prof. Alan Yuille Spring 2014 Outline 1. Bayes Decision Theory 2. Empirical risk 3. Memorization & Generalization; Advanced topics 1 How to make decisions in the presence

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Bayes Decision Theory - I

Bayes Decision Theory - I Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Bayesian decision making

Bayesian decision making Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set MAS 6J/1.16J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information