CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3
|
|
- Alison Tate
- 5 years ago
- Views:
Transcription
1 CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1
2 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an with your name and course number in the subject, and in the body: if you are graduate or undergraduate your department the year you are in if undergraduate
3 Announcements Information Session on "HOW TO ALY FOR EXTERNAL AWARDS" Location: Social Science Centre, Room 036 Tuesday, September 1, 004 from 4 pm to 5pm Lutfiyya is 3
4 Today Finish Matlab Introduction Course Roadmap Change in Notation (for consistency with textbook) Conditional distributions (forgot to review) Bayesian Decision Theory Two category classification Multiple category classification Discriminant Functions 4
5 Course Road Map a lot is known easier little is known harder
6 Bayesian Decision theory Know probability distribution of the categories never happens in real world Do not even need training data Can design optimal classifier Example respected fish expert says that salmon s length has distribution N(5,1) and sea bass s length has distribution N(10,4) a lot is known easier salmon sea bass little is known harder
7 ML and Bayesian parameter estimation Shape of probability distribution is known Happens sometimes Labeled training data salmon bass salmon salmon Need to estimate parameters of probability distribution from the training data Example respected fish expert says salmon s ( length has distribution N µ 1, σ ) 1 and sea ( bass s length has distribution N µ σ ), Need to estimate parameters µ, σ, µ σ 1 1, Then can use the methods from the bayesian decision theory a lot is known easier little is known harder
8 lightness Linear discriminant functions and Neural Nets No probability distribution (no shape or parameters are known) salmon bass salmon salmon Labeled data The shape of discriminant functions is known salmon bass linear discriminant function a lot is known length Need to estimate parameters of the discriminant function (parameters of the line in case of linear discriminant) little is known
9 Non-arametric Methods Neither probability distribution nor discriminant function is known Happens quite often All we have is labeled data a lot is known easier salmon bass salmon salmon Estimate the probability distribution from the labeled data little is known harder
10 Unsupervised Learning and Clustering Data is not labeled Happens quite often a lot is known easier 1. Estimate the probability distribution from the unlabeled data. Cluster the data little is known harder
11 Course Road Map 1. Bayesian Decision theory (rare case) Know probability distribution of the categories Do not even need training data Can design optimal classifier. ML and Bayesian parameter estimation Need to estimate arameters of probability dist. Need training data 3. Non-arametric Methods No probability distribution, labeled data 4. Linear discriminant functions and Neural Nets The shape of discriminant functions is known Need to estimate parameters of discriminant functions 5. Unsupervised Learning and Clustering No probability distribution and unlabeled data a lot is known little is known
12 Notation Change (for consistency with textbook) r [A] (x) p(x) p(x,y) p(x y) (x y) probability of event A probability mass function of discrete r.v. x probability density function of continuous r.v. x joint probability density of r.v. x and y conditional density of x given y conditional mass of x given y
13 More on robability For events A and B, we have defined conditional probability r(a B) r(a B)= r(b) U r law of total probability n ( A) = r ( A Bk ) r ( B k ) k = 1 Bayes rule r ( B A) i = n k = 1 r r ( A B ) r ( B ) i ( A B ) r ( B ) k i k Usually model with random variables not events. Need equivalents of these laws for mass and density functions (could go from random variables back to events, but time consuming)
14 Conditional Mass Function: Discrete RV For discrete RV nothing new because mass function is really a probability law Define conditional mass function of X given Y=y by ( ) ( x, y ) x y = ( y ) This is a probability mass function because: ( x, y ) x ( ) ( y ) x y = = = 1 ( y ) ( y ) x y is fixed This is really nothing new because: ( x, y ) ( y ) [ X = x Y = y ] r [ Y = y ] r ( x y ) = = = r = [ X = x Y y ]
15 Conditional Mass Function: Bayes Rule The law of Total robability: ( x) = ( x,y ) = ( x y ) ( y ) y y The Bayes Rule: ( y x ) = ( y, x ) ( x ) = y ( x y ) ( y ) ( x y ) ( y )
16 Conditional Density Function: Continuous RV Does it make sense to talk about conditional density p(x y) if Y is a continuous random variable? After all, r[y=y]=0, so we will never see Y=y in practice Measurements have limited accuracy. Can interpret observation y as observation in interval [y-ε, y+ε], and observation x as observation in interval [x-ε, x+ε] y-ε y+ε x-ε x+ε y x
17 Conditional Density Function: Continuous RV Let B(x) denote interval [x-ε,x+ε] r x + ε [ X B( x) ] = p( x) dx ε p( x) x ε Similarly r[ Y B( y )] ε p( y ) r[ X B( x) Y B( y )] 4ε p( x,y ) Thus we should have ( x y ) p ( x y ) Which can be simplified to: p r r [ X B( x) Y B( y )] ε r[ Y B( y )] p(x) x-ε x x+ε [ X B( x) Y B( y )] ε ( x,y ) ( y ) p p
18 Conditional Density Function: Continuous RV Define conditional density function of X given Y=y by p ( ) ( x, y ) p x y = p( y ) This is a probability density function because: y is fixed ( x, y ) p( y ) ( x, y ) p dx p p p ( x y ) dx = dx = = = p p The law of Total robability: ( y ) ( x) p( x, y) dy = p( x y ) p( y ) p = dy ( y ) ( y ) 1
19 Conditional Density Function: Bayes Rule The Bayes Rule: p ( y x ) = p p ( y, x ) ( x ) = p p ( x y ) p( y ) ( x y ) p( y ) dy
20 Mixed Discrete and Continuous X discrete, Y continuous Bayes rule p ( ) ( y x ) ( x ) x y = p( y ) X continuous, Y discrete Bayes rule ( ) ( y x ) p( x ) p x y = ( y )
21 Bayesian Decision Theory Know probability distribution of the categories Almost never the case in real life! Nevertheless useful since other cases can be reduced to this one after some work Do not even need training data Can design optimal classifier 1
22 Cats and Dogs Suppose we have these conditional probability mass functions for cats and dogs (small ears dog) = 0.1, (large ears dog) = 0.9 (small ears cat) = 0.8, (large ears cat) = 0. Observe an animal with large ears Dog or a cat? Makes sense to say dog because probability of observing large ears in a dog is much larger than probability of observing large ears in a cat r[large ears dog] = 0.9 > 0.= r[large ears cat] = 0. We choose the event of larger probability, i.e. maximum likelihood event
23 Example: Fish Sorting Respected fish expert says that Salmon length has distribution N(5,1) Sea bass s length has distribution N(10,4) Recall if r.v. is ( ) p l salmon = ( ) p l ( N µ,σ ) = 1 e π σ ( l ) then it s density is ( l µ ) 1 σ e π Thus class conditional densities are 5 ( ) p l bass = 1 e π ( l ) 10 *4 3
24 Likelihood function Thus class conditional densities are ( ) p l salmon fixed = 1 e π ( l ) 5 ( ) p l bass fixed = 1 e π ( l ) 10 *4 Fix length, let fish class vary. Then we get likelihood function (it is not density and not probability mass) p l ( ) fixed class = 1 e π 1 e π ( l 5 ) ( l 10 ) 8 if class = salmon if class = bass 4
25 Likelihood vs. Class Conditional Density p(l class) 7 length Suppose a fish has length 7. How do we classify it? 5
26 ML (maximum likelihood) Classifier We would like to choose salmon if r length= 7 salmon > r length= 7 bass [ ] [ ] However, since length is a continuous r.v., [ length= 7 salmon] = r[ length= 7 bass] 0 r = Instead, we choose class which maximizes likelihood ( l 5) 1 ( l 10) ( ) 1 p l salmon = e ( ) *4 p l bass = e π π ML classifier: for an observed l: bass <?p l bass > salmon ( ) ( ) p l salmon in words: if p(l salmon) > p(l bass), classify as salmon, else classify as bass
27 > > Interval Justification p( 7 bass) p( 7 salmon) Thus we choose the class (bass) which is more likely to have given the observation r r 7 [ l B( 7) bass] ε p( 7 bass) [ l B( 7) salmon] ε p( 7 salmon) 7
28 Decision Boundary classify as salmon classify as sea bass 6.70 length 8
29 riors rior comes from prior knowledge, no data has been seen yet Suppose a fish expert says: in the fall, there are twice as many salmon as sea bass rior for our fish sorting problem (salmon) = /3 (bass) = 1/3 With the addition of prior to our model, how should we classify a fish of length 7? 9
30 How rior Changes Decision Boundary? Without priors salmon 6.70 sea bass length How should this change with prior? (salmon) = /3 (bass) = 1/3 salmon?? 6.70 sea bass length 30
31 Bayes Decision Rule 1. Have likelihood functions p(length salmon) and p(length bass). Have priors (salmon) and (bass) Question: Having observed fish of certain length, do we classify it as salmon or bass? Natural Idea: salmon if bass if ( ) ( ) salmon length > bass length ( ) ( ) bass length > salmon length 31
32 osterior (salmon length) and (bass length) are called posterior distributions, because the data (length) was revealed (post data) How to compute posteriors? Not obvious From Bayes rule: ( salmon length) = ( ) p( length) p salmon,length Similarly: ( bass length) = = ( salmon ) ( salmon) p( length) p length ( bass ) ( bass) p( length) p length 3
33 MA (maximum a posteriori) classifier p > salmon salmon length? bass < ( ) ( ) bass ( length salmon) ( salmon) p( length ) bass salmon >? < p length ( length bass) ( bass) p( length ) ( ) ( ) ( ) ( ) p length salmon salmon? bass < >salmon p length bass bass 33
34 Back to Fish Sorting Example likelihood p( l salmon) = 1 e π ( l ) 5 ( bass) riors: (salmon) = /3, (bass) = 1/3 Solve inequality salmon 6.70 e π p l = 1 e π ( l ) 10 8 ( l 5 ) ( l 10 ) 3 1 e π 1 8 new decision boundary 7.18 > sea bass length 1 3 New decision boundary makes sense since we expect to see more salmon 34
35 rior (s)=/3 and (b)= 1/3 vs. rior (s)=0.999 and (b)= salmon bass length
36 Likelihood vs osteriors (salmon l) p(l salmon) (bass l) p(l bass) likelihood p(l fish class) density with respect to length, area under the curve is 1 length posterior (fish class l) mass function with respect to fish class, so for each l, (salmon l )+(bass l ) = 1
37 More on osterior posterior density (our goal) ( ) c l = likelihood (given) ( l c) ( c) ( l) rior (given) normalizing factor, often do not even need it for classification since (l) does not depend on class c. If we do need it, from the law of total probability: ( l ) = p( l salmon ) p( salmon ) + p( l bass) p( bass) Notice this formula consists of likelihoods and priors, which are given
38 More on osterior posterior cause (class) ( ) c l = c likelihood ( l c) ( c) ( l) l prior effect (length) If cause c is present, it easy to determine the probability of effect l with likelihood (l c) Usually observe the effect l without knowing cause c. Hard to determine cause c because there may be several causes which could produce same effect l Bayes rule makes I easy to determine posterior (c l), if we know likelihood (l c) and prior (c)
39 More on riors rior comes from prior knowledge, no data has been seen yet If there is a reliable source prior knowledge, it should be used Some problems cannot even be solved reliably without a good prior However prior alone is not enough, we still need likelihood (salmon)=/3, (sea bass)=1/3 If I don t let you see the data, but ask you to guess, will you choose salmon or sea bass? 39
40 More on Map Classifier posterior ( ) c l = likelihood prior ( l c) ( c) ( l) Do not care about (l) when maximizing (c l ) ( ) ( ) ( ) c l proportional c If (salmon)=(bass) (uniform prior) MA classifier becomes ML classifier ( c l) ( l c) If for some observation l, (l salmon)=(l bass), then this observation is uninformative and decision is based solely on the prior ( c l) ( c) l c
41 Justification for MA Classifier Let s compute probability of error for the MA estimate: > ( salmon l )? ( bass l ) bass < salmon For any particular l, probability of error (bass l) if we decide salmon r[error l ]= (salmon l) if we decide bass Thus MA classifier is optimal for each individual l! 41
42 Justification for MA Classifier We are interested to minimize error not just for one l, we really want to minimize the average error over all l r [ error] = p( error,l) dl = r[ error l] p( l)dl If r[error l ]is as small as possible, the integral is small as possible But Bayes rule makes r[error l ] as small as possible Thus MA classifier minimizes the probability of error!
43 Today Bayesian Decision theory Multiple Classes General loss functions Multivariate Normal Random Variable Classifiers Discriminant Functions 43
44 More General Case Let s generalize a little bit Have more than one feature x = [ x ] 1, x,..., xd Have more than classes { } c 1,c,..., cm
45 More General Case As before, for each j we have ( ) p x c j is likelihood of observation x given that the true class is c ( ) j c j is prior probability of class c j ( c ) j x is posterior probability of class c j given that we observed data x Evidence, or probability density for data p m ( x) = p( x c ) ( ) j c j j= 1 45
46 Minimum Error Rate Classification Want to minimize average probability of error r = = [ error] p( error, x) dx r[ error x] p( x)dx [ ] ( ) need to make this as small as possible r error x = 1 ci x if we decide class c i r[ error x] Decide on class c i is minimized with MA classifier ( c x) > ( c x) j i i MA classifier is optimal If we want to minimize the probability of error j if 1 1-(c 1 x) 1-(c x) (c 1 x) (c x) 1-(c 3 x) (c 3 x)
47 General Bayesian Decision Theory In close cases we may want to refuse to make a decision (let human expert handle tough case) allow actions { } α, α 1 α,..., Suppose some mistakes are more costly than others (classifying a benign tumor as cancer is not as bad as classifying cancer as benign tumor) Allow loss functions λ α describing loss i c j occurred when taking action α i when the true class is c j k ( ) 47
48 Conditional Risk Suppose we observe x and wish to take action α i If the true class is c, by definition, we incur loss ( ) j λ αi c j robability that the true class is c j after observing x is R ( c x) j The expected loss associated with taking action is called conditional risk and it is: α i m ( α ) = ( ) ( ) i x λ αi c j c j x j= 1
49 Conditional Risk sum over disjoint events (different classes) probability of class c j given observation x R m ( α ) = ( ) ( ) i x λ αi c j c j x penalty for taking action α i if observe x j= 1 part of overall penalty which comes from event that true class is c j c 1 c c 3 c 4 λ(α i c 1 ) λ(α i c ) λ(α i c 3 ) λ(α i c 4 )
50 Example: Zero-One loss function action is decision that true class is R m ( α x) λ( α c ) ( c x) = i α i λ ( α ) i c j = 1 = j= 1 = 1 ci i 0 if i = j otherwise j ( x) MA classifier is Bayes decision rule under zero-one loss function j i j (no mistake) (mistake) ( x) = c Thus MA classifier optimizes R(α i x) ( c x) > ( c x) j i i = r j j c i [ error if decide ] c i
51 Overall Risk Decision rule is a function α(x) which for every x specifies action α α out of { 1,,..., α k } The average risk for α(x) ( ) ( ( x) x) p( x)dx R α = R α need to make this as small as possible Bayes decision rule α(x) for every x is the action which minimizes the conditional risk R m ( α ) = ( ) ( ) i x λ αi c j c j x j= 1 Bayes decision rule α(x) is optimal, i.e. gives the minimum possible overall risk R* x 1 x x 3 X α(x 1 ) α(x ) α(x 3 ) { α, α α } 1,..., k
52 Bayes Risk: Example Salmon is more tasty and expensive than sea bass λ ( salmon bass) sb = λ = classify bass as salmon λ ( bass salmon) bs = λ = 1 classify salmon as bass λ = λ 0 no mistake, no loss ss bb = Likelihoods R ( salmon) p l riors (salmon)= (bass) = 1 e π ( l ) Risk R( x) = λ( α c ) ( ) j c j x 5 ( bass) p l = 1 e π ( ) ( ) ( ) ( ) salmon R l = λ s l + λ b l = λ ss sb sb b ( ) ( ) ( ) ( ) bass l j= 1 = λ s l + λ b l = λ bs bb bs s ( l ) l l 10 *4 m α = λ α s( s l) + λα b( b l)
53 R Bayes Risk: Example ( salmon l) = ( b l) R( bass l) ( s l) λ sb = λ Bayes decision rule (optimal for our loss function) Need to solve ( ) ( ) λ sb b l? λ > < salmon bs bass ( b l) λbs < ( s l) λsb ( l b) ( b) p( l) ( ) ( ) ( ) p l l s s s Or, equivalently, since priors are equal: = l ( l b) λ < ( l s) λsb bs bs
54 Bayes Risk: Example Need to solve ( l b) λbs < ( l s) λsb Substituting likelihoods and losses 1 π exp π exp ( l 10 ) 8 ( l 5 ) < 1 exp exp ( l 10 ) 8 ( l 5 ) < 1 exp ln exp ( l 10 ) 8 < ln( 1) ( l 5 ) ( l 10) ( l 5) + < 0 3l 0l < 0 l < salmon new decision boundary sea bass length
55 Likelihood Ratio Rule In category case, use likelihood ratio rule ( x c ) 1 ( ) x c > λ λ 1 1 λ λ 11 ( ) ( ) c c 1 likelihood ratio fixed number Independent of x If above inequality holds, decide c 1 Otherwise decide c 55
56 Discriminant Functions All decision rules have the same structure: at observation x choose class s.t. g i ( x) > g ( x) j i discriminant function ML decision rule: g ( x) = ( x ) j c i i c i MA decision rule: g ( x) ( c x) i = i Bayes decision rule: g ( x) R( c x) i = i
57 Discriminant Functions Classifier can be viewed as network which computes m discriminant functions and selects category corresponding to the largest discriminant select class giving maximim discriminant functions g 1 ( x) g ( x) g m ( x) features x1 x x 3 g i (x) can be replaced with any monotonically increasing function, the results will be unchanged xd
58 Decision Regions Discriminant functions split the feature vector space X into decision regions c 1 ( x) max{ } g = g i c c 3 c 3 c 1 58
59 Important oints If we know probability distributions for the classes, we can design the optimal classifier Definition of optimal depends on the chosen loss function Under the minimum error rate (zero-one loss function No prior: ML classifier is optimal Have prior: MA classifier is optimal More general loss function General Bayes classifier is optimal 59
Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationError Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3
Biometrics: A Pattern Recognition System Yes/No Pattern classification Biometrics CSE 190 Lecture 3 Authentication False accept rate (FAR): Proportion of imposters accepted False reject rate (FRR): Proportion
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationPattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationWhat does Bayes theorem give us? Lets revisit the ball in the box example.
ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1
CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of
More informationReview: mostly probability and some statistics
Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More information44 CHAPTER 2. BAYESIAN DECISION THEORY
44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationBayesian Decision Theory Lecture 2
Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationBAYES DECISION THEORY
BAYES DECISION THEORY PROF. ALAN YUILLE 1. How to make decisions in the presence of uncertainty? History 2 nd World War: Radar for detection aircraft, code-breaking, decryption. The task is to estimate
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationNearest Neighbor Pattern Classification
Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationBayes Rule for Minimizing Risk
Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture 2. Bayes Decision Theory
Lecture 2. Bayes Decision Theory Prof. Alan Yuille Spring 2014 Outline 1. Bayes Decision Theory 2. Empirical risk 3. Memorization & Generalization; Advanced topics 1 How to make decisions in the presence
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationBayesian decision making
Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationNon-parametric Methods
Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set MAS 6J/1.16J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More information