Image Processing 1 (IP1) Bildverarbeitung 1

Size: px

Start display at page:

Download "Image Processing 1 (IP1) Bildverarbeitung 1"

Dinah Fields
5 years ago
Views:

1 MIN-Fakultät Fachbereich Informatik Arbeitsbereich SAV/BV (KOGS) Image Processing 1 (IP1) Bildverarbeitung 1 Lecture 16 Decision Theory Winter Semester 014/15 Dr. Benjamin Seppke Prof. Siegfried SKehl

2 Sta$s$cal Decision Theory Genera$ng decision func$ons from a sta$s$cal characteriza$on of classes (as opposed to a characteriza$on by prototypes) Advantages: 1. The classificakon scheme may be designed to saksfy an objeckve opkmality criterion: OpKmal decisions minimize the probability of error.. StaKsKcal descripkons may be much more compact than a colleckon of prototypes. 3. Some phenomena may only be adequately described using stakskcs, e.g. noise.

3 Eample: Medical Screening I Health test based on some measurement (e.g. ECG evaluakon) It is known that every 10th person is sick (prior probability): ω 1 class of healthy people P(ω 1 ) = 9/10 ω class of sick people P(ω ) = 1/10 Task 1: Classify without taking any measurements (to save money) Decision rule 1a: Classify every 10th person as sick P(error) = P(decide sick if healthy) + P(decide healthy if sick) = 1/10 9/10 + 9/10 1/10 = 0.18 Decision rule 1b: Classify all persons as healthy P(error) = P(decide healthy if sick) = 1/10 = 0.1 Decision rule 1b is bexer because it gives lower probability of error Decision rule 1b is opkmal because no other decision rule can give a lower probability of error (try "every n- th" in 1a and minimize over n) 3

neg P(e ) = P(error given ) = P(ω ω' ) = 1 - P(ω ) where ω' is the class assigned to by the decision rule.

4 Eample: Medical Screening II Task : Classify a[er taking a measurement Assume that the stakskcs of prototypes are given as p( ω i ), i = 1, Person No. indicakon neg neg pos neg pos neg P(e ) = P(error given ) = P(ω ω' ) = 1 - P(ω ) where ω' is the class assigned to by the decision rule. How do we get the "posterior" probabilikes P(ω i )? p( ω i ) ω : sick P(e ) is minimized by choosing the class which maimizes P(ω ). Hence g i () = P(ω i ) are discriminant func$ons. ω 1 : healthy

5 Eample: Medical Screening (3) The posterior probabilikes P(ω i ) can be computed from the "likelihood" p( ω i ) using Bayes formula: P( ω i ) = p ( ω i)p ω i p() = p ( ω i)p( ω i ) p( ω i )P ω i For the eample, using Bayes Formula, one could get: i p( ω i ) P(ω i ) decision boundary

General Framework for Bayes Classifica$on Sta$s$cal decision theory minimizes the probability of error for classifica$ons based on uncertain evidence ω 1... ω K P(ω k ) T = 1.

6 General Framework for Bayes Classifica$on Sta$s$cal decision theory minimizes the probability of error for classifica$ons based on uncertain evidence ω 1... ω K P(ω k ) T = 1... N p ω k P ω k K classes prior probability that an object of class k will be observed N- dimensional feature vector of an object condikonal probability ("likelihood") of observing given that the object belongs to class ω K condikonal probability ("posterior probability") that an object belongs to class ω K given is observed Bayes decision rule: Classify given evidence as class ω such that ω minimizes the probability of error P ω ω " g i à Choose ω which maimizes the posterior probability are discriminant funckons. = P ω i P( ω ) 6

7 Bayes - class Decisions If the decision is between classes ω 1 and ω, the decision rule can be simplified: Choose ω 1 if > P ω P ω 1 p ω 1 p ω p ω 1 p ω is called the "likelihood rako" Several alternakve forms are possible for a discriminant funckon: g ( ) = P( ω 1 ) P( ω ) g For eponenkal and Gaussian distribukons it is useful to take the logarithm: g( ) = log P ω 1 # " P ω $ & = log p ( ω 1 )P ω 1 # % " p ( ω )P ω = P ω 1 P ω $ & = log p ω 1 # % " p ω $ & log P ω # % " P ω 1 $ & % 7

8 Normal Distribu$ons Gaussian ("normal") mulkvariate distribukon: with: Σ = E # $ ( µ) T ( µ) % & µ N N covariance matri mean vector p ( ) = 1 ( π ) N Σ N e 1 ( µ ) T Σ 1 ( µ ) For decision problems, loci of points of constant density are intereskng. For Gaussian mulkvariate distribukons, these are hyperellipsoids: ( µ) T Σ 1 ( µ) = constant Eigenvectors of Σ determine direckons of principal aes of the ellipsoids, Eigenvalues determine lengths of the principal aes. d = ( µ) T Σ 1 ( µ) is called "squared Mahalanobis distance" of from. µ 1 8

9 Discriminant Func$on for Normal Distribu$ons General form: g i ( ) = log p ( ω i ) log P ω i ( ) For p ( ω i ) N ( µ i, Σ i ) : g i ( ) = 1 ( µ) T Σ 1 ( µ) N log( π ) 1 log Σ i irrelevant for decisions + log( P( ω i )) We consider the discriminant funckons for three intereskng special cases: univariate distribukon N=1 stakskcally independent, equal variance variables i equal covariance matrices Σ i = Σ 9

10 Univariate Distribu$on AssumpKon: p( ω i ) are univariate Gaussian distribukons. Eample: classes p( ω 1 ) = 1 e ( µ 1 ) σ 1 πσ 1 p( ω i ) P(ω i ) p( ω ) = 1 e ( µ ) σ πσ ω 1 ω Decision rule: g i ( ) ( ) = log P ω i = 1 σ i ( µ) 1 log σ i + log P( ω i ) 10

11 Sta$s$cally Independent, Equal Variance Variables In case of insufficient stakskcal data, variables are somekmes assumed to be stakskcally independent and of equal variance. Σ i = σ I g i ( ) = 1 σ i If P(ω i ) = 1/N, then the decision rule is equivalent to the minimum- distance classificakon rule. By epanding g i and dropping the T term, one gets the decision rule: g i which is linear in g i ( ) = 1 µ + log P ω i ( ) " µ T + µ T # µ $ % σ + log P ω i i ( ) = ( w i ) T + w i0 and can be wrixen as: The decision surface is composed of hyperplanes. ( ) 1 11

Equal Covariance Matrices If Σ i = Σ, the decision rule can be simplified: g i ( ) = 1 σ i µ T Σ 1 µ + log P( ω i ) By epanding the quadrakc form and dropping T Σ 1 decision rule which can (again) be

12 Equal Covariance Matrices If Σ i = Σ, the decision rule can be simplified: g i ( ) = 1 σ i µ T Σ 1 µ + log P( ω i ) By epanding the quadrakc form and dropping T Σ 1 decision rule which can (again) be wrixen as: one gets another linear g i ( ) = ( w i ) T + w i0 If the a- priori probabili$es are equal, the decision rule assigns to the class where the Mahalanobis distance to the mean is minimal. µ i 1 1

13 Es$ma$ng Probability Densi$es Let R be a region in feature space with volume V. Let k out of N samples lie in R. p ( ) d k N p ( )V R 3 R 1 p ( N ) k V rela$ve frequency of samples per volume A sequence of approimakons p n ( ) may be obtained by changing the volume V n as the number of samples n increases. Eamples: V n 1/ n Parzen Windows k n n adjust volume for k nearest neighbours Conditions for a converging sequence of estimates p n (): 1. limv n = 0 n. lim k n = n k 3. lim n n n = 0 13

14 Given: Es$ma$ng the Mean in a Univariate Normal Density p( µ) = N(µ, σ ) known normal probability density for ecept of unknown mean µ p(µ) = N(µ 0, σ 0 ) prior knowledge about µ: a normal density with known µ 0 and σ 0 X = { 1... n } samples drawn from p() Es$ma$on using Bayes Rule: p(µ X) = = α n p(x µ)p(µ) = α p( k µ)p(µ) p(x µ)p(µ)dµ n k=1 1 πσ e 1$ k µ ' & ) % σ ( k=1 1 πσ 0 e 1$ & % µ µ 0 σ 0 ' ) ( = α is scale factor independent of µ 1 πσ n e 1$ & % µ µ n σ n ' ) ( with µ n = " n nσ 0 1 $ nσ 0 +σ k # n k=1 % '+ & σ nσ 0 +σ µ 0 and σ n = σ 0σ nσ 0 +σ Best es$mate of mean µ axer observing n samples 14

Bayesian Decision Theory

Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian