Bayesian Decision Theory

Similar documents
CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

EXST Regression Techniques Page 1

Data Assimilation 1. Alan O Neill National Centre for Earth Observation UK

Propositional Logic. Combinatorial Problem Solving (CPS) Albert Oliveras Enric Rodríguez-Carbonell. May 17, 2018

The Matrix Exponential

The Matrix Exponential

cycle that does not cross any edges (including its own), then it has at least

Lecture 37 (Schrödinger Equation) Physics Spring 2018 Douglas Fields

CPSC 665 : An Algorithmist s Toolkit Lecture 4 : 21 Jan Linear Programming

A Propagating Wave Packet Group Velocity Dispersion

Basic Polyhedral theory

Supplementary Materials

Construction of asymmetric orthogonal arrays of strength three via a replacement method

SCHUR S THEOREM REU SUMMER 2005

CS 361 Meeting 12 10/3/18

Superposition. Thinning

1 Minimum Cut Problem

Exact and Approximate Detection Probability Formulas in Fundamentals of Radar Signal Processing

Estimation of apparent fraction defective: A mathematical approach

Computing and Communications -- Network Coding

Problem Statement. Definitions, Equations and Helpful Hints BEAUTIFUL HOMEWORK 6 ENGR 323 PROBLEM 3-79 WOOLSEY

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

Brief Introduction to Statistical Mechanics

Chemistry 342 Spring, The Hydrogen Atom.

DISTRIBUTION OF DIFFERENCE BETWEEN INVERSES OF CONSECUTIVE INTEGERS MODULO P

Function Spaces. a x 3. (Letting x = 1 =)) a(0) + b + c (1) = 0. Row reducing the matrix. b 1. e 4 3. e 9. >: (x = 1 =)) a(0) + b + c (1) = 0

Observer Bias and Reliability By Xunchi Pu

Higher order derivatives

Solution of Assignment #2

Cramér-Rao Inequality: Let f(x; θ) be a probability density function with continuous parameter

EEO 401 Digital Signal Processing Prof. Mark Fowler

CS 6353 Compiler Construction, Homework #1. 1. Write regular expressions for the following informally described languages:

ECE602 Exam 1 April 5, You must show ALL of your work for full credit.

Economics 201b Spring 2010 Solutions to Problem Set 3 John Zhu

Dealing with quantitative data and problem solving life is a story problem! Attacking Quantitative Problems

Einstein Equations for Tetrad Fields

Fourier Transforms and the Wave Equation. Key Mathematics: More Fourier transform theory, especially as applied to solving the wave equation.

y = 2xe x + x 2 e x at (0, 3). solution: Since y is implicitly related to x we have to use implicit differentiation: 3 6y = 0 y = 1 2 x ln(b) ln(b)

Bifurcation Theory. , a stationary point, depends on the value of α. At certain values

Search sequence databases 3 10/25/2016

Addition of angular momentum

Why is a E&M nature of light not sufficient to explain experiments?

SECTION where P (cos θ, sin θ) and Q(cos θ, sin θ) are polynomials in cos θ and sin θ, provided Q is never equal to zero.

Where k is either given or determined from the data and c is an arbitrary constant.

Random Access Techniques: ALOHA (cont.)

Addition of angular momentum

Elements of Statistical Thermodynamics

EEO 401 Digital Signal Processing Prof. Mark Fowler


Sundials and Linear Algebra

Calculus II (MAC )

Derangements and Applications

The van der Waals interaction 1 D. E. Soper 2 University of Oregon 20 April 2012

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

On spanning trees and cycles of multicolored point sets with few intersections

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

u r du = ur+1 r + 1 du = ln u + C u sin u du = cos u + C cos u du = sin u + C sec u tan u du = sec u + C e u du = e u + C

Alpha and beta decay equation practice

Chemical Physics II. More Stat. Thermo Kinetics Protein Folding...

(Upside-Down o Direct Rotation) β - Numbers

Chapter 10. The singular integral Introducing S(n) and J(n)

Continuous probability distributions

perm4 A cnt 0 for for if A i 1 A i cnt cnt 1 cnt i j. j k. k l. i k. j l. i l

Engineering 323 Beautiful HW #13 Page 1 of 6 Brown Problem 5-12

Exam 1. It is important that you clearly show your work and mark the final answer clearly, closed book, closed notes, no calculator.

Need to understand interaction of macroscopic measures

Week 3: Connected Subgraphs

VALUING SURRENDER OPTIONS IN KOREAN INTEREST INDEXED ANNUITIES

Math 34A. Final Review

Ch. 24 Molecular Reaction Dynamics 1. Collision Theory

Text: WMM, Chapter 5. Sections , ,

Abstract Interpretation: concrete and abstract semantics

u 3 = u 3 (x 1, x 2, x 3 )

4. Money cannot be neutral in the short-run the neutrality of money is exclusively a medium run phenomenon.

5.80 Small-Molecule Spectroscopy and Dynamics

1 Isoparametric Concept

Hydrogen Atom and One Electron Ions

Linear-Phase FIR Transfer Functions. Functions. Functions. Functions. Functions. Functions. Let

Differential Equations

SME 3033 FINITE ELEMENT METHOD. Bending of Prismatic Beams (Initial notes designed by Dr. Nazri Kamsah)

BINOMIAL COEFFICIENTS INVOLVING INFINITE POWERS OF PRIMES. 1. Statement of results

6.1 Integration by Parts and Present Value. Copyright Cengage Learning. All rights reserved.

On the Hamiltonian of a Multi-Electron Atom

UNTYPED LAMBDA CALCULUS (II)

EECE 301 Signals & Systems Prof. Mark Fowler

First order differential equation Linear equation; Method of integrating factors

Pipe flow friction, small vs. big pipes

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, April 04, 2005, 8:35 AM) PART I: CHAPTER TWO COMB MATH.

GEOMETRICAL PHENOMENA IN THE PHYSICS OF SUBATOMIC PARTICLES. Eduard N. Klenov* Rostov-on-Don, Russia

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

1997 AP Calculus AB: Section I, Part A

COUNTING TAMELY RAMIFIED EXTENSIONS OF LOCAL FIELDS UP TO ISOMORPHISM

That is, we start with a general matrix: And end with a simpler matrix:

orbiting electron turns out to be wrong even though it Unfortunately, the classical visualization of the

Bounds on the Probability of Misclassification among Hidden Markov Models

Introduction to Arithmetic Geometry Fall 2013 Lecture #20 11/14/2013

Linear Non-Gaussian Structural Equation Models

Combinatorial Networks Week 1, March 11-12

Nuclear reactions The chain reaction

Derivation of Electron-Electron Interaction Terms in the Multi-Electron Hamiltonian

Transcription:

Baysian Dcision Thory

Baysian Dcision Thory Know probabiity distribution of th catgoris Amost nvr th cas in ra if! Nvrthss usfu sinc othr cass can b rducd to this on aftr som work Do not vn nd training data Can dsign optima cassifir

Baysian Dcision thory Fish Examp: Each fish is in on of stats: sa bass or samon Lt ω dnot th stat of natur ω = ω for sa bass ω = ω for samon Th stat of natur is unprdictab ω is a variab that must b dscribd probabiisticay. If th catch producd as much samon as sa bass th nxt fish is quay iky to b sa bass or samon. Dfin: (ω ) : a priori probabiity that th nxt fish is sa bass (ω ): a priori probabiity that th nxt fish is samon.

Baysian Dcision thory If othr typs of fish ar irrvant: ( ω ) + ( ω ) =. rior probabiitis rfct our prior knowdg (.g. tim of yar, fishing ara, ) Simp dcision Ru: Mak a dcision without sing th fish. Dcid ω if ( ω ) > ( ω ); ω othrwis. OK if dciding for on fish If svra fish, a assignd to sam cass In gnra, w hav som faturs and mor information. 4

Cats and Dogs Suppos w hav ths conditiona probabiity mass functions for cats and dogs (sma ars dog) = 0., (arg ars dog) = 0.9 (sma ars cat) = 0.8, (arg ars cat) = 0. Obsrv an anima with arg ars Dog or a cat? Maks sns to say dog bcaus probabiity of obsrving arg ars in a dog is much argr than probabiity of obsrving arg ars in a cat r[arg ars dog] = 0.9 > 0.= r[arg ars cat] = 0. W choos th vnt of argr probabiity, i.. maximum ikihood vnt

Examp: Fish Sorting Rspctd fish xprt says that Samon ngth has distribution N(5,) Sa bass s ngth has distribution N(0,4) Rca if r.v. is Nµ,σ, thn it s dnsity is p ( x) = ( ) σ ( x µ ) σ π 6

Cass Conditiona Dnsitis ( ) p samon fixd = π ( ) 5 ( ) p bass fixd = π ( ) 0 *4 7

Likihood function Fix ngth, t fish cass vary. Thn w gt ikihood function (it is not dnsity and not probabiity mass) π π ( 5) ( cass) = ( 0) p fixd 8 if cass= samon if cass= bass 8

Likihood vs. Cass Conditiona Dnsity p( cass) 7 ngth Suppos a fish has ngth 7. How do w cassify it? 9

ML (maximum ikihood) Cassifir W woud ik to choos samon if r ngth= 7 samon> r ngth= 7 bass [ ] [ ] Howvr, sinc ngth is a continuous r.v., [ ngth= 7 samon] = r[ ngth= 7 bass] 0 r = Instad, w choos cass which maximizs ikihood ( 5) ( 0) ( ) p samon = ( ) *4 p bass = π π ML cassifir: for an obsrvd : bass <?p bass > samon ( ) ( ) p samon in words: if p( samon) > p( bass), cassify as samon, s cassify as bass

ML (maximum ikihood) Cassifir p( 7 bass) p( 7 samon) Thus w choos th cass (bass) which is mor iky to hav givn th obsrvation 7

Dcision Boundary cassify as samon cassify as sa bass 6.70 ngth

How rior Changs Dcision Boundary? Without priors samon 6.70 sa bass ngth How shoud this chang with prior? (samon) = /3 (bass) = /3 samon?? 6.70 sa bass ngth 3

Bays Dcision Ru. Hav ikihood functions p(ngth samon) and p(ngth bass). Hav priors (samon) and (bass) Qustion: Having obsrvd fish of crtain ngth, do w cassify it as samon or bass? Natura Ida: samon if bass if ( ) ( ) samon ngth > bass ngth ( ) ( ) bass ngth > samon ngth 4

ostrior (samon ngth) and (bass ngth) ar cad postrior distributions, bcaus th data (ngth) was rvad (post data) How to comput postriors? Not obvious 5 From Bays ru: ( ) ( ) ( ) n g t h n g t h n g t h n g t h p s a s a s a s a s a s a s a s a n g t h n g t h n g t h n g t h s a m o n s a m o n s a m o n s a m o n = ( ) ( ) ( ) ( ) p ngth bass bass p ngth ngth bass = Simiary:

MA (maximum a postriori) cassifir p > samon samon ngth? bass < ( ) ( ) bass ( ngth samon) ( samon) p( ngth bass) ( bass)? p( ngth) p( ngth) p bass< > samon ngth ( ) ( ) ( ) ( ) p ngth samon samon? bass< >samon p ngth bass bass 6

Back to Fish Sorting Examp Likihood p( samon) = π ( ) 5 ( bass) riors: (samon) = /3, (bass) = /3 Sov inquaity samon 6.70 π p = π ( ) 0 8 ( 5) ( 0) 8 nw dcision boundary 7.8 > 3 π sa bass ngth 3 Nw dcision boundary maks sns sinc w xpct to s mor samon 7

rior (s)=/3 and (b)= /3 vs. rior (s)=0.999 and (b)= 0.00 samon bass 7. 8.9 ngth

Likihood vs ostriors (samon ) p( samon) (bass ) p( bass) ikihood p( fish cass) dnsity with rspct to ngth, ara undr th curv is ngth postrior (fish cass ) mass function with rspct to fish cass, so for ach, (samon )+(bass ) =

Mor on ostrior postrior dnsity (our goa) ( ) c = ikihood (givn) ( c) ( c) ( ) rior (givn) normaizing factor, oftn do not vn nd it for cassification sinc () dos not dpnd on cass c. If w do nd it, from th aw of tota probabiity: ( ) = p( samon) p( samon) + p( bass) p( bass) Notic this formua consists of ikihoods and priors, which ar givn

Mor on riors rior coms from prior knowdg, no data has bn sn yt If thr is a riab sourc prior knowdg, it shoud b usd Som probms cannot vn b sovd riaby without a good prior

Mor on Map Cassifir postrior ( ) c = ikihood prior ( c) ( c) ( ) Do not car about () whn maximizing (c ) ( ) ( ) ( ) c proportiona c c If (samon)=(bass) (uniform prior) MA cassifir bcoms ML cassifir ( c ) ( c) If for som obsrvation, ( samon)=( bass), thn this obsrvation is uninformativ and dcision is basd soy on th prior ( c ) ( c)

Justification for MA Cassifir Lt s comput probabiity of rror for th MA stimat: > ( samon )? ( bass ) bass < samon For any particuar, probabiity of rror (bass ) if w dcid samon r[rror ]= (samon ) if w dcid bass Thus MA cassifir is optima for ach individua! 3

Justification for MA Cassifir W ar intrstd to minimiz rror not ust for on, w ray want to minimiz th avrag rror ovr a r [ rror] = p( rror,) d= r[ rror ] p( )d If r[rror ]is as sma as possib, th intgra is sma as possib But Bays ru maks r[rror ] as sma as possib Thus MA cassifir minimizs th probabiity of rror!

Mor Gnra Cas Lt s gnraiz a itt bit Hav mor than on fatur x = [ x, x,..., ] Hav mor than casss xd,c,..., cm { } c

Mor Gnra Cas As bfor, for ach w hav ( ) p x c is ikihood of obsrvation x givn that th tru cass is c ( ) c is prior probabiity of cass c ( c ) x is postrior probabiity of cass c givn that w obsrvd data x Evidnc, or probabiity dnsity for data p m ( x) = p( x c ) ( ) c = 6

Minimum Error Rat Cassification Want to minimiz avrag probabiity of rror r = = [ rror] p( rror, x) dx r[ rror x] p( x)dx [ ] ( ) nd to mak this as sma as possib r rror x = ci x if w dcid cass c i r[ rror x] Dcid on cass c i is minimizd with MA cassifir ( c x) > ( c x) i i MA cassifir is optima If w want to minimiz th probabiity of rror if -(c x) -(c x) (c x) (c x) -(c 3 x) (c 3 x)

Gnra Baysian Dcision Thory In cos cass w may want to rfus to mak a dcision (t human xprt hand tough cas) aow actions { } α, α α,..., Suppos som mistaks ar mor costy than othrs (cassifying a bnign tumor as cancr is not as bad as cassifying cancr as bnign tumor) Aow oss functions λα dscribing oss i c occurrd whn taking action α i whn th tru cass is c k ( ) 8

Conditiona Risk Suppos w obsrv x and wish to tak action α i If th tru cass is c, by dfinition, w incur oss ( ) λαi c robabiity that th tru cass is c aftr obsrving x is R ( c x) Th xpctd oss associatd with taking action is cad conditiona risk and it is: α i m ( α ) = ( ) ( ) i x λαi c c x =

Conditiona Risk sum ovr disoint vnts (diffrnt casss) probabiity of cass c givn obsrvation x R m = ( α x) λ( α c ) ( c x) i pnaty for taking action α i if obsrv x = i part of ovra pnaty which coms from vnt that tru cass is c

Examp: Zro-On oss function action is dcision that tru cass is R m ( α x) λ( α c ) ( c x)= i α i λ ( α ) i c = = == = ci i 0 if i = othrwis ( x) i ( c x)= Thus MA cassifir optimizs R(α i x) ( c x) > ( c x) i i = r c i (no mistak) (mistak) [ rror if dcid ] MA cassifir is Bays dcision ru undr zro-on oss function c i

Ovra Risk Dcision ru is a function α(x) which for vry x spcifis action out of { α, α,..., α k } Th avrag risk for α(x) ( ) ( ( x ) x ) p ( x )dx R α = R α x x x 3 X α(x ) α(x ) α(x 3 ) nd to mak this as sma as possib { α, α α },..., Bays dcision ru α(x) for vry x is th action which minimizs th conditiona risk R m ( α ) = ( ) ( ) i x λαi c c x = Bays dcision ru α(x) is optima, i.. givs th minimum possib ovra risk R* k

Bays Risk: Examp Samon is mor tasty and xpnsiv than sa bass λ ( samon bass) sb =λ = cassify bass as samon λ ( bass samon) bs =λ = cassify samon as bass λ =λ 0 no mistak, no oss ss bb= Likihoods R ( samon) p riors (samon)= (bass) = ( ) Risk R( x) = λ( α c ) ( ) c x p( bass) = ππ π ( ) ( ) ( ) ( ) samon R = λ s + λ b = λ ss sb 5 sb b ( ) ( ) ( ) ( ) bass = = λ s + λ b = λ bs bb bs s ( ) 0 *4 m α =λ α s( s ) + λα b( b )

R Bays Risk: Examp ( samon ) = ( b ) R( bass ) ( s ) λ sb =λ Bays dcision ru (optima for our oss function) Nd to sov ( ) ( ) λ sb b? λ > < samon bs bass ( b ) λbs < ( s ) λsb ( b) ( b) p( ) ( ) ( ) ( ) p s s s Or, quivanty, sinc priors ar qua: = ( b) λ < ( s) λsb bs bs

Bays Risk: Examp Nd to sov ( b) λbs < ( s) λsb Substituting ikihoods and osss π xp ( 0) ( 5) xp π 8 < xp ( 0) 8 ( 5) xp 5 < xp n xp ( 0) ( 5) 8 ( ) < n ( 0) ( 5) + < 0 3 0< 0 0 < 6.6667 8 samon nw dcision boundary 6.67 6.70 sa bass ngth

Likihood Ratio Ru In catgory cas, us ikihood ratio ru ( x c ) ( ) x c > λ λ λ λ ( ) ( ) c c ikihood ratio fixd numbr Indpndnt of x If abov inquaity hods, dcid c Othrwis dcid c 36

Discriminant Functions A dcision rus hav th sam structur: at obsrvation x choos cass s.t. g i ( x) > g ( x) i discriminant function ML dcision ru: g ( x) = ( x ) c i i c i MA dcision ru: g ( x) ( c x) i = i Bays dcision ru: g ( x) R( c x) i = i

Dcision Rgions Discriminant functions spit th fatur vctor spac X into dcision rgions c ( x) max{ } g = g i c c 3 c 3 c 38

Important oints If w know probabiity distributions for th casss, w can dsign th optima cassifir Dfinition of optima dpnds on th chosn oss function Undr th minimum rror rat (zro-on oss function No prior: ML cassifir is optima Hav prior: MA cassifir is optima Mor gnra oss function Gnra Bays cassifir is optima 39