Statistical classifiers: Bayesian decision theory and density estimation

Size: px
Start display at page:

Download "Statistical classifiers: Bayesian decision theory and density estimation"

Transcription

1 3 rd NOSE Shrt Curse Alpbach, st 6 th Mar 004 Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Department f Cmputer Science rgutier@cs.tamu.edu

2 Outline Chapter : Review f pattern classificatin Chapter : Review f prbability thery Chapter 3: Bayesian Decisin Thery Chapter 4: Quadratic classifiers Chapter 5: Kernel density estimatin Chapter 6: Nearest neighbrs Chapter 7: Perceptrn and least-squares classifiers 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - - Ricard Gutierrez-

3 CHAPTER : Review f pattern classificatin Features and patterns 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 3- Ricard Gutierrez-

4 Features and patterns () Feature Feature is any distinctive aspect, quality r characteristic Features may be symblic (i.e., clr) r numeric (i.e., height) Feature vectr: The cmbinatin f d features is represented as a d- dimensinal clumn Feature space: The d-dimensinal space defined by the feature vectr Scatter plt: Representatin f an bject cllectin in feature space x = Feature x x x d Feature vectr Class 3 x 3 x x Class Scatter plt x Feature space Class Feature 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 4- Ricard Gutierrez-

5 Features and patterns () Pattern Pattern is a cmpsite f traits r features characteristic f an individual In classificatin tasks, a pattern is a pair f variables {x,ω} where x is a cllectin f bservatins r features (feature vectr) ω is the cncept behind the bservatin (label) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 5- Ricard Gutierrez-

6 Features and patterns (3) What makes a gd feature vectr? The quality f a feature vectr is related t its ability t discriminate examples frm different classes Examples frm the same class shuld have similar feature values Examples frm different classes have different feature values Gd features Bad features Mre feature prperties Linear separability Nn-linear separability Highly crrelated features Multi-mdal 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 6- Ricard Gutierrez-

7 Classifiers The task f a classifier is t partitin feature space int class-labeled decisin regins Brders between decisin regins are called decisin bundaries The classificatin f feature vectr x cnsists f determining which decisin regin it belngs t, and assign x t this class R R3 R In this lecture we will verview tw methdlgies fr designing classifiers Based n the underlying prbability density functins f the data Based n gemetric pattern-separability criteria R R R4 R3 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 7- Ricard Gutierrez-

8 CHAPTER : Review f prbability thery What is a prbability Prbability density functins Cnditinal prbability Bayes therem Prbabilistic reasning: a case example 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 8- Ricard Gutierrez-

9 Basic prbability cncepts Prbabilities are numbers assigned t events that indicate hw likely it is that the event will ccur when a randm experiment is perfrmed A prbability law fr a randm experiment is a rule that assigns prbabilities t the events in the experiment The sample space S f a randm experiment is the set f all pssible utcmes S A3 A PROBABILITY LAW prbability A4 A A A A3 A4 event 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 9- Ricard Gutierrez-

10 Cnditinal prbability () If A and B are tw events, the prbability f event A when we already knw that event B has ccurred is defined by the relatin P[A IB] P[A B] = fr P[B] > P[B] 0 This cnditinal prbability P[A B] is read: the cnditinal prbability f A cnditined n B, r simply the prbability f A given B 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 0- Ricard Gutierrez-

11 Cnditinal prbability () Interpretatin The new evidence B has ccurred has the fllwing effects The riginal sample space S (the whle square) becmes B (the rightmst circle) The event A becmes A B P[B] simply re-nrmalizes the prbability f events that ccur jintly with B S S A A B B B has A A B B ccurred 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - - Ricard Gutierrez-

12 Therem f ttal prbability Let B, B,, B N be a partitin f S, a set f mutually exclusive events such that B B 3 B N- A S = B UB U... U BN B B 4 B N Any event A can then be represented as: A = A I S = A I(B UB U... UBN) = (A IB) U(A IB ) U...(A IB Since B, B,, B N are mutually exclusive then, by Axim III: P[A] = P[A I B ] + P[A IB] P[A and, therefre P[A] = P[A B I N ]P[B] +...P[A BN]P[BN] = P[A Bk ]P[Bk ] k= B N ] N ) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - - Ricard Gutierrez-

13 Bayes therem Given {B, B,, B N }, a partitin f the sample space S. Suppse that event A ccurs; what is the prbability f event B j? Using the definitin f cnditinal prbability and the therem f ttal prbability we btain P[B j P[A IB j] P[A B j] P[B j] A] = = N P[A] P[A B ] P[B This is knwn as Bayes therem r Bayes rule, and is (ne f) the mst useful relatins in prbability and statistics k= k k ] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 3- Ricard Gutierrez-

14 Applying Bayesian therem () Cnsider a clinical prblem where we need t decide if a patient has a particular medical cnditin n the basis f an imperfec test: Smene with the cnditin may g undetected (false-negative) Smene free f the cnditin may yield a psitive result (falsepsitive) Nmenclature SPECIFICITY: The true-negative rate P(NEG COND) f a test SENSITIVITY: The true-psitive rate P(POS COND) f a test 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 4- Ricard Gutierrez-

15 Applying Bayesian therem () PROBLEM Assume a ppulatin f 0,000 where ut f every 00 peple has the medical cnditin Assume that we design a test with 98% specificity P(NEG COND) and 90% sensitivity P(POS COND) Assume yu take the test, and it yields a POSITIVE result What is the prbability that yu have the medical cnditin? 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 5- Ricard Gutierrez-

16 Applying Bayesian therem (3) SOLUTION A: Fill in the jint frequency table belw The answer is the rati f individuals with the cnditin t ttal individuals (cnsidering nly individuals that tested psitive) r 90/88=0.35 TEST IS POSITIVE TEST IS NEGATIVE ROW TOTAL HAS CONDITION True-psitive P(POS COND) False-negative P(NEG COND) =90 00 (-0.90)=0 00 False-psitive True-negative FREE OF CONDITION P(POS COND) P(NEG COND) 9,900 (-0.98)=98 9, =9,07 9,900 COLUMN TOTAL 88 9,7 0,000 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 6- Ricard Gutierrez-

17 Applying Bayesian therem (4) SOLUTION B: Apply Bayes therem P[COND POS] = = P[POS COND] P[COND] P[POS] = = P[POS COND] P[COND] P[POS COND] P[COND] + P[POS COND] P[ COND] = = ( 0.98) 0.99 = = rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 7- Ricard Gutierrez-

18 Bayes therem and pattern classificatin Fr the purpse f pattern classificatin, Bayes therem is nrmally expressed as P[ ω x] = j N k= P[x ω ] P[ ω ] P[x ω ] P[ ω ] where ω j is the i th class and x is the feature vectr j k j k P[x ω j] P[ ω j] = P[x] Bayes therem is relevant because, as we will see in a minute, a sensible classificatin rule is t chse the class ω i with the highest P[ω i x] This represents the intuitive ratinale f chsing the class that is mre likely given the bserved feature vectr x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 8- Ricard Gutierrez-

19 Bayes therem and pattern classificatin Each term in the Bayes therem has a special name, which yu shuld becme familiarized with P[ ω Prir prbability (f class ω i ) j] P[ ω Psterir Prbability (f class ω i given the j x] bservatin x) P[x ω j] Likelihd (cnditinal prbability f bservatin x given class ω i ) P[x] A nrmalizatin cnstant (des nt affect the decisin) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 9- Ricard Gutierrez-

20 CHAPTER 3: Bayesian Decisin Thery The Likelihd Rati Test The Prbability f Errr The Bayes Risk Bayes, MAP and ML Criteria Multi-class prblems Discriminant Functins 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 0- Ricard Gutierrez-

21 The Likelihd Rati Test () Assume we are t classify an bject based n the evidence prvided by a measurement (r feature vectr) x Wuld yu agree that a reasnable decisin rule wuld be the fllwing? "Chse the class that is mst prbable given the bserved feature vectr x Mre frmally: Evaluate the psterir prbability f each class P(ω i x) and chse the class with largest P(ω i x) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - - Ricard Gutierrez-

22 The Likelihd Rati Test () Let us examine this decisin rule fr a -class prblem In this case the decisin rule becmes if P(ω x) > P(ω x) else chse ω chse ω Or, in a mre cmpact frm Applying Bayes therem ω P(ω x) < > P(ω ω x) ω P (x ω )P(ω ) P(x ω)p(ω P(x) < > P(x) ω ) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - - Ricard Gutierrez-

23 The Likelihd Rati Test (3) P(x) des nt affect the decisin rule s it can be eliminated*. Rearranging the previus expressin Λ(x) = ω P(x ω ) P(x ω ) < > ω P(ω ) P(ω ) The term Λ(x) is called the likelihd rati, and the decisin rule is knwn as the likelihd rati test *P(x) can be disregarded in the decisin rule since it is cnstant regardless f class ω i. Hwever, P(x) will be needed if we want t estimate the psterir P(ω i x) which, unlike P(x ω i )P(x), is a true prbability value and, therefre, gives us an estimate f the gdness f ur decisin. 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 3- Ricard Gutierrez-

24 Likelihd Rati Test: an example () Given a classificatin prblem with the fllwing class cnditinal densities: P(x ω ) = P(x ω ) = e π e π (x 4) (x 0) P(x ω ) P(x ω ) 4 0 x Derive a classificatin rule based n the Likelihd Rati Test (assume equal prirs) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 4- Ricard Gutierrez-

25 Likelihd Rati Test: an example () Slutin Substituting the given likelihds and prirs int the LRT expressin: Λ(x) = π π e e (x 4) ω (x 0) > < ω Simplifying, changing signs and taking lgs: Which yields: ω < x 7 > ω This LRT result makes intuitive sense since the likelihds are identical and differ nly in their mean value (x 4) (x 0) ω < 0 > R : say ω R : say ω P(x ω ) P(x ω ) ω 4 0 x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 5- Ricard Gutierrez-

26 The prbability f errr Prb. f errr is the prbability f assigning x t the wrng class Fr a tw-class prblem, P[errr x] is simply P(errr x) = P(ω P(ω x) x) if if we decide we decide ω ω It makes sense that the classificatin rule be designed t minimize the average prb. f errr P[errr] acrss all pssible values f x + P(errr) = P(errr,x)dx = P(errr x)p(x)dx T minimize P(errr) we minimize the integrand P(errr x) at each x: chse the class with maximum psterir P(ω i x) This is called the MAXIMUM A POSTERIORI (MAP) RULE + 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 6- Ricard Gutierrez-

27 Minimizing prbability f errr We prve the ptimality f the MAP rule graphically The right plt shws the psterir fr each f the tw classes The bttm plts shws the P(errr) fr the MAP rule and an alternative decisin rule Which ne has lwer P(errr) (clr-filled area)? P(w i x) x THE MAP RULE THE OTHER RULE Chse RED Chse BLUE Chse RED Chse RED Chse BLUE Chse RED 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 7- Ricard Gutierrez-

28 The Bayes Risk () S far we have assumed that the penalty f misclassifying a class ω example as class ω is the same as the reciprcal In general, this is nt the case: Fr example, misclassifying a cancer sufferer as a healthy patient is a much mre serius prblem than the ther way arund Misclassifying salmn as sea bass has lwer cst (unhappy custmers) than the ppsite errr This cncept can be frmalized in terms f a cst functin C ij C ij represents the cst f chsing class ω i when class ω j is the true class We define the Bayes Risk as the expected value f the cst R = E[C] = i= j= C ij P[chse ωi and x ω j] = Cij P[x Ri ω j] P[ω j] i= j= 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 8- Ricard Gutierrez-

29 The Bayes Risk () What is the decisin rule that minimizes the Bayes Risk? It can be shwn* that the minimum risk can be achieved by using the fllwing decisin rule: P(x ω ) > (C P(x ω ) < (C ) P[ω ] ) P[ω ] *Fr an intuitive prf visit my lecture ntes at TAMU ω ω C C Ntice any similarities with the LRT? 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 9- Ricard Gutierrez-

30 The Bayes Risk: an example () Cnsider a classificatin prblem with tw classes defined by the fllwing likelihd functins P(x ω ) = P(x ω ) = π π e 3 e (x ) x 3 likelihd What is the decisin rule that minimizes P[errr]? x Assume P[ω ]=P[ω ]=0.5, C =C =0, C = and C =3 / 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

31 The Bayes Risk: an example () Λ(x) = e e x x 3 (x ) x 3 π π ω e 3 > < ω ω e (x ) + (x ) > 0 < 3 > x + 0 x = 4.73,.7 < ω ω ω x 3 ω ω 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 3- > < R x Ricard Gutierrez- R R

32 Variatins f the LRT The LRT that minimizes the Bayes Risk is called the Bayes Criterin ω P(x ω ) > (C Λ(x) = P(x ω ) < (C ω C C ) P[ω ] ) P[ω ] Bayes criterin Many times we will simply be interested in minimizing P[errr], which is a special case f the Bayes Criterin if we use a zer-ne cst functin This versin f the LRT is referred t as the Maximum A Psteriri Criterin, since it seeks t maximize the psterir P(ω i x) C ij 0 = i = j i j ω P(x ω ) > P(ω ) Λ(x) = P(x ω ) < P(ω ) ω Finally, fr the case f equal prirs P[ω i ]=/ and zer-ne cst functin, the LTR is called the Maximum Likelihd Criterin, since it will maximize the likelihd P(x ω i ) 0 i = j ij = i j P(ω i) = i C P(ω P(ω ω C P(x ω ) > Λ(x) = P(x ω ) < ω ω x) > Maximum A Psteriri x) < (MAP) Criterin ω Maximum Likelihd (ML) Criterin 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 3- Ricard Gutierrez-

33 Multi-class prblems The previus decisins were derived fr tw-class prblems, but generalize gracefully fr multiple classes: T minimize P[errr] chse the class ω i with highest P[ω i x] ω i = argmax P(ω i C i x) T minimize Bayes risk chse the class ω i with lwest R[ω i x] i C i C ωi = argminr(ω j x) = argmin CijP(ω j x) C j= 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

34 Discriminant functins () Nte that all the decisin rules have the same structure At each pint x in feature space chse class ω i which maximizes (r minimizes) sme measure g i (x) This structure can be frmalized with a set f discriminant functins g i (x), i=..c, and the fllwing decisin rule " assign x t class ωi if g i(x) > g j(x) j i" We can then express the three basic decisin rules (Bayes, MAP and ML) in terms f Discriminant Functins: Criterin Discriminant Functin Bayes g i (x)=-r(α i x) MAP g i (x)=p(ω i x) ML g i (x)=p(x ω i ) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

35 Discriminant functins () Therefre, we can visualize the decisin rule as a netwrk that cmputes C discriminant functins and selects the categry crrespnding t the largest discriminant Class assignment Select max Csts Discriminant functins g (x) g (x) g C (x) Features x x x 3 x d 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

36 Recapping The LRT is a theretical result that can nly be applied if we have cmplete knwledge f the likelihds P[x ω i ] P[x ω i ] generally unknwn, but can be estimated frm data If the frm f the likelihd is knwn (e.g., Gaussian) the prblem is simplified b/c we nly need t estimate the parameters f the mdel (e.g., mean and cvariance) This leads t a classifier knwn as QUADRATIC, which we cver next If the frm f the likelihd is unknwn, the prblem becmes much harder, and requires a technique knwn as nn-parametric density estimatin This technique is cvered in the final chapters f this lecture 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

37 CHAPTER 4: Quadratic classifiers Bayes classifiers fr nrmally distributed classes The Euclidean-distance classifier The Mahalanbis-distance classifier Numerical example 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

38 The Nrmal r Gaussian distributin Remember that the univariate Nrmal distributin N(µ,σ) is f (x) = X X -µ exp πσ σ Similarly, the multivariate Nrmal distributin N(µ,Σ) is defined as p(x) µ=; σ=3 µ=6; σ= fx(x) = n/ ( π ) / exp (X µ) T (X µ) x Gaussian pdfs are very ppular since The parameters (µ,σ) are sufficient t uniquely characterize the pdf If the x i s are mutually uncrrelated (c ik =0), then they are als independent The cvariance matrix becmes a diagnal matrix, with the individual variances in the main diagnal 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin x Ricard Gutierrez x

39 Cvariance matrix The cvariance matrix indicates the tendency f each pair f features (dimensins in a randm vectr) t vary tgether, i.e., t c-vary* The cvariance has several imprtant prperties If x i and x k tend t increase tgether, then c ik >0 If x i tends t decrease when x k increases, then c ik <0 If x i and x k are uncrrelated, then c ik =0 c ik σ i σ k, where σ i is the standard deviatin f x i c ii = σ i = VAR(x i ) The cvariance terms can be expressed as where ρ ik is called the crrelatin cefficient c ii i = σ and c = ρ ik ik σ σ i k X k X k X k X k X k X i X i X i X i X i C ik =-σ i σ k ρ ik =- C ik =-½σ i σ k ρ ik =-½ C ik =0 ρ ik =0 C ik =+½σ i σ k ρ ik =+½ C ik =σ i σ k ρ ik =+ 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- *frm

40 Bayes classifier fr Gaussian classes () Fr Nrmally distributed classes, the DFs can be reduced t very simple expressins The (multivariate) Gaussian density can be defined as p(x) = n/ ( π) / exp (x µ) T (x µ) Using Bayes rule, the MAP DF can be written as P(x ωi )P( ωi ) g i(x) = P( ωi x) = = P(x) = n/ ( π) / i exp (x µ ) i T i (x µ i ) P( ωi ) P(x) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

41 Bayes classifier fr Gaussian classes () Eliminating cnstant terms Taking lgs g (x) = i i -/ exp (x µ i) T i (x µ i) P(ω i) T gi (x) = (x µ i) i (x µ i) - lg i + ( ) lg( P(ω )) This is knwn as a QUADRATIC discriminant functin (because it is a functin f the square f x) i In the next few slides we will analyze what happens t this expressin under different assumptins fr the cvariance 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 4- Ricard Gutierrez-

42 Case : Σ i =σ I () This situatin ccurs when the features are statistically independent, and have the same variance fr all classes In this case, the quadratic discriminant functin becmes - T ( σ I) (x µ ) - lg( σ I ) + lg( P(ω )) = (x µ ) (x µ ) lg( P(ω )) T gi (x) = (x µ i ) i i i i + σ Assuming equal prirs and drpping cnstant terms i DIM T i = (x µ i ) (x µ i ) = - µ i= g (x) ( xi i ) This is called an Euclidean-distance r nearest mean classifier Frm [Schalkff, 99] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 4- Ricard Gutierrez-

43 Case : Σ i =σ I () This is prbably the simplest statistical classifier that yu can build: Assign an unknwn example t the class whse center is the clsest using the Euclidean distance x µ µ µ C Euclidean Distance Euclidean Distance Euclidean Distance Minimum Selectr class Hw valid is the assumptin Σ i =σ I in chemical sensr arrays? 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

44 Ricard Gutierrez- 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Case : Σ i =σ I, example [ ] [ ] [ ] = = = = = = 0 0 Σ 0 0 Σ 0 0 Σ 5 µ 4 7 µ 3 µ 3 T 3 T T

45 Case : Σ i =Σ (Σ nn-diagnal) All the classes have the same cvariance matrix, but the matrix is nt diagnal In this case, the quadratic discriminant becmes g (x) = (x µ i ) (x µ i ) - lg T i + ( ) lg( P(ω )) Assuming equal prirs and eliminating cnstant terms i T - gi(x) = (x µ i ) Σ (x µ i ) µ This is knwn as a Mahalanbis-distance classifier x Σ µ µ C Mahalanbis Distance Mahalanbis Distance Mahalanbis Distance Minimum Selectr class 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

46 The Mahalanbis distance The quadratic term is called the Mahalanbis distance, a very imprtant metric in Statistical Pattern Recgnitin (right up there with Bayes therem) The Mahalanbis distance is a vectr distance that uses a - nrm - can be thught f as a stretching factr n the space Nte that fr an identity cvariance matrix ( =I), the Mahalanbis distance becmes the familiar Euclidean distance x µ x xi - µ = K xi - µ = Κ 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

47 Case : Σ i =Σ (Σ nn-diagnal), example µ Σ T T [ 3 ] µ = [ 5 4] µ = [ 5] = = Σ = Σ 3 3 = 0.7 T rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

48 Ricard Gutierrez- 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin [ ] [ ] [ ] = = = = = = Σ 7 Σ Σ 5 µ 4 5 µ 3 µ 3 T 3 T T Case 3: Σ i Σ j general case, example Zm ut

49 Numerical example () Derive a linear discriminant functin fr the tw-class 3D classificatin prblem defined by /4 0 0 µ = 0 0 /4 T T [ 0 0 0] ; µ = [ ] ; Σ = Σ = 0 /4 0 ; p( ω ) p( ) = ω Anybdy wuld dare t sketch the likelihd densities and decisin bundary fr this prblem? 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

50 Numerical example () Slutin x -µ x x -µ x T gi (x) = ( x -µ i ) Σ ( x -µ i ) lgσ + lgp(ω i) y -µ y y -µ y + lgp(ω i) z -µ z z - -µ z T T x x - 0 g(x) = y y lg ; 3 z z - 0 x - g(x) = y - z - T x - y - + lg z rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

51 Numerical example (3) Slutin (cntinued) - ( ) > ( x + y + z ) + lg - ( x ) + ( y ) + ( z ) 3 > g(x) g < ω < ω ω ω (x) + lg 3 x + y + z ω ω > 6 lg < 4 =.3 Classify the test example x u =[ ] T > =.6.3 x < ω ω u ω 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 5- Ricard Gutierrez-

52 Cnclusins The Euclidean distance classifier is Bayes-ptimal* fr Gaussian classes and equal cvariance matrices prprtinal t the identity matrix and equal prirs The Mahalanbis distance classifier is Bayes-ptimal fr Gaussian classes and equal cvariance matrices and equal prirs *Bayes ptimal means that the classifier yields the minimum P[errr], which is the best ANY classifier can achieve 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 5- Ricard Gutierrez-

53 CHAPTER 5: Kernel Density Estimatin Histgrams Parzen Windws Smth Kernels The Naïve Bayes Classifier 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

54 Nn-parametric density estimatin (NPDE) In the previus tw chapters we have assumed that either The likelihds p(x ω i ) were knwn (Likelihd Rati Test) r At least, the parametric frm f the likelihds were knwn (Parameter Estimatin) The methds that will be presented in the next tw chapters d nt affrd such luxuries Instead, they attempt t estimate the density directly frm the data withut making assumptins abut the underlying distributin Sunds challenging? Yu bet! 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

55 The histgram The simplest frm f NPDE is the familiar histgram Divide the sample space int a number f bins and apprximate the density at the center f each bin by the fractin f pints in the training data that fall int the crrespnding bin P H (x) = (k [ number f x in same bin as x] N [ width f bin cntaining x] p(x) x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

56 Shrtcmings f the histgram The shape f the NPDE depends n the starting psitin f the bins Fr multivariate data, the final shape f the NDPE als depends n the rientatin f the bins The discntinuities are nt due t the underlying density, they are nly an artifact f the chsen bin lcatins These discntinuities make it very difficult, withut experience, t grasp the structure f the data A much mre serius prblem is the curse f dimensinality: the number f bins grws expnentially with the number f dimensins In high dimensins we wuld require a very large number f examples r else mst f the bins wuld be empty All these drawbacks make the histgram unsuitable fr mst practical applicatins except fr rapid visualizatin f results in ne r tw dimensins 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

57 NPDE, general frmulatin () Let us return t the basic definitin f prbability t get a slid idea f what we are trying t accmplish The prbability that a vectr x, drawn frm a distributin p(x), will fall in a given regin R f the sample space is P = p(x' )dx' R p(x) R x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Frm [Bishp, 995]

58 NPDE, general frmulatin () Suppse nw that N vectrs {x (, x (,, x (N } are drawn frm the distributin; the prbability that k f these N vectrs fall in R is nw given by the binmial distributin Prb k N k k k ( ) P ( P) N = It can be shwn (frm the prperties f the binmial) that the mean and variance f the rati k/n are k E N = P and k Var N = E P ( P) Nte that the variance gets smaller as N, s we can expect that a gd estimate f P is the mean fractin f pints that fall within R P k N k N P = N 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Frm [Bishp, 995]

59 NPDE, general frmulatin (3) Assume nw that R is s small that p(x) des nt vary appreciably within it, then the integral can be apprximated by p(x' )dx' p(x)v R where V is the vlume enclsed by regin R p(x) R x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Frm [Bishp, 995]

60 NPDE, general frmulatin (4) Merging the tw expressins we btain P = p(x' )dx' R P p(x)v p(x) k N k NV This estimate becmes mre accurate as we increase the number f sample pints N and shrink the vlume V In practice the value f N (the ttal number f examples) is fixed T imprve the estimate p(x) we culd let V apprach zer but then regin R wuld becme s small that it wuld enclse n examples This means that, in practice, we will have t find a cmprmise value fr the vlume V Large enugh t include enugh examples within R Small enugh t supprt the assumptin that p(x) is cnstant within R 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Frm [Bishp, 995]

61 NPDE, general frmulatin (5) In cnclusin, the general expressin fr NPDE is p(x) k NV where V is the vlume surrunding x N is the ttal number f examples k is the number f examples inside V When applying this result t practical density estimatin prblems, tw basic appraches can be adpted Kernel Density Estimatin (KDE): Chse a fixed value f the vlume V and determine k frm the data k Nearest Neighbr (knn): Chse a fixed value f k and determine the crrespnding vlume V frm the data It can be shwn that bth KDE and knn cnverge t the true prbability density as N, prvided that V shrinks with N, and k grws with N apprpriately 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 6- Ricard Gutierrez- Frm [Bishp, 995]

62 Parzen windws () Suppse that the regin R that enclses the k examples is a hypercube f side h Then its vlume is given by V=h D, where D is the number f dimensins h x T find the #examples that fall within this regin we define a kernel functin K(u) h h K ( u) = uj < / j =,..,D 0 therwise K(u) This kernel, which crrespnds t a unit hypercube centered at the rigin, is knwn as a Parzen windw r the naïve estimatr -h/ h/ u 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 6- Ricard Gutierrez- Frm [Bishp, 995]

63 Parzen windws () x (4 x (3 x x ( x ( / V The ttal number f pints inside the hypercube is then k = N n= x x K h Substituting back int the density estimate expressin (n Vlume x ( x ( x (3 K(x-x ( )= K(x-x ( )= K(x-x (3 )= p KDE (x) = Nh D N n= x x K h (n x (4 K(x-x (4 )=0 x (4 ( ) Nte that the Parzen windw DE resembles the histgram, with the exceptin that the bin lcatins are determined by the data pints k = N n = K x x (n 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Frm [Bishp, 995]

64 Numerical exercise () Given the dataset X belw, use Parzen windws t estimate the density p(x) at y=3, 0, 5. X = {x (, x (, x (N } = {4, 5, 5, 6,, 4, 5, 5, 6, 7} Use a bandwidth f h=4 p(x) y=3 y=0 y= x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

65 Numerical exercise () Slutin: Let s first estimate p(y=3): p KDE Similarly N y x (y = 3) = K D Nh n= h = 0 4 = 0 4 (y = 0) = 0 4 (n = K + K + K + K K = /4 -/ -/ - -3/4 0 4 [ ] = = [ ] = 0 pkde = (y = 5) = [ ] = 0. pkde = 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

66 Smth kernels () The Parzen windw has several drawbacks Yields density estimates that have discntinuities Weights equally all the pints x i, regardless f their distance t the estimatin pint x Sme f these difficulties can be vercme by replacing the Parzen windw with a smth kernel K(u) such that Parzen(u) A= D R K ( x) dx = K(u) A= -/ -/ u -/ -/ u 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

67 Smth kernels () Usually, but nt nt always, K(u) will be a radially symmetric and unimdal prbability density functin, such as the multivariate Gaussian density functin K ( x) = x T exp D / x ( π) where the expressin f the density estimate remains the same as with Parzen windws p KDE (x) = Nh D N n= K x x h (n 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

68 Smth kernels (3) Just as the Parzen windw DE can be cnsidered a sum f bxes centered at the bservatins, the smth kernel estimate is a sum f bumps placed at the data pints The kernel functin determines the shape f the bumps The parameter h, als called the smthing parameter r bandwidth, determines their width P K DE(x); h= Density estimate Kernel functins rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin x Ricard Gutierrez- Data pints

69 Bandwidth selectin, univariate case () P KDE (x); h= x P KDE (x); h= x P KDE (x); h= P KDE (x); h= x x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

70 Bandwidth selectin, univariate case () Subjective chice Plt ut several curves and chse the estimate that is mst in accrdance with ne s prir (subjective) ideas Hwever, this methd is nt practical in pattern recgnitin since we typically have high-dimensinal data Reference t a standard distributin Assume a standard density functin and find the value f the bandwidth that minimizes the integral f the square errr (MISE) h pt { [ KDE ]} { MISE( p ( x) )} = argmin E ( p ( x) p( x) ) = argmin dx h If we assume that the true distributin is a Gaussian and we use a Gaussian kernel, it can be shwn that the ptimal bandwidth is h KDE pt =.06σN where σ is the sample variance and N is the number f training examples h / 5 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez- Frm [Silverman, 986]

71 Bandwidth selectin, univariate case (3) Likelihd crss-validatin The ML estimate f h is degenerate since it yields h ML =0, a density estimate with Dirac delta functins at each training data pint A practical alternative is t maximize the pseud-likelihd cmputed using leave-ne-ut crss-validatin p - (x) N h MLCV = argmax lgp n ; h N n= p - (x ( ) (n (n ( x ) where p ( x ) p -3 (x) n = N ( ) N h m=,n p -3 (x (3 ) m x K (n x h (m p - (x) x ( p - (x ( ) x p -4 (x) x (3 p -4 (x (4 ) x x ( 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 7- x x (4 Ricard Gutierrez- x Frm [Silverman, 986]

72 Multivariate density estimatin Bandwidth needs t be selected individually fr each axis Alternative, ne may pre-scale axes r whiten the data, s that the same bandwidth can be used fr all dimensins Density can be estimated with a multivariate kernel r by means f s-called prduct kernels (see TAMU ntes) x * 9* * 33 3* 5*5* 8* 9* 9*9* 8 5* 8* 8* * * 3* 3*3* ** * * 3* 3* * 6* * * * 4* 6* * 6*6* 6* 7* * 7*7* 0* 00 0* 00 0* 0* PRODUCT KERNELS P(x, x ω i ) x x x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 7- Ricard Gutierrez-

73 Naïve Bayes classifier () Hw d we apply KDE t classifier design? First, we estimate the likelihd f each class P(x ω i ) Then we apply Bayes rule t derive the MAP rule g (x) i = P(ω i x) P(x ω )P(ω ) i i Hwever, P(x ω i ) is multivariate: NPDE becmes hard!! T avid this prblem, ne practical simplificatin is smetimes made: assume that the features are classcnditinally independent P(x ω ) D i = P(x(d) ωi) d= 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

74 Naïve Bayes classifier () Class-cnditinal independence vs. independence x x x P(x ω ) D i P(x(d) ω i ) d = P(x ω ) P(x) D i = P(x(d) ω i ) d = D d = P(x(d)) x P(x ω ) P(x) x D i = P(x(d) ω i ) d = D d = P(x(d)) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

75 Naïve Bayes classifier (3) Merging this expressin int the discriminant functin yields the decisin rule fr the Naïve Bayes classifier g i,nb (x) = P(ω ) i D d= ( ω ) P x(d) i Naïve Bayes Classifier The main advantage f the Naïve Bayes classifier is that we nly need t cmpute the univariate densities P(x(d) ω i ), which is a much easier prblem than estimating the multivariate density P(x ω i ) Despite its simplicity, the Naïve Bayes has been shwn t have cmparable perfrmance t artificial neural netwrks and decisin tree learning in sme dmains 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

76 CHAPTER 6: Nearest Neighbrs Nearest Neighbrs density estimatin The k Nearest Neighbrs classificatin rule knn as a lazy learner Characteristics f the knn classifier Optimizing the knn classifier 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

77 knn Density Estimatin () In the knn methd we grw the vlume surrunding the estimatin pint x until it enclses a ttal f k data pints The density estimate then becmes P(x) k NV = N c k R D D k (x) x R Vl=πR k P(x) = Nπ R R k (x) is the distance between the estimatin pint x and its k-th clsest neighbr c D is the vlume f the unit sphere in D dimensins: c D = D/ D/ ( D/)! Γ( D/ + ) Thus c =, c =π, c 3 =4π/3 and s n π = π 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

78 knn Density Estimatin () In general, the estimates that can be btained with the knn methd are nt very satisfactry The estimates are prne t lcal nise The methd prduces estimates with very heavy tails Since the functin R k (x) is nt differentiable, the density estimate will have discntinuities These prperties are illustrated in the next few slides 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

79 knn Density Estimatin, example T illustrate knn we generated several DEs fr a univariate mixture f tw Gaussians: P(x)=½N(0,)+½N(0,4) and several values f N and k 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

80 knn Density Estimatin, example (a) The perfrmance f the knn density estimatin technique n tw dimensins is illustrated in these figures The tp figure shws the true density, a mixture f tw bivariate Gaussians p(x) = N µ,σ µ = with µ = ( ) + N ( µ,σ ) [ 0 5] T [ 5 0] T = = 4 The bttm figure shws the density estimate fr k=0 neighbrs and N=00 examples In the next slide we shw the cnturs f the tw distributins verlapped with the training data used t generate the estimate Σ Σ 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

81 knn Density Estimatin, example (b) True density cnturs knn density estimate cnturs 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 8- Ricard Gutierrez-

82 knn as a Bayesian classifier () The main advantage f the knn methd is that it leads t a very simple apprximatin f the Bayes classifier Assume that we have a dataset with N examples, N i frm class ω i, and that we are interested in classifying an unknwn sample x u We draw a hyper-sphere f vlume V arund x u. Assume this vlume cntains a ttal f k examples, k i frm class ω i. The uncnditinal density is estimated by P(x) = k NV Frm [Bishp, 995] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 8- Ricard Gutierrez-

83 knn as a Bayesian classifier () Similarly, we can then apprximate the likelihd functins by cunting the number f examples f each class inside vlume V: P(x ω ) i = ki N V And the prirs are apprximated by Ni P( ω i ) = N Putting everything tgether, the Bayes classifier becmes i P(ω x) = P(x ω i)p(ω i) P(x) ki Ni NiV N k NV i = = ki k Frm [Bishp, 995] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

84 The knn classificatin rule () The K Nearest Neighbr Rule (knn) is a very intuitive methd that classifies unlabeled examples based n their similarity t examples in the training set Fr a given unlabeled example x u R D, find the k clsest labeled examples in the training data set and assign x u t the class that appears mst frequently within the k-subset The knn nly requires An integer k A set f labeled examples (training data) A metric t measure clseness 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

85 The knn classificatin rule () Example In the example belw we have three classes: the gal is t find a class label fr the unknwn example x u In this case we use the Euclidean distance and a value f k=5 neighbrs Of the 5 clsest neighbrs, 4 belng t ω and belngs ω ω t ω 3, s x u is assigned t ω, x u the predminant class ω 3 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

86 knn in actin: example We have generated data fr a -dimensinal 3-class prblem, where the class-cnditinal densities are multimdal, and nn-linearly separable, as illustrated in the figure We used the knn rule with k = 5 The Euclidean distance as a metric The resulting decisin bundaries and decisin regins are shwn belw 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

87 knn in actin: example We have generated data fr a -dimensinal 3-class prblem, where the class-cnditinal densities are unimdal, and are distributed in rings arund a cmmn mean. These classes are als nn-linearly separable, as illustrated in the figure We used the knn rule with k = 5 The Euclidean distance as a metric The resulting decisin bundaries and decisin regins are shwn belw 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

88 Characteristics f the knn classifier () Advantages Simple implementatin Nearly ptimal in the large sample limit (N ) P[errr] Bayes <P[errr] NN <P[errr] Bayes Uses lcal infrmatin, which can yield highly adaptive behavir Lends itself very easily t parallel implementatins Disadvantages Large strage requirements Cmputatinally intensive recall Highly susceptible t the curse f dimensinality 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

89 Characteristics f the knn classifier () NN versus knn The use f large values f k has tw main advantages Yields smther decisin regins Prvides prbabilistic infrmatin The rati f examples fr each class gives infrmatin abut the ambiguity f the decisin Hwever, t large a value f k is detrimental It destrys the lcality f the estimatin, since farther examples are taken int cnsideratin In additin, it increases the cmputatinal burden 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

90 knn versus NN -NN 5-NN 0-NN 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

91 knn and the prblem f feature weighting 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 9- Ricard Gutierrez-

92 Feature weighting The previus example illustrated the Achilles heel f the knn classifier: its sensitivity t nisy axes A pssible slutin wuld be t nrmalize each feature t N(0,) Hwever, nrmalizatin des nt reslve the curse f dimensinality. A clse lk at the Euclidean distance shws that this metric can becme very nisy fr high dimensinal prblems if nly a few f the features carry the classificatin infrmatin D d(x u,x) = (xu(k) x(k)) k= The slutin t this prblem is t mdify the Euclidean metric by a set f weights that represent the infrmatin cntent r gdness f each feature 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 9- Ricard Gutierrez-

93 CHAPTER 7: Linear Discriminant Functins Perceptrn learning Minimum squared errr (MSE) slutin Least-mean squares (LMS) rule 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

94 Linear Discriminant Functins () The bjective f this chapter is t present methds fr learning linear discriminant functins f the frm g ( x) = w T x + w 0 g g ( x) ( x) > 0 < 0 x ω x ω where w is the weight vectr and w 0 is the threshld weight r bias Similar discriminant functins were derived in chapter 3 as a special case f the quadratic classifier In this chapter, the discriminant functins will be derived in a nn-parametric fashin, this is, n assumptins will be made abut the underlying densities x w T x+w 0 <0 w T x+w 0 >0 x ( d x x ( w x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

95 Linear Discriminant Functins () Fr cnvenience, we will fcus n binary classificatin Extensin t the multicategry case can be easily achieved by Using ω i /nt ω i dichtmies Using ω i /ω i dichtmies 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

96 Gradient descent () Gradient descent is a general methd fr functin minimizatin Frm basic calculus, we knw that the minimum f a functin J(x) is defined by the zers f the gradient [ ] J(x) = 0 x* = argmin J(x) x x Only in very special cases this minimizatin functin has a clsed frm slutin In sme ther cases, a clsed frm slutin may exist, but is numerically ill-psed r impractical (e.g., memry requirements) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

97 Gradient descent () J(w) Gradient descent finds the minimum in an iterative fashin by mving in the directin f steepest descent J<0 w>0 J>0 w<0. Start with an arbitrary slutin x(0). Cmpute the gradient x J(x(k)) 3. Mve in the directin f steepest descent: x ( k + ) = x ( k ) η x J( x ( k )) 4. G t (until cnvergence) Lcal minimum w where η is a learning rate x 0 Initial guess Glbal minimum x 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

98 Perceptrn learning () Let s nw cnsider the prblem f learning a binary classificatin prblem with a linear discriminant functin As usual, assume we have a dataset X={x (,x (, x (N } cntaining examples frm the tw classes Fr cnvenience, we will absrb the intercept w 0 by augmenting the feature vectr x with an additinal cnstant dimensin: w T x + w 0 = x [ ] T T w w = a y 0 Frm [Duda, Hart and Strk, 00] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

99 Perceptrn learning () Keep in mind that ur bjective is t find a vectr a such that g ( x) = a T > y < 0 0 x ω x ω T simplify the derivatin, we will nrmalize the training set by replacing all examples frm class ω by their negative y [ y] y ω This allws us t ignre class labels and lk fr a weight vectr such that a T y > 0 y Frm [Duda, Hart and Strk, 00] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

100 Perceptrn learning (3) T find this slutin we must first define an bjective functin J(a) A gd chice is what is knwn as the Perceptrn criterin J P T ( a) = ( a y) y Υ M where Y M is the set f examples misclassified by a Nte that J P (a) is nn-negative since a T y<0 fr misclassified samples 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

101 Perceptrn learning (4) T find the minimum f J P (a), we use gradient descent The gradient is defined by ( a) = ( y) And the gradient descent update rule becmes a a J P This is knwn as the perceptrn batch update rule. The weight vectr may als be updated in an n-line fashin, this is, after the presentatin f each individual example y ( k + ) = a( k) Υ M + η y Υ M y ( k ) (i ( k ) = a( k) ηy a + + Perceptrn rule where y (i is an example that has been misclassified by a(k) 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 0- Ricard Gutierrez-

102 Perceptrn learning (5) If classes are linearly separable, the perceptrn rule is guaranteed t cnverge t a valid slutin Hwever, if the tw classes are nt linearly separable, the perceptrn rule will nt cnverge Since n weight vectr a can crrectly classify every sample in a nn-separable dataset, the crrectins in the perceptrn rule will never cease One ad-hc slutin t this prblem is t enfrce cnvergence by using variable learning rates η(k) that apprach zer as k appraches infinite 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin - 0- Ricard Gutierrez-

103 Minimum Squared Errr slutin () The classical Minimum Squared Errr (MSE) criterin prvides an alternative t the perceptrn rule The perceptrn rule seeks a weight vectr a T that satisfies the inequality a T y (i >0 The perceptrn rule nly cnsiders misclassified samples, since these are the nly nes that vilate the abve inequality Instead, the MSE criterin lks fr a slutin t the equality a T y (i =b (i, where b (i are sme pre-specified target values (e.g., class labels) As a result, the MSE slutin uses ALL samples in the training set Frm [Duda, Hart and Strk, 00] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

104 Minimum Squared Errr slutin () The system f equatins slved by MSE is y y M M y ( 0 ( 0 (N 0 y y y ( ( M M (N L L L y y y ( D ( D M M (N D a a M a 0 D b b = M M b ( ( (N Ya = b where a is the weight vectr, each rw in Y is a training example, and each rw in b is the crrespnding class label Fr cnsistency, we will cntinue assuming that examples frm class ω have been replaced by their negative vectr, althugh this is nt a requirement fr the MSE slutin Frm [Duda, Hart and Strk, 00] 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

105 Minimum Squared Errr slutin (3) An exact slutin t Ya=b can smetimes be fund If the number f (independent) equatins (N) is equal t the number f unknwns (D+), the exact slutin is defined by a = Y b In practice, hwever, Y will be singular s its inverse Y - des nt exist Y will cmmnly have mre rws (examples) than clumns (unknwn), which yields an ver-determined system, fr which an exact slutin cannt be fund 3 rd Shrt Curse Statistical classifiers: Bayesian decisin thery and density estimatin Ricard Gutierrez-

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016 Maximum A Psteriri (MAP) CS 109 Lecture 22 May 16th, 2016 Previusly in CS109 Game f Estimatrs Maximum Likelihd Nn spiler: this didn t happen Side Plt argmax argmax f lg Mther f ptimizatins? Reviving an

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Lyapunov Stability Stability of Equilibrium Points

Lyapunov Stability Stability of Equilibrium Points Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),

More information

Trigonometric Ratios Unit 5 Tentative TEST date

Trigonometric Ratios Unit 5 Tentative TEST date 1 U n i t 5 11U Date: Name: Trignmetric Ratis Unit 5 Tentative TEST date Big idea/learning Gals In this unit yu will extend yur knwledge f SOH CAH TOA t wrk with btuse and reflex angles. This extensin

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

Experiment #3. Graphing with Excel

Experiment #3. Graphing with Excel Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Chapter 2 GAUSS LAW Recommended Problems:

Chapter 2 GAUSS LAW Recommended Problems: Chapter GAUSS LAW Recmmended Prblems: 1,4,5,6,7,9,11,13,15,18,19,1,7,9,31,35,37,39,41,43,45,47,49,51,55,57,61,6,69. LCTRIC FLUX lectric flux is a measure f the number f electric filed lines penetrating

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Inference in the Multiple-Regression

Inference in the Multiple-Regression Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b . REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but

More information

Cells though to send feedback signals from the medulla back to the lamina o L: Lamina Monopolar cells

Cells though to send feedback signals from the medulla back to the lamina o L: Lamina Monopolar cells Classificatin Rules (and Exceptins) Name: Cell type fllwed by either a clumn ID (determined by the visual lcatin f the cell) r a numeric identifier t separate ut different examples f a given cell type

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

General Chemistry II, Unit I: Study Guide (part I)

General Chemistry II, Unit I: Study Guide (part I) 1 General Chemistry II, Unit I: Study Guide (part I) CDS Chapter 14: Physical Prperties f Gases Observatin 1: Pressure- Vlume Measurements n Gases The spring f air is measured as pressure, defined as the

More information

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1 Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Surface and Contact Stress

Surface and Contact Stress Surface and Cntact Stress The cncept f the frce is fundamental t mechanics and many imprtant prblems can be cast in terms f frces nly, fr example the prblems cnsidered in Chapter. Hwever, mre sphisticated

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

ECEN 4872/5827 Lecture Notes

ECEN 4872/5827 Lecture Notes ECEN 4872/5827 Lecture Ntes Lecture #5 Objectives fr lecture #5: 1. Analysis f precisin current reference 2. Appraches fr evaluating tlerances 3. Temperature Cefficients evaluatin technique 4. Fundamentals

More information

CS 109 Lecture 23 May 18th, 2016

CS 109 Lecture 23 May 18th, 2016 CS 109 Lecture 23 May 18th, 2016 New Datasets Heart Ancestry Netflix Our Path Parameter Estimatin Machine Learning: Frmally Many different frms f Machine Learning We fcus n the prblem f predictin Want

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

Preparation work for A2 Mathematics [2018]

Preparation work for A2 Mathematics [2018] Preparatin wrk fr A Mathematics [018] The wrk studied in Y1 will frm the fundatins n which will build upn in Year 13. It will nly be reviewed during Year 13, it will nt be retaught. This is t allw time

More information

Lecture 8: Multiclass Classification (I)

Lecture 8: Multiclass Classification (I) Bayes Rule fr Multiclass Prblems Traditinal Methds fr Multiclass Prblems Linear Regressin Mdels Lecture 8: Multiclass Classificatin (I) Ha Helen Zhang Fall 07 Ha Helen Zhang Lecture 8: Multiclass Classificatin

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification Agenda Artificial Intelligence and its applicatins Lecture 6 Supervised Learning Prfessr Daniel Yeung danyeung@ieee.rg Dr. Patrick Chan patrickchan@ieee.rg Suth China University f Technlgy, China Learning

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD 3 J. Appl. Cryst. (1988). 21,3-8 Particle Size Distributins frm SANS Data Using the Maximum Entrpy Methd By J. A. PTTN, G. J. DANIELL AND B. D. RAINFRD Physics Department, The University, Suthamptn S9

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

More Tutorial at

More Tutorial at Answer each questin in the space prvided; use back f page if extra space is needed. Answer questins s the grader can READILY understand yur wrk; nly wrk n the exam sheet will be cnsidered. Write answers,

More information

EDA Engineering Design & Analysis Ltd

EDA Engineering Design & Analysis Ltd EDA Engineering Design & Analysis Ltd THE FINITE ELEMENT METHOD A shrt tutrial giving an verview f the histry, thery and applicatin f the finite element methd. Intrductin Value f FEM Applicatins Elements

More information

Physical Layer: Outline

Physical Layer: Outline 18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Emphases in Common Core Standards for Mathematical Content Kindergarten High School Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

15-381/781 Bayesian Nets & Probabilistic Inference

15-381/781 Bayesian Nets & Probabilistic Inference 15-381/781 Bayesian Nets & Prbabilistic Inference Emma Brunskill (this time) Ariel Prcaccia With thanks t Dan Klein (Berkeley), Percy Liang (Stanfrd) and Past 15-381 Instructrs fr sme slide cntent, and

More information