Statistical Foundations of Pattern Recognition

Size: px
Start display at page:

Download "Statistical Foundations of Pattern Recognition"

Transcription

1 Statstcal Foundatons of Pattern Recognton Learnng Objectves Bayes Theorem Decson-mang Confdence factors Dscrmnants The connecton to neural nets

2 Statstcal Foundatons of Pattern Recognton NDE measurement system feature (pattern) extracton observed patterns x Problem: How do we decde whch class x belongs to? possble classes c 1 x c 2 c 3

3 Bayesan (probablstc) approach Let P( c ) apror probablty that a pattern belongs to class c, regardless of the dentty of the pattern P P P ( x ) ( x c ) ( ) x x ( ), c apror probablty that a pattern s x, regardless of ts class membershp condtonal probablty that the pattern s x, gven that t belongs to class c condtonal probablty that the pattern's class membershp s c, gven that the pattern s x the jont probablty that the pattern s x and the class membershp s c

4 Example: consder the case where there s one pattern value that s observed or not, and two classes (e.g. sgnal or nose) x, c 1 * ~x, c 1 x, c 1 * ~x, c 2 x, c 2 # x, c 1 * x, c 2 # ~x, c 1 ~x, c 1 x, c 1 * (~x means x not observed) Then ( ) ( 1) ( 2 ) ( 1) ( 2 ) ( 1 x) ( 2 x) ( 1 ) ( x) P x 2 6/10 7/10 3/10 P x c 4/7 P x c 2/3 4/6 2/6, x 4/10, 2/10 (see * s) (see # s)

5 Bayes Theorem (, x) ( x ) ( ) P( c x ) P( x ) or, equvalently, n an "updatng form" "new" probablty of c, havng seen x ( x ) ( x ) ( ) P ( x ) "old" probablty of c

6 Bayes Theorem ( x ) ( x ) ( ) P ( x ) Snce P( x ) P( x c ) P( c ) j j j ( ) we can calculate f we now the probabltes P( c ) j j 1,2,... ( ) and x j x P c j 1, 2,...

7 ( ) ( 1) ( 2 ) ( 1) ( 2 ) ( 1 x) ( 2 x) ( 1 ) ( x) P x 2 6/10 7/10 3/10 P x c 4/7 P x c 2/3 4/6 2/6, x 4/10, 2/10 Now, consder our prevous example ( 1, ) ( 1) ( 1) ( )( ) P( c1 x) P( x) ( )( ) x P x c ( x) 1 4/7 7/10 4/10 4/6 6/10 4/10 or, n the "updatng" form P( x c1) P( c1) P( x c1) P( c1) + P( x c2) P( c2) ( 4/7)( 7/10) ( 4/7)( 7/10) + ( 2/3)( 3/10) ( 4/7)( 7/10) ( 6/10) 4/6 P( x)

8 As a smple example, consder tryng to classfy a flaw as a crac or a volumetrc flaw based on the these two features: x 1 : a postve leadng edge pulse, PP x 2 : flash ponts, FP

9 Assume: P( crac ) 0.5 P( volumetrc ) 0.5 P( PP crac) 0.1 (cracs have leadng edge sgnal that s always negatve, so unless the leadng edge sgnal s mstaenly dentfed, ths case s unlely) P( PP volumetrc ) 0.5 (low mpedance (relatve to host) volumetrc flaws have negatve leadng edge pulses and hgh mpedance volumetrc flaws have postve leadng edge pulses, so assume both types of volumetrc flaws equally lely) P( FP crac ) 0.8 ( flashponts s a features strongly characterstc of cracs, so mae ths probablty hgh) P( FP volumetrc ) 0.05 (alternatvely, mae ths a very low probablty)

10 (1) Now, suppose a pece of data comes n and there s frm evdence that flashponts (FP) exsts n the measured response. Then what s the probablty that the flaw s a crac? ( ) rac FP P( FP crac) P( crac) P ( FP crac ) P( crac ) + P( FP vol) P( vol) ( 0.8)( 0.5) ( 0.8)( 0.5) + ( 0.05)( 0.5) Thus, we also have P( vol FP )

11 (2) Now, suppose another pece of data comes n wth the frm evdence of a postve leadng edge pulse (PP). What s the new probablty that the flaw s a crac? ( ) rac PP P( PP crac) P( crac) P ( PP crac ) P( crac ) + P( PP vol) P( vol) ( 0.1)( ) ( 0.1)( ) + ( 0.5)( ) and, hence P( vol PP ) ( ) rac PP Note how the prevous was now taen as the new, apror P( crac) n ths Bayesan updatng

12 (3) Fnally, suppose another data set comes n wth frm evdence that the flashponts (FP) do not exst. What s the probablty now that the flaw s a crac? ( ~ FP) rac P ( ~ FP crac ) P( crac ) P ( ~ FP crac ) P ( crac ) + P ( ~ FP vol) P( vol) ( 0.2)( 0.762) ( 0.2)( 0.762) + ( 0.95)( 0.238) and now P( vol ~ FP ) ~ 0.2 Note: P( Frac) P( Frac) P( FP vol) P( FP vol) 0.05 ~ 0.95

13 In all three cases we must mae some decson on whether the flaw s a crac or not. One possble choce s to smply loo at the probabltes and decde x belongs to class c c j f and only f ( x ) > P( c x ) for all 1,2,..., N j j Snce n our present example we only have two classes, we only have the two condtonal probabltes ( x ), P( vol x ) rac and snce just ( x ) 1 P( crac x ) P vol ( x ) > 1 P( crac x ) rac, our decson rule s or ( x ) rac > 0.5

14 Usng ths smple probablty decson rule, n the prevous three cases we would fnd (1) (2) (3) ( ) ( ) ( FP) rac FP rac PP rac ~ decson crac crac volumetrc There s no reason, however, that we need to mae a decson on the condtonal probabltes x by themselves. We could synthesze a decson functons g x from such condtonal g probabltes and use the nstead. Ths s the dea behnd what s called Bayes Decson Rule: ( ) ( )

15 Bayes Decson Rule Decde x belongs to class c c j f and only f g for all 1, 2,..., N j ( x ) > g ( x ) j where g ( x ) s the decson functon for class c

16 Example: Suppose that not all decson errors are equally mportant. We could weght these decsons by defnng the loss, l j that s sustaned when we decde class membershp s c when t s n realty class c j. Then n terms of these losses we could also defne the rs that x belongs to to class c as ( x ) ( x ) + ( x ) R l l j j j For our two class problem we would have ( x) ( x) + ( x) ( x ) ( x ) + ( x ) R l l R l l The decson rule n ths case would be to decde x belongs to class c 1 f and only f R x < R x ( ) ( ) 1 2 or, equvalently ( l l ) P( c x ) < ( l l ) P( c x )

17 In the specal case where there s no loss when we guess correctly, then l 11 l If, also t s equally costly to guess ether c 1 or c 2 then l 12 l 21 and the decson rule becomes ( x ) l P( c x ) l < or P( c x ) > P( c x ) 1 2 whch s the smple decson rule based on condtonal probabltes we dscussed prevously

18 Now, consder our prevous example agan and let c 1 crac, c 2 volumetrc flaw and suppose we choose the followng loss factors: l l (a gan. If we guess cracs, whch are dangerous, correctly we should reward ths decson) (f we guess that the flaw s a crac and t s really volumetrc, then there s a cost (loss) snce we may do unnecessary repars or removal from servce) l (f we guess the flaw s volumetrc and t s really a crac, there may be a sgnfcant loss because of a loss of safety due to msclassfcaton) l 22 0 (f we guess t s volumetrc and t s, there mght be no loss or gan)

19 In ths case we fnd the decson rule s decde that a crac s present f or ( 11.0 ) P( c x ) < ( 1.0 ) P( c x ) ( 1 x ) ( x ) > For our example then we have decson (1) ( 1 x2) ( x ) > c 1 (crac) (2) ( 1 x1) ( x ) > c 1 (crac) (3) ( 1 x2) ( x ) ~ > ~ c 1 (crac)

20 Bayes Theorem (Odds) We can also wrte Bayes Theorem n terms of odds rather than probabltes by notng that for any probablty P (condton, jont, etc.) we have the correspondng odds, O, gven by Example: O( c x) P O 1 P ( ) ( x ) x 1 Usng ths defnton of odds, Bayes Theorem becomes ( x ) LR O( c ) O c O P 1 + O ( or ) where P( x c) LR P ( x ~ c ) s called the lelhood rato

21 Gong bac to our example wth x 1 PP, x 2 FP 0.5 P( c1) 0.5 O( c1) P( c2) 0.5 O( c2) P( x1 c1) 0.1 O( x1 c1) P( x1 c2) 0.5 O( x1 c2) P( x2 c1) 0.8 O( x1 c1) P( x2 c2) 0.05 O( x2 c2) Then for our three cases:

22 (1) O( crac FP) ( ) ( ~ crac) P Frac P FP ( ) O crac () and ( ) 16 rac FP (2) ( ) O crac PP ( ) ( ~ crac) P Prac P PP ( ) O crac ( ) and ( ) 3.2 rac PP (3) ( ~ FP crac) ( ~ ~ ) P O( crac ~ FP) O crac P Frac ( ) 0.2 ( 3.2 ) and ( ) rac ~ FP

23 As we see from ths result we can update the probabltes accordng to Bayes Theorem by P( x c) O( c x) O( c) P x ~ c ( ) f the feature pattern x s observed and by ( ~ x ) O c P ( ~ x c) ( ~ x ~ c ) ( ) O c P f the feature pattern x s not observed. We can combne these two cases as ( xˆ ) ( xˆ, ) ( ) O c LR c O c where LR ( xˆ, c ) P P ( xˆ c) ( xˆ ~ c ) and xˆ x ~ x f x s observed f x s not observed

24 Crtcsms of ths Probablstc Approach 1. It does not nclude uncertanty n the evdence of the exstence (or not) of the feature patterns 2. It s dffcult to assgn the apror probabltes To solve the frst problem we wll show how to ntroduce uncertanty wth confdence factors To solve the second problems, we wll dscuss the alternatve use of dscrmnants

25 Confdence Factors Consder Bayes Theorem n the odds form ( xˆ ) ( xˆ, ) ( ) O c LR c O c In updatng the odds, the lelhood rato s based on beng able to have frm evdence of the exstence of the pattern x or not LR ( xˆ, c ) P P ( xˆ c) ( xˆ ~ c ) We can ntroduce uncertanty nto ths updatng by lettng a user (or a program) gve a response R n the range [-1,1], where R 1 corresponds to complete certanty x s present R 0 corresponds to complete uncertanty that x s or s not present R -1 corresponds to complete certanty that x s not present

26 Then n updatng the odds, we can replace the lelhood rato, LR, by a functon of LR and R that ncorporates ths uncertanty ( xˆ ) (, ) ( ) O c f LR R O c There are, however, some propertes that ths functon f should satsfy. They are: 1. f R 1 (, ) f LR x c (f we are certan n the evdence of x, we should reduce to ordnary Bayes) 2. If R -1 ( ~, ) f LR x c (f we are certan x does not exst, agan reduce to ordnary Bayes) 3. If LR 0, f 0 (f the lelhood s zero, regardless of the uncertanty, R, the updated odds should be zero)

27 A popular choce that appears n the lterature s to choose where, f R [0,1] (, ) ( xˆ, ) + ( 1 ) f LR R LR c R R LR( xˆ, c ) LR( x, c ) and, where, f R [-1,0] LR( xˆ, c ) LR( ~ x, c ) If we plot ths functon, we see the effects of R LR f ( x, c ) 1.0 ( LR, R) LR ( ~ x, c ) -1 1 R

28 Although ths s a smple functon to use, there s a problem wth t whch we can see f we plot f versus LR for dfferent R. ( Note that LR [0, ) ) f ( LR, R) R 1 R ncreasng 1.0 R 0 1- R R ncreasng 1.0 LR At LR 0 the functon f does not go to zero as we sad t should (see property 3 dscussed above). To remedy that problem, we need to choose a nonlnear functon.

29 One choce that satsfes all three propertes f should have s (, ) ( ) f LR R LR R f ( LR, R) R 1 R ncreasng 1.0 R 0 R ncreasng 1.0 LR

30 Ths gves a dependency on R that s nonlnear f ( LR, R) LR ( x, c ) 1.0 LR ( ~ x, c ) R

31 Wth ths choce of f, we would have R ( xˆ ) LR O( c ) O c However, f one wants to wor n terms of probabltes, not odds, we have ( xˆ ) P( c ) ( ) + ( ) 1 ( ) R ( ) LR wth P( xˆ c) LR P ( xˆ ~ c ) and xˆ x R > 0 ~ x R < 0

32 Bayes theorem, even n ths modfed form to tae nto account uncertanty n the evdence, stll requres us to have apror probablty estmates and those may be dffcult to come by. How do we get around ths? Consder our two class problem where we have classes (c 1, c 2 ) and where x x s a sngle feature (pattern). Accordng to Bayes decson rule we could decde on c 1 (or c 2 ) f g 1 (x) > g 2 (x) (or g 2 (x) > g 1 (x) ). For example, suppose both g 1 and g 2 were unmodal, smooth dstrbutons. Then we mght have: g1 ( x) g2 ( x) decde c 1 decde c 2 x x threshold Then we see the decson rule s really just class c f x < x 1 class c f x > x 2 threshold threshold

33 x threshold Thus, f we had a way of fndng, whch serves as a dscrmnant, we could mae our decsons and not have to even consder the underlyng probabltes! However, we have not really elmnated the probabltes entrely snce they ultmately determne the errors made n the decson mang process. Note that n the more general mult-modal decson functon case, several dscrmnants may be needed: g1 ( x) g2 ( x) x1 x2 x3

34 If we tae the g (x) to be just the probablty dstrbutons P( c x) then recall that Bayes decson rule says that x belongs to class c c f and only f ( ) > ( ) x x for all j 1,2,..., N j or, equvalently P ( c x) P( x) > P( c x) P( x) whch says that P ( c, x) > P( c, x) so that also P( x c) P( c) > P( x cj) P( cj) j j j If x s a contnuous varable, then we can assocate probablty dstrbutons wth quanttes such as p( c, x ) and p( x c ) and so we expect that the dscrmnants are dependent on the nature of these dstrbutons. We wll now examne closer that relatonshp.

35 Probablty Dstrbutons and Dscrmnants Frst, consder the 1-D case where x x and where we assume the dstrbutons are Gaussans,.e. 1 ( ) ( ) 2 2, σ 2π exp ( μ ) /2σ ( ) p xc x where μ mean value of x for class c σ standard devaton for class c σ σ σ then Bayes decson rule If we assume ( ) ( ), j j says that x belongs to class c f and only f ( x μ ) 2 exp / 2σ 1 2 > exp ( x μj ) / 2σ

36 or, equvalently, x belongs to class c f and only f 2 ( x μ ) < ( x μ ) 2 for all j 1,2,..., N j j Ths s just the bass for the nearest cluster center classfcaton method

37 Now, consder the more general case of N-dmensonal features but eep the assumpton of Gaussan dstrbutons. Then 1 1 p c 2 N/2 1/2 T 1 ( x, ) ( 2π ) Σ exp ( x μ ) Σ ( x μ ) ( ) where μ N-component mean vector for class c Σ N x N covarance matrx

38 Bayes decson theory says that x belongs to class c f and only f > N/2 1/2 T ( ) 1 ( 2π ) Σ exp ( x μ ) Σ ( x μ ) 1/2 1 /2 T ( ) N 1 j ( 2π ) Σ j exp ( x μ j) Σ j ( x μ j) Now, suppose we are on the boundary between c and c j and also suppose that Σ Σ I 2 j σ where I s the unt matrx. Then T ( ) ( x μ ) ( x μ ) 2 exp / 2σ 1 2 exp / 2σ T ( ) j ( x μ j) ( x μ j) Tang the ln of ths equaton then we have

39 ( ) T T ( x μ) ( x μ) σ + ( x μ j) ( x μ j) P( cj ) 2 2 ln /2 /2σ 0 whch can be expanded out to gve ( ) + x ( μ μ j) σ + ( μμ j j μμ ) P( cj ) T 2 T T 2 2ln 2 / / σ 0 However, these are just the equatons of the hyperplanes T xwj b j wth ( ) 2 wj μ μ j / σ ( T T ) ( ) 2 j j j /2σ ln b μμ μμ ( ) j

40 T xwj b j The w j and the b j here determne the hyperplanes separatng the classes and hence are dscrmnants. If we can fnd a way to determne these dscrmnants drectly, we need not deal wth the underlyng probabltes that defne them here. We wll now examne ways n whch we can fnd such hyperplanes (or hypersurfaces n a more general context).

41 Learnng Dscrmnants wth Neural Networs Suppose that we have a two class problem and a lnear dscrmnant s able to dstngush between observed patterns, x, of ether class. Such a problem s sad to be lnearly separable. Geometrcally, we have the stuaton where we can place a dscrmnant hyperplane T xw b between patterns, x, of ether class. For example, for x (x 1, x 2 ) w x 2 class c 1 class c 2 T xw b < 0 b T xw b > x 1 0 pattern observed from class c 2 pattern observed from class c 1 T xw b

42 Learnng to dstngush between these two classes then conssts of fndng the values of w, b that wll separate the observed patterns. Note that we can augment the vector x ( x1, x2,..., xn ) and the weght vector w ( w w w ),,..., n 1 2 by redefnng them as w ( w1, w2,..., wn, b) x ( x, x,..., x, 1) 1 2 Then the equaton of the hyperplane n terms of these augmented vectors becomes T xw 0 n

43 T xw Ths equaton can be related to neural networ deas snce we can vew the process of mang a decson c 1 or c 2 as smlar to the frng (or not frng) of a neuron based on the actvty level of the nputs: x n w n b 1 0 x 2 w 2 Σ O 1 D 0 1 D < 0 class c 1 class c 2 x 1 w 1 D n+ 1 wx 1 x w n+ 1 n+ 1 1 b Now, the queston s, how do we determne the unnown "weghts", w, b?

44 Two Category Tranng Procedure Gven an extended weght vector and an extended feature vector ( w1, w2,..., wn, b) ( x, x,..., x, 1) 1 2 the followng steps defne a two class error correcton algorthm Defnton: Let w be the weghts assocated wth x, " feature tranng vectors" for cases where the class of each x s nown (1) let w 1 ( 0,0,...,0) w x (actually, w 1 can be arbtrary) (2) Gven w, the followng case rules apply ( ) case1: x c class c 1 1 af. x w 0, w+ 1 w bf. x w < 0, w w + λx + 1 ( ) case 2: x c class c 2 2 af. x w < 0, w+ 1 w bf. x w 0, w w λx where λ > n

45 Two Category Learnng Theorem If c 1 and c 2 are lnearly separable and the two class tranng procedure s used to defne w, then there exsts an nteger t 1 such that w t lnearly separates c 1 and c 2 and hence w t+ w t for all postve. Ref: Hunt, E.B., Artfcal Intellgence, Academc Press, Remar: λ can be a constant or can tae on partcular forms. For example: α w λ λ x x 2 ( 0< α < 2) can be used and the algorthm stll converges. Ths often speeds up the convergence Ref: Nlsson, N, Learnng Machnes, McGraw Hll, 1965.

46 Example: Determne f an ultrasonc sgnal s from a crac or a volumetrc flaw based on the followng two features: x 1 " has flashponts" 1 "yes", -1 "no" x 2 " has negatve leadng edge pulse" 1 "yes", -1 "no" crac volumetrc (low mpedance) volumetrc (hgh mpeance) x 1 x x 2 volumetrc (low mped.) 1.0 crac -1.0 volumetrc (hgh mped.) x 1

47 x 2 w D > 0 D < 0 x 1 D wx 1 1+ w2x2 0 For smplcty, we wll tae b 0, λ 1 so the learnng procedure s 1. Gve an tranng example ( x 1, x 2 ) D 0 D < 0 2. If as "s t a crac" (Y or N) If as "s t volumetrc" (Y or N) 3. If error (N) and If error (N) and D 0 D < 0 w w x w + w + x + 1 1

48 Suppose for ths case we have the followng tranng set: 1. x 1-1, x 2 1 (vol) 2. x 1 1, x 2 1 crac 3. x 1-1, x 2-1 (vol) 4.. Tranng example 1: vol x 1-1, x 2 1 D 0 (snce w 1 w 2 0 ntally) "s t a crac" N x 2 D 0 D > 0 w w 0 ( x ) ( x ) D < 0 w x 1

49 Tranng example 2: crac x 1 1, x 2 1 D (1)(1) + (-1)(1) 0 "s t a crac" Y no change x 2 D 0 D > 0 w w D < 0 w x 1 w w Tranng example 3: vol x 1-1, x 2-1 D (1)(-1) + (-1)(-1) 0 "s t a crac" N 1 x x D < 0 x 2 D 0 w D > 0 x 1 no further changes

50 Note that ths classfer can also handle stuatons other than whch t s traned on. Ths "generalzaton" ablty s a valuable property of neural nets. For example, suppose we let x 1 1 "defntely has flash ponts" 0.5 "probably has flash ponts" 0 "don't now" -0.5 "probably does not have flash ponts" -1 "defntely does not have flash pont" smlarly for x 2 Now suppose we gve our traned system an example t hasn't seen before such as a crac where x " probably has flash ponts" x " probably has a negatve leadng edge pulse" D (2)(0.5) + (0)(0.5) 1 0 " s t a crac" Y (whch s correct)

51 References Slansy, J., and G. Wassel, Pattern Classfers and Tranable Machnes, Sprnger Verlag, 1981 Pao, Y.H., Adaptve Pattern Recognton and Neural Networs, Addson Wesley, 1989 Gale, W.A., Ed., Artfcal Intellgence and Statstcs, Addson Wesley, 1986 Duda, R.O, Hart, P.E., and D.G. Stor, Pattern Classfcaton, 2 nd Ed., John Wley, 2001 Fuunaga, K. Statstcal Pattern Recognton, Academc Press,1990. Webb, A, Statstcal Pattern Recognton, 2 nd Ed., John Wley, Nadler, M., and E.P. Smth, Pattern Recognton Engneerng, John Wley, 1993.

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Société de Calcul Mathématique SA

Société de Calcul Mathématique SA Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

10.34 Fall 2015 Metropolis Monte Carlo Algorithm 10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

p 1 c 2 + p 2 c 2 + p 3 c p m c 2 Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

CHAPTER 3: BAYESIAN DECISION THEORY

CHAPTER 3: BAYESIAN DECISION THEORY HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 - Chapter 9R -Davd Klenfeld - Fall 2005 9 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys a set

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Eigenvalues of Random Graphs

Eigenvalues of Random Graphs Spectral Graph Theory Lecture 2 Egenvalues of Random Graphs Danel A. Spelman November 4, 202 2. Introducton In ths lecture, we consder a random graph on n vertces n whch each edge s chosen to be n the

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Classification learning II

Classification learning II Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

Digital Signal Processing

Digital Signal Processing Dgtal Sgnal Processng Dscrete-tme System Analyss Manar Mohasen Offce: F8 Emal: manar.subh@ut.ac.r School of IT Engneerng Revew of Precedent Class Contnuous Sgnal The value of the sgnal s avalable over

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Fundamental loop-current method using virtual voltage sources technique for special cases

Fundamental loop-current method using virtual voltage sources technique for special cases Fundamental loop-current method usng vrtual voltage sources technque for specal cases George E. Chatzaraks, 1 Marna D. Tortorel 1 and Anastasos D. Tzolas 1 Electrcal and Electroncs Engneerng Departments,

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

SL n (F ) Equals its Own Derived Group

SL n (F ) Equals its Own Derived Group Internatonal Journal of Algebra, Vol. 2, 2008, no. 12, 585-594 SL n (F ) Equals ts Own Derved Group Jorge Macel BMCC-The Cty Unversty of New York, CUNY 199 Chambers street, New York, NY 10007, USA macel@cms.nyu.edu

More information