Randomness and uncertainty play an important

Size: px
Start display at page:

Download "Randomness and uncertainty play an important"

Transcription

1 CHAPTER 3 Probablty, Statstcs, ad Iformato Theory Radomess ad ucertaty play a mportat role scece ad egeerg. Most spoke laguage processg problems ca be characterzed a probablstc framework. Probablty theory ad statstcs provde the mathematcal laguage to descrbe ad aalyze such systems. The crtera ad methods used to estmate the ukow probabltes ad probablty destes form the bass for estmato theory. Estmato theory forms the bascs for parameter learg patter recogto. I ths chapter, three wdely used estmato methods are dscussed. They are mmum mea squared error estmato (MMSE), mamum lkelhood estmato (MLE), ad mamum posteror probablty estmato (MAP). Sgfcace testg s also mportat statstcs, whch deals wth the cofdece of statstcal ferece, such as kowg whether the estmato of some parameter ca be accepted wth cofdece. I patter recogto, sgfcace testg s etremely mportat for determg whether the observed dfferece betwee two dfferet classfers s real. I our coverage of sgfcace testg, we descrbe varous methods that are used patter recogto dscussed. Chapter 4. 73

2 74 Probablty, Statstcs, ad Iformato Theory Iformato theory was orgally developed for effcet ad relable commucato systems. It has evolved to a mathematcal theory cocered wth the very essece of the commucato process. It provdes a framework for the study of fudametal ssues, such as the effcecy of formato represetato ad the lmtatos relable trasmsso of formato over a commucato chael. May of these problems are fudametal to spoke laguage processg. 3.. PROBABILITY THEORY Probablty theory deals wth the averages of mass pheomea occurrg sequetally or smultaeously. We ofte use probablstc epressos our day-to-day lves, such as whe sayg, It s very lkely that the Dow (Dow Joes Idustral de) wll ht, pots et moth, or,the chace of scattered showers Seattle ths weeked s hgh. Each of these epressos s based upo the cocept of the probablty, or the lkelhood, whch some specfc evet wll occur. Probablty ca be used to represet the degree of cofdece the outcome of some actos (observatos), whch are ot defte. I probablty theory, the term sample space, S, s used to refer to the collecto (set) of all possble outcomes. A evet refers to a subset of the sample space or a collecto of outcomes. The probablty of evet A deoted as PA, ( ) ca be terpreted as the relatve frequecy wth whch the evet A would occur f the process were repeated a large umber of tmes uder smlar codtos. Based o ths terpretato, PA ( ) ca be computed smply by coutg the total umber, N S, of all observatos ad the umber of observatos N A whose outcome belogs to the evet A. Thats, N A PA ( ) = (3.) N S PA ( ) s bouded betwee zero ad oe,.e., PA ( ) foralla (3.) The lower boud of probablty PA ( ) s zero whe the evet set A s a empty set. O the other had, the upper boud of probablty PA ( ) s oe whe the evet set A happes to be S. If there are evets A, A, A S such that A, A, A are dsjot ad 7 A = S, = evets A, A, A are sad to form a partto of S. The followg obvous equato forms a fudametal aom for probablty theory. PA ( A A) = PA ( ) = (3.3) =

3 Probablty Theory 75 Based o the defto Eq. (3.), the jot probablty of evet A ad evet B occurrg cocurretly s deoted as PAB ( ) ad ca be calculated as: N AB PAB ( ) = (3.4) N S 3... Codtoal Probablty Ad Bayes' Rule It s useful to study the way whch the probablty of a evet A chages after t has bee leared that some other evet B has occurred. Ths ew probablty deoted as PAB ( ) s called the codtoal probablty of evet A gve that evet B has occurred. Sce the set of those outcomes B that also result the occurrece of A s eactly the set AB as llustrated Fgure 3., t s atural to defe the codtoal probablty as the proporto of the total probablty PB ( ) that s represeted by the jot probablty PAB.Thsleadstothefollowg defto: ( ) PAB ( ) NAB NS PAB ( ) = = (3.5) PB ( ) N N S B S A B AB Fgure 3. The tersecto AB represets where the jot evet A ad B occurs cocurretly. Based o the defto of codtoal probablty, the followg epressos ca be easly derved. PAB ( ) = PA ( BPB ) ( ) = PB ( APA ) ( ) (3.6) Equato (3.6) s the smple verso of the cha rule. The cha rule, whch ca specfy a jot probablty terms of multplcato of several cascaded codtoal probabltes, s ofte used to decompose a complcated jot probablstc problem to a sequece of stepwse codtoal probablstc problems. Eq. (3.6) ca be coverted to such a geeral cha: PAA ( A) = PA ( AA ) PA ( A) PA ( ) (3.7)

4 76 Probablty, Statstcs, ad Iformato Theory Whe two evets, A ad B, are depedet of each other, the sese that the occurrece or of ether of them has o relato to ad o fluece o the occurrece of the other, t s obvous that the codtoal probablty PB ( A) equals to the ucodtoal probablty PB. ( ) It follows that the jot probablty PAB ( ) s smply the product of PA ( ) ad PB ( ) f A ad B, are depedet. If the evets A, A, A form a partto of S ad B s ay evet S as llustrated Fgure 3., the evets AB, AB, A B form a partto of B. Thus, we ca rewrte: B = AB A B A B (3.8) Sce AB, AB, A B are dsjot, PB ( ) = PAB ( ) (3.9) k = k A 3 A B A 4 A A 5 Fgure 3. The tersectos of B wth partto evets A, A, A. Equato (3.9) s called the margal probablty of evet B, where the probablty of evet B s computed from the sum of jot probabltes. Accordg to the cha rule, Eq. (3.6), PAB ( ) = PA ( ) PB ( A), t follows that k = PB ( ) = PA ( ) PB ( A) (3.) k k Combg Eqs. (3.5) ad (3.), we get the well-kow Bayes' rule: PAB ( ) PB ( A) PA ( ) PA ( B) = = PB ( ) PB ( A) PA ( ) k = k Bayes' rule s the bass for patter recogto that s descrbed Chapter 4. k (3.)

5 Probablty Theory Radom Varables Elemets a sample space may be umbered ad referred to by the umbers gve. A varable X that specfes the umercal quatty a sample space s called a radom varable. Therefore, a radom varable X s a fucto that maps each possble outcome s the sample space S oto real umbers X( s ). Sce each evet s a subset of the sample space, a evet s represeted as a set of { s} whch satsfes { s X( s) = }. We use captal letters to deote radom varables ad lower-case letters to deote fed values of the radom varable. Thus, the probablty that X = s deoted as: PX ( = ) = Ps ( Xs ( ) = ) (3.) A radom varable X s a dscrete radom varable, or X has a dscrete dstrbuto, f X ca take oly a fte umber of dfferet values,,,, or at most, a fte sequece of dfferet values,,. If the radom varable X s a dscrete radom varable, the probablty fucto (p.f.) or probablty mass fucto (p.m.f.) of X sdefedtobethe fucto p such that for ay real umber, p ( ) = P( X = ) (3.3) X For the cases whch there s o cofuso, we drop the subscrpto X for px ( ).The sum of probablty mass over all values of the radom varable s equal to uty. p ( ) = PX ( = ) = (3.4) k= k= The margal probablty, cha rule ad Bayes' rule ca also be rewrtte wth respect to radom varables. m p ( ) = PX ( = ) = PX ( = Y, = y) = PX ( = Y= y) PY ( = y) X k k k k= k= m (3.5) PX ( =,, X = ) = PX ( = X =,, X = ) PX ( = X = ) PX ( = ) (3.6) PX ( = Y, = y) PY ( = y X= ) PX ( = ) PX ( = Y= y) = = PY ( = y) PY ( = y X= ) P( X= ) k = k k (3.7) I a smlar maer, f the radom varables X ad Y are statstcally depedet, they ca be represeted as: PX ( = Y, = y) = PX ( = ) PY ( = y)= p ( ) p( y) alladj (3.8) j j X Y j

6 78 Probablty, Statstcs, ad Iformato Theory A radom varable X s a cotuous radom varable, or X has a cotuous dstrbuto, f there ests a oegatve fucto f, defed o the real le, such that for a terval A, PX ( A) = f ( d ) (3.9) A X The fucto f X s called the probablty desty fucto (abbrevated p.d.f.) of X. Wedrop the subscrpt X for fx f there s o ambguty. As llustrated Fgure 3.3, the area of shaded rego s equal to the value of Pa ( X b) f () Fgure 3.3 A eample of p.d.f. The area of the shaded rego s equal to the value of Pa ( X b). Every p.d.f must satsfy the followg two requremets. f( ) for - ad f( ) d = (3.) The margal probablty, cha rule, ad Bayes' rule ca also be rewrtte wth respect to cotuous radom varables: (3.) X = X, Y = X Y Y f ( ) f (, y) dy f ( y) f ( y) dy a b f (,, ) = f (,, ) f ( ) f ( ) (3.) X,, X X X,, X X X X f XY ( y) f (, y) f ( y ) f ( ) XY, YX X = = (3.3) fy ( y) fyx ( y ) fx( ) d

7 Probablty Theory 79 The dstrbuto fucto or cumulatve dstrbuto fucto F of a dscrete or cotuous radom varable X s a fucto defed for all real umber as follows: F( ) = P( X ) for (3.4) For cotuous radom varables, It follows that: F( ) = f ( ) d (3.5) f X X df( ) ( ) = (3.6) d Mea ad Varace Suppose that a dscrete radom varable X has a p.f. f(); the epectato or mea of X s defed as follows: E( X) = f( ) (3.7) Smlarly, f a cotuous radom varable X has a p.d.f. f, theepectato or mea of X s defed as follows: EX ( ) = f( d ) (3.8) I physcs, the mea s regarded as the ceter of mass of the probablty dstrbuto. The epectato ca also be defed for ay fucto of the radom varable X. IfX s a cotuous radom varable wth p.d.f. f, the the epectato of ay fucto g( X) ca be defed as follows: [ ] E g( X) = g( ) f( ) d (3.9) The epectato of a radom varable s a lear operator. That s, t satsfes both addtvty ad homogeety propertes: EaX ( + + ax + b) = aex ( ) + + aex ( ) + b (3.3) where a,, a, b are costats Equato (3.3) s vald regardless of whether or ot the radom varables X,, X are depedet. Suppose that X s a radom varable wth mea µ = EX ( ).Thevarace of X deoted as Var( X ) s defed as follows:

8 8 Probablty, Statstcs, ad Iformato Theory Var( X ) = σ = E ( X µ ) (3.3) where σ, the oegatve square root of the varace s kow as the stadard devato of radom varable X. Therefore, the varace s also ofte deoted as σ. The varace of a dstrbuto provdes a measure of the spread or dsperso of the dstrbuto aroud ts mea µ. A small value of the varace dcates that the probablty dstrbuto s tghtly cocetrated aroud µ, ad a large value of the varace typcally dcates the probablty dstrbuto has a wde spread aroud µ. Fgure 3.4 llustrates three dfferet Gaussa dstrbutos wth the same mea, but dfferet varaces. The varace of radom varable X ca be computed the followg way: [ ] Var( X ) = E( X ) E( X ) (3.3) k I physcs, the epectato EX ( ) s called the k th momet of X for ay radom varable X ad ay postve teger k. Therefore, the varace s smply the dfferece betwee the secod momet ad the square of the frst momet. The varace satsfes the followg addtvty property, f radom varables X ad Y are depedet: Var( X + Y ) = Var( X ) + Var( Y ) (3.33) However, t does ot satsfy the homogeety property. Istead for costat a, Var ax ( ) a Var( X ) = (3.34) Sce t s clear that Var( b ) = for ay costat b, we have a equato smlar to Eq. (3.3) f radom varables X,, X are depedet. Var( a X + + a X + b) = a Var( X ) + + a Var( X ) (3.35) Codtoal epectato ca also be defed a smlar way. Suppose that X ad Y are dscrete radom varables ad let f( y ) deote the codtoal p.f. of Y gve X =, the the codtoal epectato E( Y X) s defed to be the fucto of X whose value EY ( ) whe X = s E ( Y X = ) = yf ( y ) (3.36) YX YX y For cotuous radom varables X ad Y wth fyx ( y ) as the codtoal p.d.f. of Y gve X =, the codtoal epectato EY ( X) s defed to be the fucto of X whose value EY ( ) whe X = s We descrbe Gaussa dstrbutos Secto 3..7

9 Probablty Theory 8 E ( Y X = ) = yf ( y ) dy (3.37) YX YX sgma =.5 sgma = sgma = Fgure 3.4 Three Gaussa dstrbutos wth same mea µ,butdfferetvaraces,.5,.,ad., respectvely. The dstrbuto wth a large value of the varace has a wde spread aroud the mea µ. Sce EY ( X) s a fucto of radom varable X, t tself s a radom varable whose probablty dstrbuto ca be derved from the dstrbuto of X. It ca be show that E X EY X( Y X) = EX, Y( Y) (3.38) More geerally, suppose that X ad Y have a cotuous jot dstrbuto ad that (, ) E g( X, Y) X s gy s ay arbtrary fucto of X ad Y. The codtoal epectato [ ] defedtobethefuctoofx whose value E[ g( X, Y) ] whe X = s [ ] E g( X, Y) X = = g(, y) f ( y ) dy (3.39) YX YX Equato (3.38) ca also be geeralzed to the followg equato: { [ (, ) ]}, [ (, ) ] E E g X Y X = E g X Y (3.4) X Y X X Y Fally, t s worthwhle to troduce meda ad mode. A meda of the dstrbuto of X s defed to be a pot m, such that PX ( m) ad PX ( m).thus,the meda m dvdes the total probablty to two equal parts,.e., the probablty to the left of m ad the probablty to the rght of m are eactly.

10 8 Probablty, Statstcs, ad Iformato Theory Suppose a radom varable X has ether a dscrete dstrbuto wth p.f. p ( ) or cotuous p.d.f. f( ) ; a pot ϖ s called the mode of the dstrbuto f p ( ) or f( ) attas the mamum value at the pot ϖ. A dstrbuto ca have more tha oe modes The Law of Large Numbers The cocept of sample mea ad sample varace s mportat statstcs because most statstcal epermets volve samplg. Suppose that the radom varables X,, X form a radom sample of sze from some dstrbuto for whch the mea s µ ad the varace s σ. I other words, the radom varables X,, X are depedet detcally dstrbuted (ofte abbrevated by..d.) ad each has mea µ ad varace σ. Now f we deote X as the arthmetc average of the observatos the sample, the X = ( X + + X ) (3.4) X X s a radom varable ad s referred to as sample mea. The mea ad varace of ca be easly derved based o the defto. σ EX ( ) = µ ad Var( X) = (3.4) Equato (3.4) states that the mea of sample mea s equal to mea of the dstrbuto, whle the varace of sample mea s oly tmes the varace of the dstrbuto. I other words, the dstrbuto of X wll be more cocetrated aroud the mea µ tha was the orgal dstrbuto. Thus, the sample mea s closer to µ tha s the value of just a sgle observato X from the gve dstrbuto. The law of large umbers s oe of most mportat theorems probablty theory. Formally, t states that the sample mea X coverges to the mea µ probablty, that s, ( ) lm P X µ < ε = for ay gve umber ε > (3.43) The law of large umbers bascally mples that the sample mea s a ecellet estmate of the ukow mea of the dstrbuto whe the sample sze s large.

11 Probablty Theory Covarace ad Correlato Let X ad Y be radom varables havg a specfc jot dstrbuto, ad EX ( ) = µ X, EY ( ) = µ Y, Var( X ) = σ X, ad Var( Y ) = σ Y. The covarace of X ad Y, deoted as Cov( X, Y ), s defed as follows: [ µ X µ Y ] Cov( X, Y) = E ( X )( Y ) = Cov( Y, X ) (3.44), s defed as fol- I addto, the correlato coeffcet of X ad Y, deoted as lows: ρ XY ρ XY Cov( X, Y) = (3.45) σ σ It ca be show that ( XY, ) X Y ρ should be boud wth [ ],thats, ρ( XY, ) (3.46) X ad Y are sad to be postvely correlated f ρ XY >, egatvely correlated f ρ XY <, ad ucorrelated f ρ XY =.Itcaalsobeshowthat Cov( X, Y ) ad ρ XY must have the same sg; that s, both are postve, egatve, or zero at the same tme. Whe E( XY ) =, the two radom varables are called orthogoal. There are several theorems pertag to the basc propertes of covarace ad correlato. We lst here the most mportat oes: Theorem For ay radom varables X ad Y Cov( X, Y ) = E( XY ) E( X ) E( Y ) (3.47) Theorem If X ad Y are depedet radom varables, the Cov( X, Y) = ρ XY = Theorem 3 Suppose X s a radom varable ad Y s a lear fucto of X the form of Y = ax + bfor some costat a ad b, where a.if a >,the ρ XY =.If a <,the ρ XY =. Sometmes, ρ XY s referred to as the amout of lear depedecy betwee radom varables X ad Y. Theorem 4 For ay radom varables X ad Y, Var( X + Y ) = Var( X ) + Var( Y ) + Cov( X, Y ) (3.48) Theorem 5 If X,, X are radom varables, the

12 84 Probablty, Statstcs, ad Iformato Theory (3.49) Var( X ) = Var( X ) + Cov( X, X ) j = = = j= Radom Vectors ad Multvarate Dstrbutos Whe a radom varable s a vector rather tha a scalar, t s called a radom vector ad we ofte use boldface varable lke X = ( X,, X ) to dcate that t s a radom vector. It s sad that radom varables X,, X have a dscrete jot dstrbuto f the radom vector X = ( X,, X ) ca have oly a fte umber or a fte sequece of dfferet values (,, ) R.Thejotp.f.of X,, X s defed to be the fucto f X such that for ay pot (,, ) R, X (,, ) = ( =,, = ) (3.5) f P X X Smlarly, t s sad that radom varables X,, X have a cotuous jot dstrbuto f there s a oegatve fucto f defed o R such that for ay subset A R, [ ] = X P ( X,, X ) A f (,, ) d d (3.5) A The jot dstrbuto fucto ca also be defed smlarly for radom varables X,, X as follows: X (,, ) = (,, ) (3.5) F P X X The cocept of mea ad varace for a radom vector ca be geeralzed to mea vector ad covarace matr. Supposed that X s a -dmesoal radom vector wth compoets X,, X, uder matr represetato, we have X X = (3.53) X The epectato (mea) vector E( X) of radom vector X s a -dmesoal vector whose compoets are the epectatos of the dvdual compoets of X, thats, E( X) E( X) = (3.54) E( X )

13 Probablty Theory 85 The covarace matr Cov( X) of radom vector X s defed to be a matr such that the elemet the th row ad j th colum s Cov( X, Y ),thats, Cov( X, X) Cov( X, X ) t Cov( X) = E [ X E( X )][ X E( X )] = Cov( X, X) Cov( X, X ) j (3.55) It should be emphaszed that the dagoal elemets of the covarace matr Cov( X) are actually the varaces of X,, X. Furthermore, sce covarace s symmetrc,.e., Cov( X, X j) = Cov( X j, X ), the covarace matr Cov( X) must be a symmetrc matr. There s a mportat theorem regardg the mea vector ad covarace matr for a lear trasformato of the radom vector X. Suppose X s a -dmesoal vector as specfed by Eq. (3.53), wth mea vector E( X) ad covarace matr Cov( X ).Now,assume Y s a m-dmesoal radom vector whch s a lear trasform of radom vector X by the relato: Y = AX + B,whereA s a m trasformato matr whose elemets are costats, ad B s a m-dmesoal costat vector. The we have the followg two equatos: E( Y) = AE( X) + B (3.56) t Cov( Y) = ACov( X) A (3.57) Some Useful Dstrbutos I the followg two sectos, we wll troduce several useful dstrbutos that are wdely used applcatos of probablty ad statstcs, partcularly spoke laguage systems Uform Dstrbutos The smplest dstrbuto s uform dstrbuto where the p.f. or p.d.f. s a costat fucto. For uform dscrete radom varable X, whch oly takes possble values from, the p.f. for X s { } PX ( = ) = (3.58) For uform cotuous radom varable X, whch oly takes possble values from real ab,, the p.d.f. for X s terval [ ]

14 86 Probablty, Statstcs, ad Iformato Theory f( ) = a b b a (3.59) f( ) b a a b Fgure 3.5 A uform dstrbuto for p.d.f. Eq. (3.59) Bomal Dstrbutos The bomal dstrbuto s used to descrbe bary-decso evets. For eample, suppose that a sgle co toss wll produce the head wth probablty p ad produce the tal wth probablty p.now,fwetossthesameco tmes ad let X deote the umber of heads observed, the the radom varable X has the followg bomal p.f.: PX ( = ) = f( p, ) = p( p) (3.6).35.3 p=. p=.3 p= Fgure 3.6 Three bomal dstrbutos wth p=.,.3 ad.4. It ca be show that the mea ad varace of a bomal dstrbuto are:

15 Probablty Theory 87 E( X) = p (3.6) Var( X ) = p( p) (3.6) Fgure 3.6 llustrates three bomal dstrbutos wth p =.,.3 ad Geometrc Dstrbutos The geometrc dstrbuto s related to the bomal dstrbuto. As the depedet co toss eample, the head-up has a probablty p ad the tal-up has a probablty p.the geometrc dstrbuto s to model the tme utl a tal-up appears. Let the radom varable X be the tme (the umber of tosses) utl the frst tal-up s show. The p.d.f. of X s the followg form: PX ( = ) = f( p) = p ( p) =,, ad < p< (3.63) The mea ad varace of a geometrc dstrbuto are gve by: EX ( ) = p (3.64) Var( X ) = ( p) (3.65) Oe eample for the geometrc dstrbuto s the dstrbuto of the state durato for a hdde Markov model, as descrbed Chapter 8. Fgure 3.7 llustrates three geometrc dstrbutos wth p =.,.3 ad p =. p=.4 p = Fgure 3.7 Three geometrc dstrbutos wth dfferet parameter p.

16 88 Probablty, Statstcs, ad Iformato Theory Multomal Dstrbutos Suppose that a bag cotas balls of k dfferet colors, where the proporto of the balls of k color s p. Thus, p > for =,, k ad p = =. Now suppose that balls are radomly selected from the bag ad there are eough balls ( > ) of each color. Let X deote the umber of selected balls that are of color. The radom vector X = ( X,, X k ) s sad to have a multomal dstrbuto wth parameters ad p = ( p, p k ). For a vector = (, k ), the p.f. of X has the followg form:!, k p pk where =,, k!, k! P( X= ) = f(, p) = ad + + k = (3.66) otherwse X X Fgure 3.8 A multomal dstrbuto wth =, p =. ad p =.3 are: It ca be show that the mea, varace ad covarace of the multomal dstrbuto E( X ) = p ad Var( X ) = p ( p ) =,, k (3.67)

17 Probablty Theory 89 Cov( X, X ) = p p (3.68) j j Fgure 3.8 shows a multomal dstrbuto wth =, p =. ad p =.3. Sce there are oly two free parameters ad, the graph s llustrated oly usg ad as as. Multomal dstrbutos are typcally used wth the χ test that s oe of the most wdely used goodess-of-ft hypotheses testg procedures descrbed Secto Posso Dstrbutos Aother popular dscrete dstrbuto s Posso dstrbuto. The radom varable X has a Posso dstrbuto wth mea λ ( λ > ) f the p.f. of X has the followg form: e λ λ PX ( = ) = f( λ) = for =,,, (3.69)! otherwse The mea ad varace of a Posso dstrbuto are the same ad equal λ : EX ( ) = VarX ( ) = λ (3.7) lambda= lambda= lambda= Fgure 3.9 Three Posso dstrbutos wth λ =,, ad 4. Fgure 3.9 llustrates three Posso dstrbutos wth λ =,, ad 4. The Posso dstrbuto s typcally used queug theory, where s the total umber of occurreces of some pheomeo durg a fed perod of tme or wth a fed rego of space. Eamples clude the umber of telephoe calls receved at a swtchboard durg a fed perod of

18 9 Probablty, Statstcs, ad Iformato Theory tme. I speech recogto, the Posso dstrbuto s used to model the durato for a phoeme Gamma Dstrbutos A cotuous radom varable X s sad to have a gamma dstrbuto wth parameters α ad β ( α > ad β > )fx has a cotuous p.d.f. of the followg form: α β α β e > f( α, β) = Γ( α) where (3.7) α e d Γ ( α ) = (3.7) It ca be show that the fucto Γ s a factoral fucto whe α s a postve teger. ( )! =,3, Γ ( ) = = (3.73) alpha = alpha = 3 alpha = Fgure 3. Three Gamma dstrbutos wth β =. ad α =., 3., ad 4.. The mea ad varace of a gamma dstrbuto are: EX ( ) = α α ad VarX ( ) β = β (3.74)

19 Probablty Theory 9 Fgure 3. llustrates three gamma dstrbutos wth β =. ad α =., 3., ad 4.. There s a terestg theorem assocated wth gamma dstrbutos. If the radom varables X,, Xk are depedet ad each radom varable X has a gamma dstrbuto wth parameters α ad β, the the sum X + + X k also has a gamma dstrbuto wth parameters α + + αk ad β. A specal case of gamma dstrbuto s called epoetal dstrbuto. A cotuous radom varable X s sad to have a epoetal dstrbuto wth parameters β ( β > )f X has a cotuous p.d.f. of the followg form: β β e > f( β ) = (3.75) It s clear that the epoetal dstrbuto s a gamma dstrbuto wth α =.Themea ad varace of the epoetal dstrbuto are: EX ( ) = ad VarX ( ) β = β (3.76).9.8 beta = beta =.6 beta = Fgure 3. Three epoetal dstrbutos wth β =.,.6 ad.3. Fgure 3. shows three epoetal dstrbutos wth β =.,.6, ad.3. The epoetal dstrbuto s ofte used queug theory for the dstrbutos of the durato of a servce or the ter-arrval tme of customers. It s also used to appromate the dstrbuto of the lfe of a mechacal compoet.

20 9 Probablty, Statstcs, ad Iformato Theory Gaussa Dstrbutos Gaussa dstrbuto s by far the most mportat probablty dstrbuto maly because may scetsts have observed that the radom varables studed varous physcal epermets (cludg speech sgals), ofte have dstrbutos that are appromately Gaussa. The Gaussa dstrbuto s also referred to as ormal dstrbuto. A cotuous radom varable X s sad to have a Gaussa dstrbuto wth mea µ ad varace σ ( σ > )f X has a cotuous p.d.f. the followg form: f ( µ ) = = πσ σ ( µσ, ) N( µσ, ) ep (3.77) It ca be show that µ ad σ are deed the mea ad the varace for the Gaussa dstrbuto. Some eamples of Gaussa ca be foud Fgure 3.4. The use of Gaussa dstrbutos s justfed by the Cetral Lmt Theorem, whch states that observable evets cosdered to be a cosequece of may urelated causes wth o sgle cause predomatg over the others, ted to follow the Gaussa dstrbuto [6]. It ca be show from Eq. (3.77) that the Gaussa f( µ, σ ) s symmetrc wth respect to = µ. Therefore, µ s both the mea ad the meda of the dstrbuto. Moreover, µ s also the mode of the dstrbuto,.e., the p.d.f. f( µ, σ ) attas ts mamum at the mea pot = µ. Several Gaussa p.d.f. s wth the same mea µ, but dfferet varaces are llustrated Fgure 3.4. Readers ca see that the curve has a bell shape. The Gaussa p.d.f. wth a small varace has a hgh peak ad s very cocetrated aroud the mea µ, whereas the Gaussa p.d.f., wth a large varace, s relatvely flat ad s spread out more wdely over the -as. If the radom varable X s a Gaussa dstrbuto wth mea µ ad varace σ, the ay lear fucto of X also has a Gaussa dstrbuto. That s, f Y = ax + b,where a ad b are costats ad a, Y has a Gaussa dstrbuto wth mea aµ + b ad varace a σ. Smlarly, the sum X + + X k of depedet radom varables X,, Xk, where each radom varable has a Gaussa dstrbuto, s also a Gaussa dstrbuto. X Stadard Gaussa Dstrbutos The Gaussa dstrbuto wth mea ad varace, deoted as (,) N, s called the stadard Gaussa dstrbuto or ut Gaussa dstrbuto. Sce the lear trasformato of a Gaussa dstrbuto s stll a Gaussa dstrbuto, the behavor of a Gaussa dstrbuto ca be solely descrbed usg a stadard Gaussa dstrbuto. If the radom varable

21 Probablty Theory 93 X s a Gaussa dstrbuto wth mea µ ad varace show that σ,thats, X ~ N ( µ, σ ), t ca be X µ Z = ~ N(,) (3.78) σ Based o Eq. (3.78), the followg property ca be show: P( X µ kσ) = P( Z k) (3.79) Equato (3.79) demostrates that every Gaussa dstrbuto cotas the same total amout of probablty wth ay fed umber of stadard devatos of ts mea The Cetral Lmt Theorem If radom varables X,, X are..d. accordg to a commo dstrbuto fucto wth mea µ ad varace σ, the as the radom sample sze approaches, the followg radom varable has a dstrbuto covergg to the stadard Gaussa dstrbuto: X ( µ ) Y = ~ N (,) (3.8) σ where X s the sample mea of radom varables X,, X asdefedeq.(3.4). Based o Eq. (3.8), the sample mea radom varable X ca be appromated by a Gaussa dstrbuto wth mea µ ad varace σ /. The cetral lmt theorem above s appled to..d. radom varables X,, X.A. Lapouov 9 derved aother cetral lmt theorem for depedet but ot ecessarly detcally dstrbuted radom varables X,, X. Suppose X,, X are depedet 3 radom varables ad E( X µ ) < for ; the followg radom varable wll coverge to stadard Gaussa dstrbuto whe. / Y = ( X µ )/ σ = = = (3.8) I other words, the sum of radom varables X,, X ca be appromated by a / Gaussa dstrbuto wth mea µ ad varace σ. = = Both cetral lmt theorems essetally state that regardless of ther orgal dvdual dstrbutos, the sum of may depedet radom varables (effects) teds to be dstrbuted lke a Gaussa dstrbuto as the umber of radom varables (effects) becomes large.

22 94 Probablty, Statstcs, ad Iformato Theory Multvarate Mture Gaussa Dstrbutos Whe X = ( X,, X ) s a -dmesoal cotuous radom vector, the multvarate Gaussa p.d.f. has the followg form: t f( X = µ, Σ) = N( ; µ, Σ) = ep ( ) ( ) / / µ Σ µ Σ ( π ) (3.8) where µ s the -dmesoal mea vector, Σ s the covarace matr, ad Σ s the determat of the covarace matr Σ. µ = E( ) (3.83) Σ = E ( µ )( µ ) t (3.84) of covarace matr Σ ca be specfed as fol- More specfcally, the -j th elemet lows: σ j σ j = E( µ )( j µ j ) (3.85) Fgure 3. A two-dmesoal multvarate Gaussa dstrbuto wth depedet radom varables ad that have the same varace.

23 Probablty Theory 95 If X,, X are depedet radom varables, the covarace matr Σ s reduced to dagoal covarace where all the off-dagoal etres are zero. The dstrbuto ca be regarded as depedet scalar Gaussa dstrbutos. The jot p.d.f. s the product of all the dvdual scalar Gaussa p.d.f.. Fgure 3. shows a two-dmesoal multvarate Gaussa dstrbuto wth depedet radom varables ad wth the same varace. Fgure 3.3 shows aother two-dmesoal multvarate Gaussa dstrbuto wth depedet radom varables ad that have dfferet varaces. Although Gaussa dstrbutos are umodal, more comple dstrbutos wth multple local mama ca be appromated by Gaussa mtures: K f( ) = ck Nk( ; µ k, Σ k) (3.86) k = where c k, the mture weght assocated wth kth Gaussa compoet are subject to the followg costrat: c k K k k = ad c = Gaussa mtures wth eough mture compoets ca appromate ay dstrbuto. Throughout ths book, most cotuous probablty desty fuctos are modeled wth Gaussa mtures. Fgure 3.3 Aother two-dmesoal multvarate Gaussa dstrbuto wth depedet A umodal dstrbuto has a sgle mamum (bump) for the dstrbuto. For Gaussa dstrbuto, the mamum occurs at the mea.

24 96 Probablty, Statstcs, ad Iformato Theory radom varable ad whch have dfferet varaces χ Dstrbutos A gamma dstrbuto wth parameters α ad β s defed Eq. (3.7). For ay gve postve teger, the gamma dstrbuto for whch α = ad β = s called the dstrbuto wth degrees of freedom. It follows from Eq. (3.7) that the p.d.f. for the dstrbuto s χ χ ( ) e > f( ) = Γ( ) (3.87) χ dstrbutos are mportat statstcs because they are closely related to radom samples of Gaussa dstrbuto. They are wdely appled may mportat problems of statstcal ferece ad hypothess testg. Specfcally, f the radom varables X,, X are depedet ad detcally dstrbuted, ad f each of these varables has a stadard Gaussa dstrbuto, the the sum of square X + + X ca be proved to have a χ dstrbuto wth degree of freedom. Fgure 3.4 llustrates three =, 3 ad 4. χ dstrbutos wth = =3 = Fgure 3.4 Three χ dstrbutos wth =, 3, ad 4. The mea ad varace for the χ dstrbuto are

25 Probablty Theory 97 EX ( ) = ad VarX ( ) = (3.88) Followg the addtvty property of the gamma dstrbuto, the χ dstrbuto also has the addtvty property. That s, f the radom varables X,, X are depedet ad f X has a χ dstrbuto wth k degrees of freedom, the sum X + + X has a χ dstrbuto wth k + + k degrees of freedom Log-Normal Dstrbuto Let, be a Gaussa radom varable wth mea µ ad stadard devato σ,the y = e (3.89) follows a log-ormal dstrbuto (l y µ ) f( y µ, σ) = ep yσ σ π show Fgure 3.5, ad whose mea s gve by (3.9) ( µ ) µ y = E{} y = E{ e } = ep{} ep d πσ σ ( ( µ + σ ) = ep + / ep = ep + / πσ σ { µ σ } d { µ σ } (3.9) where we have rearraged the quadratc form of ad made use of the fact that the total probablty mass of a Gaussa s. Smlarly, the secod order momet of y s gve by E{ y } = ep{ } ep d πσ σ ( µ ) ( ( µ + σ ) = ep + ep = ep + πσ σ { µ σ } d { µ σ } ad thus the varace of y s gve by ( E y ) ( { } ) y y (3.9) σ = E{ y } { } = µ ep σ (3.93)

26 98 Probablty, Statstcs, ad Iformato Theory.35.3 std=3 std= std= Fgure 3.5 Logormal dstrbuto for µ = ad σ = 3, ad.5 accordg to Eq. (3.9). Smlarly, f s a Gaussa radom vector wth mea the radom vector µ ad covarace matr Σ, y = e s log-ormal wth mea ad covarace matr [8] gve by { } ( { } ) µ y[] = ep µ [] + Σ[,]/ Σ [, j] = µ [] µ [ j] ep Σ [, j] y y y usg a smlar dervato as Eqs. (3.9) to (3.93). (3.94) 3.. ESTIMATION THEORY Estmato theory ad sgfcace testg are two most mportat theores ad methods of statstcal ferece. I ths secto, we descrbe estmato theory whle sgfcace testg s covered the et secto. A problem of statstcal ferece s oe whch data geerated accordace wth some ukow probablty dstrbuto must be aalyzed, ad some type of ferece about the ukow dstrbuto must be made. I a problem of statstcal ferece, ay characterstc of the dstrbuto geeratg the epermetal data, such as the mea µ ad varace σ of a Gaussa dstrbuto, s called a parameter of the dstrbuto. The set Ω of all possble values of a parameter Φ or a group of parameters Φ, Φ,, Φ s called the parameter space. I ths secto we focus o how to estmate the parameter Φ from sample data. Before we descrbe varous estmato methods, we troduce the cocept ad ature of the estmato problems. Suppose that a set of radom varables X = { X, X,, X } s

27 Estmato Theory 99..d. accordg to a p.d.f. pφ ( ) where the value of the parameter Φ s ukow. Now, suppose also that the value of Φ must be estmated from the observed values the sample. A estmator of the parameter Φ, based o the radom varables X, X,, X, s a realvalued fucto θ ( X, X,, X ) that specfes the estmated value of Φ for each possble set of values of X, X,, X. That s, f the sample values of X, X,, X tur out to be,,,, the the estmated value of Φ wll be θ (,,, ). We eed to dstgush betwee estmator, estmate, ad estmato. Aestmator θ ( X, X,, X ) s a fucto of the radom varables, whose probablty dstrbuto ca be derved from the jot dstrbuto of X, X,, X. O the other had, a estmate s a specfc value θ (,,, ) of the estmator that s determed by usg some specfc sample values,,,. Estmato s usually used to dcate the process of obtag such a estmator for the set of radom varables or a estmate for the set of specfc sample values. If we use the otato X = { X, X,, X } to represet the vector of radom varables ad = {,,, } to represet the vector of sample values, a estmator ca be deoted as θ ( X) ad a estmate θ ( ). Sometmes we abbrevate a estmator θ ( X) by just the symbol θ. I the followg four sectos we descrbe ad compare three dfferet estmators (estmato methods). They are mmum mea square estmator, mamum lkelhood estmator, adbayes estmator. The frst oe s ofte used to estmate the radom varable tself, whle the latter two are used to estmate the parameters of the dstrbuto of the radom varables Mmum/Least Mea Squared Error Estmato Mmum mea squared error (MMSE) estmato ad least squared error (LSE) estmato are mportat methods for radom varable sce the goal (mmze the squared error) s a tutve oe. I geeral, two radom varables X ad Y are..d. accordg to some p.d.f. fxy, (, y ). Suppose that we perform a seres of epermets ad observe the value of X. We wat to fd a trasformato Y ˆ = g( X) such that we ca predct the value of the radom varable Y. The followg quatty ca measure the goodess of such a trasformato. EY ( Yˆ ) EY ( g( X)) = (3.95) Ths quatty s called mea squared error (MSE) because t s the mea of the squared error of the predctor g( X ). The crtero of mmzg the mea squared error s a good oe for pckg the predctor g( X ). Of course, we usually specfy the class of fucto G, fromwhch g( X) may be selected. I geeral, there s a parameter vector Φ assocated wth the fucto g( X ), so the fucto ca be epressed as g( X, Φ ). The process to

28 Probablty, Statstcs, ad Iformato Theory fd the parameter vector Φˆ MMSE that mmzes the mea of the squared error s called mmum mea squared error estmato ad Φ ˆ MMSE s called the mmum mea squared error estmator. That s, Φ ˆ arg m ( (, )) MMSE = E Y g X Φ (3.96) Φ Sometmes, the jot dstrbuto of radom varables X ad Y s ot kow. Istead, samples of (,y) pars may be observable. I ths case, the followg crtero ca be used stead, LSE Φ = Φ = arg m y g(, Φ ) (3.97) The argumet of the mmzato Eq. (3.97) s called sum-of-squared-error (SSE) ad the process of fdg the parameter vector Φ ˆ LSE, whch satsfes the crtero s called least squared error estmato or mmum squared error estmato. LSE s a powerful mechasm for curve fttg, where the fucto g(, Φ) descrbes the observato pars (, y ).I geeral, there are more pots () tha the umber of free parameters fucto gφ (, ), so the fttg s over-determed. Therefore, o eact soluto ests, ad LSE fttg becomes ecessary. It should be emphaszed that MMSE ad LSE are actually very smlar ad share smlar propertes. The quatty Eq. (3.97) s actually tmes the sample mea of the squared error. Based o the law of large umbers, whe the jot probablty fxy, (, y) s uform or the umber of samples approaches to fty, MMSE ad LSE are equvalet. For the class of fuctos, we cosder the followg three cases: Costat fuctos,.e., c { ( ), } G = g = c c R (3.98) Lear fuctos,.e., l { ( ),, } G = g = a+ b a b R (3.99) Other o-lear fuctos Gl 3... MMSE/LSE for Costat Fuctos Whe Y ˆ = g( ) = c, Eq. (3.95) becomes EY ( Yˆ ) EY ( c) = (3.)

29 Estmato Theory To fd the MMSE estmate for c, we take the dervatves of both sdes Eq. (3.) wth respect to c ad equate t to. The MMSE estmate c s gve as MMSE cmmse = E( Y) (3.) ad the mmum mea squared error s eactly the varace of Y, Var( Y ). For the LSE estmate of c, the quatty Eq. (3.97) becomes = [ y ] c m (3.) Smlarly, the LSE estmate c LSE ca be obtaed as follows: c LSE = = y (3.3) The quatty Eq. (3.3) s the sample mea MMSE ad LSE For Lear Fuctos Whe Y ˆ = g( ) = a+ b, Eq. (3.95) becomes eab (, ) EY ( Yˆ ) EY ( a b) = = (3.4) TofdtheMMSEestmateofa ad b, we ca frst set e e =, ad = (3.5) a b ad solve the two lear equatos. Thus, we ca obta cov( XY, ) σ Y a = = ρ XY (3.6) Var( X ) σ X σ Y b= E( Y) ρ XY E( X) (3.7) σ X For LSE estmato, we assume that the sample s a d-dmesoal vector for geeralty. Assumg we have sample-vectors (, y) = (,,,, y), =, a lear d fucto ca be represeted as

30 Probablty, Statstcs, ad Iformato Theory d y a d ˆ y a or Y= XA = d y a d The sum of squared error ca the be represeted as (3.8) t ( ) y e( A) Yˆ Y A (3.9) = = = A closed-form soluto of the LSE estmate of A ca be obtaed by takg the gradet of ( ) e A, t t e( A) = ( A y ) = X ( XA Y ) (3.) = ad equatg t to zero. Ths yelds the followg equato: t t XXA= XY (3.) Thus the LSE estmate ALSE wll be of the followg form: t t ALSE = ( X X) X Y (3.) t t ( XX) X Eq. (3.) s also refereed to as the pseudo-verse of X ad s sometmes deoted as X. t Whe XX s sgular or some boudary codtos cause the LSE estmato Eq. (3.) to be uattaable, some umerc methods ca be used to fd a appromate soluto. Istead of mmzg the quatty Eq. (3.9), oe ca mmze the followg quatty: e( ) A = XA Y + α X (3.3) Followg a smlar procedure, oe ca obta the LSE estmate to mmze the quatty above the followg form. t A = ( X X+ I) X Y (3.4) * t LSE α The LSE soluto Eq. (3.) ca be used for polyomal fuctos too. I the problem of polyomal curve fttg usg the least square crtero, we are amg to fd the coeffcets A = ( a, a, a,, a ) t d that mmze the followg quatty: m EY ( Y ˆ) a a a a,,,, d (3.5)

31 Estmato Theory 3 where Yˆ = a + a + a + + a d d To obta the LSE estmate of coeffcets A = ( a, a, a,, a ) t d, smply chage the formato of matr X Eq. (3.8) to the followg: d d X = d (3.6) j j Note that Eq. (3.8) meas the j-th dmeso of sample, whle Eq. (3.6) meas j-th order of value. Therefore, the LSE estmate of polyomal coeffcets A = ( a, a, a,, a ) t has the same form as Eq. (3.). LSE d MMSE/LSE For Nolear Fuctos As the most geeral case, cosder solvg the followg mmzato problem: g( ) Gl [ ] m EY gx ( ) (3.7) Sce we eed to deal wth all possble olear fuctos, takg a dervatve does ot work here. Istead, we use the property of codtoal epectato to solve ths mmzato problem. By applyg Eq. (3.38) to (3.7), we get { } [ ] [ ] XY, X YX E Y g( X) = E E Y g( X) X = = E YX [ Y g( X) ] X = fx( ) d - = E YX [ Y g( ) ] X = fx( ) d - (3.8) Sce the tegrad s oegatve Eq. (3.8), the quatty Eq. (3.7) wll be mmzed at the same tme the followg equato s mmzed. [ ] m E Y YX g ( ) X = g( ) R (3.9) Sce g ( ) s a costat the calculato of the codtoal epectato above, the MMSE estmate ca be obtaed the same way as the costat fuctos Secto 3... Thus, the MMSE estmate should take the followg form: Y ˆ = g ( X ) = E ( Y X ) (3.) MMSE Y X

32 4 Probablty, Statstcs, ad Iformato Theory If the value X = s observed ad the value EY ( X= ) s used to predct Y, the mea squared error (MSE) s mmzed ad specfed as follows: E YX Y EYX ( Y X ) X = = = VarYX ( Y X = ) The overall MSE, averaged over all the possble values of X, s: { } X Y X X Y X Y X X Y X (3.) E Y E ( Y X) = E E Y E ( Y X) X = E Var( Y X = ) (3.) It s mportat to dstgush betwee the overall MSE EX VarY X( Y X) ad the MSE of the partcular estmate whe X =,whchs VarYX ( Y X = ). Before the value of X s observed, the epected MSE for the process of observg X ad predctg Y s EX VarY X( Y X). O the other had, after a partcular value of X has bee observed ad the predcto EYX ( Y X = ) has bee made, the approprate measure of MSE of the predcto s VarYX ( Y X = ). I geeral, the form of the MMSE estmator for olear fuctos depeds o the form of the jot dstrbuto of X ad Y. There s o mathematcal closed-form soluto. To get the codtoal epectato Eq. (3.), we have to perform the followg tegral: Yˆ( ) = yfy ( y X = ) dy (3.3) It s dffcult to solve ths tegral calculato. Frst, dfferet measures of could determe dfferet codtoal p.d.f. for the tegral. Eact formato about the p.d.f. s ofte mpossble to obta. Secod, there could be o aalytc soluto for the tegral. Those dffcultes reduce the terest of the MMSE estmato of olear fuctos to theoretcal aspects oly. The same dffcultes also est for LSE estmato for olear fuctos. Some certa classes of well-behaved olear fuctos are typcally assumed for LSE problems ad umerc methods are used to obta LSE estmate from sample data Mamum Lkelhood Estmato Mamum lkelhood estmato (MLE) s the most wdely used parametrc estmato method, largely because of ts effcecy. Suppose that a set of radom samples X = { X, X,, X } s to be draw depedetly accordg to a dscrete or cotuous dstrbuto wth the p.f. or the p.d.f. pφ ( ), where the parameter vector Φ belogs to some parameter space Ω. Gve a observed vector = (,, ),thelkelhood of the set of sample data vectors wth respect to Φ s defed as the jot p.f. or jot p.d.f. p( Φ ) ; p ( ) Φ s also referred to as the lkelhood fucto.

33 Estmato Theory 5 MLE assumes the parameters of p.d.f. s are fed but ukow ad ams to fd the set of parameters that mamzes the lkelhood of geeratg the observed data. For eample, the p.d.f. p( Φ) s assumed to be a Gaussa dstrbuto N ( µ, Σ ),thecompoetsof Φ wll the clude eactly the compoets of mea-vector µ ad covarace matr Σ. Sce X, X,, X are depedet radom varables, the lkelhood ca be rewrtte as follows: p( Φ) = p( k Φ ) (3.4) k = The lkelhood p( Φ) ca be vewed as the probablty of geeratg the sample data set based o parameter set Φ.Themamum lkelhood estmator of Φ s deoted as Φ that mamzes the lkelhood p ( Φ ).Thats, MLE Φ = argma p ( Φ ) (3.5) MLE Φ Ths estmato method s called the mamum lkelhood estmato method ad s ofte abbrevated as MLE. Sce the logarthm fucto s a mootocally creasg fucto, the parameter set ΦMLE that mamzes the log-lkelhood should also mamze the lkelhood. If p ( Φ) s dfferetable fucto of Φ, ΦMLE ca be attaed by takg the partal dervatve wth respect to Φ ad settg t to zero. Specfcally, let Φ be a k- compoet parameter vector Φ = ( Φ, Φ,, Φ ) t k ad Φ be the gradet operator: Φ Φ = (3.6) Φk The log-lkelhood becomes: l( Φ) log p ( Φ) log p( Φ ) (3.7) = = k = ad ts partal dervatve s: k = k l( Φ) = log p( Φ ) (3.8) Φ Φ k Thus, the mamum lkelhood estmate of Φ ca be obtaed by solvg the followg set of k equatos: l( ) = Φ Φ (3.9)

34 6 Probablty, Statstcs, ad Iformato Theory Eample Let s take a look at the mamum lkelhood estmator of a uvarate Gaussa p.d.f., gve as the followg equato: ( µ ) p ( Φ ) = ep (3.3) πσ σ are the mea ad the varace respectvely. The parameter vector Φ de- where µ ad otes σ ( µ, σ ). The log-lkelhood s: log p ( Φ) = log p( Φ) k = ( k µ ) = log ep k = πσ σ = log( πσ ) ( µ ) k σ k = ad the partal dervatve of the above epresso s: log p( Φ) = ( ) k µ µ σ k = ( µ ) σ σ σ k log p ( ) Φ = + 4 k = k We set the two partal dfferetal dervatves to zero, k = ( k µ ) = σ ( µ ) + = σ k 4 k = σ (3.3) (3.3) (3.33) are obtaed by solvg the above equa- The mamum lkelhood estmates for µ ad tos: µ = = E( ) MLE k k = MLE = ( k MLE ) = E( MLE ) k = σ µ µ σ (3.34) Equato (3.34) dcates that the mamum lkelhood estmato for mea ad varace s just the sample mea ad varace.

35 Estmato Theory 7 Eample For the multvarate Gaussa p.d.f. p( ) t p( Φ) = ep ( ) Σ ( ) d / / µ µ Σ ( π ) (3.35) The mamum lkelhood estmates of m ad Σ ca be obtaed by a smlar procedure. µ ˆ = MLE k k = t MLE = k MLE k MLE = E k MLE k MLE k = ˆ t Σ ( µ ˆ )( µ ˆ ) ( µ ˆ )( µ ˆ ) (3.36) Oce aga, the mamum lkelhood estmato for mea vector ad co-varace matr s the sample mea vector ad sample covarace matr. I some stuatos, a mamum lkelhood estmato of Φ may ot est, or the mamum lkelhood estmator may ot be uquely defed,.e., there may be more tha oe MLE of Φ for a specfc set of sample values. Fortuately, accordg to Fsher s theorem, for most practcal problems wth a well-behaved famly of dstrbutos, the MLE ests ad s uquely defed [4, 5, 6]. I fact, the mamum lkelhood estmator ca be prove to be soud uder certa codtos. As metoed before, the estmator θ ( X) s a fucto of the vector of radom varables X that represet the sample data. θ ( X) tself s also a radom varable, wth a dstrbuto determed by jot dstrbutos of X.Let Φ be the parameter vector of true dstrbuto pφ ( ) from whch the samples are draw. If the followg three codtos hold:. The sample s a draw from the assumed famly of dstrbuto,. The famly of dstrbutos s well behaved, 3. The sample s large eough, the mamum lkelhood estmator, Φ MLE, has a Gaussa dstrbuto wth a mea Φ ad avaraceoftheform/b [6], where stheszeofsamplead B s the Fsher formato, whch s determed solely by Φ ad. A estmator s sad to be cosstet, ff the estmate wll coverge to the true dstrbuto whe there s fte umber of trag samples. lm Φ = Φ (3.37) > MLE

36 8 Probablty, Statstcs, ad Iformato Theory ΦMLE Φ MLE s a cosstet estmator based o the aalyss above. I addto, t ca be show that o cosstet estmator has a lower varace tha.iotherwords,oestmator provdes a closer estmate of the true parameters tha the mamum lkelhood estmator Bayesa Estmato ad MAP Estmato Bayesa estmato has a dfferet phlosophy tha mamum lkelhood estmato. Whle MLE assumes that the parameter Φ 3 s fed but ukow, Bayesa estmato assumes that the parameter Φ tself s a radom varable wth a pror dstrbuto p( Φ ). Suppose we observe a sequece of radom samples = {,,, }, whch are..d. wth a p.d.f. pφ ( ). Accordg to Bayes rule, we have the posteror dstrbuto of Φ as: p( Φ) p( Φ) p( Φ ) = p( Φ) p( Φ) p( ) (3.38) I Eq. (3.38), we dropped the deomator p( ) here because t s depedet of the parameter Φ. The dstrbuto Eq. (3.38) s called the posteror dstrbuto of Φ because t s the dstrbuto of Φ after we observed the values of radom varables X, X,, X Pror ad Posteror Dstrbutos For mathematcal tractablty, cojugate prors are ofte used Bayesa estmato. Suppose a radom sample s take of a kow dstrbuto wth p.d.f. p( Φ). A cojugate pror for the radom varable (or vector) s defed as the pror dstrbuto for the parameters of the probablty desty fucto of the radom varable (or vector), such that the class-codtoal p.d.f. p( Φ), the posteror dstrbuto p( Φ ), ad the pror dstrbuto p( Φ) belog to the same dstrbuto famly. For eample, t s well kow that the cojugate pror for the mea of a Gaussa p.d.f. s also a Gaussa p.d.f. [4]. Now, let s derve such a posteror dstrbuto p( Φ ) from the wdely used Gaussa cojugate pror. Eample Suppose X, X,, X are draw from a Gaussa dstrbuto for whch the mea Φ s a radom varable ad the varace σ s kow. The lkelhood fucto p( Φ) ca be wrtte as: 3 For smplcty, we assume the parameter Φ s a scalar stead of a vector here. However, the eteso to a parameter vector Φ ca be derved accordg to a smlar procedure.

37 Estmato Theory 9 p( Φ Φ Φ ) = ep ep / ( π) σ = σ = σ (3.39) To further smply Eq. (3.39), we could use Eq. (3.4) ( Φ ) = ( Φ ) + ( ) = = (3.4). where = = thesamplemeaof = {,,, } = Let s rewrte p( Φ) Eq. (3.39) to Eq. (3.4): p( Φ) ep ( Φ ) ep ( ) σ σ (3.4) = Now supposed the pror dstrbuto of Φ s also a Gaussa dstrbuto wth mea µ ad varace ν,.e., the pror dstrbuto p( Φ) s gve as follows: Φ µ Φ µ p( Φ ) = ep ep / ( π) ν ν ν (3.4) By combg Eqs. (3.4) ad (3.4) whle droppg the secod term Eq. (3.4) we could atta the posteror p.d.f. p( Φ ) the followg equato: p( Φ ) ep ( Φ ) + ( Φ µ ) σ ν (3.43) Now f we defe ρ ad τ as follows: σ µ + ν ρ = σ + ν (3.44) τ σν = σ + ν We ca rewrte Eq. (3.43) ca be rewrtte as: (3.45) p( Φ ) ep ( ρ ) ( µ ) Φ + τ σ + ν (3.46) Sce the secod term Eq. (3.46) does ot deped o Φ, t ca be absorbed the costat factor. Fally, we have the posteror p.d.f. the followg form:

38 Probablty, Statstcs, ad Iformato Theory ( ) ep ( ) p Φ = Φ ρ πτ τ (3.47) Equato (3.47) shows that the posteror p.d.f. p( Φ ) s a Gaussa dstrbuto wth mea ρ ad varace τ as defed Eqs. (3.44) ad (3.45). The Gaussa pror dstrbuto defed Eq. (3.4) s a cojurgate pror Geeral Bayesa Estmato The foremost requremet of a good estmator θ s that t ca yeld a estmate of Φ ( θ ( X )) whch s close to the real value Φ. I other words, a good estmator s oe for whch t s hghly probable that the error θ ( X) Φ s close to. I geeral, we ca defe a loss fucto 4 R( ΦΦ, ). It measures the loss or cost assocated wth the fact that the true value of the parameter s Φ whle the estmate s Φ. Whe oly the pror dstrbuto p( Φ) s avalable ad o sample data has bee observed, f we choose oe partcular estmate Φ, the epected loss s: E R( ΦΦ, ) = R( ΦΦ, ) p( Φ) dφ (3.48) The fact that we could derve posteror dstrbuto from the lkelhood fucto ad the pror dstrbuto [as show the dervato of Eq. (3.47)] s very mportat here because t allows us to compute the epected posteror loss after sample vector s observed. The epected posteror loss assocated wth estmate Φ s: E R( ΦΦ, ) = R( ΦΦ, ) p( Φ ) dφ (3.49) The Bayesa estmator of Φ s defed as the estmator that attas mmum Bayes rsk, that s, mmzes the epected posteror loss fucto (3.49). Formally, the Bayesa estmator s chose accordg to: θ Bayes [ θ ] ( ) = argm E R( Φ, ( )) (3.5) θ The Bayesa estmator of Φ s the estmator θbayes for whch Eq. (3.5) s satsfed for every possble value of of radom vector X. Therefore, the form of the Bayesa estmator θ Bayes should deped oly o the loss fucto ad the pror dstrbuto, but ot the sample value. 4 The Bayesa estmato ad loss fucto are based o Bayes decso theory, whch wll be descrbed detal Chapter 4.

39 Estmato Theory Oe of the most commo loss fuctos used statstcal estmato s the mea squared error fucto []. The mea squared error fucto for Bayesa estmato should have the followg form: R( Φ, θ( )) ( θ( )) = Φ (3.5) to mmze the epected pos- I order to fd the Bayesa estmator, we are seekg teror loss fucto: θ Bayes [ ( Φ, θ( )) ] = ( Φ θ( )) = ( Φ ) θ( ) ( Φ ) θ( ) E R E E E (3.5) The mmum value of ths fucto ca be obtaed by takg the partal dervatve of Eq. (3.5) wth respect to θ ( ). Sce the above equato s smply a quadratc fucto of θ ( ), t ca be show that the mmum loss ca be acheved whe o the followg equato: θ Bayes s chose based θ ( ) = E( Φ ) (3.53) Bayes Equato (3.53) traslates to the fact that the Bayesa estmate of the parameter Φ for mea squared error fucto s equal to the mea of the posteror dstrbuto of Φ.I the followg secto, we dscuss aother popular loss fucto (MAP estmato) that also geerates the same estmate for certa dstrbuto fuctos MAP Estmato Oe tutve terpretato of Eq. (3.38) s that a pror p.d.f. p( Φ) represets the relatve lkelhood before the values of X, X,, X have bee observed; whle the posteror p.d.f. p( Φ ) represets the relatve lkelhood after the values of X, X,, X have bee observed. Therefore, choosg a estmate Φ that mamzes posteror probablty s cosstet wth out tuto. Ths estmator s fact the mamum posteror probablty (MAP) estmator ad s the most popular Bayesa estmator. The loss fucto assocated wth the MAP estmator s the so-called uform loss fucto []:, f θ ( ) Φ R( Φ, θ ( )) = where > (3.54), f θ ( ) Φ > Now let s see how ths uform loss fucto results MAP estmato. Based o loss fucto defed above, the epected posteror loss fucto s: ER ( ( Φ, θ( )) ) = P( θ( ) Φ > ) θ ( ) + (3.55) = P( θ ( ) Φ ) = p( Φ ) θ ( )

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

STK4011 and STK9011 Autumn 2016

STK4011 and STK9011 Autumn 2016 STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Lecture Notes Types of economic variables

Lecture Notes Types of economic variables Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

Module 7. Lecture 7: Statistical parameter estimation

Module 7. Lecture 7: Statistical parameter estimation Lecture 7: Statstcal parameter estmato Parameter Estmato Methods of Parameter Estmato 1) Method of Matchg Pots ) Method of Momets 3) Mamum Lkelhood method Populato Parameter Sample Parameter Ubased estmato

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Qualifying Exam Statistical Theory Problem Solutions August 2005

Qualifying Exam Statistical Theory Problem Solutions August 2005 Qualfyg Exam Statstcal Theory Problem Solutos August 5. Let X, X,..., X be d uform U(,),

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION Iteratoal Joural of Mathematcs ad Statstcs Studes Vol.4, No.3, pp.5-39, Jue 06 Publshed by Europea Cetre for Research Trag ad Developmet UK (www.eajourals.org BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL

More information

ρ < 1 be five real numbers. The

ρ < 1 be five real numbers. The Lecture o BST 63: Statstcal Theory I Ku Zhag, /0/006 Revew for the prevous lecture Deftos: covarace, correlato Examples: How to calculate covarace ad correlato Theorems: propertes of correlato ad covarace

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for Chapter 4-5 Notes: Although all deftos ad theorems troduced our lectures ad ths ote are mportat ad you should be famlar wth, but I put those

More information

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uversty Mchael Bar Sprg 5 Mdterm am, secto Soluto Thursday, February 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes eam.. No calculators of ay kd are allowed..

More information

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3 IOSR Joural of Mathematcs IOSR-JM e-issn: 78-578, p-issn: 9-765X. Volume, Issue Ver. II Ja - Feb. 05, PP 4- www.osrjourals.org Bayesa Ifereces for Two Parameter Webull Dstrbuto Kpkoech W. Cheruyot, Abel

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statstc ad Radom Samples A parameter s a umber that descrbes the populato. It s a fxed umber, but practce we do ot kow ts value. A statstc s a fucto of the sample data,.e., t s a quatty whose

More information

Lecture Note to Rice Chapter 8

Lecture Note to Rice Chapter 8 ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY 3 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER I STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad

More information

STATISTICAL INFERENCE

STATISTICAL INFERENCE (STATISTICS) STATISTICAL INFERENCE COMPLEMENTARY COURSE B.Sc. MATHEMATICS III SEMESTER ( Admsso) UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY P.O., MALAPPURAM, KERALA, INDIA -

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Generative classification models

Generative classification models CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 1 Review of Fundamental Statistical Concepts Lecture Revew of Fudametal Statstcal Cocepts Measures of Cetral Tedecy ad Dsperso A word about otato for ths class: Idvduals a populato are desgated, where the dex rages from to N, ad N s the total umber

More information

22 Nonparametric Methods.

22 Nonparametric Methods. 22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

Mu Sequences/Series Solutions National Convention 2014

Mu Sequences/Series Solutions National Convention 2014 Mu Sequeces/Seres Solutos Natoal Coveto 04 C 6 E A 6C A 6 B B 7 A D 7 D C 7 A B 8 A B 8 A C 8 E 4 B 9 B 4 E 9 B 4 C 9 E C 0 A A 0 D B 0 C C Usg basc propertes of arthmetc sequeces, we fd a ad bm m We eed

More information

PROPERTIES OF GOOD ESTIMATORS

PROPERTIES OF GOOD ESTIMATORS ESTIMATION INTRODUCTION Estmato s the statstcal process of fdg a appromate value for a populato parameter. A populato parameter s a characterstc of the dstrbuto of a populato such as the populato mea,

More information

Descriptive Statistics

Descriptive Statistics Page Techcal Math II Descrptve Statstcs Descrptve Statstcs Descrptve statstcs s the body of methods used to represet ad summarze sets of data. A descrpto of how a set of measuremets (for eample, people

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

CODING & MODULATION Prof. Ing. Anton Čižmár, PhD.

CODING & MODULATION Prof. Ing. Anton Čižmár, PhD. CODING & MODULATION Prof. Ig. Ato Čžmár, PhD. also from Dgtal Commucatos 4th Ed., J. G. Proaks, McGraw-Hll It. Ed. 00 CONTENT. PROBABILITY. STOCHASTIC PROCESSES Probablty ad Stochastc Processes The theory

More information

BASICS ON DISTRIBUTIONS

BASICS ON DISTRIBUTIONS BASICS ON DISTRIBUTIONS Hstograms Cosder a epermet whch dfferet outcomes are possble (e. Dce tossg). The probablty of all the outcomes ca be represeted a hstogram Dstrbutos Probabltes are descrbed wth

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the

More information

3. Basic Concepts: Consequences and Properties

3. Basic Concepts: Consequences and Properties : 3. Basc Cocepts: Cosequeces ad Propertes Markku Jutt Overvew More advaced cosequeces ad propertes of the basc cocepts troduced the prevous lecture are derved. Source The materal s maly based o Sectos.6.8

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

CHAPTER 3 POSTERIOR DISTRIBUTIONS

CHAPTER 3 POSTERIOR DISTRIBUTIONS CHAPTER 3 POSTERIOR DISTRIBUTIONS If scece caot measure the degree of probablt volved, so much the worse for scece. The practcal ma wll stck to hs apprecatve methods utl t does, or wll accept the results

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 30 Recap: regresso problems Iput to learg problem: trag

More information

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters

More information

MEASURES OF DISPERSION

MEASURES OF DISPERSION MEASURES OF DISPERSION Measure of Cetral Tedecy: Measures of Cetral Tedecy ad Dsperso ) Mathematcal Average: a) Arthmetc mea (A.M.) b) Geometrc mea (G.M.) c) Harmoc mea (H.M.) ) Averages of Posto: a) Meda

More information

A New Family of Transformations for Lifetime Data

A New Family of Transformations for Lifetime Data Proceedgs of the World Cogress o Egeerg 4 Vol I, WCE 4, July - 4, 4, Lodo, U.K. A New Famly of Trasformatos for Lfetme Data Lakhaa Watthaacheewakul Abstract A famly of trasformatos s the oe of several

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Lecture Notes 2. The ability to manipulate matrices is critical in economics. Lecture Notes. Revew of Matrces he ablt to mapulate matrces s crtcal ecoomcs.. Matr a rectagular arra of umbers, parameters, or varables placed rows ad colums. Matrces are assocated wth lear equatos. lemets

More information

Analysis of Variance with Weibull Data

Analysis of Variance with Weibull Data Aalyss of Varace wth Webull Data Lahaa Watthaacheewaul Abstract I statstcal data aalyss by aalyss of varace, the usual basc assumptos are that the model s addtve ad the errors are radomly, depedetly, ad

More information

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers. PROBLEMS A real umber s represeted appromately by 63, ad we are told that the relatve error s % What s? Note: There are two aswers Ht : Recall that % relatve error s What s the relatve error volved roudg

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s

More information

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line HUR Techcal Report 000--9 verso.05 / Frak Borg (borgbros@ett.f) A Study of the Reproducblty of Measuremets wth HUR Leg Eteso/Curl Research Le A mportat property of measuremets s that the results should

More information

arxiv: v1 [math.st] 24 Oct 2016

arxiv: v1 [math.st] 24 Oct 2016 arxv:60.07554v [math.st] 24 Oct 206 Some Relatoshps ad Propertes of the Hypergeometrc Dstrbuto Peter H. Pesku, Departmet of Mathematcs ad Statstcs York Uversty, Toroto, Otaro M3J P3, Caada E-mal: pesku@pascal.math.yorku.ca

More information

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Lecture Notes Forecasting the process of estimating or predicting unknown situations Lecture Notes. Ecoomc Forecastg. Forecastg the process of estmatg or predctg ukow stuatos Eample usuall ecoomsts predct future ecoomc varables Forecastg apples to a varet of data () tme seres data predctg

More information

Dr. Shalabh. Indian Institute of Technology Kanpur

Dr. Shalabh. Indian Institute of Technology Kanpur Aalyss of Varace ad Desg of Expermets-I MODULE -I LECTURE - SOME RESULTS ON LINEAR ALGEBRA, MATRIX THEORY AND DISTRIBUTIONS Dr. Shalabh Departmet t of Mathematcs t ad Statstcs t t Ida Isttute of Techology

More information

The Mathematical Appendix

The Mathematical Appendix The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.

More information

Simple Linear Regression

Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uversty Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

Regresso What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9 Determstc Models 1. Hypothesze

More information

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uverst Mchael Bar Sprg 5 Mdterm xam, secto Soluto Thursda, Februar 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes exam.. No calculators of a kd are allowed..

More information

A NEW LOG-NORMAL DISTRIBUTION

A NEW LOG-NORMAL DISTRIBUTION Joural of Statstcs: Advaces Theory ad Applcatos Volume 6, Number, 06, Pages 93-04 Avalable at http://scetfcadvaces.co. DOI: http://dx.do.org/0.864/jsata_700705 A NEW LOG-NORMAL DISTRIBUTION Departmet of

More information

LINEAR REGRESSION ANALYSIS

LINEAR REGRESSION ANALYSIS LINEAR REGRESSION ANALYSIS MODULE V Lecture - Correctg Model Iadequaces Through Trasformato ad Weghtg Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Aalytcal methods for

More information

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use. INTRODUCTORY NOTE ON LINEAR REGREION We have data of the form (x y ) (x y ) (x y ) These wll most ofte be preseted to us as two colum of a spreadsheet As the topc develops we wll see both upper case ad

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

Module 7: Probability and Statistics

Module 7: Probability and Statistics Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to

More information

Introduction to Probability

Introduction to Probability Itroducto to Probablty Nader H Bshouty Departmet of Computer Scece Techo 32000 Israel e-mal: bshouty@cstechoacl 1 Combatorcs 11 Smple Rules I Combatorcs The rule of sum says that the umber of ways to choose

More information

STA302/1001-Fall 2008 Midterm Test October 21, 2008

STA302/1001-Fall 2008 Midterm Test October 21, 2008 STA3/-Fall 8 Mdterm Test October, 8 Last Name: Frst Name: Studet Number: Erolled (Crcle oe) STA3 STA INSTRUCTIONS Tme allowed: hour 45 mutes Ads allowed: A o-programmable calculator A table of values from

More information

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 THE ROYAL STATISTICAL SOCIETY 06 EAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 The Socety s provdg these solutos to assst cadtes preparg for the examatos 07. The solutos are teded as learg ads ad should

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

ECON 5360 Class Notes GMM

ECON 5360 Class Notes GMM ECON 560 Class Notes GMM Geeralzed Method of Momets (GMM) I beg by outlg the classcal method of momets techque (Fsher, 95) ad the proceed to geeralzed method of momets (Hase, 98).. radtoal Method of Momets

More information

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model 1. Estmatg Model parameters Assumptos: ox ad y are related accordg to the smple lear regresso model (The lear regresso model s the model that says that x ad y are related a lear fasho, but the observed

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information