n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

Size: px

Start display at page:

Download "n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1"

Giles Pitts
5 years ago
Views:

1 MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9 Basic cocepts of statistics Statistics is cocered with fidig the distributio give a set of samples Estimators, estimates ad samplig distributios Cosider a fixed but uow parameter q R d A vector i R d that represets q is called the poit estimate A iterval estimate provides a iterval that quatifies the plausible locatio of compoets of q A estimator gη,, η ) is a rule or procedure that specifies how to costruct estimates for q based o radom samples of radom variable η Hece the estimator is a radom variable with a associated distributio which quatifies attributes of estimatio process A estimate is a realizatio of the estimator, so, it is a fuctio of realized values x,, x ie, a sample) Ofte we eed to estimate the expected value of a radom variable The stadard estimator for the mea is the ruig average ) ηη,, η ) η i 2) 3) 4) Ofte we also eed to estimate the variace Below are three estimators each of which is optimal i some sese: Sη 2,, η ) η i i η, i S2η 2,, η ) η i η, S 2 3η,, η ) η i η The estimator S 2 is called the ruig variace It has the advatage that it is ot ecessary to store samples i memory i order to calculate it This is ot so for S2 2

2 2 ad S3 2 The estimator S2 2 has the maximal lielihood see below) The estimator S3 2 is ubiased see below) A estimator for q is said to be ubiased if E[gη,, η )] q Example Suppose E[η] m Show that the estimator ) for the mea is ubiased: E[ η] E[η i ] m m Example 2 Suppose Varη) σ 2 Cosider the ruig variace as a estimator for the variace S 2 η i i η i This is a biased estimator as E[S] 2 E η η i E η m + m η i [ E[η m ] 2E η m) σ 2 2 σ2 + ) σ2 ) σ 2 + σ 2 Note that lim σ2 Exercise Show that + S 2 3η,, η ) ] η i m) + E σ 2 ) σ 2 η i m) ) ) lim σ2 + log ) σ 2 η i is a ubiased estimator of the variace η

3 3 Defiitio A statistic is a measurable fuctio of oe or more radom variables that does ot deped o uow parameters For example, the estimators i the examples ad the exercise above are statistics 2 Ordiary least squares estimate Cosider the statistical model η i ft i, q 0 ) + ɛ i, where η i are radom variables whose realizatios x i are a set of measuremets from a experimet, ad ft i, q 0 ) is the parameter depedet model respose at correspodig times The radom variables ɛ i accout for errors betwee the model ad measuremets Fially, q 0 deotes the true but uow parameter value that we caot measure directly, but istead must ifer it from from realizatios of the radom variables η i Suppose the radom variables ɛ i are iid ad ubiased, ie, E[ɛ i ] 0, ad have fiite but uow variace σ 2 Let Q be the set of admissible values of the parameter q The ordiary least squares estimator ad estimate are ) 6) ˆq OLS arg mi q Q q OLS arg mi q Q 3 Maximum lielihood estimator [η i ft i, q)] 2, [y i ft i, q)] 2 Defiitio 2 Let f η x; q) be a parameter-depedet joit pdf associated with the radom vector η [η,, η ] where q Q is the uow parameter vector, ad x [x,, x ] be a realizatio of η The lielihood fuctio L : Q [0, ) is defied as L x q) f η x; q), where the observed sample x [x,, x ] is fixed ad q varies over all admissible parameter values The lielihood fuctio is proportioal to the probability to observe the give sample if the parameter vector is equal to q Thus, to fid the maximal lielihood estimate we eed to maximize the lielihood fuctio For iid radom variables, the lielihood fuctio becomes L x q) f η x i ; q) Ofte it is computatioally advatageous to deal with log L x q) rather tha with L x q), because products are replaced with sums, ad powers are replaced with multiplicatios The log-lielihood fuctio is deoted by l x q) ad defied by l x q) log L x q)

4 4 Note that for iid radom variables l x q) log f η x i ; q) Due to mootoicity of the logarithm, maximizig the lielihood fuctio is equivalet to maximizig the log-lielihood fuctio Example 3 Suppose we believe that η is a Gaussia radom variable, ad we wish to estimate its mea ad variace usig a sample of idepedet trials x [x,, x ] The the parameter vector is q [m, σ 2 ] The lielihood fuctio is give by L x m, σ 2 ) x 2πσ 2 e i m)2 2σ 2, while the log-lielihood fuctio is l x m, σ 2 ) [ 2 log2π) 2 log σ2 x i m ] 2σ 2 2 log2π) 2 log σ2 2σ 2 x i m To fid the maximum lielihood estimate for m we compute lxm,σ2 ) m ad set it to zero: ) l x m, σ 2 ) m σ 2 x i m) m σ 2 x i 0 Hece, the maximal lielihood estimate for the mea is x x i To fid the maximum lielihood estimate for σ 2 we compute lxm,σ2 ) ad σ 2 set it to zero: ) l x m, σ 2 ) σ 2 2σ 2 + 2σ 4 x i m 2σ 4 σ 2 x i m 0 Hece, the maximal lielihood estimate for the variace is x i m Note that the maximal lielihood estimate for the variace is ot ubiased

5 4 Bayesia estimatio Bayes s theorems allow us to mae correctios to our a priori estimatios based o ew iformatio They might seem trivial because their proofs are very simple, however, they are very importat Their power is demostrated by the followig problem from D Kahema s boo Thiig Fast ad Slow [3] I this boo, D Kahema shows that humas are poor ituitive statisticias A cab was ivolved i a hit-ad-ru accidet at ight Two cab compaies, the Gree ad the Blue, operate i the city You are give the followig data: 8% of the cabs i the city are Gree ad % are Blue A witess idetified the cab as Blue The court tested the reliability of the witess uder the circumstaces that existed o the ight of the accidet ad cocluded that the witess correctly idetified each oe of the two colors 80% of the time ad failed 20% of the time What is the probability that the cab ivolved i the accidet was Blue rather tha Gree? I order to solve this problem correctly we eed to use Bayes s theorem First cosider two evets A ad B with P A) 0 ad P B) 0 The recall that Hece P A B) P A B) P B) 7) P A B) ad P B A) P B A)P A) P B) P A B) P A) This is the first form of Bayes s theorem Suppose that the the set of outcomes Ω is partitioed ito a fiite or coutable umber of disjoit subsets Z i : Ω Z i, Z i Z j i Suppose for all i P Z i ) 0 The we ca write P A) as 8) P A) P A Z i ) i i Usig Eq 7) for Z j ad A we get P A Z i ) P Z i ) P Z i ) i P Z j A) P A Z j)p Z j ) P A) Pluggig i the expressio for P A) from Eq 8) we obtai 9) P Z j A) P A Z j)p Z j ) i P A Z i)p Z i ) P A Z i )P Z i ) Eq 9) is the secod form of Bayes s theorem Now we will solve the problem above usig Eq 9) The set of outcomes Ω is the set of all taxis i the city It is partitioed ito two subsets B Blue) ad G Gree) We

6 6 are give: P B) 0, P G) 08 Deote the evet that the witess saw a Blue cab by W B We are also give: P W B B) 08, P W B G) 02 What we eed to fid is P B W B), ie, the probability the that cab was ideed Blue give that the witess idetified it as Blue Substitutig j, i, 2, W B A, B Z, G Z 2 ito Eq 9) we calculate P B W B) P W B B)P B) P W B B)P B) + P W B G)P G) Thus, the probability the cab ivolved ito the accidet was Blue rather tha Gree is about 4% 4 Bayesia iferece Now we will illustrate how oe ca use Bayes s theorem for parameter estimatio Bayesia iferece is based o the suppositio that probabilities, ad, more geerally, our state of owledge regardig a observed pheomeo, ca be updated as additioal iformatio is obtaied I the cotext of parametric models, parameters are treated as radom variables havig associated desities I the case where oe does ot have ay specific a priori owledge about the distributio of the parameter value q, it is best to use the o-iformative prior A commo choice of o-iformative prior is the uormalized uiform desity f 0 q) χ Q q, where Q is the set of all admissible values of Q I the cotext of parameter estimatio, Bayes s theorem ca be reformulated as follows Theorem Let the radom variable q R p the parameter to be estimated) has a ow prior desity f 0 q) which may be o-iformative, ad let x be a sample of the radom variable η The posterior desity of q, give the sample x, is 0) fq x) fx q)f 0q) fx) fx q)f 0 q) R p fx q)f 0 q)dq Note that i follows from Eq 0) that it is uecessary to ormalize f 0 q) Example 4 Let η be a Beroulli radom variable, ie, {, P η ) q, η 0, P η 0) q, where p is the parameter to be estimated Suppose we have o a priori owledge about what is the value of q The we should use the oiformative prior desity for q, ie, {, q [0, ] f 0 q) 0, q / [0, ]

7 7 Suppose we have a sample of N trials of η x [x,, x ], out of which N 0 are zeros ad N are oes N N 0 + N ) The the posterior desity of q is ) fq x) fx q)f 0q) 0 fx q)f 0q)dq, where the fuctio fx p) is the probability to observe the sample x give the value of p Hece N fx q) q x i q) x i q x i! q) N x i q N q) N 0 Pluggig fx q) ito Eq ) we calculate the posterior desity see Fig ) fq x) qn q) N 0 N + )! 0 qn q) N 0dq N 0!N! qn q) N 0 Note that the itegral i the deomiator is a istace of the Beta fuctio that is aalytically itegrable: If x, y N the Bx, y) Bx, y) 0 t x t) y dt x )!y )! x + y )! Example Now suppose that the Beroulli radom variable i Example 4 represets tossig a coi We ow that it is a fair coi q /2) with probability 08 ad a biased coi with q 3/ with probability 02 Suppose the observed sample is x [,, 0,, 0, 0,, ] N 0 3, N ) I this case, the prior distributio of q is 08, q /2, f 0 q) 02, q 3/, 0, otherwise The posterior distributio is give by f ) ) x 2 ) ) ) f 3 ) 2 ) 3 3 x) 3 ) 2 ) 3 + 2) 2 ) 3 )

8 8 3 3 N 0 8, N 2 N 0 6, N 4 N 0 4, N 6 N 0 2, N 8 2 fq [N 0,N ]) q Figure The posterior distributios of the parameter q i Example 4 for differet values of N 0 ad N 42 Maximum a posteriori estimate vs maximum lielihood estimate As we obtai the posterior desity distributio) for q we ca fid the maximum a posteriori estimate MAP) defied by 2) q MAP arg max fq x) arg max fx q)f 0q) q Q q Q The last idetity is due to the fact that the ormalizatio costat the itegral i the deomiator of Eq 0)) does ot affect the maximizig argumet Previously we have defied the maximum lielihood estimate MLE) as the oe that maximizes the lielihood fuctio, which is exactly fx q), ie, 3) q MLE arg max q Q fx q) Therefore, the maximum lielihood estimate is the value of q that maximizes the probability to observe the sample x give q, while the maximum a posteriori estimate maximizes this probability times the a priori pdf for q the base rate) I the case of o-iformative prior, the MAP ad the MLE estimates coicide Example 6 Assume that we ow that the average waitig times q 0 have the expoetial pdf f 0 q) e q Note that the expected value of q is The waitig times are distributed accordig to the pdf ft q) q e t/q

9 9 Suppose our waitig time turs out to be τ Let us fid the MAP ad MLE estimates for q The lielihood fuctio is Lτ; q) fτ q) q e τ/q To fid the MLE estimate for q we, for the ease of calculatios, switch to log Lτ; q): lτ; q) log q τ q lτ; q) q q + τ q 2 0 Hece q MLE τ Usig Eq 0) we get the posterior desity for q: fq τ) fτ q)f 0 q) 0 fτ q)f 0 q)dq q e τ/qe q 0 q e τ/q e q dq Z q e τ/q e q Agai, for the ease of calculatio, we fid the maximizer of log fq τ) istead: log fq τ) log Z log q τ q q log fq τ) q q + τ q 2 0 Choosig the positive root we obtai q MAP ) 4τ + < τ qmle 2 Refereces [] A Chori ad O Hald, Stochastic Tools i Mathematics ad Sciece, 3rd editio, Spriger 203 [2] Ralph C Smith, Ucertaity Quatificatio, Theory, Implemetatio, ad Applicatios, SIAM 204 [3] Daiel Kahema, Thiig Fast ad Slow, Farrar, Straus ad Giroux, New Yor, 20

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum