Point Estimation: definition of estimators

Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters. Examples: Assume that X,..., X are draw..d. from some dstrbuto wth ukow mea µ ad ukow varace σ 2. Potetal pot estmators for µ clude: sample mea X = X ; sample meda med(x,..., X ). Potetal pot estmators for σ 2 clude: the sample varace (X X ) 2. Ay pot estmator s a radom varable, whose dstrbuto s that duced by the dstrbuto of X,..., X. Example: X,..., X..d. N(µ, σ 2 ). The sample mea X N(µ, σ 2 ) where µ = µ, ad σ 2 = σ 2 /. For a partcular realzato of the radom varables x,..., x, the correspodg pot estmator evaluated at x,..., x,.e., W (x,..., x ), s called the pot estmate. I these lecture otes, we wll cosder three types of estmators:. Method of momets 2. Maxmum lkelhood 3. Bayesa estmato Method of momets: very tutve dea Assume: X,..., X..d. f(x θ,..., θ K ) Here the ukow parameters are θ,..., θ K (K N). Idea s to fd values of the parameters such that the populato momets are as close as possble to ther sample aalogs. Ths volves fdg values of the parameters to solve the followg K-system of equatos:

m X = EX = xf(x θ,..., θ K ) m 2 X 2 = EX 2 = x 2 f(x θ,..., θ K ).. m K X K = EX K = x K f(x θ,..., θ K ). Example: X,..., X..d. N(θ, σ 2 ). Parameters are θ, σ 2. Momet equatos are: X = EXθ X 2 = EX 2 = V X + (EX) 2 = σ 2 + θ 2. Hece, the MOM estmators are θ MOM = X ad σ 2 MOM = Example: X,..., X..d. U[0, θ]. Parameter s θ. MOM: X = θ 2 = θmom = 2 X. Remarks: X2 ( X ) 2. Apart from these specal cases above, for geeral desty fuctos f( θ), the geeral case MOM estmator s ofte dffcult to calculate, because the populato momets volve dffcult tegrals. (I Pearso s orgal paper, the desty was a mxture of two ormal desty fuctos: f(x θ) = λ 2πσ exp ( (x µ ) ) 2 + ( λ) 2σ 2 wth ukow parameters λ, µ, µ 2, σ, σ 2.) 2πσ2 exp ( (x µ ) 2) 2 The model assumpto that X,..., X..d. f( θ) mples a umber of momet equatos equal to the umber of momets, whch ca be >> K. Ths leaves room for evaluatg the model specfcato. 2 2σ 2 2

For example, the uform dstrbuto example above, aother momet codto whch should be satsfed s that X 2 = EX 2 = V X + (EX) 2 = θ2 3 + θ 2. () At the MOM estmator θ MOM, oe ca see whether X 2 = θmom 2 3 + θmom. 2 (Later, you wll lear how ths ca be tested more formally.) If ths does ot hold, the that mght be cause for you to coclude that the orgal specfcato that X,..., X..d. U[0, θ] s adequate. Eq. () s a example s a overdetfyg restrcto. Whle the MOM estmator focuses o usg the sample ucetered momets to costruct estmators, there are other sample quattes whch could be useful, such as the sample meda (or other sample percetles), as well as sample mmum or maxmum. (Ideed, for the uform case above, the sample maxmum would be a very reasoable estmator for θ.) Maxmum Lkelhood Estmato Let X,..., X..d. wth desty f( θ,..., θ K ). Defe: the lkelhood fucto, for a cotuous radom varable, s the jot desty of the sample observatos: L( θ x,..., x ) = f(x θ). Vew L( θ x) as a fucto of the parameters θ, for the data observatos x. From classcal pot of vew, the lkelhood fucto L( θ x) s a radom varable due to the radomess the data x. (I the Bayesa pot of vew, whch we talk about later, the lkelhood fucto s also radom because the parameters θ are also treated as radom varables.) The maxmum lkelhood estmator (MLE) are the parameter values θ ML whch maxmze the lkelhood fucto: = θ ML = argmax θ L( θ x). 3

Usually, practce, to avod umercal overflow problems, maxmze the log of the lkelhood fucto: θ ML = argmax θ log L( θ x) = log f(x θ). Aalogously, for dscrete radom varables, the lkelhood fucto s the jot probablty mass fucto: L( θ x) = P (X = x θ). Example: X,..., X..d. N(θ, ). = log L(θ x) = log(/ 2π) 2 = (x θ) 2 max θ log L(θ x) = m θ 2 (x θ) 2 FOC: log L θ = (x θ) = 0 θ ML = Also should check secod order codto: x (sample mea) 2 log L θ 2 = < 0 : so satsfed. Example: X,..., X..d. Beroull wth prob. p. Ukow parameter s p. L(p x) = = px ( p) x FOC: log L p log L(p x) = [x log p + ( x ) log( p)] = = y log p + ( y) log( p) : y s umber of s = y p y p = pml = y For y = 0 or y =, the p ML s (respectvely) 0 ad : corer solutos. SOC: log L p 2 p=p ML = y p 2 y ( p) 2 < 0 for 0 < y <. Whe parameter s multdmesoal: check that the Hessa matrx 2 log L θ θ egatve defte. s 4

You ca thk of ML as a MOM estmator: for X,..., X..d., ad K-dmesoal parameter vector θ, the MLE solves the FOCs: log f(x θ) = 0 θ log f(x θ) θ 2 = 0.. log f(x θ) = 0. θ K Uder LLN: log f(x θ) p θ k log f(x θ) Eθ0 θ k, for k =,..., K, where the otato E θ0 deote the expectato over the dstrbuto of X at the true parameter vector θ 0. Hece, MLE s equvalet to MOM wth the momet codtos Bayes estmators E θ0 log f(x θ) θ k = 0, k =,..., K. Fudametally dfferet vew of the world. Model the ukow parameters θ as radom varables, ad assume that researcher s belefs about θ are summarzed a pror dstrbuto f(θ). I ths sese, Bayesa approach s subjectve, because researcher s belefs about θ are accommodated feretal approach. X,..., X..d. f(x θ): the Bayesa vews the desty of each data observato as a codtoal desty, whch s codtoal o a realzato of the radom varable θ. Gve data X,..., X, we ca update our belefs about the parameter θ by comput- 5

g the posteror desty (usg Bayes Rule): f( x θ) f(θ) f(θ x) = f( x) f( x θ) f(θ) =. f( x θ)f(θ)dθ A Bayesa pot estmate of θ s some feature of ths posteror desty. Commo pot estmators are: Posteror mea: E [θ x] = θf(θ x)dθ. Posteror meda: F θ x (0.5), where F θ x s CDF correspodg to the posteror desty:.e., F θ x ( θ) = θ f(θ x)dθ. Posteror mode: max θ f(θ x). Ths s the pot at whch the desty s hghest. Note that f( x θ) s just the lkelhood fucto, so that the posteror desty f(θ x) ca be wrtte as: f(θ x) = L(θ x) f(θ). L(θ x)f(θ)dθ But there s a dfferece terpretato: Bayesa world, the lkelhood fucto s radom due to both x ad θ, whereas classcal world, oly x s radom. Example: X,..., X..d. N(θ, ), wth pror desty f(θ). Posteror desty f(θ x) = exp( 2 P (x θ) 2 f(θ)) R exp( P 2 (x θ) 2 )f(θ)dθ. Itegral deomator ca be dffcult to calculate: computatoal dffcultes ca hamper computato of posteror destes. Specal case: f we assume that f(θ) =, for < θ < (ths s what s called a mproper pror ), the f(θ x) L(θ x) (because deomator s just a costat, ad ot a fucto of θ). For ths case, posteror mode = argmax θ L(θ x) = θ ML. 6

Example: Bayesa updatg for ormal dstrbuto, wth ormal prors X N(θ, σ 2 ), assume σ 2 s kow. Pror: θ N(µ, τ 2 ), assume τ s kow. The posteror dstrbuto where θ X N(E(θ X), V (θ X)), E(θ X) = τ 2 τ 2 + σ X + σ2 2 σ 2 + τ µ 2 V (θ X) = σ2 τ 2 σ 2 + τ 2. Ths s a example of a cojugate pror ad cojugate dstrbuto, where the posteror dstrbuto comes from the same famly as the pror dstrbuto. (The classc referece s Morrs Degroot, Optmal Statstcal Decsos.) Posteror mea s weghted average of X ad pror mea µ. I ths case, as τ (so that pror formato gets worse ad worse): the E(θ X) X (a.s.). These are just the MLE (for just oe data observato). Whe you observe a..d. sample X (X,..., X ), wth sample mea X : E(θ X ) = τ 2 τ 2 + σ X σ 2 2 + σ 2 + τ µ 2 V (θ X) = σ2 τ 2 σ 2 + τ 2. I ths case, as the umber of observatos, the posteror mea E(θ X ) X. So as, the posteror mea coverges to the MLE: whe your sample becomes arbtrarly large, you place o weght o your pror formato. 7