Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters. Examples: Assume that X,..., X are draw..d. from some dstrbuto wth ukow mea µ ad ukow varace σ 2. Potetal pot estmators for µ clude: sample mea X = X ; sample meda med(x,..., X ). Potetal pot estmators for σ 2 clude: the sample varace (X X ) 2. Ay pot estmator s a radom varable, whose dstrbuto s that duced by the dstrbuto of X,..., X. Example: X,..., X..d. N(µ, σ 2 ). The sample mea X N(µ, σ 2 ) where µ = µ, ad σ 2 = σ 2 /. For a partcular realzato of the radom varables x,..., x, the correspodg pot estmator evaluated at x,..., x,.e., W (x,..., x ), s called the pot estmate. I these lecture otes, we wll cosder three types of estmators:. Method of momets 2. Maxmum lkelhood 3. Bayesa estmato Method of momets: Assume: X,..., X..d. f(x θ,..., θ K ) Here the ukow parameters are θ,..., θ K (K N). Idea s to fd values of the parameters such that the populato momets are as close as possble to ther sample aalogs. Ths volves fdg values of the parameters to solve the followg K-system of equatos:
m X = EX = xf(x θ,..., θ K ) m 2 X 2 = EX 2 = x 2 f(x θ,..., θ K ).. m K X K = EX K = x K f(x θ,..., θ K ). Example: X,..., X..d. N(θ, σ 2 ). Parameters are θ, σ 2. Momet equatos are: X = EX = θ X 2 = EX 2 = V X + (EX) 2 = σ 2 + θ 2. Hece, the MOM estmators are θ MOM = X ad σ 2 MOM = Example: X,..., X..d. U[0, θ]. Parameter s θ. MOM: X = θ 2 = θmom = 2 X. Remarks: X2 ( X ) 2. Apart from these specal cases above, for geeral desty fuctos f( θ), the MOM estmator s ofte dffcult to calculate, because the populato momets volve dffcult tegrals. I Pearso s orgal paper, the desty was a mxture of two ormal desty fuctos: f(x θ) = λ 2πσ exp ( (x µ ) ) 2 + ( λ) 2σ 2 wth ukow parameters λ, µ, µ 2, σ, σ 2. 2πσ2 exp ( (x µ ) 2) 2 The model assumpto that X,..., X..d. f( θ) mples a umber of momet equatos equal to the umber of momets, whch ca be >> K. Ths leaves room for evaluatg the model specfcato. 2 2σ 2 2
For example, the uform dstrbuto example above, aother momet codto whch should be satsfed s that X 2 At the MOM estmator θ MOM, oe ca see whether = EX 2 = V X + (EX) 2 = θ2 3 + θ 2. () X 2 = θmom 2 3 + θmom 2. (Later, you wll lear how ths ca be tested more formally.) If ths does ot hold, the that mght be cause for you to coclude that the orgal specfcato that X,..., X..d. U[0, θ] s adequate. Eq. () s a example s a overdetfyg restrcto. Whle the MOM estmator focuses o usg the sample ucetered momets to costruct estmators, there are other sample quattes whch could be useful, such as the sample meda (or other sample percetles), as well as sample mmum or maxmum. (Ideed, for the uform case above, the sample maxmum would be a very reasoable estmator for θ.) All these estmators are lumped uder the rubrc of geeralzed method of momets (GMM). Maxmum Lkelhood Estmato Let X,..., X..d. wth desty f( θ,..., θ K ). Defe: the lkelhood fucto, for a cotuous radom varable, s the jot desty of the sample observatos: L( θ x,..., x ) = f(x θ). = Vew L( θ x) as a fucto of the parameters θ, for the data observatos x. From classcal pot of vew, the lkelhood fucto L( θ x) s a radom varable due to the radomess the data x. (I the Bayesa pot of vew, whch we talk about later, the lkelhood fucto s also radom because the parameters θ are also treated as radom varables.) 3
The maxmum lkelhood estmator (MLE) are the parameter values θ ML whch maxmze the lkelhood fucto: θ ML = argmax θ L( θ x). Usually, practce, to avod umercal overflow problems, maxmze the log of the lkelhood fucto: θ ML = argmax θ log L( θ x) = log f(x θ). Aalogously, for dscrete radom varables, the lkelhood fucto s the jot probablty mass fucto: L( θ x) = P (X = x θ). Example: X,..., X..d. N(θ, ). = log L(θ x) = log(/ 2π) 2 = (x θ) 2 max θ log L(θ x) = m θ 2 (x θ) 2 FOC: log L θ = (x θ) = 0 θ ML = Also should check secod order codto: 2 log L θ 2 x (sample mea) = < 0 : so satsfed. Example: X,..., X..d. Beroull wth prob. p. Ukow parameter s p. L(p x) = = px ( p) x FOC: log L p log L(p x) = [x log p + ( x ) log( p)] = = y log p + ( y) log( p) : y s umber of s = y p y p = pml = y For y = 0 or y =, the p ML s (respectvely) 0 ad : corer solutos. 4
SOC: log L = p p=p y y < 0 for 0 < y <. 2 p 2 ( p) 2 ML Whe parameter s multdmesoal: check that the Hessa matrx 2 log L θ θ egatve defte. s Example: X,..., X U[0, θ]. Lkelhood fucto { ( L( X θ) ) f max(x = θ,..., X ) θ 0 f max(x,..., X ) > θ whch s maxmzed at θ MLE = max(x,..., X ). You ca thk of ML as a geeralzed MOM estmator: for X,..., X..d., ad K-dmesoal parameter vector θ, the MLE solves the FOCs: log f(x θ) log f(x θ) = 0 θ log f(x θ) θ 2 = 0.. log f(x θ) = 0. θ K p Eθ0 log f(x θ) Uder LLN: θ k θ k, for k =,..., K, where the otato E θ0 deote the expectato over the dstrbuto of X at the true parameter vector θ 0. Hece, MLE s equvalet to GMM wth the momet codtos Bayes estmators E θ0 log f(x θ) θ k = 0, k =,..., K. 5
Phlosophcally dfferet vew of the world. Model the ukow parameters θ as radom varables, ad assume that researcher s belefs about θ are summarzed a pror dstrbuto f(θ). I ths sese, Bayesa approach s subjectve, because researcher s belefs about θ are accommodated feretal approach. X,..., X..d. f(x θ): the Bayesa vews the desty of each data observato as a codtoal desty, whch s codtoal o a realzato of the radom varable θ. Gve data X,..., X, we ca update our belefs about the parameter θ by computg the posteror desty (usg Bayes Rule): f( x θ) f(θ) f(θ x) = f( x) f( x θ) f(θ) =. f( x θ)f(θ)dθ A Bayesa pot estmate of θ s some feature of ths posteror desty. Commo pot estmators are: Posteror mea: E [θ x] = θf(θ x)dθ. Posteror meda: F θ x (0.5), where F θ x s CDF correspodg to the posteror desty:.e., F θ x ( θ) = θ f(θ x)dθ. Posteror mode: max θ f(θ x). Ths s the pot at whch the desty s hghest. Note that f( x θ) s just the lkelhood fucto, so that the posteror desty f(θ x) ca be wrtte as: f(θ x) = L(θ x) f(θ). L(θ x)f(θ)dθ But there s a dfferece terpretato: Bayesa world, the lkelhood fucto s radom due to both x ad θ, whereas classcal world, oly x s radom. Example: X,..., X..d. N(θ, ), wth pror desty f(θ). Posteror desty f(θ x) = exp( 2 (x θ) 2 f(θ)) exp( 2 (x θ) 2 )f(θ)dθ. 6
Itegral deomator ca be dffcult to calculate: computatoal dffcultes ca hamper computato of posteror destes. However, ote that the deomator s ot a fucto of θ. Thus f(θ x) L(θ x). Hece, f we assume that f(θ) s costat (e. uform), for all possble values of θ, the the posteror mode argmax θ f(θ x) = argmax θ L(θ x) = θ ML. Example: Bayesa updatg for ormal dstrbuto, wth ormal prors X N(θ, σ 2 ), assume σ 2 s kow. Pror: θ N(µ, τ 2 ), assume τ s kow. The posteror dstrbuto where θ X N(E(θ X), V (θ X)), E(θ X) = τ 2 τ 2 + σ X + σ2 2 σ 2 + τ µ 2 V (θ X) = σ2 τ 2 σ 2 + τ 2. Ths s a example of a cojugate pror ad cojugate dstrbuto, where the posteror dstrbuto comes from the same famly as the pror dstrbuto. Posteror mea E(θ X) s weghted average of X ad pror mea µ. I ths case, as τ (so that pror formato gets worse ad worse): the E(θ X) X (a.s.). These are just the MLE (for just oe data observato). Whe you observe a..d. sample X (X,..., X ), wth sample mea X : E(θ X ) = τ 2 τ 2 + σ X σ 2 2 + σ 2 + τ µ 2 V (θ X ) = σ2 τ 2 σ 2 + τ 2. I ths case, as the umber of observatos, the posteror mea E(θ X ) X. So as, the posteror mea coverges to the MLE: whe your sample becomes arbtrarly large, you place o weght o your pror formato. 7
Exchageablty ad depedece: A terestg feature of the Bayesa approach here, s that, for ay, the posteror ferece does ot deped o the order whch the observatos X, X 2,..., X are observed; ay permutato of these varables would yeld the same posteror mea. Ths exchageablty of the posteror mea would appear to be a reasoable requremet to make of ay ferece procedure, the case whe the data are draw..d. fasho. De Fett s Theorem formalzes the coecto betwee exchageablty ad Bayesa ferece wth..d. varates. Defe a exchageble sequece of radom varables to be a sequece X, X 2,..., X such that the jot dstrbuto fucto F (X,..., X ) s the same for ay permutato of the radom varables. (Obvously, f X,..., X are..d., the they are exchageable; but ot true vce versa.) De Fett s Theorem, the smplest form, says that a fte sequece of 0- radom varables X, X 2,..., X,... whch are exchageable, has a jot probablty dstrbuto equal to the jot margal dstrbuto of codtoally..d. Beroull radom varables: that s, for all f(x,..., X ) = }{{} exchageable 0 p Xt ( p) Xt t dh(p) }{{}..d. Beroull(p) (Result has bee exteded to cotuous radom elemets.) Data augmetato The mportat phlosophcal dstcto of the Bayesa approach s that data ad model parameters are treated o a equal footg. Hece, just as we make posteror ferece o model parameters, we ca also make posteror ferece o uobserved varables latet varable models, whch are models where ot all the model varables are observed. Cosder a smple example (the bary probt model): z = βx + ϵ, ϵ N(0, ) { 0 f z < 0 y = f z 0. (2) The researcher observes (x, y), but ot (z, β). He wshes to form the posteror of z, β x, y. 8
We do all ferece codtoal o x. Therefore the relevat pror s (z, β x) = (z β, x) β x = N(βx, ) N( β, a 2 ) }{{} f(β) = ϕ(z βx) a ϕ ( β β a ). (3) I the above, we assume the margal pror o β s ormal (ad does t deped o x). The codtoal pror desty of z β, x s derved from the model specfcato (2). ϕ deotes desty for stadard ormal. The posteror s: (z, β y, x) L(y z, β, y, x) (z, β x) = ((z 0) y + (z < 0) ( y)) f 0 (z, β x). (4) Note how dog ths smplfes the lkelhood. Ths s ot the lkelhood you would use whe you do MLE (because you do ote observe z!) but ths s what you ca f you treat the latet z as a parameter. Accordgly, ths ca be margalzed over β to obta the posteror of z y, x. Usg Bayesa procedure to do posteror ferece o latet data varables s sometmes called data augmetato. I o-bayesa cotext, obtag values for mssg data values s usually doe by some sort of mputato procedure. Thus, data augmetato ca be vewed as a sort of Bayesa mputato procedure. Oe attractve feature of the Bayesa approach s that t follows easly ad aturally from the usual Bayesa logc. 9