LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of the statistical model. Oe ca of course woder what happes if the model is wrog? Or where do models really come from? I some rare cases, we actually kow eough about our data to hypothesise a reasoable model. Most ofte however, whe we specify a model, we do so hopig that it ca provide a useful approximatio to the data geeratio mechaism. The George Box quote is worth rememberig i this cotext: all models are wrog, but some are useful.. The Method of Momets Suppose that θ = (θ,..., θ k ) so that there are k ukow parameters. We ca estimate θ by matchig k momets. Let m = X i, m = Xi,..., m k = Xi k. Let µ i = x i p θ (x)dx deote the i th populatio momet. This depeds o θ so we write it as µ i (θ). The method-of-momets prescribes estimatig the parameters: θ,..., θ k by solvig the system of equatios: m = µ (θ,..., θ k ). m k = µ k (θ,..., θ k ). Example : If X,..., X N(θ, σ ), we would solve: X i = θ, ad Xi = θ + σ,

to obtai the estimators: θ = σ = = X i = X Xi ( ) X i ( ) Xi X s. Example : Suppose X,..., X Bi(k, p) where k ad p are both ukow. Now µ = kp µ = kp( p) + k p. Solvig, we ge which gives X = kp, Xi = kp( p) + k p p = X (X i X), X X k = X (X i X).. Maximum Likelihood Estimatio The most popular techique to derive estimators is via the priciple of maximum likelihood. Suppose that X,..., X p θ where p θ deotes either the pmf or pdf. The likelihood fuctio is defied by: L(θ) L(θ; X,..., X ) = p θ (X i ). The log-likelihood fuctio is l(θ) l(θ; X,..., X ) = log L(θ).

The maximum likelihood estimator, or mle deoted by θ or θ is the value of the θ that maximizes L(θ). Note that θ also maximizes l(θ). We write θ = argmax L(θ) = argmax l(θ). θ Keep i mid that θ is a fuctio of the data. Sometimes we will write θ as θ(x,..., θ ) to emphasize this poit. Later, we shall see that the mle has may optimality properties i certai settigs. Fidig the mle might ot be easy. Sometimes we eed to resort to umerical techiques. The typical way to compute the MLE (suppose that we have k ukow parameters) is to either aalytically or umerically solve the system of equatios: θ i l(θ) = 0 i =,..., k. Note: We ca throw away ay costats ot depedig o θ i the likelihood fuctio whe we fid the mle. This does ot affect the locatio of the maximizer. Example : Suppose X,..., X N(θ, ), the the likelihood fuctio is give as: L(θ) = π exp( (X i θ) /) e (θ X) / ad l(θ) = (θ X ). We get θ = X. Sice l ( θ) < 0 this is ideed a maximum. Example : Suppose that X,..., X Ber(p), the the log-likelihood is give by l(p) which is maximized at p = X. X i log p + ( X i ) log( p) = X log p + ( X) log( p), Ivariace of the MLE. The mle is ivariat to trasfomatios. This meas that the mle of r(θ) is ru( θ) for ay fuctio r. We will ot prove this but it is a very useful fact. We will discuss other properties of the MLE i future lectures. 3

Bayes Estimators The third geeral method to derive estimators is the Bayes estimator. We treat θ as a radom variable ad assig it a distributio p(θ) called the prior distributio. **This opes up a buch of philosophical questios that we will deal with later i the course.** Now we ca use Bayes theorem to get the distributio of θ give X,..., X, which is called the posterior distributio: p(θ x,..., x ) = p(θ, x,..., x ) p(x,..., x ) = L(θ)p(θ) L(θ)p(θ). L(θ)p(θ)dθ = p(x,..., x θ)p(θ) p(x,..., x θ)p(θ)dθ Fially, we ca use the mea of p(θ x,..., x ) as a estimator: θ = θp(θ X,..., X )dθ. We call this, the Bayes estimator. We could also use the media or mode of the posterior. Example: Suppose X,..., X Ber(θ). We will first eed to defie the Beta distributio: θ has a Beta distributio with parameters α ad β if its desity o [0, ] is p(θ) = Γ(α + β) Γ(α)Γ(β) θα ( θ) β θ α ( θ) β. We write θ Beta(α, β). The mea of the Beta distributio is: α/(α + β). Let S = i X i. The posterior distributio is p(θ X,..., X ) L(θ)p(θ) θ S ( θ) S θ α ( θ) β = θ S+α ( θ) S+beta. Thus, the posterior distributio is Beta(S + α, S + β). We write The mea is (S + α)/( + α + β). Thus, θ X,..., X Beta(S + α, S + β). θ = S + α + α + β. A commo choice is α = β = (so that the prior for θ is uiform). I that case: θ = X + + = + X + + 4 = wx + ( w),

which ca be viewed as a covex combiatio of the MLE ad the prior mea /. Note that, whe is large, θ X, which is the mle. Example : Suppose that X,..., X draw from N(θ, σ ). Assume that σ is kow. Let s use the prior θ N(µ, τ ). It ca be show that the posterior is N(a, b ) where Exercise: Prove this. a = τ σ + τ X σ + σ + τ µ, b = σ τ σ + τ. The Bayes estimator is thus ( µ = τ σ + τ X σ + σ + τ µ = w where w = τ σ +τ. Whe is large, w ad µ X. ) X i + ( w)µ 5