Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom variables, we shall use the term desity fuctio to refer to both cotiuous ad discrete radom variables. Thus, to each θ Θ, there exists ad desity fuctio which we deote Example (Parametric families of desities). f(x θ).. Biomial radom variables with kow umber of trials but ukow success probability parameter θ has desity ( ) f(x θ) = θ x ( θ) x. x 2. Normal radom variables with kow variace σ 0 but ukow mea µ has desity ) (x µ)2 f(x µ) =. exp ( σ 0 2π 2σ0 2. 3. Normal radom variables with ukow mea µ ad variace σ has desity f(x µ, θ) = ) ( σ 2π exp (x µ)2 2σ 2. Defiitio 2. A statistic is a fuctio of the radom variable that does ot deped o ay ukow parameter. The goal of estimatio is to determie which of the P θ is the source of the data X. I this case the actio space A is the same as the parameter space ad the estimator is the decisio fuctio d : data Θ. Example 3. If X = (X,..., X ) are idepedet Ber(θ) radom variables, the the simple choice for estimatig θ is d(x,..., x ) = (x + + x ) = x.
Ubiased Estimators Defiitio 4. A statistic d is called a ubiased estimator for a fuctio of the parameter g(θ) provided that for every θ Θ E θ d(x) = g(θ). Ay estimator that ot ubiased is called biased. If the image of g(θ) is a vector space, the the bias b d (θ) = E θ d(x) g(θ). Exercise 5. If X,..., X form a simple radom sample with ukow fiite mea µ, the X is a ubiased estimator of µ. If the X i have variace σ 2, the Var( X) = σ2. If we choose the quadratic loss fuctio L(θ, a) = (a θ) 2, the correspodig risk fuctio R 2 (g(θ), d) = E θ [(d(x) g(θ)) 2 ] = E θ [(d(x) E θ d(x) + b d (θ)) 2 ] = E θ [(d(x) E θ d(x)) 2 ] + 2b d (θ)(e θ [(d(x) E θ d(x)] + b d (θ) 2 = Var θ (d(x)) + b d (θ) 2 Note that the risk is the variace of a ubiased estimator ad the bias adds to the risk. I the example above, with d(x) = X, Thus, x is a ubiased estimator for θ. I additio, E θ X = (θ + + θ) = θ Var( X) = 2 (θ( θ) + + θ( θ)) = θ( θ). Example 6. If, i additio, the simple radom sample has ukow fiite variace σ 2, the, we ca cosider the sample variace S 2 = (X i X) 2. To fid the mea of S 2, we begi with the idetity (X i µ) 2 = = = ((X i X) + ( X µ)) 2 (X i X) 2 + (X i X)( X µ) + ( X µ) 2 (X i X) 2 + ( X µ) 2 2
The, Thus, ad is a ubiased estimator for σ 2. ES 2 = E [ ] (X i µ) 2 ( X µ) 2 = σ2 σ2 = σ2. [ ] E S2 = σ 2 S2 = (X i X) 2 Defiitio 7. A ubiased estimator d is a uiformly miimum variace ubiased estimator (UMVUE) if d(x) has fiite variace for every value θ of the parameter ad for every ubiased estimator d, The efficiecy of ubiased estimator d, Thus, the efficiecy is betwee 0 ad. 2 Cramér-Rao Boud Var θ d(x) Var θ d(x). e( d) = Var θd(x) Var θ d(x). First, we will review a bit o correlatio. For two radom variables Y ad Z, the correlatio ρ(y, Z) = Cov(Y, Z) Var(Y )Var(Z). () The correlatio takes values ρ(y, Z) ad takes the extreme values ± if ad oly if Y ad Z are liearly related, i.e., Z = ay + b for some costats a ad b. Cosequetly, Cov(Y, Z) 2 Var(Y )Var(Z). If the radom variable Z has mea zero, the Cov(Y, Z) = E[Y Z] ad E[Y Z] 2 Var(Y )Var(Z) = Var(Y )EZ 2. (2) We begi with data X = (X,..., X ) draw from a ukow probability P θ. The paramater space Θ R. Deote the joit desity of these radom variables f(x θ), where x = (x..., x ). 3
I the case that the date comes from a simple radom sample the the joit desity is the product of the margial desities. f(x θ) = f(x θ) f(x θ). (3) For cotiuous radom variables, we have = f(x θ) dx R (4) Now, let d be the ubiased estimator of g(θ), the g(θ) = E θ d(x) = d(x)f(x θ) dx R (5). If the fuctios i (4) ad (5) are differetiable with respect to the parameter θ ad we ca pass the derivative through the itegral, the [ ] f(x θ) l f(x θ) l f(x θ) 0 = dx = f(x θ) dx = E θ. (6) From a similar calculatio, R R [ ] g l f(x θ) (θ) = E θ d(x). (7) Now, retur to the review o correlatio with Y = d(x) ad the score fuctio Z = l f(x θ)/.. The, by equatio (6), EZ = 0, ad from equatios (7) ad (2), we fid that g (θ) 2 = E θ [ d(x) l f(x θ) ] [ 2 ( ) ] 2 l f(x θ) Var θ (d(x))e θ, or, where Var θ (d(x)) g (θ) 2 [ ( ) ] 2 l f(x θ) I(θ) = E θ I(θ). (8) is called the Fisher iformatio. Equatio (8), called the Cramér-Rao lower boud or the iformatio iequality, states that the lower boud for the variace of a ubiased estimator is the reciprocal of the Fisher iformatio. I other words, the higher the iformatio, the lower is the possible value of the variace of a ubiased estimator. If we retur to the case of a simple radom sample the l f(x θ) = l f(x θ) + + l f(x θ). Also, the radom variables {l f(x k θ); k } are idepedet ad have the same distributio. Thus, the Fisher iformatio. I(θ) = E[(l f(x θ)) 2 ]. 4
Example 8. For idepedet Beroulli radom variable with ukow success probability θ, E [ ( f(x θ) l f(x θ) = x l θ + ( x) l( θ), f(x θ) = x θ x θ = x θ θ( θ), ) ] 2 = θ 2 ( θ) 2 E[(X θ)2 ] = θ( θ) ad the iformatio is the reciprocal of the variace. Thus, by the Cramér-Rao lower boud, ay ubiased estimator based o observatios must have variace al least θ( θ)/. However, if we take d(x) = x, the θ( θ) V ar µ d(x) = ad x is a uiformly miimum variace ubiased estimator. Example 9. For idepedet ormal radom variables with kow variace σ 2 0 ad ukow mea µ, (x µ) 2 l f(x µ) = l(σ 0 2π). 2σ 2 0 ad E [ ( µ f(x µ) µ f(x µ) = σ0 2 (x µ). ) ] 2 = σ0 4 E[(X µ) 2 ] = σ0 2. Agai, the iformatio is the reciprocal of the variace. Thus, by the Cramér-Rao lower boud, ay ubiased estimator based o observatios must have variace al least σ 2 0/. However, if we take d(x) = x, the V ar µ d(x) = σ2 0. ad x is a uiformly miimum variace ubiased estimator. Recall that for the correlatio to be ±, the estimator d(x) ad the score fuctio l f(x θ)/. must be liearly related with probability. After itegratig, we obtai, l f(x θ) = a(θ)d(x) + b(θ) f(x θ) = c(θ)h(x) exp(π(θ)d(x)). (9) We shall call desity fuctios satisfyig equatio (9) a expoetial family with atural parameter π(θ). 5
Example 0 (Poisso radom variables). f(x λ) = λx x! e λ = e λ exp(x l λ). x! Thus, Poisso radom variables are a expoetial family. The score fuctio The Fisher iformatio λ f(x λ) = λ (x l λ l x! λ) = x λ. I(λ) = E λ [ (X λ ) 2 ] = λ 2 E λ[(x λ) 2 ] = λ. If X is P ois(λ), the E λ X = Var λ (X) = λ. For a simple radom sample havig observatios both X ad k= (X k X) 2 /( ) are ubiased estimators. However, Var λ ( X) = λ ad d(x) = x has efficiecy. This could have bee predicted. The desity of idepedet observatios is f(x λ) = e λ λ x +x x! x! = e λ λ x x! x! ad so the score fuctio x l f(x λ) = ( λ + x l λ) = + λ λ λ showig that the estimator ad the score fuctio are liearly related. 6