Estimation: Part 2. Chapter GREG estimation

Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the concept of regresson estmaton to a more general class of models for developng model-asssted estmaton. To motvate the proposed estmator, we frst ntroduce dfferent estmator. Suppose that a proxy value of y, denoted by y 0), throughout the populaton. The dfferent estmator of Y = y usng y 0) Ŷ dff =,,y0) s defned as y 0 + y y 0 ). 9.) The dfference estmator s unbased regardless of the choce of y 0). The varance of the dfference estmator s Var ) ) d Ŷ dff = Var where d = y y 0. Under SRS, the varance s 2 n n ) d d ) 2.

2 CHAPTER 9. ESTIMATIO: PART 2 Thus, the dfference estmator s unbased regardless of y 0) but ts varances are dfferent for dfferent choce of y 0). If y 0) s close to the true value of y, then the varance wll be small. In practce, we do not know y 0). Instead, we use auxlary varable x to construct a model for y and develop y 0) from the model. That s, we assume that the fnte populaton s a random sample from a superpopulaton model that has generated the current fnte populaton. One of the commonly used superpopulaton model s E ζ y ) = x β 9.2) Cov ζ y,y j ) = c σ 2 f = j 0 f j 9.3) where c = cx ) s a known functon of x and β are σ 2 are unknown parameters. Model 9.3) s often called the generalzed regresson GREG) model. Under the GREG model n 9.3), the model-based estmator s defned to be the sum of the predcted values n the GREG model. That s, we defne Ŷ GREG = ŷ 9.4) where ŷ = x ˆβ and ˆβ = ) x x c x y. 9.5) c ote that ˆβ converges to the followng fnte populaton quantty B = x x ) c x y, 9.6) c whch s the best estmator of β under Census. The model-based estmator n 9.4) s developed under the GERG model n 9.3). Thus, t s often called GREG estmator. The followng lemma shows that the GREG estmator n 9.4) s asymptotcally equvalent to the dfference estmator n 9.).

9.. GREG ESTIMATIO 3 Lemma 9.. Under some regularty condtons, f c satsfes c = λ x for some λ, the GREG estmator n 9.4) satsfes where B s defned n 9.5). Ŷ GREG = x B + Proof. If c = λ x holds for some λ, and we can wrte Ŷ GREG = ŷ = λ x x c ˆβ = λ x y c y x π B) 9.7) = λ x y = c y x ˆβ + y x ˆβ ) = Ŷ HT + X ˆX HT ) ˆβ = Ŷ HT + X ˆX HT ) B + X ˆX HT ) ˆβ B ) and the last term s neglgble under some regularty condtons. and By 9.7), we can establsh that E Ŷ GREG ) = Y 9.8) Var ) Ŷ GREG = Var π y x B ). 9.9) If E = y x B were observed, the varance 9.9) would be estmated by Var Ŷ GREG ) = j A j j E E j.

4 CHAPTER 9. ESTIMATIO: PART 2 If we use e = y x ˆβ nstead of E, a consstent varance estmator s computed by ˆV GREG = j A Example 9.. Rakng rato estmaton) j j e e j. 9.0) Suppose that we have I J groups or cells. Cell counts j are not known. Margnal counts = J j= j and j = I j are known. In ths case, we may consder the followng two-way addtve model E ζ Y k ) = α + β j V ζ Y k ) = σ 2 where α, β j, and σ 2 are unknown parameters. Defne δ jk = f k U j 0 otherwse. Unfortunately, we do not observe δ jk n the populaton. Let x k = δ,k,δ 2 k,,δ I k,δ k,δ 2k,,δ Jk ) and we know k= x k. The GREG estmator n ths case can be wrtten as Ŷ GREG = g A)y where g A) = + x k k= k A x k π k ) k A ) x k x k x. π k Unfortunately, we cannot compute the nverse of k A π k x k x k because ts rank s I + J, whch s not full rank. Thus, there s no unque soluton ˆB to π x x ˆB = π x y.

9.2. OPTIMAL ESTIMATIO 5 The goal s to fnd g ka = g k A) such that g ka δ,k = k A π k g ka δ jk = k A π k δ k, =,2,,I 9.) k= δ jk, j =,2,,J. 9.2) k= One way to obtan the soluton to 9.) and 9.2) s to solve the equatons teratvely as follows:. Start wth g 0) ka =. 2. For δ k =, g t+) ka = g t) k= δ k ka k A g t) ka δ. k/π k It satsfes 9.), but not necessarly satsfy 9.2). 3. For δ jk =, g t+2) ka = g t+) ka k= δ jk k A g t+) ka δ jk /π k. It satsfes 9.2), but not necessarly satsfy 9.). 4. Set t t + 2 and go to Step 2. Contnue untl convergence. Such computaton method s called rakng rato estmaton and was frst consdered by Demng and Stephan 940) n the Census applcaton. See also Devlle, et al. 993). 9.2 Optmal Estmaton So far, we have dscussed a class of model-asssted estmators that mprove the effcency of the HT estmator by ncorporatng the auxlary nformaton. In ths secton, we dscuss some optmalty of the GREG estmator under some class of estmators. We frst show the nonexstence of the UMVUE Unformly Mnmum

6 CHAPTER 9. ESTIMATIO: PART 2 Varance Unbased Estmator) n a strctly desgn-based sense, whch was orgnally dscussed by Godambe and Josh 965) and then also proved by Basu 97) wth a smpler proof. Theorem 9.. Let any noncensus desgn wth π k > 0 k =,2,,) be gven. Then no unformly mnmum varance estmator exsts n the class of all unbased estmators of Y = y. Proof. For gven value y = y,y 2,,y ), consder the followng dfference estmator Ŷ dff = y + y y ) 9.3) whch s unbased regardless of y = y,y 2,,y ) and ts varance s zero when y = y. ow, n order for an unbased estmator Ŷ to be an UMVUE, t should satsfy Var Ŷ ) Var Ŷ dff ), y. ow, snce Var ) Ŷ dff = 0 for y = y, t means Var Ŷ ) = 0 for y = y. Snce we can choose any arbtrary y, t means that Var Ŷ ) = 0 for all y, whch holds only for the census. The above theorem shows that t s mpossble to fnd the best estmator among the class of unbased estmator n the sense of mnmzng the desgn varance. To overcome ths problem, we relax the objectve functon for comparng the effcency of the estmators by allowng for the superpopulaton model nto consderaton. Specfcally, we wll consder the expected value of the desgn varance under the superpopulaton model. Such varance s called antcpated varance and s formally defned as follows. Defnton 9.. Antcpated varance of ˆθ s defned by AV ˆθ ) = E ζ Var ) p ˆθ, where subscrpt ζ denotes the dstrbuton wth respect to the superpopulaton model and subscrpt p denotes the dstrbuton generated by the samplng mechansm.

9.2. OPTIMAL ESTIMATIO 7 Lemma 9.2. Let ˆθ be desgn-unbased for θ. The antcpated varance of ˆθ can be wrtten as AV ˆθ ) = E ) p Varζ ˆθ +Var ) p Eζ ˆθ Var ζ θ ). 9.4) Proof. Snce ˆθ s desgn-unbased for θ, we can wrte AV ˆθ ) = E ζ Var ) ) 2 p ˆθ = E ζ E p ˆθ θ. Thus, AV ˆθ ) = ) 2 E p E ζ ˆθ θ = 2 E p E ζ ˆθ E ζ ˆθ) + E ζ ˆθ) E ζ θ ) + E ζ θ ) θ and AV ˆθ ) = E p E ζ ˆθ E ζ ˆθ) 2 + Ep Eζ ˆθ) E ζ θ ) 2 2 + Ep Eζ θ ) θ +2E p ˆθ E ζ ˆθ) ) ) E ζ θ ) θ and the remanng cross product terms are zero. Snce E p ˆθ E ζ ˆθ) ) = ) E ζ θ ) θ and 2E p ˆθ E ζ ˆθ) ) ) E ζ θ ) θ = ) 2 Eζ θ ) θ Ep ˆθ E ζ ˆθ) ) = 2 ) 2 E ζ θ ) θ, we obtan AV ˆθ ) = E p E ζ ˆθ E ζ ˆθ) 2 + Ep Eζ ˆθ) E ζ θ ) 2 2 Eζ θ ) θ whch proves 9.4). The followng theorem presents the lower bound of the antcpated varance for a desgn unbased estmator.

8 CHAPTER 9. ESTIMATIO: PART 2 Theorem 9.2. Let y be ndependently dstrbuted n the superpopulaton model. The antcpate varance of any desgn-unbased estmator of Y satsfes E ζ Var Ŷ ) ) Var π ζ y ). 9.5) Proof. Wrte Ŷ as Ŷ = Ŷ HT + R where Ŷ HT s the HT estmator of Y. Snce Ŷ s desgn unbased, we have E R) = 0 and, for fxed j U, ow, snce 0 = E R) = pa)ra) A A = A A ; j A pa)ra) + A A ; j / A pa)ra). V ζ Ŷ ) = Vζ ŶHT ) +Vζ R) + 2Cov ζ ŶHT,R ), we obtan E p Covζ ŶHT,R ) = E p [ Eζ ŶHT E ζ ŶHT )) R ] = E p [ j U y j E ζ y j ) ) I j E ζ R ] y j E ζ y j ) ) = E ζ E I j A)RA) j U y j E ζ y j ) ) = E ζ j U RA) pa) = j U = 0, E ζ y j E ζ y j ) ) A A ; j A A A ; j / A RA) pa) where the last equalty follows because A A ; j / A RA) pa) does not depend on

9.2. OPTIMAL ESTIMATIO 9 y j. Thus, E p Vζ Ŷ ) = Ep Vζ ŶHT ) + Ep Vζ R) ) E p Vζ ŶHT ) y I = E p = E p V ζ σ 2I π 2 = σ 2 and we have AV Ŷ ) = E p V ζ Ŷ ) +Vp E ζ Ŷ ) Vζ Y ) E p V ζ Ŷ ) Vζ Y ) σ 2 U σ 2. The lower bound n 9.5) s the lower bound of the antcpated varance of any unbased estmator. The lower bound was frst dscovered by Godambe and Josh 965) and s often called Godambe-Josh lower bound. For a fx-ed sze probablty samplng desgn, the Godambe-Josh lower bound s mnmzed when Var ζ y ) /2. 9.6) To show ths, we mnmze Var ζ y )/ subject to = n. The soluton can be obtaned by applyng Cauchy -Schwarz nequalty to get σ 2 wth equalty when 9.6) holds. / ) σ 2 The followng theorem, whch was frst proved by Isak and Fuller 982), shows that the GREG estmator acheves the Godambe-Josh lower bound asymptotcally.

0 CHAPTER 9. ESTIMATIO: PART 2 Theorem 9.3. Suppose that ζ s a superpopulaton model wth y s ndependent and E ζ y ) = x β and V ζ y ) = c σ 2. Then, the antcpated varance of the GREG estmator n 9.4) asymptotcally attans the Godambe-Josh lower bound f c = λ x for some λ. Proof. By Lemma 9., when c = λ x for some λ, the GREG estmator s asymptotcally equvalent to the dfference estmator n 9.7). Thus, E ζ Var ŶGREG ). = Eζ. = E ζ = j= j= j ) y x B y j x j B j ) y x β y j x j β ) c σ 2 whch s equal to the Gobambe-Josh lower bound under the superpopulaton model. Reference Basu, D. 97). An essay on the logcal foundatons of survey samplng, part one. In V.P. Godambe and D.A. Sprott, Eds., Foundatons of Statstcal Inference. Toronto: Holt, Rnehart & Wnston, 203-242. Demng, W. E. and Stephan, F. F. 940). On a least squares adjustment of a sampled frequency table when the expected margnal totals are known. The Annals of Mathematcal Statstcs,, 427-444. Devlle, J.C., Särndal, C.E., and Sautory, O. 993). Generalzed rakng procedure n survey samplng, Journal of the Amercan Statstcal Assocaton 88, 03-020. Godambe, V.P. and Josh, V.M. 965). Admssblty and Bayes estmaton n samplng fnte populatons,. Annals of Mathematcal Statstcs 36, 707-722.

9.2. OPTIMAL ESTIMATIO Isak, C. and Fuller, W.A. 982). Survey desgn under the regresson superpopulaton model. Journal of the Amercan Statstcal Assocaton 78, 7-23.