Nonparametric Regression Estimation. of Finite Population Totals. under Two-Stage Sampling

Size: px
Start display at page:

Download "Nonparametric Regression Estimation. of Finite Population Totals. under Two-Stage Sampling"

Transcription

1 Nonparametrc Regresson Estmaton of Fnte Populaton Totals under Two-Stage Samplng J-Yeon Km Iowa State Unversty F. Jay Bredt Colorado State Unversty Jean D. Opsomer Iowa State Unversty ay 21, 2003 Abstract A nonparametrc regresson estmator for the fnte populaton total n two-stage samplng wth complete stage-one auxlary nformaton s developed. The estmator, based on local polynomal regresson, s a lnear combnaton of cluster total estmators, wth weghts that are calbrated to known control totals. The estmator s asymptotcally desgn-unbased and desgn consstent under mld assumptons, and ts varance can be consstently estmated. Smulaton results ndcate that the nonparametrc estmator domnates several parametrc estmators when the model regresson functon s ncorrectly specfed, whle beng nearly as effcent when the parametrc specfcaton s 1

2 correct. The methodology s llustrated usng data from a study of land use and eroson. Keywords: Auxlary nformaton, calbraton, model-asssted estmaton, local polynomal regresson, cluster samplng, eroson. 1 Introducton In many complex surveys, auxlary nformaton about the populaton of nterest s avalable. In trackng human dsease or montorng natural resources, for example, geographc nformaton systems may contan locaton-specfc ndces for all stes n the study. One approach to usng ths auxlary nformaton n estmaton s to assume a workng model ξ descrbng the relatonshp between the study varable of nterest and the auxlary varables. Estmators are then derved on the bass of ths model. Estmators are sought that have good effcency f the model s true, but mantan desrable propertes lke asymptotc desgn unbasedness and desgn consstency f the model s false. Often, a lnear model s selected as the workng model. Generalzed regresson estmators (e.g., Cassel, Särndal, and Wretman, 1976, 1977; Särndal, 1980; Robnson and Särndal, 1983), ncludng rato estmators and lnear regresson estmators (Cochran, 1977), best lnear unbased estmators (Brewer, 1963; Royall, 1970), and poststratfcaton estmators (Holt and Smth, 1979), are all derved from assumed lnear models. In some stuatons, the lnear model s not approprate, and the resultng estmators do not acheve any effcency gan over purely desgn-based estmators. Wu and Stter (2001) propose a class of estmators for whch the workng models follow a nonlnear parametrc shape. The effcent use of nonlnear models, however, requres a pror 2

3 knowledge of the specfc parametrc structure of the populaton. Ths s especally problematc f that same model s to be used for many varables of nterest, a common occurrence n surveys. Because of these concerns, some researchers have consdered nonparametrc models for ξ. Dorfman (1992) and Chambers, Dorfman, and Wehrly (1993) developed modelbased nonparametrc estmators usng ths approach. Bredt and Opsomer (2000) proposed a new type of model-asssted nonparametrc regresson estmator for the fnte populaton total, based on local polynomal smoothng. The local polynomal regresson estmator has the form of the generalzed regresson estmator, but s based on a nonparametrc superpopulaton model applcable to a much larger class of functons. The theory developed n Bredt and Opsomer (2000) for the local polynomal regresson estmator apples only to drect element samplng desgns wth auxlary nformaton avalable for all elements of the populaton. In many large-scale surveys, however, more complex desgns such as multstage or multphase samplng desgns wth varous types of auxlary nformaton are commonly used. In ths paper, we extend nonparametrc regresson estmaton to two-stage samplng, n whch a probablty sample of clusters s selected, and then subsamples of elements wthn each selected cluster are obtaned. Such two-stage samplng s frequently used because an adequate frame of elements s not avalable or would be prohbtvely expensve to construct, but a lstng of clusters s avalable. Famlar examples nclude humans wthn households, fsh wthn lakes, and trees wthn plots. In such cases, t s more lkely that detaled auxlary nformaton would be avalable for the clusters but not the elements. Therefore, we consder local polynomal regresson estmaton n two-stage element samplng wth auxlary nformaton avalable for all clusters. Results for sngle-stage cluster samplng, n whch 3

4 each sampled cluster s completely enumerated, are obtaned as a specal case. The case of mult-stage element samplng wth auxlary nformaton avalable for all prmary samplng unts s an mmedate extenson of the results n ths paper, but wll not be explctly developed here. In Secton 1.1, we descrbe our two-stage samplng framework and ntroduce approprate notaton. In Secton 1.2, we adapt the local polynomal regresson estmator of Bredt and Opsomer (2000) to two-stage samplng and n Secton 1.3 we ntroduce assumptons used n our theoretcal dervatons. Desgn propertes of the estmator are descrbed n Secton 2. Secton 2.1 shows that the estmator s a lnear combnaton of estmators of cluster totals wth weghts that are calbrated to known control totals. Secton 2.2 shows asymptotc desgn unbasedness and desgn consstency of the estmator, approxmates the estmator s desgn mean squared error, and provdes a consstent estmator of the desgn mean squared error. Secton 3 descrbes results of a smulaton study, n whch the local polynomal regresson estmator competes well wth a number of other parametrc and nonparametrc estmators, across a broad range of study varables. In Secton 4, we apply the estmator to data from a 1995 study of eroson, usng Natonal Resources Inventory (NRI) data as frame materals, and conclude wth a bref dscusson n Secton 5. All proofs are gathered n an appendx. 1.1 Notaton Consder a fnte populaton of elements U = {1,..., k,..., N} parttoned nto clusters, U 1,..., U,..., U. The populaton of clusters s represented as C = {1,...,,..., }. The number of elements n the th cluster U s denoted N. We have U = U and N = N. For all clusters C, an auxlary vector x = (x 1,..., x G) s aval- 4

5 able. For the sake of smplcty we assume that G = 1; that s, the x are scalars. At stage one, a probablty sample s of clusters s drawn from C accordng to a fxed sze desgn p I ( ), where p I (s) s the probablty of drawng the sample s from C. Let m be the sze of s. The cluster ncluson probabltes = Pr { s} = s: s p I(s) and j = Pr {, j s} = s:,j s p I(s) are assumed to be strctly postve. For every sampled cluster s, a probablty sample s of elements s drawn from U accordng to a fxed sze desgn p ( ) wth ncluson probabltes π k and π kl. That s, p (s ) s the probablty of drawng s from U gven that the th cluster s chosen at stage one. The sze of s s denoted n. Assume that π k = Pr {k s s } = s :k s p (s ) and π kl = Pr {k, l s s } = s :k,l s p (s ) are strctly postve. As s customary for two-stage samplng, we assume nvarance and ndependence of the second-stage desgn. Invarance of the second-stage desgn means that for every, and for every s, p ( s) = p ( ). That s, the same wthn-cluster desgn s used whenever the th cluster s selected, regardless of what other clusters are selected. Independence of the second-stage desgn means that subsamplng n a gven cluster s ndependent of subsamplng n any other cluster. The whole sample of elements and ts sze are s s and s n, respectvely. The study varable y k s observed for k s s. The parameter to estmate s the populaton total t y = k U y k = t, where t = k U y k s the th cluster total. Let I = 1 f s and I = 0 otherwse. Note that E p [I ] = E I [E II [I ]] = E I [I ] =, where E p [ ] denotes expectaton wth respect to the samplng desgn, E I [ ] denotes expectaton wth respect to stage one, and E II [ ] denotes condtonal expectaton wth respect to stage two gven s. Also, V I ( ) and V II ( ) denote varances wth respect to stage one and two, respectvely. Usng ths notaton, an estmator ˆt of t s sad to be ] desgn-unbased f E p [ˆt = t. 5

6 The smple expanson estmator of t y n two-stage element samplng s gven by where ˆt y = s ˆt = ˆt I, (1) π ˆt = k s s the Horvtz-Thompson (1952) estmator of t wth respect to the second stage of samplng. We wll refer to (1) as the Horvtz-Thompson (HT) estmator. Snce ˆt s y k π k desgn-unbased for t, the HT estmator ˆt y s desgn-unbased for t y. The varance of the HT estmator ˆt y under the samplng desgn can be wrtten as the sum of two components, Var p (ˆt y ) = V I ( E II [ˆt y ]) + E I [ V II (ˆt y )] =,j C (j π j ) t t j + V, (2) π j π where V = V II (ˆt ) = (π kl π k π l ) y k y l π k,l U k s the varance of ˆt wth respect to stage two. A desgn-unbased estmator of V s gven by ˆV = π kl π k π l y k y l. π k,l s kl π k π l Note that V s non-random due to nvarance. Note also that the result for sngle-stage cluster samplng, n whch all elements n each selected cluster are observed, s obtaned f we set ˆt = t and V = ˆV = 0 for all C. See, for example, Särndal, Swensson, and Wretman (1992, Result 4.3.1). π l 6

7 1.2 Local Polynomal Regresson Estmator The model-asssted approach to usng auxlary nformaton {x } s to assume as a workng model that the fnte populaton pont scatter {(x, t )} s a realzaton from an nfnte superpopulaton model ξ, n whch t = µ(x ) + ε, (3) where the ε are ndependent random varables wth E ξ [ε ] = 0 and Var ξ (ε ) = ν(x ). Typcally, both µ(x) and ν(x) are taken to be parametrc functons of x, such as the lnear specfcaton µ(x) = j β ja µ,j (x) and ν(x) = j λ ja ν,j (x), where the a µ,j and a ν,j are known functons and the β j and λ j are unknown parameters. A varety of heteroskedastc polynomal regresson models could be specfed n ths way (e.g., Särndal, Swensson, and Wretman, 1992, Secton 8.4). As mentoned n Secton 1, the model-asssted methodology offers effcency gans f the workng model descrbes the fnte populaton pont scatter reasonably well. The problem s that, n an actual survey, there s not a sngle pont scatter, but many, correspondng to dfferent study varables t. Standard survey practce s to use the workng model to construct one set of weghts that reflects the desgn and the auxlary nformaton n {x }, and apply ths one set of weghts to all study varables. Thus, t s crtcal to keep the model specfcaton flexble. Ths s the motvaton for the nonparametrc approach that we employ. Rather than specfy a parametrc model, we assume only that µ(x) s a smooth functon of x. Ths nonparametrc workng model has the potental to offer effcency gans for a greater varety of study varables than the parametrc model, whle mantanng most of the effcency of the parametrc regresson estmator f the parametrc model s correct. We now ntroduce some further notaton used n the nonparametrc regresson. Let 7

8 K denote a kernel functon and h denote ts bandwdth. Let t C = [t ] be the vector of t s n the populaton of clusters. Defne the (q + 1) matrx 1 x 1 x (x 1 x ) q [ ] X C =... = 1 x j x (x j x ) q 1 x x (x x ) q and defne the matrx { 1 W C = dag K h ( )} xj x Let e r represent the rth column of the dentty matrx. The local polynomal regresson estmator of µ(x ), based on the entre fnte populaton of clusters, s then gven by h j C. j C, µ = e ( 1 X ) 1 C W C X C X C W C t C = w Ct C, (4) whch s well-defned as long as X CW C X C s nvertble. Ths s the tradtonal local polynomal kernel estmator descrbed n e.g. Wand and Jones (1995). If these µ s were known, then a desgn-unbased estmator of t y would be the two-stage analogue of the generalzed dfference estmator (Särndal, Swensson, and Wretman, 1992, p. 222), t y = s ˆt µ + µ. (5) The desgn varance of (5) s Var p (t y ) =,j C (j π j ) t µ t j µ j π j + V, (6) whch depends on resduals from the nonparametrc regresson and hence s expected to be smaller than (2). Note that, snce a model s assumed for the cluster totals but not for the ndvdual observatons, only the varance component at the cluster level n (6) s affected by the model. 8

9 In the present context, the populaton estmator µ cannot be calculated because only the y k n s s are known. Therefore, we wll replace each µ by a sample-based consstent estmator. Let ˆt s = [ˆt ] s be the vector of ˆt s obtaned n the sample of clusters. Defne the m (q + 1) matrx [ X s = 1 x j x (x j x ) q ] j s, (7) and defne the m m matrx W s = dag { 1 π j h K ( ) xj x } h A desgn-based sample estmator of µ s then gven by j s. (8) ˆµ o = e ( 1 X ) 1 s W s X s X s W sˆt s = w o sˆt s, (9) as long as X sw s X s s nvertble. Ths estmator dffers from tradtonal local polynomal regresson because of the ncluson of the samplng weghts and the fact that the cluster totals are estmated, not observed. In a desgn-based context, these adjustments mply that ˆµ o s an estmator of µ, the populaton ft, but not an estmator of µ(x ), the model mean at x. Substtutng ˆt and ˆµ o respectvely for t and µ n (5), we have the local polynomal regresson estmator for the populaton total of y t o y = s ˆt ˆµ o + ˆµ o. (10) In theory, the estmator (9) can be undefned for some C even f the populaton estmator n (4) s well-defned. As n Bredt and Opsomer (2000), we wll consder an adjusted sample estmator for the theoretcal dervatons n Secton 2. The adjusted sample estmator for µ s gven by ˆµ = e 1 ( X sw s X s + dag { δ 2 9 } q+1 j=1 ) 1 X sw sˆt s = w sˆt s, (11)

10 for some small δ > 0. The value δ 2 n (11) s a small order adjustment that guarantees the estmator s exstence for any s C, as long as the populaton estmator n (4) s defned for all C. Ths adjustment was also used by Fan (1992) n the study of the theoretcal propertes of local polynomal regresson. We let t y = s ˆt ˆµ + ˆµ (12) denote the local polynomal regresson estmator that uses the adjusted sample estmator n (11). The estmator for sngle-stage cluster samplng s obtaned f we set ˆt = t for all C. odelng the cluster totals as n (3) s not the only possble approach. Another possblty s to model the cluster means as N 1 t = α(x ) + ε, (13) where the ε are ndependent random varables wth mean zero and varance ν(x ), α(x) s smooth, and ν(x) s smooth and strctly postve. In ths case ˆµ n (12) would be replaced by N ˆα, where the ˆα are obtaned va nonparametrc regresson of N 1 ˆt on x, usng the local desgn matrx (7) and local weghtng matrx (8). odel (13) wll not be consdered further n ths paper. 1.3 Assumptons To prove our theoretcal results, we adopt an asymptotc framework n whch both the populaton number of clusters,, and the sample number of clusters, m, tend to nfnty. The number of elements wthn each cluster, N, remans bounded, so that no cluster domnates the populaton. Subsamplng wthn selected clusters s carred out as descrbed n Secton 1.1. We make the followng addtonal assumptons on the study varable, the desgn, and the smoothng methodology: 10

11 A1 Dstrbuton of the errors under ξ: the errors ε are ndependent and have mean zero, varance ν(x ), and compact support, unformly for all. A2 For each, the x are consdered fxed wth respect to the superpopulaton model ξ. The x are ndependent and dentcally dstrbuted F (x) = x f(t)dt, where f( ) s a densty wth compact support [a x, b x ] and f(x) > 0 for all x [a x, b x ]. A3 The mean functon µ s contnuous on [a x, b x ]. A4 Kernel K: the kernel K( ) has compact support [ 1, 1], s symmetrc and contnuous, and satsfes 1 1 K(u) du = 1. A5 Frst-stage samplng rate m 1, bandwdth h, and cluster sze N : as, m 1 π (0, 1), h 0, h 2 /(log log ), and N s unformly bounded above for all clusters and for all. A6 Frst-stage (cluster) ncluson probabltes and j : for all, mn λ > 0, mn,j C j λ > 0 and lm sup m max j π j <.,j C: j A7 Addtonal assumptons nvolvng hgher-order frst-stage ncluson probabltes: lm m2 EI max [(I 1 1 )(I 2 2 )(I 3 3 )(I 4 4 )] <, ( 1, 2, 3, 4 ) D 4, 11

12 where D t, denotes the set of all dstnct t-tuples ( 1, 2,..., t ) from C, and lm lm sup lm sup EI max [(I 1 I )(I 3 I )] = 0, ( 1, 2, 3, 4 ) D 4, [ EI m max (I 1 1 ) 2 (I 2 2 )(I 3 3 )] <, ( 1, 2, 3 ) D 3, m 2 EI max [(I 1 1 )(I 2 2 )(I 3 3 )] <. ( 1, 2, 3 ) D 3, A8 The second-stage desgn s nvarant and ndependent, wth n 1 for every s and for every possble frst-stage sample s. Further, the second-stage ncluson probabltes are unformly bounded away from zero for all clusters and all. A9 The second-stage jont ncluson probabltes are unformly bounded away from zero for all clusters and all. Remarks: 1. The assumptons n A1 and A3 are weaker than those n standard kernel regresson, because we are not attemptng to estmate the superpopulaton mean functon µ( ), but only the fnte populaton nonparametrc fts, {µ } =1. 2. Assumptons A1 A7 are adapted from Bredt and Opsomer (2000) to the twostage samplng case. The last expresson added n A7 for ths case becomes ( ) N 2 ρ 3 3ρ 2 ρ 1 + 2ρ 3 1 = O (1), wth the notaton that ρ k s the kth order ncluson probablty of k dstct elements under smple random samplng wthout replacement. Straghtforward 12

13 extenson of the results n that paper shows that the desgn assumptons wll hold for smple random samplng of clusters, stratfed smple random samplng of clusters, and related desgns. 3. The subsamplng desgn can be qute general, but s subject to the mld restrctons mposed by A8 (for consstent total estmaton) and A9 (for consstent varance estmaton). In partcular, A8 together wth the bounded cluster szes and bounded error terms mples that t and V are unformly bounded for all clusters and all, whch wll be used n several of the proofs. If clusters are completely enumerated, then A8 and A9 are satsfed trvally, and the results of Bredt and Opsomer (2000) can be appled drectly. See Remark (v) of Secton 1.3 of that paper. 4. For the second stage, we assume n A8 that n 1 for every s, and for every possble frst-stage sample s so that ˆt and ˆV are well-defned. Alternatvely, we can let ˆt = 0 and ˆV = 0 when n = 0 for s to make ˆt and ˆV well-defned. 2 an Results 2.1 Weghtng and Calbraton The nonparametrc regresson estmator can be expressed as a lnear combnaton of the study varables, wth weghts that do not depend on the study varables. These weghts are extremely useful n practce. From (11) and (12), note that t y = 1 + ( 1 I ) j w s π sje j C j ˆt = s ω sˆt (14) 13

14 = s k s ω s π k y k. Thus, t y s a lnear combnaton of the ˆt s n s, wth cluster weghts {ω s } that are the samplng weghts of clusters, sutably modfed to reflect auxlary nformaton [x ]. Alternatvely, t y s a lnear combnaton of the y k s n s s, wth element weghts {ω s π 1 k } that reflect both the desgn and the auxlary nformaton. Because both sets of weghts are ndependent of the study varables, they can be appled to any study varable of nterest. In partcular, the weghts ω s could be appled to x l. If δ = 0, then ω s = ω o s and s ω o sx l = x l for l = 0, 1,..., q. That s, the weghts are exactly calbrated to the q +1 known control totals N, t x,..., t x q. If µ(x ) s exactly a qth degree polynomal, then the uncondtonal expectaton (wth respect to desgn and model) of t o y t y s exactly zero. If δ 0, then ths calbraton property holds approxmately. 2.2 Asymptotc Results In general, the local polynomal regresson estmator t y s not desgn-unbased because the ˆµ are nonlnear functons of desgn-unbased estmators. However, t y s asymptotcally desgn-unbased and desgn consstent under mld condtons. Theorem 1 In two-stage element samplng, and under A1 A8, the local polynomal regresson estmator t y = { (ˆt ˆµ ) I } + ˆµ s asymptotcally desgn-unbased (ADU) n the sense that lm E p ] [ t y t y = 0 wth ξ-probablty one, 14

15 and s desgn consstent n the sense that for all η > 0. [ ] lm E p I { t y t y >η} = 0 wth ξ-probablty one Under the same condtons as n Theorem 1, we obtan the asymptotc desgn mean squared error of the local polynomal regresson estmator t y n two-stage element samplng. The asymptotc desgn mean squared error conssts of frst- and second-stage varance components, and s equvalent to the varance of the generalzed dfference estmator, gven n (6). As noted after equaton (6) above, the second-stage varance s unaffected by the regresson estmaton at the cluster level. Theorem 2 In two-stage element samplng, and under A1 A8, ) 2 ( t y t y me p = m 2,j C (t µ )(t j µ j ) j π j π j + m 2 V + o(1). The next result shows that the asymptotc desgn mean squared error can be estmated consstently under mld assumptons. Theorem 3 In two-stage element samplng, and under A1 A9, where and ˆV ( 1 t y ) = 1 2 lm me p,j C ASE( 1 t y ) = 1 2 ˆV ( 1 t y ) ASE( 1 t y ) = 0, (ˆt ˆµ )(ˆt j ˆµ j ) j π j π j,j C I I j + 1 j 2 ˆV I (t µ )(t j µ j ) j π j + 1 V π j 2. 15

16 Therefore, ˆV ( 1 t y ) s asymptotcally desgn-unbased and desgn consstent for ASE( 1 t y ). Usng the weghted resdual technque (Särndal, Swensson, and Wretman, 1989), we could construct an alternatve varance estmator wth the local polynomal regresson weghts ω s n (14), ˆV w ( 1 t y ) = 1 2,j s ω s (ˆt ˆµ )ω js (ˆt j ˆµ j ) j π j j ωs 2 ˆV. Analogous results for the generalzed regresson estmator are gven n Result of Särndal, Swensson, and Wretman (1992). s 3 Smulaton Results We performed some smulaton experments n order to compare the performance of the local polynomal regresson estmator n two-stage element samplng wth that of several parametrc and nonparametrc estmators. The estmators consdered are the same as those n Bredt and Opsomer (2000) adapted to the two-stage case, and are denoted as follows: HT Horvtz-Thompson equaton (1) REG lnear regresson Särndal, Swensson, and Wretman (1992, p. 309) REG3 cubc regresson PS poststratfcaton Cochran (1977, p. 134) LPR0 local polynomal wth q = 0 equaton (12) LPR1 local polynomal wth q = 1 equaton (12) KERN model-based nonparametrc Dorfman (1992) CDW bas-calbrated nonparametrc Chambers, Dorfman, and Wehrly (1993) 16

17 The frst four estmators are parametrc and the last four are nonparametrc. Of the parametrc estmators, HT s purely desgn-based and REG and REG3 are modelasssted. For the poststratfcaton estmator, we dvde the x-range nto ten equallyspaced strata. The number of poststrata was chosen to ensure a very small probablty of empty poststrata. Among the four nonparametrc estmators, LPR0 and LPR1 are model-asssted and KERN and CDW are model-based. KERN and CDW consdered here are extended versons of estmators proposed n Dorfman (1992) and Chambers et al. (1993) to two-stage element samplng wth auxlary nformaton avalable for all clusters. Snce the cluster totals t are unknown for sampled clusters s, the HT estmators ˆt are nstead used to construct KERN and CDW. In KERN, the mean functon s estmated va nonparametrc regresson of cluster total estmators ˆt s = [ˆt ] s on {x } s, and ths estmated mean functon s used to predct each non-sampled cluster total t. In CDW, we take µ(x) = xβ, ν(x) = σ 2 as a workng parametrc model ξ. Each non-sampled cluster total t s frst predcted by estmatng ts parametrc mean functon under ξ wth cluster total estmators ˆt s = [ˆt ] s, and then ts bas s predcted usng nonparametrc regresson to defne a predctor of the cluster total robust to msspecfcaton of the workng model. Note that the robust predctor can equally be vewed as a bas-adjusted verson of a nonparametrc predctor of t under the workng model ξ. In both KERN and CDW, the Nadaraya-Watson estmator s used. The Epanechnkov kernel, K(t) = 3 4 (1 t2 )I { t 1}, 17

18 and two bandwdth values (0.1 and 0.25) are used for all nonparametrc estmators. The frst bandwdth s equal to the poststratum wdth and the second s based on an ad hoc rule of 1/4th the data range. Bandwdth selecton for local polynomal regresson wll be explored at a later date. Followng Bredt and Opsomer (2000), we consder several mean functons of the cluster totals: lnear: µ 1 (x) = 1 + 2(x 0.5), quadratc: µ 2 (x) = 1 + 2(x 0.5) 2, bump: µ 3 (x) = 1 + 2(x 0.5) + exp( 200(x 0.5) 2 ), jump: µ 4 (x) = {1 + 2(x 0.5)I {x 0.65} } I {x>0.65}, exponental: cycle1: cycle4: µ 5 (x) = exp( 8x), µ 6 (x) = 2 + sn(2πx), µ 7 (x) = 2 + sn(8πx), wth x [0, 1]. These represent a range of correct and ncorrect model specfcatons for the varous estmators consdered. For µ 1, REG and CDW are expected to perform better than others because the model s correctly specfed. The mean functon µ 2 s quadratc, so that t s smooth but far from lnear. The functon µ 3 s smooth and nearly lnear, µ 4 s not smooth, and µ 5 s an exponental curve. The functons µ 6 and µ 7 are snusods wth perod 1 and 0.25, respectvely. The populaton x of sze = 1000 are generated as ndependent and dentcally dstrbuted (d) unform(0,1) random varables. For each generated value x and each study varable j = 1,, 7, N element values are generated as y jk = µ j(x ) N + ε jk N 1/2, {ε jk } d N(0, σ 2 ) where k U. Thus t j has mean µ j (x ) and varance ν j (x ) = σ 2. Two values for the 18

19 standard devaton of the errors are used: σ = 0.1 and 0.4. At stage one, a sample of clusters s frst generated by smple random samplng wth sample sze m = 100 and then, at stage two, subsamples of elements wthn each selected cluster are generated by smple random samplng usng sample sze n. We have consdered three cases wth dfferent second-stage samplng rates: constant cluster sze N = 100 wth n = 10, constant cluster sze N = 100 wth n = 100, and random cluster sze N dstrbuted as Posson(3) + 1 wth n = 0.5N + 1, where a denotes the nteger part of a. The results for constant cluster sze N = 100 wth n = 100 are smlar to those for the element samplng case wth samplng rate 0.1 n Bredt and Opsomer (2000). In the case of constant cluster sze N = 100 wth n = 10, the local lnear regresson estmator does not gan a large amount of effcency, snce t s relatvely dffcult to fnd an dentfable pattern n the plot of the relatonshp between the auxlary varable x and estmated cluster total ˆt due to low second-stage samplng rate. As the second-stage samplng rate ncreases, the local lnear regresson estmator gans more mprovement n effcency over the other estmators. Here, we only report on the experment wth the random cluster szes. Such clusters of moderate and varable sze mght be encountered n a household survey, for nstance. For each combnaton of mean functon, standard devaton and bandwdth, 1000 replcate two-stage element samples from the fnte populaton are selected and then the estmators are calculated. Table 1 shows the ratos of desgn mean squared errors (SEs) for all estmators mentoned above to that for the local polynomal regresson estmator wth q = 1 (LPR1). Overall, the performance of the LPR1 estmator s good, partcularly at the small value of σ. As s expected, REG and CDW perform best for the lnear study varable. In general, both parametrc and nonparametrc estmators perform better 19

20 than the HT estmator for all study varables and have more effcency at the small value of σ. Among the parametrc estmators, REG3 and PS generally perform better than REG except n the lnear study varable. LPR1 s compettve or better than the parametrc estmators n most cases, but PS n the bump study varable and REG3 and PS n the cycle1 and cycle4 study varables are much better than the oversmoothed LPR1 estmator. Compared to other nonparametrc estmators, LPR1 s compettve or better n most cases, wth the effcency gan dependng on study varable, bandwdth, and ther nteracton. To assess further the effect of bandwdth on the nonparametrc estmators, we consdered three large bandwdths h = 0.5, 1.0, and 1.5, but do not table the results here. As the bandwdth becomes large, LPR1 becomes equvalent to REG and the performance of LPR0 and KERN becomes smlar to that of HT, as expected theoretcally. CDW becomes theoretcally equvalent to the classcal regresson estmator wth a lnear ft through the orgn, whch s less effcent than REG, as the bandwdth becomes large. In summary, LPR1 s at least as good as HT for all study varables, bandwdths, and nose levels, and s sometmes much better. LPR1 s more effcent than REG for all study varables but lnear, whle beng nearly as effcent n the lnear case. LPR1 s at least as good as the remanng parametrc estmators (REG3 and PS), wth solated exceptons n whch the parametrc specfcaton s very nearly correct and LPR oversmooths. Fnally, LPR1 domnates the other nonparametrc estmators; t s sometmes much better than each, and s never much worse (SE ratos 0.93). 20

21 4 Example: Eroson study from the Natonal Resources Inventory In ths secton, we apply local polynomal regresson estmaton to data from the 1995 Natonal Resources Inventory Eroson Update Study (see Bredt and Fuller, 1999). The Natonal Resources Inventory (NRI) s a stratfed two-stage area sample of the agrcultural lands n the Unted States conducted by the Natural Resources Conservaton Servce (NRCS) of the U.S. Department of Agrculture (Bredt, 2001). The 1995 Eroson Update Study was a smaller-scale study usng NRI nformaton as frame materal. In the 1995 study, frst-stage samplng strata were 14 states n the dwest and Great Plans regons and prmary samplng unts (PSUs) were countes wthn states. A categorcal varable was used for wthn-county stratfcaton n second-stage samplng. Second-stage samplng unts (SSUs) were NRI segments of land, 160 acres n sze. The auxlary varable for each county was x, the square root of a sze measure of land wth eroson potental. (We used square root to reduce the sparseness of ponts n the regressor space.) The varables of nterest were two knds of eroson measurements, roughly characterzed as wnd eroson (WEQ) and water eroson (USLE). At stage one, a sample of 213 countes was selected by stratfed samplng from the populaton of 1357 countes, wth probablty proportonal to x 2. Subsamples of NRI segments wthn the selected countes were selected by stratfed unequal probablty samplng at stage two. In total, 1900 NRI segments were selected. The Horvtz-Thompson (HT), lnear regresson (REG), and local lnear regresson (LPR1) estmates for WEQ and USLE totals and the correspondng varance estmates were calculated from the sample. We calculated REG estmates wth three dfferent 21

22 varances of the errors (ν(x) x 2, x 4, and x 8 ), denoted by REG2, REG4, and REG8 respectvely. Weghted regressons were used because the data dsplayed large amounts of heteroskedastcty (see Fgure 1), whch can have an effect on the parametrc ft. The Epanechnkov kernel wth three dfferent bandwdths (h = 1, 3, and 5) was used for the LPR1 estmator. Because of data sparseness, the smallest allowable bandwdth for ths example s (to the nearest tenth) h = 1. Table 2 shows HT, REG and LPR1 estmates of WEQ and USLE totals and estmated standard errors. For each estmator, weghts were constructed and appled to both study varables. Standard errors were estmated by assumng unequal-probablty wth-replacement samplng wthn desgn strata at stage one, and unequal-probablty wth-replacement samplng wthn clusters at stage two. Usng the estmated standard errors as a gude, LPR1 wth h = 1 performs best among all estmates and REG4 s best among REG estmates. The estmated functon wth h = 1 s qute rough, so we use h = 3 for further comparsons. Except n relatvely large bandwdths (e.g. LPR1 wth h = 5), LPR1 estmates are better than HT and REG estmates on the bass of estmated standard errors for both WEQ and USLE. Fgure 1 shows the relatonshp between x = square root of sze measure of land wth eroson potental and estmated county total (ˆt ) n sampled countes at stage one for WEQ and USLE, on both the orgnal and square-root transformed vertcal scales. In all plots, the weghted lnear regresson ft wth varance proportonal to x 2 (REG4) and the local lnear regresson ft wth bandwdth h = 3 (LPR1) are ncluded. (The square root transformaton n the fgure s ncluded to make dfferences n the fts more dscernble). The LPR1 ft appears qute sensble. It s at least compettve wth the REG estmators, f not better, but requres nether mean nor varance functon specfcaton. 22

23 The same weghts used for WEQ and USLE could be appled to any other study varables obtaned n the Eroson Update Study, wth effcency ncreases over HT f the varable s dependent on the eroson potental sze measure, and wth effcency ncreases over REG f the dependence s non-lnear. 5 Concluson We have developed a nonparametrc survey regresson methodology for two-stage fnte populaton samplng, n whch complete auxlary nformaton s avalable for all frst-stage samplng unts. The estmator s a lnear combnaton of cluster total estmators, wth weghts that are calbrated to known control totals. Ths weghted form s operatonally convenent. Further, the estmator has desrable theoretcal propertes ncludng asymptotc desgn-unbasedness and desgn consstency. Smulaton results show that the nonparametrc estmator domnates several parametrc estmators when the model regresson functon s ncorrectly specfed, whle beng nearly as effcent when the parametrc specfcaton s correct. In an applcaton to data from the 1995 Natonal Resources Inventory Eroson Update Study, the nonparametrc methodology compares favorably wth Horvtz-Thompson and classcal survey regresson estmates for wnd and water eroson. References Bredt, F.J. (2002). Natonal Resources Inventory (NRI), US. Encyclopeda of Envronmetrcs, vol.3, pages A.H. El-Shaaraw and W.W. Pegorsch, eds. Wley. 23

24 Bredt, F.J. and Fuller, W.A. (1999). Desgn of supplemented panel surveys wth applcaton to the Natonal Resources Inventory. Journal of Agrcultural, Bologcal, and Envronmental Statstcs 4, Bredt, F.J. and Opsomer, J.D. (2000). Local polynomal regresson estmators n survey samplng. Annals of Statstcs 28, Brewer, K.R.W. (1963). Rato estmaton n fnte populatons: some results deductble from the assumpton of an underlyng stochastc process. Australan Journal of Statstcs 5, Cassel, C.., Särndal, C.E., and Wretman, J. H. (1976). Some results on generalzed dfferent estmaton and generalzed regresson estmaton for fnte populatons. Bometrka 63, Cassel, C.-., Särndal, C.-E., and Wretman, J. H. (1977). Foundatons of Inference n Survey Samplng. Wley, New York. Chambers, R.L., Dorfman, A.H., and Wehrly, T.E. (1993). Bas robust estmaton n fnte populatons usng nonparametrc calbraton. Journal of the Amercan Statstcal Assocaton 88, Cochran, W.G. (1977). Samplng Technques, 3rd ed. Wley, New York. Dorfman, A.H. (1992). Nonparametrc regresson for estmatng totals n fnte populatons. Proceedngs of the Secton on Survey Research ethods, Amercan Statstcal Assocaton, Fan, J. (1992). Desgn-adaptve nonparametrc regresson. Journal of the Amercan Statstcal Assocaton 87, Fuller, W.A. (1996). Introducton to Statstcal Tme Seres, second edton. Wley, New York. 24

25 Holt, D. and Smth, T.. (1979). Post stratfcaton. Journal of the Royal Statstcal Socety, Seres A 142, Horvtz, D.G. and D.J. Thompson. (1952). A generalzaton of samplng wthout replacement from a fnte unverse. Journal of the Amercan Statstcal Assocaton 47, Robnson, P.. and Särndal, C.-E. (1983). Asymptotc propertes of the generalzed regresson estmaton n probablty samplng. Sankhyā: The Indan Journal of Statstcs, Seres B 45, Royall, R.. (1970). On fnte populaton samplng under certan lnear regresson models. Bometrka 57, Särndal, C.-E., Swensson, B., and Wretman, J.H. (1989). The weghted resdual technque for estmatng the varance of the general regresson estmator of the fnte populaton total. Bometrka 76, Särndal, C.-E., Swensson, B., and Wretman, J. (1992). odel Asssted Survey Samplng, Sprnger, New York. Särndal, C.E. (1980). On π-nverse weghtng versus best lnear unbased weghtng n probablty samplng. Bometrka 67, Wand,.P. and Jones,.C. (1995). Kernel Smoothng, Chapman and Hall, London. Wu, C. and Stter, R.R. (2001). A model-calbraton approach to usng complete auxlary nformaton from survey data. Journal of the Amercan Statstcal Assocaton 96,

26 Appendx: Techncal Dervatons In ths appendx, we frst state and prove three lemmas, then prove the theorems of Secton 2. The proofs, lke those n Bredt and Opsomer (2000), nvolve straghtforward but tedous boundng arguments. In order to examne the desgn propertes of the local polynomal regresson estmator, we use the Taylor lnearzaton technque. Note frst that µ and ˆµ can be expressed as functons of populaton means; that s, for some functon f, µ = f( 1 s, 0) and ˆµ = f( 1 ŝ, δ) where s = and ŝ = s 1 s 2 s 1 = [s 1g ] G 1 g=1 = s 2 = [s 2g ] G 2 g=1 = ŝ 1 ŝ 2 ŝ 1 = [ŝ 1g ] G 1 g=1 = ŝ 2 = [ŝ 2g ] G 2 g=1 = k C k C k C k C 1 h K 1 h K 1 h K 1 h K ( ) xk x (x k x ) g 1 h G 1 g=1 G 2 ( ) xk x (x k x ) g 1 t k h ( xk x h ( xk x h ) (x k x ) g 1 I k g=1 G 1 π k g=1 G 2 ) (x k x ) g 1ˆt k I k = z gk k C π k g=1 G 1 g=1 G 2 = z gk t k k C g=1 = I k z gk k C π k G 1 g=1 G 2 = I k z gkˆt k k C π k g=1. For local polynomal regresson of degree q, G 1 = 2q + 1 and G 2 = q + 1. Now, we defne µ δ by substtutng s for ŝ n ˆµ ; that s, µ δ = f( 1 s, δ). Usng 26

27 the mean value theorem, for some δ (0, δ), and we let µ δ = µ + µ δ ( 2 δ) δ=δ δ 2 (15) R = ˆµ µ δ 1 k C ( ) Ik z 1k 1 1 π k k C ( ) I k z 2k ˆt k t k π k (16) where and z 1k = z 2k = G 1 g=1 G 2 g=1 ˆµ ( 1 z gk ŝ 1g ) ŝ =s ˆµ ( 1 z gk. ŝ 2g ) ŝ =s Lemma 1 Under A1 A8, as. m ] E p [R 2 ( ) 1 = O mh 2 Proof of Lemma 1: Note that m 2 h 2 E p ŝ 1g s 1g 4 = O(1) by the proof of Lemma 3 n Bredt and Opsomer (2000). Defne Then s 2g = k C 1 h K ( xk x h ) (x k x ) g 1 t k I k π k. m 2 h 2 + m2 h 2 E p ŝ 2g s 2g = B 1 + B 2 + B 3. 4 = m2 h 2 E p ŝ 2g s 2g E p (ŝ 2g s 2g ) 2 ( s 2g s 2g ) m2 h 2 E p s 2g s 2g 4 27

28 Now B 3 = O(1) by Lemma 3 n Bredt and Opsomer (2000), so to show that B 1 + B 2 + B 3 = O(1), t suffces to show that B 1 = O(1), then use Cauchy-Schwarz on B 2. Usng the ndependence of the second-stage desgn, B 1 = m2 h 2 k l 1 4 h 4 ( ) ( ) K 2 xk x K 2 xl x h h (x k x ) 2(g 1) (x l x ) 2(g 1) V kv l π kl π 2 k π2 l + m2 h 2 ( ) 1 4 h 4 K 4 xk x (x k x ) 4(g 1) E II(ˆt k t k ) 4 k C h πk 3 c 1 2 I {x h x k x +h } + c 2 I {x h x k x +h } h k C 2 h 2, k C whch s bounded by Lemma 2() n Bredt and Opsomer (2000). The assumptons of Theorem of Fuller (1996) wth α = 1, s = 4, a 4 = O(m 2 h (2+τ) ), and expectaton 1 E p[ ] are then met for the sequence {R 2 }. Snce ths functon and ts frst three dervatves wth respect to the elements of 1 s evaluate to zero, the result follows. Lemma 2 Under A1 A8, lm 1 E p (ˆµ µ ) 2 = 0. Proof of Lemma 2: We wrte 1 E p (ˆµ µ ) 2 = 1 E p (ˆµ µ δ) E p (µ δ µ ) E p [(ˆµ µ δ)(µ δ µ )]. (17) By (16), the frst term on the rght sde of (17), 1 E p (ˆµ µ δ) 2 28

29 = 1 3 k,l C k,l C,k C z 1k z 1l π kl π k π l π k π l z 2k z 2l E p [( ˆt k I k π k t k z 1k E p [ R ( Ik π k 1 E p [R 2 k,l C ) ( ˆt l I l π l t l z 1k z 2l E p [( Ik )] ) ( )] I l 1 ˆt l t l π k π l )] + 2 [ ( )] I k 2 z 2k E p R ˆt k t k π,k C k ]. (18) Usng the proof of Lemma 4 n Bredt and Opsomer (2000), the frst term on the rght sde of (18) converges to zero as. Next, 1 3 [( ) ( I k I l z 2k z 2l E p ˆt k t k ˆt l π k )] t l π l ) k,l C 1 ( = 3 z2k 2 t 2 1 π k k + V k π k C k π k c 1 1 I {x h x k x +h } λh 2h k C + c 2 λ 2 max j π j 1,j C: j k C 0 as k l I {x h x k x +h } h z 2k z 2l t k t l π kl π k π l π k π l 2 under A5, A6, A8, ndependence of the second-stage desgn, and usng Lemma 2() n Bredt and Opsomer (2000). Also, 1 E [ ] p R 2 converges to zero by Lemma 1, and then the remanng cross-product terms go to zero by the Cauchy-Schwarz nequalty. By equaton (15), the second term on the rght sde of (17), 1 E p(µ δ µ ) 2, converges to zero as. The last cross-product term of (17) goes to zero by the Cauchy-Schwarz nequalty, and hence the result follows. Lemma 3 Under A1 A8, m lm 2 E p,j C ( (ˆµ µ δ)(ˆµ j µ δj) 29 1 I ) ( 1 I ) j = 0. π j

30 Proof of Lemma 3: By (16), m 2 E p ( (ˆµ µ δ)(ˆµ j µ δj) 1 I ) ( 1 I ) j π,j C π j = m [ ( 4 z 1k z 1jl E p 1 I ) ( 1 I ) ( j 1 I ) ( k 1 I ) ] l π,j,k,l C π j π k π l + 2m [ ( 4 z 1k z 2jl E p 1 I ) ( 1 I ) ( j 1 I ) ( ) k I ] l t l ˆt l π,j,k,l C π j π k π l + m [ ( 4 z 2k z 2jl E p 1 I ) ( 1 I ) ( ) ( ) j I k I ] l t k ˆt k t l ˆt l π,j,k,l C π j π k π l 2m [ ( 3 z 1k E p R j 1 I ) ( 1 I ) ( j 1 I ) ] k 2m 3,j,k C,j,k C + m 2,j C [ ( z 2k E p R j 1 I ) ( 1 I ) ( ) j I ] k t k ˆt k E p [ R R j ( 1 I = b 1 + b 2 + b 3 + b 4 + b 5 + b 6. π j ) ( 1 I j π j π j )] π k π k Here, the frst term, b 1, s dentcal to that of the proof of Lemma 5 n Bredt and Opsomer (2000) and thus converges to zero as. Next, b 3 = m 4,j,k,l C:k l + m 4,j,k C = m 4,j,k,l C:k l + m 4,j,k C + m 4,j,k C + m 4,j,k C [ ( z 2k z 2jl E I 1 I ) ( 1 I j [ ( z 2k z 2jk E I 1 I ) ( 1 I j π j π j ) { ) t k ( 1 I k t 2 k π k ) t l ( 1 I l ( 1 I ) 2 k I k + V k π k πk 2 z 2k z 2jl t k t l π j π k π l E I [(I ) (I j π j ) (I k π k ) (I l π l )] z 2k z 2jk t 2 k π j π 2 k z 2k z 2jk V k π j π 2 k E I [ (I ) (I j π j ) (I k π k ) 2] E I [(I ) (I j π j ) (I k π k )] ) ] π l }] z 2k z 2jk V k π j π k E I [(I ) (I j π j )] (19) Each of the terms on the rght sde of (19) converges to zero as, followng the same boundng arguments as n Lemma 1. We omt the detals. The b 6 term 30

31 converges to zero by Lemma 1 and A6, and then the remanng cross-product terms go to zero usng the Cauchy-Schwarz nequalty. Proof of Theorem 1: By arkov s nequalty, t suffces to show that We wrte lm E p y t y t = 0. t y t y = t µ ( ) I 1 + ˆt t I + ( ˆµ µ π 1 I ). Then E p t y t y E p + { t µ ( ) ( I 1 + E p [ (ˆµ µ ) 2 ] [ E p E p ˆt t (1 π 1 I ) 2 ) 2 1/2 I ]} 1/2. (20) The frst term on the rght of (20) converges to zero as under A1 A6 and the fact that lm sup 1 (t µ ) 2 <, followng the argument of Theorem 1 n Robnson and Särndal (1983). Usng A6, A8, and the ndependence assumpton of the second-stage desgn, E p ( ˆt t ) 2 ( I = V I = 1 2 E II [ˆt t ] V 1 λ ) [ ] I V II (ˆt ) I + E I 2 π 2 V 0 as. Thus, the second term on the rght of (20) converges to zero as. Under A6, E p [ (1 π 1 I ) 2 ] = (1 ) π 2 1 λ. Combnng ths wth Lemma 2, the last term on the rght of (20) converges to zero as, and the theorem follows. 31

32 Proof of Theorem 2: Let ) ( t m 1/2 y t y = a + b + c where a = m 1/2 b = m 1/2 c = m 1/2 t µ δ ˆµ µ δ ˆt t ) 1, ( 1 I ), ( I I. Then ) 2 ( t y t [ ] [ ] [ ] y me p = E p a 2 + E p b 2 + E p c 2 + 2E p [a b ] + 2E p [a c ] + 2E p [b c ]. Usng equaton (15), E p [a 2 ] = m 2,j C (t µ )(t j µ j ) j π j π j + o(1), and by Lemma 3, E p [b 2 ] = m 2 E p,j C 0 as. ( (ˆµ µ δ)(ˆµ j µ δj) 1 I ) ( 1 I ) j π j Next, ( ] E p [c 2 = me p ˆt t I whch remans bounded by assumpton, and ) 2 = m 2 V 1 1 λ V, E p [a c ] = E I [a E II [c ]] = 0 because E II [ˆt ] = t for all C. The remanng cross-product terms converge to zero by the Cauchy-Schwarz nequalty, and hence the result s proved. 32

33 Proof of Theorem 3: We wrte me p ˆV( 1 t y ) ASE( 1 t y ) me p 1 2 +me p 1 2 +me p 1 2 +me p 1 2,j C,j C,j C,j C +me p 1 2 (t µ )(t j µ j ) j π j π j I I j π j j { } πj π j I I j 2(ˆt µ )(µ j ˆµ j ) + (µ ˆµ )(µ j ˆµ j ) π j j 2(t µ )(ˆt j t j ) j π j π j (ˆt t )(ˆt j t j ) j π j π j ˆV I 1 2 V = A + B + C + D + E. I I j j I I j j 1 2 ( 1 ) 1 V Now A 0 as by the proof of Theorem 3 n Bredt and Opsomer (2000). Next, B 2m E p mE p 2 + m E p 1 +me p 1 2 (ˆt µ )(µ ˆµ ) 1,j C: j I (ˆt µ )(µ j ˆµ j ) j π j π j (µ ˆµ ) 2 1,j C: j I (µ ˆµ )(µ j ˆµ j ) j π j π j ( 2m λ 2 + 2m max,j C: j j π j λ 2 λ ( m + λ 2 + m max,j C: j j π j λ 2 λ 0 as ) { ) E p I I j j I I j j [ V + (t µ ) 2] [ (µ ˆµ ) 2] E p [ (µ ˆµ ) 2] } 1/2 usng A5, A6, A8, A9, and Lemma 2. 33

34 For C, consder m 2 E p 1 2 (t µ )(ˆt j t j ) π 2 j π j I I j π,j C π j j = m 2 E p (t µ )(t k µ k )(ˆt t )(ˆt k t k ) 1 1 π k I I k 4 π,k C π k π k +2m 2 E p (t µ )(ˆt t )(t k µ k )(ˆt l t l ) 1 π kl π k π l I I k I l 4 π k,l C:k l π k π l π kl +m 2 E p (t µ )(ˆt j t j )(t k µ k )(ˆt l t l ) j π j π kl π k π l I I j I k I l 4 π,j C: j k,l C:k l π j π k π l j π kl = C 1 + C 2 + C 3. Here, C 1 = m 2 E I [ (t µ ) 2 V 4 ( ) ] 1 2 π I π 2 m 2 1 { 1 2 λ 4 (t µ ) 4 1 V 2 } 1/2 0 as by A5, A6, A8, and ndependence of the second-stage desgn, and C 3 = m 2 E I,j,k C: j,k j (t µ )(t k µ k )V j 4 (m max,j C: j j π j ) 2 λ 4 λ as V j π j π j π kj π k π j π k π j (t µ ) 2 I I j I k j π kj by A6, A8, A9, and the ndependence assumpton of the second-stage desgn. Then C 2 goes to zero as by the Cauchy-Schwarz nequalty, and t follows that C 0 as. Next, for D m 2 1 E p 2 (ˆt t )(ˆt j t j ) j π j π,j C π j = m 2 E II [(ˆt t ) 4 ( ] 1 π 4 π ) m I I j 1 j 2,k C: k ( 1 ) 1 V V k 1 2 V 1 π k π k k π k

35 +2 m2 4 m 2 2 λ m,j C: j ( ) 2 πj π j 1 V V j m2 π j j 4 E II [(ˆt t ) 4 ] m max,j C: j j π k λ as,k C V V k (m max,j C: j j π k ) 2 λ 4 λ 2 V 2 1 π k π k V 2 by A5, A6, A8, A9, and the ndependence assumpton of the second-stage desgn. Thus, D 0 as. Fnally, we consder E : m 2 E p ( 1 2 = m2 4 m2 1 2 λ ˆV I 1 0 as ) 2 2 V E II [ ˆV 2 ] 1 + m2 4,j C: j V V j j π j m2 4,j C E II [ ˆV 2] + m m max,j C: j j π j λ 2 V V j V 2 m2 2 1 V 2 under A5, A6, A8, and the ndependence assumpton of the second-stage desgn, and so E converges to zero as. The result s proved. 35

36 Study Varable σ h HT REG REG3 PS LPR0 KERN CDW lnear quadratc bump jump exponental cycle cycle Table 1: Rato of desgn SE of HT, REG, REG3, PS, LPR0, KERN, and CDW estmators to desgn SE of LPR1 estmator, based on 1000 replcatons of two-stage element samplng from a fnte populaton wth = 1000 clusters and N (random cluster sze) elements wthn each cluster. Sample sze of clusters s m = 100 and sample sze of elements wthn each cluster s n = 0.5N + 1. Nonparametrc estmators are computed wth bandwdth h and Epanechnkov kernel. 36

37 WEQ Transformed WEQ Tons/Acre/Yr REG4 LPR1( h= 3 ) sqrt( Tons/Acre/Yr ) REG4 LPR1( h= 3 ) sqrt( sze measure ) sqrt( sze measure ) USLE Transformed USLE Tons/Acre/Yr REG4 LPR1( h= 3 ) sqrt( Tons/Acre/Yr ) REG4 LPR1( h= 3 ) sqrt( sze measure ) sqrt( sze measure ) Fgure 1: Relatonshp between x = square root of sze measure of land wth eroson potental and estmated county total (ˆt ) n stage-one sampled countes for wnd eroson (WEQ) and water eroson (USLE), on both orgnal (left column) and square root (rght column) vertcal scales. Dashed curve s weghted lnear regresson ft (REG4) and sold curve s local lnear regresson ft (LPR1 wth h = 3). 37

38 WEQ USLE HT (49.3) (31.8) REG2 ν(x) x (50.7) (26.5) REG4 ν(x) x (50.1) (26.5) REG8 ν(x) x (50.3) (27.6) LPR1 h = (47.4) (24.4) LPR1 h = (48.8) (25.2) LPR1 h = (48.7) (27.6) Table 2: Horvtz-Thompson (HT), weghted lnear regresson (REG2, REG4, REG8), and local lnear regresson (LPR1 wth h = 1, 3, 5) estmates for wnd eroson (WEQ) and water eroson (USLE) totals n mllons of tons/acre/year. The numbers n parentheses are estmated standard errors. 38

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Efficient nonresponse weighting adjustment using estimated response probability

Efficient nonresponse weighting adjustment using estimated response probability Effcent nonresponse weghtng adjustment usng estmated response probablty Jae Kwang Km Department of Appled Statstcs, Yonse Unversty, Seoul, 120-749, KOREA Key Words: Regresson estmator, Propensty score,

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

A note on regression estimation with unknown population size

A note on regression estimation with unknown population size Statstcs Publcatons Statstcs 6-016 A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Statstcs Canada Jae Kwang Km Iowa State Unversty jkm@astate.edu Chrstan Olver Nambeu Statstcs

More information

Conditional and unconditional models in modelassisted estimation of finite population totals

Conditional and unconditional models in modelassisted estimation of finite population totals Unversty of Wollongong Research Onlne Faculty of Informatcs - Papers Archve) Faculty of Engneerng and Informaton Scences 2011 Condtonal and uncondtonal models n modelasssted estmaton of fnte populaton

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Nonparametric model calibration estimation in survey sampling

Nonparametric model calibration estimation in survey sampling Ames February 18, 004 Nonparametrc model calbraton estmaton n survey samplng M. Govanna Ranall Department of Statstcs, Colorado State Unversty (Jont work wth G.E. Montanar, Dpartmento d Scenze Statstche,

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

An (almost) unbiased estimator for the S-Gini index

An (almost) unbiased estimator for the S-Gini index An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for

More information

Small Area Estimation for Business Surveys

Small Area Estimation for Business Surveys ASA Secton on Survey Research Methods Small Area Estmaton for Busness Surveys Hukum Chandra Southampton Statstcal Scences Research Insttute, Unversty of Southampton Hghfeld, Southampton-SO17 1BJ, U.K.

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling Open Journal of Statstcs, 0,, 300-304 ttp://dx.do.org/0.436/ojs.0.3036 Publsed Onlne July 0 (ttp://www.scrp.org/journal/ojs) Multvarate Rato Estmator of te Populaton Total under Stratfed Random Samplng

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

REPLICATION VARIANCE ESTIMATION UNDER TWO-PHASE SAMPLING IN THE PRESENCE OF NON-RESPONSE

REPLICATION VARIANCE ESTIMATION UNDER TWO-PHASE SAMPLING IN THE PRESENCE OF NON-RESPONSE STATISTICA, anno LXXIV, n. 3, 2014 REPLICATION VARIANCE ESTIMATION UNDER TWO-PHASE SAMPLING IN THE PRESENCE OF NON-RESPONSE Muqaddas Javed 1 Natonal College of Busness Admnstraton and Economcs, Lahore,

More information

A Comparative Study for Estimation Parameters in Panel Data Model

A Comparative Study for Estimation Parameters in Panel Data Model A Comparatve Study for Estmaton Parameters n Panel Data Model Ahmed H. Youssef and Mohamed R. Abonazel hs paper examnes the panel data models when the regresson coeffcents are fxed random and mxed and

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH) Chapter 1 Samplng wth Unequal Probabltes Notaton: Populaton element: 1 2 N varable of nterest Y : y1 y2 y N Let s be a sample of elements drawn by a gven samplng method. In other words, s s a subset of

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity ECON 48 / WH Hong Heteroskedastcty. Consequences of Heteroskedastcty for OLS Assumpton MLR. 5: Homoskedastcty var ( u x ) = σ Now we relax ths assumpton and allow that the error varance depends on the

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

STAT 511 FINAL EXAM NAME Spring 2001

STAT 511 FINAL EXAM NAME Spring 2001 STAT 5 FINAL EXAM NAME Sprng Instructons: Ths s a closed book exam. No notes or books are allowed. ou may use a calculator but you are not allowed to store notes or formulas n the calculator. Please wrte

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Bias-correction under a semi-parametric model for small area estimation

Bias-correction under a semi-parametric model for small area estimation Bas-correcton under a sem-parametrc model for small area estmaton Laura Dumtrescu, Vctora Unversty of Wellngton jont work wth J. N. K. Rao, Carleton Unversty ICORS 2017 Workshop on Robust Inference for

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT Malaysan Journal of Mathematcal Scences 8(S): 37-44 (2014) Specal Issue: Internatonal Conference on Mathematcal Scences and Statstcs 2013 (ICMSS2013) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

DUE: WEDS FEB 21ST 2018

DUE: WEDS FEB 21ST 2018 HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Exponential Type Product Estimator for Finite Population Mean with Information on Auxiliary Attribute

Exponential Type Product Estimator for Finite Population Mean with Information on Auxiliary Attribute Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 193-9466 Vol. 10, Issue 1 (June 015), pp. 106-113 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) Exponental Tpe Product Estmator

More information

Properties of Least Squares

Properties of Least Squares Week 3 3.1 Smple Lnear Regresson Model 3. Propertes of Least Squares Estmators Y Y β 1 + β X + u weekly famly expendtures X weekly famly ncome For a gven level of x, the expected level of food expendtures

More information

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING Samplng heory MODULE VII LECURE - 3 VARYIG PROBABILIY SAMPLIG DR. SHALABH DEPARME OF MAHEMAICS AD SAISICS IDIA ISIUE OF ECHOLOGY KAPUR he smple random samplng scheme provdes a random sample where every

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

A General Class of Selection Procedures and Modified Murthy Estimator

A General Class of Selection Procedures and Modified Murthy Estimator ISS 684-8403 Journal of Statstcs Volume 4, 007,. 3-9 A General Class of Selecton Procedures and Modfed Murthy Estmator Abdul Bast and Muhammad Qasar Shahbaz Abstract A new selecton rocedure for unequal

More information

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10) I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrcs of Panel Data Jakub Mućk Meetng # 8 Jakub Mućk Econometrcs of Panel Data Meetng # 8 1 / 17 Outlne 1 Heterogenety n the slope coeffcents 2 Seemngly Unrelated Regresson (SUR) 3 Swamy s random

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Lena Boneva and Oliver Linton. January 2017

Lena Boneva and Oliver Linton. January 2017 Appendx to Staff Workng Paper No. 640 A dscrete choce model for large heterogeneous panels wth nteractve fxed effects wth an applcaton to the determnants of corporate bond ssuance Lena Boneva and Olver

More information

Explaining the Stein Paradox

Explaining the Stein Paradox Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten

More information

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

Factor models with many assets: strong factors, weak factors, and the two-pass procedure Factor models wth many assets: strong factors, weak factors, and the two-pass procedure Stanslav Anatolyev 1 Anna Mkusheva 2 1 CERGE-EI and NES 2 MIT December 2017 Stanslav Anatolyev and Anna Mkusheva

More information

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information