Model Based Direct Estimation of Small Area Distributions
|
|
- Bridget Caldwell
- 5 years ago
- Views:
Transcription
1 Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 2010 Model Based Drect Estmaton of Small Area Dstrbutons Ncola Salvat Unversty of Psa, Italy Hukum Chandra Unversty of Wollongong, Ray Chambers Unversty of Wollongong, Recommended Ctaton Salvat, Ncola; Chandra, Hukum; and Chambers, Ray, Model Based Drect Estmaton of Small Area Dstrbutons, Centre for Statstcal and Survey Methodology, Unversty of Wollongong, Workng Paper 20-10, 2010, 28p. Research Onlne s the open access nsttutonal repostory for the Unversty of Wollongong. For further nformaton contact the UOW Lbrary: research-pubs@uow.edu.au
2 Centre for Statstcal and Survey Methodology The Unversty of Wollongong Workng Paper Model Based Drect Estmaton of Small Area Dstrbutons Ncola Salvat, Hukum Chandra and Ray Chambers Copyrght 2008 by the Centre for Statstcal & Survey Methodology, UOW. Work n progress, no part of ths paper may be reproduced wthout permsson from the Centre. Centre for Statstcal & Survey Methodology, Unversty of Wollongong, Wollongong NSW Phone , Fax Emal: anca@uow.edu.au
3 Model Based Drect Estmaton of Small Area Dstrbutons Ncola Salvat 1, Hukum Chandra 2 and Ray Chambers 3 1 Dpartmento d Statstca e Matematca Applcata all'economa, Unversty of Psa, Italy, E-mal: salvat@ec.unp.t 2 Centre for Statstcal and Survey Methodology, Unversty of Wollongong, Wollongong, Australa. E-mal: hchandra@uow.edu.au 3 Centre for Statstcal and Survey Methodology, Unversty of Wollongong, Wollongong, Australa. Emal: ray@uow.edu.au Summary Much of the small area estmaton lterature focuses on populaton totals and means. However, users of survey data are often nterested n the fnte populaton dstrbuton of a survey varable, and the measures (e.g. medans, quartles, percentles) that characterse the shape of ths dstrbuton at small area level. In ths paper we propose a model-based drect estmator (MBDE, see Chandra and Chambers, 2009) of the small area dstrbuton functon. The MBDE s defned as weghted sum of sample data from the area of nterest, wth weghts derved from the calbrated splne-based estmate of the fnte populaton dstrbuton functon ntroduced by Harms and Duchesne (2006), under an approprately specfed regresson model wth random area effects. We also dscuss the mean squared error estmaton of the MBDE. Monte Carlo smulatons based on both smulated and real datasets show that the proposed MBDE and ts assocated mean squared error estmator perform well when compared wth alternatve estmators of the area-specfc fnte populaton dstrbuton functon. Key words: Indcator functon; Model-based drect estmator; Mean squared error estmator; Smulaton experments. 1
4 { } 1. Introducton Let U = 1,2,..., N be the fnte populaton of sze N and let y denote a varable of nterest that takes values over ths populaton. A common target of nference s then the proporton of values y j that are bounded by a gven constant (e.g. the proporton of households whose monthly per capta expendture s below the poverty lne). More generally, the target of nference s the value of the fnte populaton dstrbuton functon for a varable y at a specfed value t. Ths s N F N (t) = N 1 I(y j t),.e. the proporton of the populaton j =1 whose values for y are less than or equal to t, where I(y j t) s the ndcator functon that takes the value 1 f y j t and 0 otherwse and t s a specfed constant. Clearly, once we obtan an estmator of the fnte populaton dstrbuton functon, we can evaluate ts nverse to obtan the assocated estmator of the fnte populaton quantle functon. See Chambers and Dunstan (1986), Rao et al. (1990), Harms and Duchesne (2006) and Rueda et al. (2007, 2010). Small area estmaton (SAE) s an mportant objectve of many surveys. Small areas or small domans are subsets of the populaton wth small sample szes, so standard survey estmaton methods for these areas, whch only use nformaton from the small area samples, are unrelable. In ths context SAE methods that borrow strength va statstcal models (Rao, 2003) can be used to produce relable small area estmates. However, vrtually all of these methods focus on estmaton of lnear parameters, e.g. small area means or totals. In ths paper we focus on estmaton of the small area dstrbuton of a study varable and measures (e.g. medans, quartles, percentles) that characterse the shape of ths dstrbuton. Ths s especally useful f there are extreme values n the small area sample data, or f the small area dstrbuton of the varable of nterest s hghly skewed (Tzavds et al., 2010). 2
5 We propose a model based drect estmator (MBDE) for the small area dstrbuton functon, extendng the MBDE approach (Chandra and Chambers, 2009) to the estmaton of the small area dstrbuton functon. Ths MBDE estmator s a weghted sum of the sample data from the small area of nterest, wth weghts that are derved from a splne-based calbrated estmator of the populaton dstrbuton functon (Harms and Duchesne, 2006) under a regresson model wth random area effects. The rest of the artcle s organzed as follows. The followng Secton descrbes SAE based on the lnear mxed model and the nonparametrc regresson model based on penalzed splnes and then uses these models to motvate estmators of the small area dstrbuton functon. Secton 3 ntroduces the concept of calbrated sample weghts for a fnte populaton dstrbuton functon and uses these to defne the MBDE estmator for ths functon. A basrobust estmator of the mean squared error of the MBDE s also developed, based on the approach of Chambers et al. (2009). The emprcal performances of the proposed MBDE as well as alternatve estmators of the small area dstrbuton functon are evaluated n Secton 4, usng both model-based and desgn-based smulatons, wth the desgn-based smulatons based on two real data sets. Concludng remarks are set out n Secton Estmaton of the Small Area Dstrbuton Functon We assume that a fnte populaton U contanng N unts can be parttoned nto A nonoverlappng domans, referred to from now on as small areas, or smply areas, ndexed by N A N =1 = 1,..., A, wth area contanng unts, so N =. Let denote the value of the y j varable of nterest y for unt j ( j = 1,, N ) n area ( = 1,, A ). The area-specfc dstrbuton functon of y for area s N 1 F (t) = N I(y j t). (1) j =1 3
6 Let s denotes a sample of n unts drawn from U by some specfed samplng desgn, and assume that values of the varable of nterest y are avalable for each of these n sample unts. The non-sample component of U, contanng N - n unts, s denoted by r. In what follows, we use a subscrpt of to denote quanttes specfc to area ( = 1,..., A). For example, s and r denote the n sample and N n non-sample unts respectvely for area. Wth ths notaton, the conventonal estmators of the area dstrbuton functon, F (t), are the Horvtz- Thompson (HT) estmator ˆF HT 1 (t) = N π 1 j I(y j t), (2) j s and the Hajek estmator ˆF Hajek (t) = π 1 j I(y j t) j s 1 π j s j. (3) Here π j denotes the sample ncluson probablty of unt j. Both (2) and (3) are area-specfc desgn-based drect estmators and do not depend on an assumed model for ther valdty (Cochran, 1977). Unfortunately, emprcal evdence presented n Rueda et al. (2007) shows that these estmators can be substantally based, whle the fact that they only use nformaton from the area sample makes them too unstable for SAE. Model-based small area estmators based on the lnear mxed model are wdely used n SAE. However, f the functonal form of the regresson relatonshp between the varable of nterest and the avalable auxlary varables s unknown or has a complcated functonal form, then SAE based on the use of a nonparametrc regresson model can offer sgnfcant advantages compared wth one based on a lnear model. In partcular, a nonparametrc regresson model based on p-splnes s attractve because t represents a relatvely straghtforward extenson of a lnear regresson model (Elers and Marx, 1996). Opsomer et al. (2008) descrbe the use of a splne-based nonparametrc regresson model for SAE. See also Salvat et al. (2010). In the rest of ths Secton we therefore summarze the model-based 4
7 approach to estmaton of the small area dstrbuton functon under the lnear mxed model and under a nonparametrc regresson model. 2.1 Estmaton under the lnear mxed model SAE theory for ths case s now well establshed, see Rao (2003). We brefly descrbe t below snce ths allows us to ntroduce notaton that wll be used elsewhere n the paper. To start, we note that throughout ths paper we wll assume that we have access to the populaton values of p auxlary scalar varables that are, to a greater or lesser extent, correlated wth y. Let x j denote the vector of values of these auxlary varables that are assocated wth y j and let z j denote a vector of auxlary contextual varables whose values are known for all unts n the populaton. Let y U, X U and Z U denote the populaton level vector and matrces defned by y j, x j and z j, respectvely. Then the lnear mxed model s y U = X U β + Z U u + e U, (4) where β s a p vector of regresson coeffcents, u s a random vector of area effects and e U s a populaton N-vector of random ndvdual effects. In general, area effects are vectorvalued, so ( 1 2 A ) u T = u T, u T, u T and Z = dag { Z ; = 1,, A}, where ndexes the A areas U that make up the populaton and N Z s of dmenson q. The area effects { u ; 1,, A } = are assumed to be ndependent and dentcally dstrbuted realsatons of a random vector of dmenson q wth zero mean and covarance matrx Σ u. Smlarly, the scalar ndvdual effects makng up e U are assumed to be ndependent and dentcally dstrbuted realsatons of a random varable wth zero mean and varance σ e 2, wth area and ndvdual effects mutually ndependent. The covarance matrx of the vector s then y U Var(y U ) = I k V U = Z U Σ u Z T U + σ 2 e I N, where denotes the dentty matrx of dmenson k. The parameters θ = (Σ u,σ 2 e ) are typcally referred to as the varance components of (4). 5
8 We also assume throughout ths paper that the method of samplng s non-nformatve gven the auxlary varables, so the model (4) holds for both sampled and non-sampled populaton unts. Consequently, we can partton y U, X U, Z U and e U nto components defned by the n sampled and N n non-sampled populaton unts, denoted by subscrpts of s and r respectvely, and re-express (4) as follows: y U = y s y r = X s X r β + Z s Z r u + e s e r, wth the varance of y smlarly parttoned, V U = V ss V rs V sr V rr. Thus X s represents the matrx defned by the n sample values of the auxlary varable vector, whle 2 { ; 1,, } { T σ ; 1,, } V = dag V = A = dag Z Σ Z + I = A ss ss s u s e s and { ; 1,, } { T ; 1,, } V = dag V = A = dag Z Σ Z = A. sr sr s u r Here Z s and Z r respectvely denote the restrcton of Z to sampled and non-sampled unts n area. The dstrbuton functon for small area gven by (1) can be expressed as 1{ j s j r } F ( t ) = N I ( y t ) + I ( y t) j j y j, where the frst term on the left s known and the second s unknown. The problem of estmatng F (t) therefore reduces to predctng the values for the non-sample unts n area. Gven estmated values ˆθ = ( ˆΣ u, ˆσ 2 e ) of the varance components we can defne the estmated covarance matrx ˆV U, and the predcted ŷ EBLUP T values of are j = x ˆβEBLUE j + z jt û EBLU P, where ˆβ EBLUE T X s ) 1 T =(X ˆV 1 X s ˆVss 1 s ss y s s the y j 6
9 emprcal best lnear unbased estmator (EBLUE) of β and û EBLUP = ˆΣ T u Z s ˆVss 1 (y s X s ˆβ) s the emprcal best lnear unbased estmator (EBLUP) of u. Substtutng estmated values for the parameters of (4) then allows us to defne an estmator for F (t) of the form EBLUP 1{ ( ˆ ) j s I j r j t } ˆ EBP F () t = N I( y t) + y. (5) j We refer to (5) as the emprcal best predctor or EBP. An alternatve way of predctng F ( t) s va the Chambers and Dunstan (hereafter CD) estmator. See Chambers and Dunstan (1986) for detals. Snce the wthn area resduals are homoskedastc under (4), the CD estmator of F (t) can be wrtten EBLUP { ( ) j s ˆ ˆEBLUP j r k s y k k { + y y t}} ˆ CD 1 1 F () t = N I( yj t) + n I j. (6) Note that the CD estmator s asymptotcally unbased f (4) s correctly specfed. 2.2 Estmaton under a nonparametrc mxed model The CD estmator (6) wll be based f the functonal form of the relatonshp between the response varable and the auxlary varables (.e. the regresson functon) s not lnear or the varance term n the regresson model s msspecfed (Tzavds et al., 2010). Ths susceptblty of parametrc model-based methods to msspecfcaton bas provdes motvaton for the use of alternatve non-parametrc model-based methods. We now summarze applcaton of the p-splne nonparametrc regresson model to SAE (Opsomer et al., 2008), and, for smplcty, consder the unvarate case. The underlyng regresson model s then y = m( x ) + e, where are ndependent random varables wth zero means. The j j j e j functon m( x) s unknown and assumed to be approxmated suffcently well by b mx (, βγ, ) = β + β x+ + β x + γ ( x κ ) b, (7) 0 1 p K k = 1 k k + 7
10 b b where b s the degree of the splne, ( c) = c I( c > + b), κ k s a set of fxed constants called knots for k = 1,..., K, β = (β 0,...,β p )T s the coeffcent vector of the parametrc part of the model and γ = (γ 1,...,γ K ) T s the vector of splne coeffcents. The approxmatng functon m(x,β,γ ) n (7) uses truncated polynomal bass functons for smplcty and, f the number of knots K s suffcently large, can approxmate most smooth functons. Ruppert et al. (2003, Chapter 5) suggest the use of a knot for every four observatons, up to a maxmum of about 40 knots for a unvarate applcaton. Usng a large number of knots n (7) can lead to an unstable ft. In order to overcome ths problem, an upper lmt s usually mposed on the sze of the splne coeffcent vector γ. Estmatng β and γ by mnmzng the squared devatons of model (7) from the actual data values subject to ths constrant s equvalent to mnmzng the penalzed loss functon ( (,, )) 2 j j T y m x β γ + λγ γ. (8) j Here λ s a Lagrange multpler that controls the level of smoothness of the resultng ft. Wand (2003) and Ruppert et al. (2003, Chapter 4) note the equvalence between mnmzng (8) and maxmzng the lkelhood of the response varable under the lnear model (7) where the splne coeffcents are treated as random effects. In partcular, let y U = ( y 1, y 2,..., y N ) T, X U b 1 x1 x 1 = b 1 xn x N and p b ( x1 κ1) + ( x1 κk ) + Δ U =. p b ( xn κ1) + ( xn κk) + The splne approxmaton (7) can then be wrtten as the lnear mxed model y U = X U β + Δ U γ + e U, (9) where γ and e are now assumed to be ndependent Gaussan random vectors of dmenson K and N respectvely. In partcular, t s assumed that 8
11 γ ~ N(0,σ 2 γ I K ) and e U ~ N(0,σ 2 e I N ). Opsomer et al. (2008) adapt p-splnes to the SAE context by addng area random effects to (9), whch then becomes y U = X U β + Δ U γ + Z U u + e U, (10) where, as n Secton 2.1, Z = ( Z,, Z ) T U 1 N s a matrx of known covarates of dmenson N A charactersng dfferences among the areas and u s the A-vector of random area effects. In the smplest case, Z U s gven by a matrx whose -th column, for = 1,, A, s an ndcator varable that takes the value 1 f a unt s n area and s zero otherwse. It s assumed that the area effects are dstrbuted ndependently of the splne effects γ and the ndvdual effects e, wth u ~ N(0, Σ u ), so that the covarance matrx of the vector s y U Var(y Z U Σ u Z T U + σ 2 U ) = V = σ 2 γ Δ U Δ T U + e I N. The varance components of (10) are then gven by 2 2 ( γ, u, e ) θ = σ Σ σ. Note that, as n prevous Secton, the use of non-nformatve samplng gven the auxlary varables means that (10) also holds at the sample level. When the varance components are known, well-establshed theory (McCulloch and Searle, 2001, Chapter 9) leads to the generalsed least squares estmator of β,.e. ˆβ =(X T s V 1 ss X s ) 1 X T s V 1 ss y s, and the best lnear unbased predctors (BLUPs) for γ and u,.e. ˆγ =σ γ 2 Δ s T V ss 1 (y s X s ˆβ) and û = Σ u Z T s V 1 ss (y s X s ˆβ). In practce, the varance components are unknown and must be estmated from sample data usng methods such as maxmum lkelhood or restrcted maxmum lkelhood; see Harvlle (1977). In what follows we use 2 2 ( ˆ σ, ˆ, ˆ γ Σu σ e ) to denote such estmates, allowng us to defne the plug-n estmator ˆV ss = ˆσ γ 2 Δ s Δ s T + Z s ˆΣu Z s T + ˆσ 2 e I n, where I n s the dentty matrx of order n. Ths leads to the nonparametrc model-based EBLUE for β,, and to the ˆβ NPEBLUE =(X s T ˆVss 1 X s ) 1 X s T ˆVss 1 y s 9
12 correspondng nonparametrc EBLUPs (NPEBLUPs) for the splne and area effects n (10), ˆγ NPEBLUP = ˆσ γ 2 Δ s T ˆVss 1 (y s X s ˆβ NPEBLUE ) and û NPEBLUP = ˆΣ T u Z s ˆVss 1 (y s X s ˆβ NPEBLUE ). Under (10), the nonparametrc emprcal best predctor of the dstrbuton functon for area (denoted by NPEBP) s 1 { ˆ NPEBP F () t = N NPEBLUP I( y t) + I( yˆ t) j s j ŷ NPEBLUP T j = x j ˆβ NPEBLUE + δ jt ˆγ NP EBLUP + z jt û NPEBLUP T where, and x j, and denote j r j δ j T }, (11) z j T respectvely the rows of, and that correspond to unt j n area. Smlarly, under X U Δ U Z U (10), the nonparametrc verson of the CD estmator of the dstrbuton functon for area s { } ˆ NPCD 1 1 NPEBLUP NPEBLUP F () t = N I( y ) n ˆ ( ˆ I yj yk y j t + + k ) t j s j r k s. (12) 3. The Model-Based Drect Estmator for the Small Area Dstrbuton Functon A drect estmate for a small area s smple to nterpret, snce the estmated value of the varable of nterest for the area s just a weghted average of the sample data from the same area. Ths s not true of an ndrect estmator lke the EBLUP, whch s a weghted sum over the entre sample. Unfortunately, when weghts are the nverses of sample ncluson probabltes, conventonal drect estmators lke (2) and (3) can be qute neffcent. The Model-Based Drect Estmator (MBDE) of a small area mean mproves upon the effcency of these conventonal drect estmators by usng the weghts that defne the EBLUP for the populaton total under a model wth random area effects. See Chandra and Chambers (2009) and Salvat et al. (2010). MBDEs for the populaton mean of y usng weghts based on the lnear model (4) as well as those based on the non-parametrc model (10) are therefore possble. However, the fnte populaton dstrbuton functon s the populaton mean of an ndcator varable, whch does not satsfy ether (4) or (10). Consequently, 'standard' EBLUP 10
13 weghts are not approprate for defnng the MBDE of ths functon. Instead, we use sample weghts that are calbrated to the known fnte populaton dstrbuton of the auxlary varables n x and are based on a model wth random area effects. For smplcty, we restrct our dscusson below to a sngle scalar covarate x, notng that the extenson to multple scalar covarates s straghtforward. The calbrated estmator of a fnte populaton dstrbuton functon F N (t) was defned n Harms and Duchesne (2006) as a weghted emprcal dstrbuton functon ˆF HD N (t) = N 1 w j I(y j t) (13) j s where the sample weghts w n (13) are calbrated to the known fnte populaton j dstrbuton of x. In partcular, let 0< α < α < < αk < denote an ordered set of constants. Then the weghts used n (13) sum to N and, for k = 1,, K, also satsfy { ( α )} = α, (14) wi x Q N j s j j x k k where Q x (α k ) s the known α k -quantle of the fnte populaton dstrbuton of x. That s, the weghts used n (13) are calbrated to both the populaton sze N and to the populaton totals of the auxlary varables defned by the ndcators I{ xj Qx( αk) }. Standard results from calbraton theory (Devlle and Särndal, 1992; Chambers, 1996) can be used to show that f these calbrated weghts w j are then chosen to mnmse ther chsquare dstance from the weghts used n Horvtz-Thompson estmator (2), as s commonly done, then (13) s a regresson estmator of F N (t) under the lnear model { } K I( y t) = β + β I x Q ( α ) + jt ε, (15) j 0t kt j x k k = 1 where the ε jt are uncorrelated errors wth zero expectaton and varance 2 σ εt (Chambers, 2005). However, (15) s also easly seen to be a p-splne model wth knots at the α k -th 11
14 quantles of the fnte populaton dstrbuton of x. That s, ˆF HD N (t) s actually a p-splne estmator of F N (t). Defne g jk I{ xj Qx ( αk )} = and let = ( g ; j = 1,..., N) g Uk jk be the correspondng populaton N-vector, so G = [ 1, g,, g ] U N U1 UK denotes the populaton level matrx of values of these varables, where 1 denotes a N-vector of ones. Also, defne d jt = I(y j t) and put d Ut equal to the N-vector of populaton values of the d jt. The populaton level verson of model (15) s then N d Ut = G U β t + ε Ut. (16) Gven the approprate sample and non-sample components of d, and the covarance Ut G U matrx V Ut = σ 2 I εt of ε DF, the vector of sample weghts w that defne the EBLUP of the U Ut jt populaton total of the d jt under (16) s then DF ( w j ) DF T T T 1 ; ˆ ( ˆ T T ) ( ) ˆ w ˆ st = jt s = 1n + Hst GU 1N Gs 1n + In Hstgs VsstVsrt1 N n, (17) where ˆ T ( ˆ st = Gs VsstGs ) H 1 1 T ˆ 1 Gs Vsst. Under (16), ˆVsst = ˆσ 2 εt I n and ˆVsrt = 0, so these weghts smplfy to T T T ( wj ; j ) = n +Gs( GsGs) ( GU1N Gs1n) = 1n + G ( G ) DF DF 1 T 1 T s = s 1 s s s N w G G n 1. N n The model (16) s easly adapted to small area estmaton by ncludng random area effects. That s, we replace (16) by d Ut = G U β t + Z U u t + ε Ut (18) where Z U was defned followng (4) and u t ~ N(0, Ω t ) s an A-vector of random area effects. As usual, we assume that u t and ε Ut are ndependently dstrbuted, so that Var(d T Ut ) = V Ut = Z U Ω t Ζ + σ U 2 εti N DF. The sample weghts w that defne the EBLUP of the jt populaton total of the d under (18) are then stll gven by (17), but now wth jt 12
15 ˆV sst = Z s ˆΩ t Z T s + ˆσ 2 εt I n varance components of (18). 0< 1 < 2 < < K T 2 and ˆV srt = Z s ˆΩ t Z r, where ˆΩ t and ˆσ εt are the estmated values of the In practce, one frst needs to decde on the calbraton constrants (14) before (18) can be ftted and (17) calculated. Ths n turn requres that one has chosen the values α α α < 1. We adapt the ordered half-sample cross valdaton procedure descrbed n Chambers (2005) for ths purpose. In partcular, we fx K = 1 and then search for α t opt the value that maxmses the concordance between the sample values of and the d jt sample values of j = { x ( ) j Qx } g I α. The steps n ths procedure are as follows: 1. Order the sample x-values: x (1), x (2), x(3),..., x (n 1), x (n) ; 2. Create two sets E = { x(1), x (3),...} and = {,,... (2) (4) } V x x ; 3. For gven α and t, ft the model (18) and then compute the weghts (17), treatng E as the 'sample' and V as the 'nonsample'. Denote the correspondng value of (13) based on these weghts by ˆF HD(n) N (t,α); 4. The optmal value α t opt then satsfes ( ( n) 1 { t } mn { FN n I( yj t) j s } HD n) opt Fˆ 1 N (, t t ) n I( yj ) = ˆ HD (, t ) j s 2 2 α α. 0< α < 1 We note that although ths procedure only dentfes a sngle 'most concordant' calbraton constrant to use n (14), there s nothng to stop t beng extended to dentfcaton of multple calbraton constrants. However, some care must then be taken to ensure that the resultng values of Q x ( α) are separated suffcently n the nterval spanned by the sample values of the auxlary x. Falure to do ths could result n the sample desgn matrx defned by (18) not beng of full rank. Fnally, gven the weghts (17), we wrte down the MBDE for the area dstrbuton functon F (t) as 13
16 ˆF MBDE (t) = w DF jt I(y j t) j s. (19) DF w j s jt We refer (19) as a drect estmator because t s a weghted average of the sample data from the area of nterest. However, ths does not mean that t can be calculated from these data alone. The weghts (17) are a functon of the data from the entre sample. That s, they borrow strength from other areas va the model (18). It should also be ponted out that snce the weghts (17) depend on t, there s no guarantee that (19) defnes a monotone functon of t,.e. one where t 1 < t 2 mples ˆF MBDE (t 1 ) ˆF MBDE (t 2 ). Ths ssue wll usually not be relevant when one wshes to estmate the dstrbuton of nterest at ponts that are well separated, but can be a problem when the am s to nvert (19) as a functon of t n order to estmate quantles. In such a stuaton we recommend that (19) be frst transformed to be monotone n t, e.g. usng the approach descrbed n He (1997). 3.1 Mean squared error estmaton for the MBDE A bas-robust estmator of the mean squared error (MSE) of the MBDE s descrbed n Chandra and Chambers (2009), see also Chambers et al. (2009), and we use ths approach here to defne a correspondng MSE estmator for (19). Ths s the estmator { MBDE 2 ()} t t Mˆ Fˆ t = Vˆ + Bˆ (20) where ˆV t s a heteroskedastcty-robust estmator of the condtonal predcton varance of MBDE ˆF (t) (Royall and Cumberland, 1978), ˆBt s an estmator of the correspondng condtonal predcton bas, and the condtonng s wth respect to the value of the area effect. In partcular, we use DF 2 {( ) } V ˆ N N w 1 ( N n ) n ( d ˆ μ 2 = + ), (21) 2 ( ) 1 t j s jt jt jt 14
17 where w jt DF() = w jt DF and ˆμ jt s an unbased lnear estmator of the condtonal DF w k s kt expected value μ jt = E(d jt g j,u t ). Chambers et al. (2009) recommend that ˆμ jt be computed as the unshrunken verson of the EBLUP for μ jt,.e. 1 ( ) ( ) T ˆ ˆ T T T ˆ T T jt = 0t + g j 1t + j s s s s st s n ˆ μ β β z Z Z Z I H g l. For the condtonal bas of the MBDE, we use a smple plug-n estmator of the form DF() 1 ˆB t = w jt ˆμ jt N ˆμ jt. (22) j s Note that the MSE estmator (20) gnores the extra varablty assocated wth estmaton of the varance components, and s therefore a heteroskedastcty-robust frst order approxmaton to the actual condtonal MSE of the MBDE. Also, (20) treats the weghts (17) as fxed,.e. t gnores the contrbuton to the MSE from the estmated varance components. Chambers et al. (2009) refer to ths as a pseudo-lnearzaton assumpton snce for large overall sample szes the contrbuton to the overall MSE of (19) arsng from the varablty of varance components wll be of smaller order of magntude then the fxed weghts predcton varance estmated by (21). However, the extent of ths underestmaton wll depend on the small area sample szes and the characterstcs of the populaton of nterest, partcularly the strength of the small area effects. Fnally, we note that (22) s a conservatve estmator of the j U squared bas, snce ˆ2 ( ) ( ˆ 2 t t ) ( ˆ t ) E B = Var B + E B. However, the extent of ths overestmaton s typcally very small. 4. Emprcal Evaluatons In ths Secton we report the results from model-based and desgn-based smulaton studes that llustrate the performance of the dfferent estmators of the small area dstrbuton functon defned n the precedng two Sectons. These estmators are set out n Table 1. Ther 15
18 performance n the smulaton studes s evaluated by computng for each small area the absolute relatve bas (ARB), the relatve root mean squared error (RRMSE) and coverage rate (CR) of nomnal 95 per cent confdence ntervals defned as follows: { } ( ) 1 R 1 1 R r= 1 r= 1( ), ARB = R F R F ˆ F 100 r r r 1 R 1 R ( r= 1 ) r= 1( ) RRMSE R F 1 R F 2 ˆ F = r r r 100, and ( ) R 1 CR = I Fˆ F 2 Mˆ 100. r r r R r = 1 Here R denotes the number of smulatons, F r denotes the true value of the area dstrbuton ˆF r functon at smulaton r, denotes an estmate of ths value, and denotes an estmate of 1 R the MSE of ˆF r. The value of the true MSE for ˆFr s calculated as R ( Fˆ r F ˆM r r= 1 r ) 2. Note that n the desgn-based smulatons F r = F. 4.1 Model-based smulatons In the model-based smulatons we set A = 30 and use two types of models to generate the populaton values of y. The frst s a lnear model, y j = x j + u + e j, where x j ~ χ 2 (20), j = 1,..., N and = 1,..., A, wth random area effects are generated as ( ) ( 94.09) ndependent realzatons from a N 0, dstrbuton and e j dstrbuted as N 0,, u correspondng to an ntra-area correlaton of σ u 2 σ 2 2 ( u + σ ε )= 0.2. Smulatons based on ths model are referred to as set 1 smulatons. The second model s a multplcatve model, y j = 5x β j u e j, where the values of x j are ndependently drawn from the lognormal dstrbuton log( x ) N j ( 2 6, σ x ), and the ndvdual effects and area effects are ndependently 2 2 drawn as log( ej ) N( 0,σ e ) and log( u) N( 0, σ u ) respectvely. We use two sets of parameters for ths model, defned by β (1 or 2), σ u (0.4 or 0.6), σ e (0.7 or 1.0) and σ x (
19 or 1.20). These are referred to from now on as set 2a and set 2b. Data values for y generated under set 2a are almost lnear n x whle those generated under set 2b are qute non-lnear n x. The small area populaton szes are randomly drawn from a unform dstrbuton on N [450,550] and kept fxed over the smulatons. The small area sample szes n are determned by frst selectng a smple random sample of sze n =600 from the populaton and notng the resultng sample szes n each small area. These area specfc sample szes n are then fxed n the smulatons by treatng the small areas as strata and carryng out stratfed random samplng. A total of R = 1000 smulatons are then carred out for each combnaton of model and ndvdual error dstrbuton, wth each smulaton correspondng to frst generatng the populaton values and then drawng a sample. The average ARB values and the average RRMSE values of the dfferent small area dstrbuton functon estmators are shown n Table 2 and 3 respectvely. These values are n percentage terms, and the averages are over the 30 small areas. All estmators are evaluated at the 0.1, 0.25, 0.5, 0.75 and 0.9 quantles of y. 4.2 Desgn-based smulatons The desgn-based smulatons are based on two real survey data sets. The frst survey data set s based on data collected n the Australan Agrcultural Grazng Industry Survey (AAGIS) conducted by the Australan Bureau of Agrcultural and Resource Economcs. In the orgnal sample there were 759 farms from 12 regons (the small areas of nterest), whch make up the wheat-sheep zone for Australan broadacre agrculture. We used these sample data to generate a synthetc populaton of sze N = 39,562 farms by re-samplng the orgnal AAGIS sample of n = 759 farms wth probablty proportonal to a farm s sample weght. Ths fxed populaton was then repeatedly sampled usng stratfed random samplng wth regons correspondng to strata and wth stratum sample szes the same as n the orgnal sample. The varable of nterest s total cash costs (TCC) and the auxlary varable s land area. Based on the orgnal AAGIS sample data, the ft of the lnear mxed model (AIC = 17
20 ) and the ft of the nonparametrc p-splne regresson model (AIC = ) were essentally the same, ndcatng that addton of the nonparametrc splne component does not mprove the ft of the mxed model. We therefore do not expect to see much dfference between the dstrbuton functon estmates generated by these two models. The am s to estmate the values of the regonal dstrbuton functons at the 0.1, 0.25, 0.5, 0.75 and 0.9 quantles of the fnte populaton dstrbuton of TCC. The data for the second desgn-based smulaton come from the Envronmental Montorng and Assessment Program (EMAP) survey carred out by the Space Tme Aquatc Resources Modellng and Analyss Program (STARMAP) at Colorado State Unversty, and we replcate the desgn-based smulaton experment carred out by Salvat et al. (2010). The background to ths data set s that EMAP conducted a survey of lakes n the North-Eastern states of the Unted States of Amerca between 1991 and The data collected n ths survey ncluded 551 measurements of Acd Neutralzng Capacty (ANC) - an ndcator of the acdfcaton rsk of water bodes n water resource surveys - from a sample of 349 of the 21,028 lakes located n ths area. Here we defne lakes grouped by 6-dgt Hydrologc Unt Code (HUC) as our small areas of nterest. Snce three HUCs have sample szes of one, these are combned wth adjacent HUCS, leadng to a total of 23 small areas. Sample szes n these 23 areas vary from 2 to 45. A (fxed) pseudo-populaton of N = 21,028 lakes s defned by samplng N tmes wth replacement and wth probablty proportonal to a lake's sample weght from the orgnal sample of 349 lakes. A total of R = 1000 ndependent stratfed random samples of the same sze as the orgnal sample are selected from ths pseudopopulaton, wth HUCs correspondng to strata and stratum sample szes fxed to be the same as n the orgnal sample. The survey varable of nterest s the ANC value of a lake, wth ts elevaton defnng the auxlary varable. Usng the orgnal EMAP data, the ft of the lnear mxed model (AIC = ) s worse than that of the nonparametrc regresson model (AIC 18
21 = ). In ths case, therefore, there are gans from ncludng the splne component n the mxed model, and so we expect that estmates of the dstrbuton functon based on the nonparametrc regresson model wll perform better than those based on the lnear mxed model. Agan, the am s to estmate the values of the ndvdual HUC dstrbuton functons at the 0.1, 0.25, 0.5, 0.75 and 0.9 quantles of the fnte populaton dstrbuton of ANC. Tables 4 and 5 show the average over small areas of the ARB and RRMSE values of the dfferent dstrbuton functon estmators based on the R = 1000 ndependent stratfed samples taken from the AAGIS and EMAP populatons respectvely. Smlarly, Table 6 shows the correspondng averages over the areas of the true RMSEs and estmated RMSEs, and the actual coverage rates of nomnal 95 percent confdence ntervals for the true areaspecfc dstrbuton functon values based on the MBDE estmator (19) and ts assocated MSE estmator (20). Fgures 1 and 2 show the area-specfc values of the true RMSE and estmated RMSE of the MBDE (19) for the desgn-based smulatons of the AAGIS and EMAP data. 4.3 Dscusson Two thngs stand out n Tables 2 and 3. The frst s that the MBDE offers substantal bas gans over the other DF estmators, at all quantles, when the relatonshp between the study varable and the covarate s complcated and/or the usual mxed model dstrbutonal assumptons are nvald (sets 2a and 2b). If the underlyng populaton structure s lnear and the usual mxed model assumptons hold (set 1) the CD and NPCD estmators have slghtly smaller absolute bases than the MBDE. The larger bases of the 'plug-n' EBP and NPEBP estmators are not unexpected n set 1 because these estmators gnore unt level varablty n y. Second, the NPCD estmator generally records the lowest RRMSE among the alternatves to the MBDE, but when the relatonshp between y and x s complcated, as under sets 2a and 2b, the RRMSE values recorded by the MBDE are comparable, and sometmes lower, than 19
22 those recorded by the NPCD estmator. On the other hand, under the lnear specfcaton (set 1), the MBDE s clearly less effcent than ts alternatves. Desgn-based smulatons serve to complement model-based smulatons for SAE, provdng evdence of comparatve performance and robustness n realstc data scenaros. Table 4 shows the results for the desgn-based smulatons usng the AAGIS data. Here we see that the MBDE has lower bas and RMSE than the other predctors at all quantles. As expected, gven the lnear relatonshp between y and x, the CD-based estmators of the DF based on the lnear mxed model are generally more effcent than those based on the nonparametrc splne regresson model. However, the reverse s true for the EBP-based estmators, perhaps reflectng the lower (but stll substantal) bases of the NPEBP. Table 5 reports the desgn-based smulaton results for EMAP data. These agan ndcate that the MBDE domnates the other estmators n terms of bas. The results for RRMSE are not as clear-cut as n the AAGIS smulatons, but stll show that the performance of the MBDE s comparable wth the performance of the NPCD estmator, whch was consstently the best of the alternatve estmators n terms of RRMSE. We now turn to an examnaton of the performance of the MSE estmator (20) for the MBDE. Fgures 1 and 2 show that ths estmator accurately tracks the smulaton (.e. repeated samplng) area-specfc MSEs of the MBDE at all fve target quantles for y. Ths good performance s confrmed by the results n Table 6, whch shows that the area averages of the true RMSEs and the estmated RMSEs obtaned usng (20) are very close. Fnally, we note that one can combne the MBDE estmator (19) wth the MSE estmator (20) to generate normal theory confdence ntervals for the area-specfc value of the dstrbuton functon,.e. as the small area estmate plus or mnus twce ts correspondng estmated RMSE. Table 6 shows that the actual coverage rates acheved by these ntervals, though generally less than 95 per cent, are stll close enough to ther target value to be practcally useful. 20
23 Fnally, we note that an alternatve to the CD estmator that s both model-consstent and desgn-consstent, has been proposed by Rao et al. (1990). Although the relevant results are not reported here, we also explored the performance of both parametrc and nonparametrc versons of ths estmator n our smulatons. In all cases, ths performance was almost dentcal to that of the parametrc and nonparametrc versons of the CD predctor. 5. Conclusons Ths paper develops an MBDE estmator for the value of the area-specfc fnte populaton dstrbuton of a response varable y. Ths estmator s based on sample weghts that are calbrated to the fnte populaton dstrbuton of an auxlary varable x, and also allow for random area effects. We then compare the performance of ths MBDE estmator wth two competng estmators based on ether a lnear mxed model or a nonparametrc mxed model for y. Our results ndcate that the proposed MBDE can sometmes be much better than these alternatves, partcularly n realstc applcatons where ftted models are approxmatons at best. On the other hand, f the model assumptons are vald (e.g. set 1 n the model-based smulatons), then area-specfc dstrbuton functon estmators based on the CD representaton are preferable. We also provde a method for estmatng the MSE of the MBDE and demonstrate emprcally that t performs well. References Chambers, R. (1996). Robust case-weghtng for multpurpose establshment surveys. Journal of Offcal Statstcs, 12, Chambers, R. (2005). Imputaton vs. Estmaton of Fnte Populaton Dstrbutons. Southampton Statstcal Scences Research Paper. S3RI Methodology Workng Papers, M05/06. 21
24 Chambers, R., Chandra, H. and Tzavds, N. (2009). On Bas-Robust Mean Squared Error Estmaton for Lnear Predctors for Domans. Workng Papers, Centre for Statstcal and Survey Methodology, The Unversty of Wollongong, Australa. (Avalable from: Chambers, R. and Dunstan, R. (1986). Estmatng dstrbuton functons from survey data. Bometrka, 73, Chandra, H. and Chambers, R. (2009). Multpurpose weghtng for small area estmaton. Journal of Offcal Statstcs, 25, 3, Cochran, W.G. (1977). Samplng Technques, 3rd edton. Wley & Sons, NY. Devlle, J.C. and Särndal, C.E. (1992). Calbraton estmators n survey samplng. Journal of the Amercan Statstcal Assocaton, 87, Elers, P. and Marx, B. (1996). Flexble Smoothng usng B-splnes and Penalzed Lkelhood (wth comments and rejonder). Statstcal Scence, 11, Harms, T. and Duchesne, P. (2006). On calbraton estmaton for quantles. Survey Methodology, 32, Harvlle, D.A. (1977). Maxmum lkelhood approaches to varance component estmaton and to related problems. Journal of the Amercan Statstcal Assocaton, 72, He, X. (1997). Quantle curves wthout crossng. Amercan Statstcan, 51, McCulloch, C.E., and Searle, S.R. (2001). Generalzed Lnear and Mxed Models. Wley, New York. Opsomer, J.D., Claeskens, G., Ranall, M.G., Kauermann, G. and Bredt, F.J. (2008). Nonparametrc small area estmaton usng penalzed splne regresson. Journal of the Royal Statstcal Socety, Seres B, 70, Rao, J.N.K., Kovar, J.G. and Mantel, H.J. (1990). On estmatng dstrbuton fucntons and quantles from survey data usng auxlary nformaton. Bometrka, 77,
25 Rao, J.N.K. (2003). Small Area Estmaton. New York: Wley. Rueda, M., Martínez, S., Martínez, H. and Arcos, A. (2007). Estmaton of the dstrbuton functon wth calbraton methods. Journal of Statstcal Plannng and Inference, 137, Rueda, M., Sánchez-Borrego, I., Arcos, A. and Martínez, S. (2010). Model-calbraton estmaton of the dstrbuton functon usng nonparametrc regresson. Metrka, 71, Ruppert, D., Wand, M.P. and Carroll, R. (2003). Semparametrc Regresson. Cambrdge Unversty Press, Cambrdge. Royall, R.M. (1976). The lnear least-squares predcton approach to two-stage samplng. Journal of the Amercan Statstcal Assocaton, 71, Royall, R.M. and Cumberland, W.G. (1978). Varance estmaton n fnte populaton samplng. Journal of the Amercan Statstcal Assocaton, 71, Salvat, N., Chandra, H., Ranall, M.G. and Chambers, R. (2010). Small area estmaton usng a nonparametrc model-based drect estmator. Computatonal Statstcs and Data Analyss, 54, Tzavds, N., Marchett, S., and Chambers, R. (2010). Robust predcton of small area means and quantles. Australan and New Zealand Journal of Statstcs, 52, Wand, M.P. (2003). Smoothng and mxed models. Computatonal Statstcs, 18,
26 Table 1. Descrpton of the estmators consdered n the smulaton studes. Estmator Descrpton MBDE MBDE (19) wth sample weghts (17) based on model (18) EBP EBLUP-based EBP estmator (5) under lnear mxed model (4) CD EBLUP-based CD estmator (6) under lnear mxed model (4) NPEBP NPEBLUP-based EBP estmator (11) under splne-based mxed model (10) NPCD NPEBLUP-based CD estmator (12) under splne-based mxed model (10) Table 2. Area averages of absolute relatve bas (ARB, %) generated by model-based smulatons. Set Populaton quantle MBDE EBP CD NPEBP NPCD a b
27 Table 3. Area averages of relatve root mean squared error (RRMSE, %) generated by modelbased smulatons. Set Populaton quantle MBDE EBP CD NPEBP NPCD a b Table 4. Average values over 12 regons of absolute relatve bas (ARB, %) and relatve root mean squared error (RRMSE, %) for the AAGIS data. Populaton quantle MBDE EBP CD NPEBP NPCD ARB (%) RRMSE (%)
28 Table 5. Average values over 23 HUCs of absolute relatve bas (ARB,%) and relatve root mean squared error (RRMSE,%) for the EMAP data. Populaton quantle MBDE EBP CD NPEBP NPCD ARB (%) RRMSE (%) Table 6. Average values of true RMSE and estmated RMSE and actual coverage rate (CR, %) of nomnal 95 per cent confdence ntervals generated by the MBDE (19) and assocated MSE estmator (20) for the AAGIS and EMAP data. Averages are over regons. AAGIS EMAP Populaton quantle True Estmated True RMSE Estmated RMSE CR RMSE RMSE CR
29 Fgure 1. Regon-specfc values of actual repeated samplng RMSE (sold lne) and average estmated RMSE (dashed lne) of MBDE (19) for the AAGIS data. Fgure 2. HUC-specfc values of actual repeated samplng RMSE (sold lne) and average estmated RMSE (dashed lne) of MBDE (19) for the EMAP data. 27
Small Area Estimation for Business Surveys
ASA Secton on Survey Research Methods Small Area Estmaton for Busness Surveys Hukum Chandra Southampton Statstcal Scences Research Insttute, Unversty of Southampton Hghfeld, Southampton-SO17 1BJ, U.K.
More informationOn Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function
On Outler Robust Small Area Mean Estmate Based on Predcton of Emprcal Dstrbuton Functon Payam Mokhtaran Natonal Insttute of Appled Statstcs Research Australa Unversty of Wollongong Small Area Estmaton
More informationBias-correction under a semi-parametric model for small area estimation
Bas-correcton under a sem-parametrc model for small area estmaton Laura Dumtrescu, Vctora Unversty of Wellngton jont work wth J. N. K. Rao, Carleton Unversty ICORS 2017 Workshop on Robust Inference for
More informationSmall Area Estimation Under Spatial Nonstationarity
Small Area Estmaton Under Spatal Nonstatonarty Hukum Chandra Indan Agrcultural Statstcs Research Insttute, New Delh Ncola Salvat Unversty of Psa Ray Chambers Unversty of Wollongong Nkos Tzavds Unversty
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationChapter 5 Multilevel Models
Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More informationOutlier Robust Small Area Estimation
Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 009 Outler Robust Small Area Estmaton R. Chambers Unversty
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationDiscussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek
Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson
More informationRobust Small Area Estimation Using a Mixture Model
Robust Small Area Estmaton Usng a Mxture Model Jule Gershunskaya U.S. Bureau of Labor Statstcs Partha Lahr JPSM, Unversty of Maryland, College Park, USA ISI Meetng, Dubln, August 23, 2011 Parameter of
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationLecture 6: Introduction to Linear Regression
Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6
More informationMultivariate Ratio Estimator of the Population Total under Stratified Random Sampling
Open Journal of Statstcs, 0,, 300-304 ttp://dx.do.org/0.436/ojs.0.3036 Publsed Onlne July 0 (ttp://www.scrp.org/journal/ojs) Multvarate Rato Estmator of te Populaton Total under Stratfed Random Samplng
More informationUncertainty as the Overlap of Alternate Conditional Distributions
Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationSmall area estimation for semicontinuous data
Unversty of Wollongong Research Onlne Faculty of Engneerng and Informaton Scences - Papers: Part A Faculty of Engneerng and Informaton Scences 2016 Small area estmaton for semcontnuous data Hukum Chandra
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationEfficient nonresponse weighting adjustment using estimated response probability
Effcent nonresponse weghtng adjustment usng estmated response probablty Jae Kwang Km Department of Appled Statstcs, Yonse Unversty, Seoul, 120-749, KOREA Key Words: Regresson estmator, Propensty score,
More informationStatistics II Final Exam 26/6/18
Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the
More informationNonparametric model calibration estimation in survey sampling
Ames February 18, 004 Nonparametrc model calbraton estmaton n survey samplng M. Govanna Ranall Department of Statstcs, Colorado State Unversty (Jont work wth G.E. Montanar, Dpartmento d Scenze Statstche,
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan
More informationx i1 =1 for all i (the constant ).
Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by
More informationEcon107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)
I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationChapter 8 Indicator Variables
Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationLecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding
Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study
More informationLecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding
Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationA nonparametric two-sample wald test of equality of variances
Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 0 A nonparametrc two-sample wald test of equalty of varances
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationComputing MLE Bias Empirically
Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.
More informationA note on regression estimation with unknown population size
Statstcs Publcatons Statstcs 6-016 A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Statstcs Canada Jae Kwang Km Iowa State Unversty jkm@astate.edu Chrstan Olver Nambeu Statstcs
More informationNon-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT
Malaysan Journal of Mathematcal Scences 8(S): 37-44 (2014) Specal Issue: Internatonal Conference on Mathematcal Scences and Statstcs 2013 (ICMSS2013) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationA Bound for the Relative Bias of the Design Effect
A Bound for the Relatve Bas of the Desgn Effect Alberto Padlla Banco de Méxco Abstract Desgn effects are typcally used to compute sample szes or standard errors from complex surveys. In ths paper, we show
More informationThe Ordinary Least Squares (OLS) Estimator
The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE
P a g e ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE Darmud O Drscoll ¹, Donald E. Ramrez ² ¹ Head of Department of Mathematcs and Computer Studes
More informationF statistic = s2 1 s 2 ( F for Fisher )
Stat 4 ANOVA Analyss of Varance /6/04 Comparng Two varances: F dstrbuton Typcal Data Sets One way analyss of varance : example Notaton for one way ANOVA Comparng Two varances: F dstrbuton We saw that the
More informationSTK4080/9080 Survival and event history analysis
SK48/98 Survval and event hstory analyss Lecture 7: Regresson modellng Relatve rsk regresson Regresson models Assume that we have a sample of n ndvduals, and let N (t) count the observed occurrences of
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationUniversity of Wollongong. Research Online
Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 2009 Borrowng strength over space n small area estmaton: Comparng
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation
Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationA Comparative Study for Estimation Parameters in Panel Data Model
A Comparatve Study for Estmaton Parameters n Panel Data Model Ahmed H. Youssef and Mohamed R. Abonazel hs paper examnes the panel data models when the regresson coeffcents are fxed random and mxed and
More informationAn (almost) unbiased estimator for the S-Gini index
An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for
More informationECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics
ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationTHE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for
More informationParametric fractional imputation for missing data analysis
Secton on Survey Research Methods JSM 2008 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Wayne Fuller Abstract Under a parametrc model for mssng data, the EM algorthm s a popular tool
More informationNon-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators *
Non-parametrc bootstrap mean squared error maton for M-quantle mates of small area means quantles and poverty ndcators * Stefano Marchett 1 Monca Prates 2 Nos zavds 3 1 Unversty of Psa e-mal: stefano.marchett@for.unp.t
More informationUSE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE
STATISTICA, anno LXXV, n. 4, 015 USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE Manoj K. Chaudhary 1 Department of Statstcs, Banaras Hndu Unversty, Varanas,
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 3 Stat102, Spring 2007
Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More informationTesting for seasonal unit roots in heterogeneous panels
Testng for seasonal unt roots n heterogeneous panels Jesus Otero * Facultad de Economía Unversdad del Rosaro, Colomba Jeremy Smth Department of Economcs Unversty of arwck Monca Gulett Aston Busness School
More informationChapter 15 - Multiple Regression
Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term
More informationInterval Estimation in the Classical Normal Linear Regression Model. 1. Introduction
ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model
More informationThis column is a continuation of our previous column
Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard
More informationβ0 + β1xi and want to estimate the unknown
SLR Models Estmaton Those OLS Estmates Estmators (e ante) v. estmates (e post) The Smple Lnear Regresson (SLR) Condtons -4 An Asde: The Populaton Regresson Functon B and B are Lnear Estmators (condtonal
More informationDurban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications
Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department
More informationStatistics for Business and Economics
Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationPolynomial Regression Models
LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More information18. SIMPLE LINEAR REGRESSION III
8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria
ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book
More informationLecture 4 Hypothesis Testing
Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More information28. SIMPLE LINEAR REGRESSION III
8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More informationNow we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity
ECON 48 / WH Hong Heteroskedastcty. Consequences of Heteroskedastcty for OLS Assumpton MLR. 5: Homoskedastcty var ( u x ) = σ Now we relax ths assumpton and allow that the error varance depends on the
More informationChapter 6. Supplemental Text Material
Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.
More informationSmall Area Estimation: Methods, Applications and New Developments. J. N. K. Rao. Carleton University, Ottawa, Canada
Small Area Estmaton: Methods, Applcatons and New Developments J. N. K. Rao Carleton Unversty, Ottawa, Canada Paper presented at the NTTS 2013 Conference, Brussels, March 2013 1 Introducton Censuses and
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More informationChapter 14 Simple Linear Regression
Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationSmall area prediction of counts under a nonstationary
Unversty of Wollongong Research Onlne Faculty of Engneerng and Informaton Scences - Papers: Part A Faculty of Engneerng and Informaton Scences 207 Small area predcton of counts under a nonstatonary spatal
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More informationConditional and unconditional models in modelassisted estimation of finite population totals
Unversty of Wollongong Research Onlne Faculty of Informatcs - Papers Archve) Faculty of Engneerng and Informaton Scences 2011 Condtonal and uncondtonal models n modelasssted estmaton of fnte populaton
More informationTopic 23 - Randomized Complete Block Designs (RCBD)
Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationExplaining the Stein Paradox
Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten
More information