Small area estimation for semicontinuous data

Size: px

Start display at page:

Download "Small area estimation for semicontinuous data"

Gavin Kennedy
5 years ago
Views:

Unversty of Wollongong Research Onlne Faculty of Engneerng and Informaton Scences - Papers: Part A Faculty of Engneerng and Informaton Scences 2016 Small area

edu.au Publcaton Detals Chandra, H. & Chambers, R. L. (2016). Small area estmaton for semcontnuous data.

1 Unversty of Wollongong Research Onlne Faculty of Engneerng and Informaton Scences - Papers: Part A Faculty of Engneerng and Informaton Scences 2016 Small area estmaton for semcontnuous data Hukum Chandra Indan Agrcultural Statstcs Research Insttute, hchandra@uow.edu.au Raymond L. Chambers Unversty of Wollongong, ray@uow.edu.au Publcaton Detals Chandra, H. & Chambers, R. L. (2016). Small area estmaton for semcontnuous data. Bometrcal Journal: Journal of Mathematcal Methods n Boscences, 58 (2), Research Onlne s the open access nsttutonal repostory for the Unversty of Wollongong. For further nformaton contact the UOW Lbrary: research-pubs@uow.edu.au

2 Small area estmaton for semcontnuous data Abstract Survey data often contan measurements for varables that are semcontnuous n nature,.e. they ether take a sngle fxed value (we assume ths s zero) or they have a contnuous, often skewed, dstrbuton on the postve real lne. Standard methods for small area estmaton (SAE) based on the use of lnearmxed models can be neffcent for such varables. We dscuss SAE technques for semcontnuous varables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the nonzero values of the response varable. In partcular, we frst model the excess zeros va a generalzed lnear mxed model ftted to the probablty of a nonzero,.e. strctly postve, value beng observed, and then model the response, gven that t s strctly postve, usng a lnear mxed model ftted on the logarthmc scale. Emprcal results suggest that the proposed method leads to effcent small area estmates for semcontnuous data of ths type. We also propose a parametrc bootstrap method to estmate the MSE of the proposed small area estmator. hese bootstrap estmates of the MSE are compared to the true MSE n a smulaton study. Dscplnes Engneerng Scence and echnology Studes Publcaton Detals Chandra, H. & Chambers, R. L. (2016). Small area estmaton for semcontnuous data. Bometrcal Journal: Journal of Mathematcal Methods n Boscences, 58 (2), hs journal artcle s avalable at Research Onlne:

3 Small area estmaton for semcontnuous data Hukum Chandra 1,* and Ray Chambers 2 1 Indan Agrcultural Statstcs Research Insttute, Lbrary Avenue, New Delh , Inda. 2 Natonal Insttute for Appled Statstcs Research Australa, Unversty of Wollongong, Wollongong, NSW, 2522, Australa. Abstract Survey data often contan measurements for varables that are semcontnuous n nature,.e. they ether take a sngle fxed value (we assume ths s zero) or they have a contnuous, often skewed, dstrbuton on the postve real lne. Standard methods for small area estmaton (SAE) based on the use of lnear mxed models can be neffcent for such varables. We dscuss SAE technques for semcontnuous varables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the non-zero values of the response varable. In partcular, we frst model the excess zeros va a generalzed lnear mxed model ftted to the probablty of a non-zero,.e. strctly postve, value beng observed, and then model the response, gven that t s strctly postve, usng a lnear mxed model ftted on the logarthmc scale. Emprcal results suggest that the proposed method leads to effcent small area estmates for semcontnuous data of ths type. We also propose a parametrc bootstrap method to estmate the MSE of the proposed small area estmator. hese bootstrap estmates of the MSE are compared to the true MSE n a smulaton study. Key words: Mean squared error; Parametrc bootstrap; Skewed data; Small area estmaton; Zero-nflated. 1. Introducton Many varables of nterest n busness, agrcultural, envronmental, ecologcal and epdemologcal surveys are semcontnuous n nature,.e. they ether take a sngle fxed value (typcally zero) or they have a contnuous, often skewed, dstrbuton on the postve real lne. hs artcle focuses on a partcular type of semcontnuous varable frequently encountered n practce, a mxture of zeros and contnuous strctly postve * Correspondng author: e-mal: hchandra12@gmal.com, Phone: , Fax:

4 values that are generally skewed. Such a semcontnuous varable s qute dfferent from one that has been left-censored or truncated, because the zeros are vald self-representng data values, not proxes for negatve or mssng responses. It s therefore natural to vew a semcontnuous response of ths type as the result of two processes, one determnng whether the response s zero and the other determnng the actual level f t s non-zero (Olsen and Schafer, 2001). Measurements of ndebtedness, nvestment, producton or amount of stock on hand all represent stuatons where semcontnuous data are typcally collected n household and busness surveys. For example, Amount of Loan Outstandng (collected n the 59 th Round of the Natonal Sample Survey, or NSS, n Inda), and Closng Beef Cattle, or BEEFCL (collected n the Australan Agrcultural Grazng Industres Survey, or AAGIS) are just two cases of mportant survey output varables that are, by ther defnton, semcontnuous. In both, the target varable s ether zero or some postve value, wth these postve values then havng a skewed dstrbuton. Unlke the NSS data, an anonymsed verson of the AAGIS data s avalable, and so these data are used n the emprcal evaluatons presented n Secton 5, whch focus on regonal estmaton for BEEFCL. See Fgure 1 and able 4 for the dstrbutons of regonal sample szes and proportons of zero values n the AAGIS sample data, whle the sample dstrbuton of BEEFCL n these data s shown n Fgure 2. It s clear from Fgures 1 and 2 that BEEFCL s zero-nflated wth hghly skewed non-zero values. Snce a lnear model s not approprate for a semcontnuous varable, commonly used methods for small area estmaton based on the use of lnear mxed models (e.g. the emprcal best lnear unbased predctor or EBLUP) can be neffcent for such varables (see Rao, 2003). Chandra and Chambers (2011a) and Berg and Chandra (2012) nvestgate small area estmaton methods for skewed varables, focussng on the case where a lnear mxed model s approprate after a logarthmc (log) transformaton. Chandra and Chambers (2011a) descrbe two methods of small area estmaton for such postvely skewed varables. he frst, a model-based drect estmator or MBDE, s defned as a weghted sum of the sampled unts n the small area, wth weghts constructed so as to lead to the mnmum mean squared error lnear predctor of the overall populaton mean f the parameters of the log scale lnear mxed model were known. he second, based on the approach of Karlberg (2000), uses an emprcal 1

5 predctor based on a log scale lnear mxed model that s analogous to the synthetc estmator under a lnear mxed model. he MBDE s a drect estmator and unbased n the presence of between area heterogenety, but can yeld unstable estmates f sample szes are too small. On the other hand, the synthetc type emprcal predctor only accounts for between area varablty through between area varaton n the model covarates, and can therefore lead to based estmators when there s sgnfcant resdual between area heterogenety. Berg and Chandra (2012) also descrbe an emprcal best predctor that has mnmum mean squared error n the class of unbased predctors when a log scale lnear mxed model s approprate. hs predctor allows for between area varaton and s ndrect,.e. t uses nformaton from all the small areas. However, all these approaches are restrcted to a strctly postve varable, and so cannot be drectly appled to a semcontnuous varable. he presence of excess zeros n survey data s a well known problem, and a varety of approaches have been suggested for addressng t. However, much less s known when the focus s on small area estmaton usng these data, even though presence of excess zeros wthn a small area are clearly much more nfluental than they are n the larger overall sample. A two part random effects model (Olsen and Schafer, 2001), also referred as a mxture model (Fletcher et al., 2005), s wdely used for small area estmaton wth zero-nflated varables, see for example, Pfeffermann et al. (2008) and Chandra and Sud (2012). In what follows we therefore develop a small area estmaton method for semcontnuous varables under a two part random effects model. Here we frst model the excess zeros va a generalzed lnear mxed model ftted to the probablty of a non-zero,.e. strctly postve, value beng observed, and then model the response, gven that t s strctly postve, usng a log scale lnear mxed model. hese two model components are combned n estmaton. We also propose a parametrc bootstrap method that can be used estmate the mean squared error (MSE) of our proposed two part estmator. he structure of the paper s as follows. In Secton 2 we develop a number of predctors for a small area mean based on a log scale lnear mxed model. In Secton 3 we then ntroduce the two part random effects model (or mxture model) and dscuss dfferent 2

6 approaches to small area estmaton under ths model. Secton 4 then focuses on MSE estmaton va a parametrc bootstrap approach. In Secton 5 we present results from both model-based as well as desgn-based smulatons whch are used llustrate the performances of the dfferent methods of small area estmaton dscussed n Secton 3, wth the desgn-based smulatons based on survey data from the AAGIS. Fnally, n Secton 6 we summarze our man fndngs and dscuss avenues for future research. 2. Small area estmaton under transformaton to lnearty We assume that a non-nformatve samplng method s used to draw a sample of sze n from a fnte populaton U of sze N whch conssts of D non-overlappng domans U ( 1,..., D). Followng standard practce, we refer to these domans as small areas or just areas. We further assume that there s a known number N of populaton unts n small area, wth n of these sampled. he total number of unts n the populaton s N D 1 N, wth correspondng total sample sze n D 1 n. We use s to denote the collecton of unts n sample, wth s the subset drawn from small area (.e. s n), and use expressons lke j and j s to refer to the unts makng up small area and sample s respectvely. Smlarly, r denotes the set of unts n small area that are not n sample, wth r N n and U s r. Let y j denote the value of the varable of nterest Y for unt j n area and x j denote the vector of length m 1 contanng the known values of the auxlary varables for unt j n area. hroughout we assume that the quantty of nterest s the small area mean of Y, 1 N j1 j m N y. We consder a stuaton where the varable of nterest follows a log scale lnear mxed model. hat s, y j satsfes where j 1,g( j ) log( y ) l z u e, (1) j j j j z x s the m 1 vector of covarates defned by approprate transformaton of the auxlary varables, s a m 1 vector of fxed effects, u s a random effect assocated wth area and e j s an ndvdual level random effect for unt j 3

7 n small area. Followng standard practce, we assume that the area and ndvdual effects are mutually ndependent, wth the area effects ndependently and dentcally 2 dstrbuted as u N(0, u) and the ndvdual effects ndependently and dentcally 2 dstrbuted as (0, ) y ; 1,... D; j s are ej N e. he sample observatons j assumed to be avalable. We further assume that the populaton values of z j are avalable, and that they can be lnked to the sample. Consequently, the avalable data for area are y, z ; 1,... D; js z ; 1,... D; jr. Let j j j vector of model parameters, and let 2 2 (, ) u e be the 2 2 (, ) u e ˆ ˆ ˆ ˆ be the Maxmum Lkelhood (ML) or the Restrcted Maxmum Lkelhood (REML) estmator of. In partcular, (, ) s usually referred to as the vector of varance components of the model u e wth estmator ˆ 2 ( ˆ 2, ˆ 2 ). Note that snce we have assumed a non-nformatve u e samplng method, the sample and populaton dstrbutons of the data are the same, and are gven by (1). Gven the sample data, we can estmate the unknown parameters (ncludng the area effect) of model (1) and hence defne the log-scale predctons as ˆ l z ˆ uˆ, where j j s the estmator of, and uˆ ˆ ( l z ˆ) s the emprcal best lnear unbased predctor s s (EBLUP) of the random area effect. Here ˆ ˆ ( ˆ n ˆ ) s the plug-n estmator u u e of the shrnkage effect z 1 s n js j ( n ), and u u e l n log( y ) and 1 s js j z are the sample means of l j and z j respectvely n area. Usng a predcton-based approach smlar to that descrbed n Karlberg (2000), Chandra and Chambers (2011a) then propose a synthetc type predctor for the area mean m under model (1) of the form 1 ˆ s r m ˆ N y y, (2) SYN EP SYN EP j j where and 1 ˆ 2 2 exp z 0.5 SYN EP SYN EP yˆ cˆ ˆ ˆ j j j u e 4

8 cˆ Vˆ ˆ Vˆ ˆ ˆ SYN EP 2 2 j exp 0.5 zj ( ) zj 0.25 ( u e ) s a aylor seres lnearzaton-based correcton for back transformaton bas. Note that (2) s not an Emprcal Best Predctor snce t does not allow for between unt correlaton wthn a small area when t predcts the value of a non-sample y j gven the correspondng sample values for ths varable n area. It s therefore a synthetc predctor of the small area mean. Chandra and Chambers (2011a) also propose a model-based drect estmator (MBDE) of m of the form wy js j j, where w j s an estmator of the weght that leads to the best lnear unbased predctor (BLUP) of the populaton mean f the parameters of the model (1) are known. o derve ths estmator, Chandra and Chambers (2011a) use the approxmatons, ( ) SYN EP E yj 0 1ˆ yj, (3) and Cov y y yˆ yˆ ˆ ˆ ˆ I j k, (4) SYN EP SYN EP ( j, k ) j k exp( u ) 1 exp( u ) exp( e ) 1 [ ] where ˆ SYN y j EP s gven n (2). he approxmatons (3) and (4) follow from the moment generatng functon of a normal dstrbuton, and the fact that the covarance between two unts from dfferent areas s zero. Put y ( y, y ), where y s and y r are the U s r vectors of sampled and non-sampled unts of Y respectvely. Smlarly, let ˆ SYN y EP and ˆ SYN EP r y denote the vectors contanng the values ˆ SYN EP y j for the sampled and non- SYN EP SYN EP sampled unts and defne (, ) (, ),(( ˆ ),( ˆ U s r s r s r ) ) then express (3) and (4) n matrx form as J J J 1 1 y y. We can E( y ) J U U Vss Vsr, (5) V ( yu) VU Vrs Vrr s where and the elements of varance-covarance matrx V U are gven by (4). For known parameters, the model specfed n (3) and (4) s referred to as a 'ftted 5

9 value' model and corresponds to a lnear model for y j. he BLUP of the populaton mean m N 1 D N y U 1 j1 j of Y under (5) s then N w y, where 1 s s w ( w ; js) 1 H ( J 1 J 1 ) ( I H J ) V V 1, (6) 1 s j s s U U s s s s s ss sr r where H ( J V J ) J V. Note that the weghts (6) satsfy s s s ss s s ss D w N 1 js j and. he MBDE of the small area mean m D SYN EP D N SYN EP wyˆ ˆ 1 j j y js 1 j1 j (Chandra and Chambers, 2011a) s then m ˆ N w y, (7) CC 1 js j j where the w j are the weghts (6) assocated wth the sample unts n area. We note that snce (7) s a drect estmator, t can lead to unstable estmates when area sample szes are too small. Balanced aganst ths however s ts nherent robustness to msspecfcaton of the model for the y j. Fnally, Berg and Chandra (2012) use (1) to develop the emprcal verson of the mnmum mean squared error (MMSE) predctor for m. hs s EBP where ˆ ˆ z ˆ z ˆ EBP 1 ˆ s r 2 1 ˆ ˆ, (8) m EBP ˆ N y y j j y exp l 0.5 (1 n ). We note that (8) allows for j j s j e between unt correlatons wthn a small area and s therefore an Emprcal Best Predctor (EBP) under the normalty assumptons of (1). o see ths, observe that for non-sample unt sample data x, l, x ; k s and so j r the condtonal dstrbuton of l j log(y j ) gven the area j k k s normal, wth j j, k, k ; j j, s, s zj s zs E l x l x k s E l z l z l Var l x, l, x ; k s ( n ) (1 n ) j j k k u e u u e e z z E y x y x ks l n, 2 1 j j, k, k; exp j s s 0.5 e (1 ) 6

10 whch mmedately leads to the emprcal verson (8) of the MMSE predctor (8). Consequently, when (1) holds,.e. the y j are lognormally dstrbuted, we expect (8) to domnate (2). Note that EBP ˆ ˆ 2 1 Eyˆ exp 0.5 ˆ (1 ˆ j E j ls s e n ) z z z z 2 1 exp j ls s 0.5 e (1 n ). hat s, the MMSE predctor (8) s based. Berg and Chandra (2012) use aylor seres approxmaton to bas correct ths predctor. Followng ther development, a bas corrected verson of (8) s where 1 yˆ cˆ yˆ EBP BC EBP EBP j j j Put ˆ d ˆ ls s cˆ cˆ EBP BC 1 ˆ s r m EBP BC ˆ N y y j j, wth, (9) EBP ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ j j 1 e 2 u 3 e u c exp 0.5 a c V( ) c V( ) 2 c Cov(, ). z. hen ˆ ˆ( ˆ V ) ˆ a z z z z, j j s j s ˆ ˆ ˆ ˆ ˆ ˆ d 2d n ˆ ˆ ˆ nu nu nu cˆ , ˆ ˆ 1 ˆ dˆ ˆ 1 ˆ 2ˆ 1 ˆ dˆ, ˆ ˆ ˆ u u u dˆ ˆ ˆ dˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 3 d n ˆ 2 ˆ ˆ ˆ nu u nu nu 3. Small area estmaton under a mxture model We now consder the case where the response varable y j s semcontnuous. In partcular, we shall assume that y j s ether zero or has a skewed dstrbuton over the strctly postve real lne. We descrbe an approach based on modellng ths varable va 7

11 a two part random effects model (also referred as a mxture model). hat s, we shall assume that y j s drawn from a two-component mxture, where the frst component corresponds to a fxed value (zero) and the second component corresponds to a strctly postve random varable wth a skewed dstrbuton. Followng Olsen and Schafer (2001), Pfeffermann et al. (2008), Chandra and Chambers (2011b) and Chandra and Sud (2012), we defne I(A) as the ndcator functon for the event A and wrte y 0I y 0 y I y 0 y, where y% j s referred to as the log-lnear j j j j j j component of y j and s assumed to follow the log scale lnear mxed model (1). he second component j Iyj 0 s assumed to follow a generalzed lnear mxed model (GLMM) wth logt lnk functon (Breslow and Clayton, 1993), and s referred as the logstc component of y j. Note that values of y j are only observed when j 1, whereas values of j are always observed. Small area estmaton under ths mxture model s mplemented n three steps. Frst, a logstc lnear mxed model s ftted to the sample values of the ndcator varable j. Second, a log scale lnear mxed model s ftted to the postve sample values of the response varable. Fnally, predcted values generated under these two models are combned at the estmaton stage. Chandra and Chambers (2011b) used a smlar mxture model for small area estmaton of zero-nflated skewed data. However, ther approach focuses on the MBDE estmator for ths case, and uses sample weghts obtaned va the 'ftted value' lnear model mpled by the two part mxture model. hey also develop a MSE estmator based on pseudo-lnearzaton (Chambers et al., 2011). However, as noted earler, the MBDE s a drect estmator and can be unstable when area specfc sample szes are too small. Fttng the logstc component of a two part random effects model poses computatonal challenges smlar to those found when fttng generalzed lnear mxed models. Generally, an approxmate Fsher scorng procedure based on hgher order Laplace approxmatons s used to obtan maxmum lkelhood estmates for the fxed coeffcents and varance components, see Olsen and Schafer (2001). Pfeffermann et al. 8

12 (2008) use a two part random effects model that allows for the random area effects n the two components of the model to be correlated. However, ther smulaton results show that ths correlaton does not sgnfcantly mprove small area estmaton. Furthermore, use of ths correlaton makes model fttng computatonally ntensve and sometmes numercally unstable. Consequently the area random effects n the two components of the two part random effects model are often assumed to be ndependent, see for example, Karlberg (2000) and references theren. We shall proceed smlarly and assume that the two area random effects are uncorrelated. hat s, followng the Pfeffermann et al. (2008), Chandra and Chambers (2011b) and Chandra and Sud (2012) we assume that the correlaton between the two random components j and y j of the assumed mxture model s neglgble. Note that ths mples that the mxture model s not approprate f there s reason to beleve that the dstrbutons of these components are dependent, e.g. f the observed zeros n the data are due to censorng of y j, as n a obt model. We assume that, gven x j, the j are ndependent Bernoull random varables wth P( y 0) P( 1) p. he model lnkng the probablty p j wth the values of the j j j covarates assocated wth unt j n area s a logstc lnear mxed model of the form logt( p ) ln p / (1 p ) x v (10) j j j j j so 1 1 p exp( ) 1exp( ) exp( x v ) 1exp( x v ). Here s a vector of j j j j j unknown fxed effects parameters and v s the random effect assocated wth area, 2 assumed to have a normal dstrbuton wth zero mean and constant varance. We estmate the parameters of (10) usng the procedure descrbed n Sae and Chambers (2003) and Mantega et al. (2007). hs s an teratve procedure, mplemented n the statstcal software package R, that combnes the Penalzed Quas-Lkelhood (PQL) estmaton of and v wth REML estmaton of the varance component parameters. Usng a 'hat' to denote these estmated values, the predcted probabltes of the logstc component of the two part random effects model are: x ˆ 1 p ˆ exp( x ˆ v ˆ ) 1exp( vˆ ). (11) j j j 9

13 In order to estmate the parameters of the second log-lnear component, of y j, we denote by s j s, y j 0 the subset of the sample for whch the response varable s non-zero, wth n denotng the number of non-zero sample unts. In what js j follows, we wll use a subscrpt of + to denote a quantty assocated wth these nonzero sample unts. Usng the data n s, we then ft the model (1) to obtan estmates of the fxed effect parameters and the predcted values of the random effects. In partcular, the Emprcal Best Lnear Unbased Estmator (EBLUE) of s 2 2 Here ˆ ˆ ˆ ss dag u s s e s D -1 1 D 1 1 s ss s 1 s ss s ˆ x vˆ x x vˆ y %. v 1 1 I, wth 1 s, I s equal to the unt vector of length n and the dentty matrx of dmenson n respectvely, where n denotes the number of area unts n s. he correspondng Emprcal Best Lnear Unbased Predctors (EBLUPs) for the random area effects are gven by uˆ ˆ ˆ ls s u u e z wth ˆ ˆ ( ˆ n ˆ ). he estmated values of y j can then be obtaned usng (2) or (9). he frst opton leads to a synthetc type predctor whle the second, after correcton for back transformaton bas, leads to an emprcal verson of the mnmum mean squared error predctor,.e. an EBP, for y j. he synthetc type predctor s SYN EP ˆ ˆ ˆ 1 ˆ 2 2 y ˆ exp ( ˆ ˆ ) ( ) ( ˆ ˆ j zj u e zjv zj V u e ), (12) whle the EBP s wth where EPBBC EPB ˆ ˆ exp ˆ (1 ˆ y j cj j ls j e n ), (13) 2 1 z ˆ ˆ z ˆ EBP ˆ 2 ˆ 2 ˆ 2 2 j a j 1 e 2 u 3 e u cˆ exp 0.5 cˆ V( ˆ ) cˆ V( ˆ ) 2 cˆ Cov( ˆ, ˆ ) ˆ ˆ( ˆ V ) ˆ a z z z z j j s j s 10

14 and ĉ 1, ĉ 2 and ĉ 3 are obtaned from ĉ 1, ĉ 2 and ĉ 3 by replacng the parameter estmates ˆ, ˆ 2, ˆ 2 e u by ˆ, ˆ 2, ˆ 2 e u. Let E 1 denote expectaton wth respect to unt level (level 1) varablty n y j. hat s, ths expectaton condtons on the random area effects n the logstc and log-lnear components of the two part model. hen, settng E ( ) 1 y, we see that under ndependence of these area effects, E ( y ) E y E E y = p (14) 1 j 1 j j 1 j 1 j j j where p j was defned followng (10). Substtutng predcted values for p j and (14) leads to a plug-n predcted value for y j, j j j n Eˆ ( y ) pˆ ˆ, (15) j j j where ˆp j s gven by (11), and ˆ ˆj E1 y j hat s, we have two dfferent predcted values: and yˆ yˆ can be calculated usng ether (12) or (13). pˆ yˆ (16) MxEP SYN EP j j j EPBBC pˆ yˆ. (17) MxEBP j j j As usual. let a 'hat' denote an estmated value. hen, for non-sample unt j n area, we yˆ MxEP E ˆ MxEBP y yˆ Eˆ y x, y, x. see that we can wrte j j x j, whle j j j s s he two predctors (16) and (17) allow us to defne three dfferent estmators for populaton mean of Y n small area as follows: () Usng (16) we can calculate a synthetc type estmator of the form ˆ 1 m MxEP MxEP ˆ,, ˆ E m y x x s s r N y y js j j jr, (18) whch we denote by MxEP n what follows; 11

15 () he ftted values y ˆ MxEP j that defne the synthetc estmator (18) can also be used to defne a 'ftted value' covarate n a lnear model for y j. hs model s then used to calculate sample weghts computed as MxEP w j va (6), and an MBDE based on these weghts m ˆ N w y. (19) MxMBDE 1 MxEP js j j We denote ths estmator by MxMBDE n what follows; () Usng (17) we can calculate an EBP type estmator of the form ˆ 1 m MxEBP MxEBP ˆ,, ˆ E m y x x s s r N y y js j j jr, (20) whch we denote by MxEBP n what follows. 4. Mean squared error estmaton Analytc estmators of the MSE of nonlnear small area estmators are techncally complex to derve and typcally nvolve a consderable degree of approxmaton. As a consequence, a number of numercally ntensve, but computatonally tractable, methods for MSE estmaton have been proposed, e.g. the jackknfe method of Jang, Lahr and Wan (2002) and the bootstrap methods descrbed n Hall and Mat (2006) and Mantega et al. (2007, 2008) and references theren. By constructon, the small area predctors (18) and (20) are non-lnear wth complex structure and so obtanng a closed form expresson for ther correspondng MSEs s not straghtforward. We therefore adopt a bootstrap approach when estmatng the MSE of (18) and (20). In partcular, we use the parametrc bootstrap method defned by the steps n the followng algorthm. Note that we use an estmator ˆm of the area mean m to motvate the algorthm, but t s generally applcable to estmators of any set of fnte populaton parameters defned on the survey populaton. Step 1. Ft the log scale lnear mxed model (1) to the postve values y j n the sample data to obtan the estmates. 12

16 Step 2. Gven the estmates ˆ ˆ ˆ ˆ, generate area-specfc random errors from 2 2 (, ) u e a lognormal dstrbuton * 0, ˆ 2 u LN, 1,, D and ndvdual level random errors from an ndependent lognormal dstrbuton * 0, ˆ 2 u e LN, j 1,..., N ; 1,, D. Step 3. Smlarly, ft the logstc lnear mxed model (10) to the sample values of the j e bnary varable j and compute and v ˆ. Step 4. Gven and ˆ v, calculate probabltes * ˆ ˆ p ˆ exp( v ˆ ) 1exp( vˆ ) 1 x x, j j j and hence generate ndependent bnary values * j, j 1,..., N; 1,, D satsfyng * * P 1 ˆ j pj. Step 5. Calculate bootstrap populaton data y * j, x j under the two part model usng y * ˆ 1 * * * j 0x j u ej j ˆ, j 1,..., N ; 1,, D, (21) and then calculate the correspondng value of the area mean Step 6. Let * * ; ; 1,, s yj js D m N y. 1 j j y denote the vector of bootstrap sample values for ths populaton. Usng these values, calculate the estmate mˆ of the area populaton mean. Step 7. Repeat steps 2-6 ndependently B tmes to generate the bootstrap dstrbuton ( b) ( b) m, ˆ m ; b1,, B of values for m and mˆ. Step 8. Calculate the bootstrap estmate of the MSE of the actual sample-based estmate m ˆ of m as 1 ( ) ( ) ˆ 2. (22) B boot b b mˆ m m B b1 mse ( ) 5. Emprcal evaluatons In ths Secton we report the results from a lmted set of emprcal evaluatons that llustrate the performance of the dfferent estmators of small area means descrbed n the precedng sectons, and ther correspondng MSE estmators. hese estmators are set out n able 1. Note that for the commonly used lnear mxed model EBLUP, denoted by LnEBLUP and whch served as the baselne estmator n our smulatons, we used the 13

17 MSE estmator of Prasad and Rao (1990). For the mxture model based MBDE (19) (MxMBDE) we followed the Chambers et al. (2011) approach and used a pseudolnearzaton-based MSE estmator. Fnally, for the mxture model based ndrect estmators MxEP (18) and MxEBP (20) we used the parametrc bootstrap procedure detaled n Secton 4. We used two types of smulatons n our emprcal evaluatons. he frst used models to smulate populaton and sample data. In ths case, at each smulaton, populaton data were frst generated under the model and a sngle sample was then taken from ths smulated populaton by stratfed smple random samplng wthout replacement, wth the small areas defnng the strata. he results from these smulatons allow one to compare dfferent estmators n terms of ther senstvty to model assumptons. he second type of smulaton was desgn-based, usng populaton data created by nonparametrcally bootstrappng a real survey dataset. Here we evaluated estmators n the context of ther performance under repeated samplng from ths populaton under a pre-specfed sample desgn. he results from these smulatons allow one to assess the robustness of dfferent estmators to the type of model msspecfcaton seen n practce. We use two measures of the relatve performance for the dfferent small area estmaton methods that were consdered n our smulatons. hese are the average percent relatve bas 1 1 K ( ) ˆ 100 k 1 k k AvRB m mean m K m m and the average percent relatve root mean squared error AvRRMSE m mean K 2 1 K mˆ k m k ( ) 100 k 1 mk of the estmates ˆm k generated by an estmaton method. Here 1 K m K m k 1 k, wth the subscrpt ndexng the small areas and the subscrpt k ndexng the K Monte Carlo smulatons, and wth m k denotng the actual area mean at smulaton k, wth predcted value ˆm k. Note that n the desgn-based smulatons m k m, so m m. 14

18 We also nvestgated the performance of the dfferent MSE estmaton methods consdered n the smulatons. Here we calculated the average relatve bas of the MSE estmaton method, defned by AvRB M mean M K M ˆ M. 1 1 K ( ) 100 k 1 k Here M ˆ k denotes the smulaton k value of the MSE estmator n area, and M denotes the actual (.e. Monte Carlo) MSE n area. We also consder a secondary performance ndcator. hs s based on the fact that n many applcatons of small area estmaton, MSE estmators are used to calculate Gaussan type confdence ntervals for the small area quanttes of nterest. Consequently t s nterestng to evaluate the coverage propertes of such ntervals. In partcular, we focussed on two sgma (.e. nomnal 95 percent) Gaussan ntervals, and calculated the average percent coverage 1 K ( ) ˆ 2 ˆ AvCR M mean K I m 1/2 k mk M k 100. k 1 able 1. Defntons of small area predctors used n the smulaton studes. Estmator Descrpton Method of MSE estmaton Mxture model based method MxEBP Emprcal best predctor (20) defned by Bootstrap MSE (22) the predcted values (17) MxEP Emprcal synthetc predctor (18) defned Bootstrap MSE (22) by the predcted values (16) MxMBDE MBDE estmator (19) defned by a 'ftted values' lnear model, wth the predcted values (16) used as the model covarate Pseudo-lnearzaton MSE estmator of Chambers et al. (2011) Raw scale lnear mxed model based method LnEBLUP Standard lnear mxed model EBLUP Prasad and Rao (1990) MSE estmator 5.1 Model-based smulatons Model-based smulatons are a standard way of llustratng the senstvty of an estmaton procedure to varaton n assumptons about the structure of the populaton of nterest. he model-based smulatons reported n ths paper are based on populaton data generated under model (1). We choose a populaton sze N 15,000 wth D 30 small areas and a sample sze n 600 and then randomly generated small area populaton szes 15

19 N, 1,..., D; N N and sample szes as n N( n/ N); n n. he average small area populaton and sample szes were 500 and 20 respectvely. hese were fxed n all smulatons. Populaton values of y j j 1,..., N ; 1,..., D were frst generated va the model log( yj ) log(5) 0.5log( xj ) u ej wth unt level random errors e j ndependently generated from the normal dstrbuton N(0, e 0.5), and random area effects u ndependently generated from the normal dstrbuton N(0, u 0.3 ). he covarate values log( x j ) were generated from the normal dstrbuton N(log(2), x 3 ). We generated zero values for y j usng Posson samplng,.e. we set y j to zero f the realzed value of an ndependently generated unform varate 1 Uj Unform(0, P ) was such that U j p j, where p j was computed usng (10) wth the same fxed effect coeffcent values as (1) and wth an ndependent area effect drawn from the normal dstrbuton wth zero mean and a standard devaton of 0.1. he value of P was chosen to generate dfferng numbers of zero values n the populaton. hus wth P = 0.9, approxmately 10% of populaton values of Y are set to zero, whle wth P = 0.5, ths ncreases to 50% and wth P = 0.3 t becomes 70%. A random sample of (fxed) sze n 20 was drawn from each area. We also repeated these smulatons wth a smaller sample of sze n 300 and wth area sample szes of n 10. All smulatons conssted of K = 1000 ndependent replcatons, wth the results from these smulatons set out n able 2. he percentage average relatve bas (AvRB) values n able 2 ndcate that LnEBLUP has a sgnfcantly larger bas than all three mxture model based small area estmaton methods (MxEBP, MxEP and MxMBDE). hs mples that LnEBLUP may not be sutable for semcontnuous data. Restrctng ourselves to the mxture model based small area estmaton methods, we see that the bas values reported for MxEBP are smaller than those reported for MxMBDE and MxEP. Further, the bas advantage of MxEBP appears larger for smaller sample szes. For moderate sample szes ( n 20) the MxMBDE domnates the MxEP n term of bas, but ths s not the case for small sample szes ( n 10). Average relatve bases ncrease for all the methods as sample szes 16

20 decrease or as the proporton of zero values n the populaton (.e. the level of zero nflaton n the data) ncreases. urnng now to the percentage average relatve root mean square errors (AvRRMSE) values n able 2, we see agan that smaller area sample szes or larger proportons of populaton zeros leads to an ncrease n the percentage average relatve root mean square errors of all the methods. Also, LnEBLUP contnues to record very large values of relatve root mean square error as compared to the mxture model based methods, renforcng our prevous comment that ths method of small area estmaton appears best avoded when faced wth zero nflated skewed data. Among the mxture model based methods, the MxEBP domnates the other methods. Overall, ths predctor appears to offer substantal bas and effcency gans over the other predctors that we consdered n our smulatons. able 2. Percentage average relatve bas (AvRB) and percentage average relatve RMSE (AvRRMSE) of dfferent estmators n model based smulatons. P MxEBP MxEP MxMBDE LnEBLUP n 10 n 20 n 10 n 20 n 10 n 20 n 10 n 20 AvRB AvRRMSE We now turn to an examnaton of the performance of the MSE estmators assocated wth the dfferent predctors. In partcular, we present results from a lmted model-based smulaton study that was carred out to llustrate the emprcal performance of the dfferent MSE estmators defned n able 1. Here we only consdered a sample sze n 300 wth area specfc sample szes of n 10. We also only consdered two zero nflaton scenaros, correspondng to P = 0.50 and P = hese smulatons were repeated K = 500 tmes. Note that bootstrap estmaton of the MSE n each smulaton was based on B = 500 bootstrap samples. he results for these smulatons are set out n able 3 and correspond to averages over the small areas of the true RMSEs (AvRMSE) 17

21 and the estmated RMSEs (AvERMSE), the average percentage relatve bas (AvRB), and the average percentage coverage rates of nomnal 95 per cent Gaussan confdence ntervals (AvCR) based on the varous MSE estmators. able 3. Average true RMSEs (AvRMSE), average estmated RMSEs (AvERMSE), average percentage relatve bas (AvRB), and average percentage coverage rates of nomnal 95 per cent Gaussan confdence ntervals (AvCR) generated by MSE estmators of the dfferent small area estmators defned n able 1. Area sample szes are n 10. Averages are over the small areas. P MxEBP MxEP MxMBDE LnEBLUP AvCR AvERMSE (AvRMSE) (8.67) (14.50) (11.92) (57.20) (7.22) 9.22 (8.90) (12.40) (31.09) AvRB From the results reported n able 3, we see that all methods of MSE estmaton lead to Gaussan confdence ntervals wth average actual coverage AvCR at or near nomnal coverage. Furthermore, the MSE estmators (bootstrap and psuedo lnearzaton) for the three mxture model based predctors (MxEBP, MxEP and MxMBDE) all report average estmated RMSE values that are close to the true average RMSE values. In three out of the four cases of the bootstrap MSE estmator for MxEBP and MxEP we see that on average the estmated RMSE values are a lttle less than the true RMSE values, ndcatng a small downward bas. hs s reflected n the average percentage relatve bas (AvRB) values recorded for these cases. In contrast, the pseudo-lnearzaton MSE estmator used wth MxMBDE has ether vrtually no bas or a very small upward bas (agan reflected n ts AvRB values), whle the lnear model based MSE estmator for LnEBLUP seems somewhat unstable, beng conservatve when the proporton of zeros n the populaton s relatvely small, but optmstc when ths proporton s hgh. Overall, we can see that the average percentage relatve bas (AvRB) values recorded by the MSE 18

22 estmators for the three mxture model based predctors are all small, n contrast to the bas values recorded by the lnear model based MSE estmator for LnEBLUP, whch are much larger. 5.2 Desgn-based smulaton Our desgn-based smulatons were based on actual survey data collected n the Australan Agrcultural Grazng Industry Survey (AAGIS) conducted by the Australan Bureau of Agrcultural and Resource Economcs. he survey collects detaled fnancal (e.g. farm busness recepts, assets, debt), physcal (e.g. farm area and locaton) and socoeconomc nformaton (e.g. age and educaton of farm operator) from farm busnesses across Australa. he target populaton for the survey s broadacre farms operatng n 3 broad agro-ecologcal zones, the pastoral zone, the wheat-sheep zone and the hgh ranfall zone. In ths study we use the wheat-sheep zone, whch conssts of 12 regons (the small areas of nterest). In the orgnal sample there were 760 farms from 12 regons n the wheat-sheep zone. he varable of nterest for ths study s number of beef cattle on hand at the end of the fnancal year (BEEFCL) and the covarate s land area (LAND). A lnear model ft to the sample data was very poor (R 2 = 0.18 for the lnear regresson of BEEFCL on LAND). hs ft mproved slghtly (R 2 = 0.25) when dummy varables correspondng to four out of the fve broadacre ndustres: () specalst croppng farms, () mxed lvestock and croppng farms, () sheep specalsts, (v) beef specalsts and (v) mxed sheep and beef farms, were ncluded as covarates of the lnear model. It s noteworthy that the target varable BEEFCL s zero nflated wth about 38 per cent of ts values equal to zero. In partcular, out of a total sample of 760 observatons there are 286 zero values. he dstrbuton of regon sample szes and proporton of zeros s gven n able 4 and dsplayed n Fgure 1. We used the 474 farms wth BEEFCL > 0 and ftted a model for BEEFCL n terms of correspondng values of LAND for these farms. However, we dd not observe any mprovement n the model ft (R 2 = 0.18) even after we ncluded the dummy varables correspondng to ndustres (), (), (v) and (v) above (R 2 = 0.23). 19

23 able 4. Regon specfc sample szes and populaton szes Regons Populaton Sample sze Sample sze Sample sze Proporton of sze (N ) (n ) for y > 0 for y = 0 zeros otal A careful examnaton of the sample data ndcates that the margnal dstrbutons of both BEEFCL and LAND are hghly skewed and there s clear evdence of non-lnearty n ther relatonshp (see the hstograms dsplayed n Fgure 2). When a lnear model based on the logarthm of LAND and the four ndustry dummy varables referred to earler was ftted to the logarthm of BEEFCL, the ft mproved (R 2 = 0.41). he usual lnear model assumptons of normalty, homoscedastcty, etc., were also satsfed. As a consequence t was decded that a log scale lnear model was approprate for postve values of BEEFCL, wth the covarates for the fxed part of the model defned by the logarthm of LAND and these four ndustry dummy varables. Gven that the resduals from ths model also dsplayed sgnfcant between regon varablty, a regon random effect was ncluded n the model,.e. we ftted model (1). hs mproved the R 2 value to just under 50%, wth all model coeffcents hghly sgnfcant. Furthermore, when we ftted the mxed logstc model (10) to the bnary ndcator for BEEFCL > 0 n these data, usng the same covarates as n (1), the dummy varables correspondng to ndustres () and (v) and the logarthm of LAND were sgnfcant, wth some evdence 2 of overdsperson ( ˆ , wth a standard devaton of ). Fnally, we carred out a crude check of whether the random effects n (1) and (10) mght be 20

24 correlated by fttng a logstc model to the same bnary ndcator for BEEFCL > 0 but ths tme just usng the EBLUPs from (1) as the model covarates. he ft of ths dagnostc model was sgnfcant, wth a Generalzed R 2 of 14%, ndcatng potental correlaton between the random effects n (1) and the random effect n (10). However, n our smulatons we gnored ths and proceeded on the bass of a workng model defned by a zero correlaton between these two sources of varablty. Fgure 1. Dstrbuton of regonal sample szes (left sde) and regonal proportons of zero observatons (rght sde). Fgure 2. Hstogram of BEEFCL (> 0) on raw scale (left plot) and on log scale (rght). We then used these AAGIS sample data to generate a synthetc populaton of N 39,569 farms by re-samplng the orgnal AAGIS sample of n 760 farms wth probablty proportonal to a farm s sample weght. Once created, ths fxed populaton 21

25 was repeatedly sampled usng stratfed random samplng wth regons correspondng to strata and wth stratum sample szes the same as n the orgnal sample. able 5 shows the average over the 12 regons of the percentage relatve bas and percentage relatve root mean squared error values of the dfferent small area estmaton methods based on K = 1000 ndependent stratfed samples taken from ths synthetc populaton. able 5. Regon specfc values of the percentage relatve bases (RB) and percentage relatve root mean squared errors (RRMSE) for dfferent small area predctors. Regons MxEBP MxEP MxMBDE LnEBLUP MxEBP MxEP MxMBDE LnEBLUP RB RRMSE Average Medan From the results set out n able 5 we see that the MxEBP predctor has generally smaller average bas and smaller average RRMSE than the other three predctors consdered here, whle the synthetc type predctor MxEP performs poorly, recordng the worst values for RB n 7 out of the 12 regons. hs s not unexpected snce the log scale lnear mxed model underpnnng MxEP almost certanly does not hold exactly n the synthetc AAGIS populaton. Furthermore, snce MxEP does not explcty allow for heterogenety between regons, t s senstve to bas nduced by regon to regon varablty n the relatonshp between BEEFCL and LAND. On the other hand, even though LnEBLUP s based on a clearly napproprate model for BEEFCL, ts performance as a predctor s reasonable n most cases, reflectng the fact that t ncludes 22

26 a between area adjustment (albet on the raw scale rather than on the log scale). We also see that although the mxture model based drect estmator MxMBDE has better RB values than MxEP, ts RRMSE tends to be large, reflectng the fact that t s a drect estmator. he large relatve bas and relatve RMSE of MxMBDE and LnEBLUP n regon 8 s noteworthy. In ths regon the proporton of zero values s small, and the postve BEEFCL values hghly skewed wth many outlers. Here LnEBLUP performs badly because ts assumed lnear model s a poor ft to these skewed data, whle MxMBDE fals because as a drect estmator t s senstve to the presence of outlers. Overall, t s clear from the results n able 5 that the mxture model based predctor MxEBP performed better n our desgn based smulatons than ts compettors, both n terms of relatve bas and relatve root mean squared error. We now consder the desgn-based performance of the parametrc bootstrap procedure used to estmate the MSE of MxEBP n these smulatons. Here, for each sample from the fxed synthetc populaton, the bootstrap MSE estmate was based on B = 100 bootstrap samples. he average RMSE values generated by these regon-specfc bootstrap MSE estmates for MxEBP are shown n Fgure 3, as s the correspondng average of the true desgn-based RMSE for ths predctor. We see that the value of the true desgn-based RMSE for regon 8 s very hgh, whle the correspondng bootstrapbased RMSE estmate tends to be low. As noted earler, ths regon has hghly skewed data, wth extreme values persstng even after a logarthmc transformaton. hs generated large values for the true RMSE of MxEBP. hs behavour was not replcated by the parametrc bootstrap, as ts bootstrap populaton data were generated under a dstrbutonal assumpton that dd not allow for such outlers. hs rases questons about outler robust MSE estmaton that are beyond the scope of ths paper however. Generally, we see that n the remanng regons, where the log scale lnear model assumptons for BEEFCL are more approprate, the bootstrap MSE estmator tracks the actual MSE of MxEBP reasonably well and we are lead to the same conclusons about ths MSE estmator as n the model based smulaton study presented n Secton

27 Fgure 3. Regon-specfc values of true desgn-based RMSE (sold lne) and average estmated RMSE (dashed lne) for the MxEBP obtaned n the desgn-based smulatons usng the AAGIS data. 6. Conclusons In ths paper we explore small area estmaton for semcontnuous varables, where the data are skewed and contan a substantal proporton of zeros Our approach assumes a mxture or two part random effects model, and we propose an emprcal best predctor estmator for small area means for ths case. We also propose a parametrc bootstrap estmator for ts MSE. Emprcal results reported n the paper support the concluson that the proposed mxture model based emprcal best predctor (MxEBP) s less based and can be more effcent than both the correspondng synthetc type predctor (MxEBP) as well as the model based drect type estmator (MxMBDE) based on the 'ftted values' defned by the assumed mxture model. hese results also suggest that gnorng the skewed and semcontnuous nature of the data and usng a standard mxed lnear modelbased EBLUP estmator (LnEBLUP) can lead to based and unstable estmates. We note that, provded the mxture model assumptons are reasonable for the small area data, the proposed parametrc bootstrap procedure seems to work well. An applcaton to real agrcultural survey data provdes some emprcal support for these observatons. It should be noted that we assume a log scale lnear mxed model for non-zero skewed data. Although the log transformaton s wdely used n practce for such data, t s not 24

28 the only approprate transformaton to lnearty, and other transformatons (e.g. square root) can be explored n ths context. We also assume that zero nflaton n the data can be adequately modelled va a mxture of two ndependent components, a Bernoull varable and a Lognormal varable. As noted earler, ths s not approprate f n fact the zero values are essentally due to truncaton, and ndeed n the AAGIS data that we used n our desgn-based smulatons, there s some evdence that the random area effect n the lnear mxed model (1) and the random area effect n the logstc mxed model (10) are correlated. Furthermore, other models for zero nflated skewed data, e.g. those based a generalzed lnear mxed model wth underlyng Gamma or Posson dstrbutons are also possble. We are currently workng on these ssues. Acknowledgment he authors would lke to acknowledge the valuable comments and suggestons of the Edtor, Assocate Edtor and two anonymous referees. hese led to a consderable mprovement n the paper. References Breslow, N. E. and Clayton, D. G. (1993). Approxmate nference n generalzed lnear mxed models. Journal of the Amercan Statstcs Assocaton, 88, Berg, E., and Chandra, H. (2012). Small area predcton for a unt level lognormal model. Proceedngs of the 2012 Federal Commttee on Statstcal Methodology Research Conference, Washngton, DC, USA, January 10-12, Chandra, H. and Sud, U.C. (2012). Small area estmaton for zero-nflated data. Communcatons n Statstcs - Smulaton and Computaton, 41 (5), Chambers, R., Chandra, H., and zavds, N. (2011). ) On bas-robust mean squared error estmaton for lnear predctors for domans. Survey Methodology, 37 (2), pp Chandra, H. and Chambers, R. (2011a). Small area estmaton under transformaton to lnearty. Survey Methodology, 37 (1), pp Chandra, H. and Chambers, R. (2011b). Small area estmaton for skewed data n presence of zeros. he Bulletn of Calcutta Statstcal Assocaton, 63, pp

29 Chandra, H. and Chambers, R. (2009). Multpurpose weghtng for Small area estmaton. Journal of Offcal Statstcs, 25 (3), Fletcher, D., MacKenze, D. and Vllouta, E. (2005). Modellng skewed data wth many zeros: a smple approach combnng ordnary and logstc regresson. Journal of Envronmental and Ecologcal Statstcs, 12 (1), Hall, P. and Mat,. (2006). On parametrc bootstrap methods for small area predcton. Journal Royal Statstcal Socety, Seres B, 68, Jang, J., Lahr, P. and Wan, S. (2002). A unfed Jackknfe theory for emprcal best predcton wth M-estmaton. Annals of Statstcs, 30 (6), Karlberg, F. (2000). Populaton total predcton under a lognormal superpopulaton model. Metron, LVIII, Mantega, G.W., Lombardìa, M.J., Molna, I., Morales, D., and Santamarìa, L. (2008). Bootstrap mean squared error of a small-area EBLUP. Journal of Statstcal Computaton and Smulaton, 78(5), Mantega, G.W., Lombardìa, M.J., Molna, I., Morales, D., and Santamarìa, L. (2007). Estmaton of the mean squared error of predctors of small area lnear parameters under a logstc mxed model. Computatonal Statstcs and Data Analyss, 51(5), Olsen, M.K. and Schafer, J. L. (2001). A two-part random-effects model for semcontnuous longtudnal data. Journal of the Amercan Statstcal Assocaton, 96 (454), Pfeffermann, D., erryn, B. and Moura, F.A.S. (2008). Small area estmaton under a two-part random effects model wth applcaton to estmaton of lteracy n developng countres. Survey Methodology, 34 (2), Prasad, N.G.N. and Rao, J.N.K. (1990). he estmaton of the mean squared error of small area estmators. Journal of the Amercan Statstcal Assocaton, 85, Rao, J.N.K. (2003). Small Area Estmaton. Wley, New York. Sae, A. and Chambers, R. (2003) Small area estmaton under lnear and generalzed lnear mxed models wth tme and area effects. Methodology Workng Paper No. M03/15. Unversty of Southampton, UK. (avalable from 26

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function On Outler Robust Small Area Mean Estmate Based on Predcton of Emprcal Dstrbuton Functon Payam Mokhtaran Natonal Insttute of Appled Statstcs Research Australa Unversty of Wollongong Small Area Estmaton