Small area estimation for semicontinuous data

Size: px
Start display at page:

Download "Small area estimation for semicontinuous data"

Transcription

1 Unversty of Wollongong Research Onlne Faculty of Engneerng and Informaton Scences - Papers: Part A Faculty of Engneerng and Informaton Scences 2016 Small area estmaton for semcontnuous data Hukum Chandra Indan Agrcultural Statstcs Research Insttute, hchandra@uow.edu.au Raymond L. Chambers Unversty of Wollongong, ray@uow.edu.au Publcaton Detals Chandra, H. & Chambers, R. L. (2016). Small area estmaton for semcontnuous data. Bometrcal Journal: Journal of Mathematcal Methods n Boscences, 58 (2), Research Onlne s the open access nsttutonal repostory for the Unversty of Wollongong. For further nformaton contact the UOW Lbrary: research-pubs@uow.edu.au

2 Small area estmaton for semcontnuous data Abstract Survey data often contan measurements for varables that are semcontnuous n nature,.e. they ether take a sngle fxed value (we assume ths s zero) or they have a contnuous, often skewed, dstrbuton on the postve real lne. Standard methods for small area estmaton (SAE) based on the use of lnearmxed models can be neffcent for such varables. We dscuss SAE technques for semcontnuous varables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the nonzero values of the response varable. In partcular, we frst model the excess zeros va a generalzed lnear mxed model ftted to the probablty of a nonzero,.e. strctly postve, value beng observed, and then model the response, gven that t s strctly postve, usng a lnear mxed model ftted on the logarthmc scale. Emprcal results suggest that the proposed method leads to effcent small area estmates for semcontnuous data of ths type. We also propose a parametrc bootstrap method to estmate the MSE of the proposed small area estmator. hese bootstrap estmates of the MSE are compared to the true MSE n a smulaton study. Dscplnes Engneerng Scence and echnology Studes Publcaton Detals Chandra, H. & Chambers, R. L. (2016). Small area estmaton for semcontnuous data. Bometrcal Journal: Journal of Mathematcal Methods n Boscences, 58 (2), hs journal artcle s avalable at Research Onlne:

3 Small area estmaton for semcontnuous data Hukum Chandra 1,* and Ray Chambers 2 1 Indan Agrcultural Statstcs Research Insttute, Lbrary Avenue, New Delh , Inda. 2 Natonal Insttute for Appled Statstcs Research Australa, Unversty of Wollongong, Wollongong, NSW, 2522, Australa. Abstract Survey data often contan measurements for varables that are semcontnuous n nature,.e. they ether take a sngle fxed value (we assume ths s zero) or they have a contnuous, often skewed, dstrbuton on the postve real lne. Standard methods for small area estmaton (SAE) based on the use of lnear mxed models can be neffcent for such varables. We dscuss SAE technques for semcontnuous varables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the non-zero values of the response varable. In partcular, we frst model the excess zeros va a generalzed lnear mxed model ftted to the probablty of a non-zero,.e. strctly postve, value beng observed, and then model the response, gven that t s strctly postve, usng a lnear mxed model ftted on the logarthmc scale. Emprcal results suggest that the proposed method leads to effcent small area estmates for semcontnuous data of ths type. We also propose a parametrc bootstrap method to estmate the MSE of the proposed small area estmator. hese bootstrap estmates of the MSE are compared to the true MSE n a smulaton study. Key words: Mean squared error; Parametrc bootstrap; Skewed data; Small area estmaton; Zero-nflated. 1. Introducton Many varables of nterest n busness, agrcultural, envronmental, ecologcal and epdemologcal surveys are semcontnuous n nature,.e. they ether take a sngle fxed value (typcally zero) or they have a contnuous, often skewed, dstrbuton on the postve real lne. hs artcle focuses on a partcular type of semcontnuous varable frequently encountered n practce, a mxture of zeros and contnuous strctly postve * Correspondng author: e-mal: hchandra12@gmal.com, Phone: , Fax:

4 values that are generally skewed. Such a semcontnuous varable s qute dfferent from one that has been left-censored or truncated, because the zeros are vald self-representng data values, not proxes for negatve or mssng responses. It s therefore natural to vew a semcontnuous response of ths type as the result of two processes, one determnng whether the response s zero and the other determnng the actual level f t s non-zero (Olsen and Schafer, 2001). Measurements of ndebtedness, nvestment, producton or amount of stock on hand all represent stuatons where semcontnuous data are typcally collected n household and busness surveys. For example, Amount of Loan Outstandng (collected n the 59 th Round of the Natonal Sample Survey, or NSS, n Inda), and Closng Beef Cattle, or BEEFCL (collected n the Australan Agrcultural Grazng Industres Survey, or AAGIS) are just two cases of mportant survey output varables that are, by ther defnton, semcontnuous. In both, the target varable s ether zero or some postve value, wth these postve values then havng a skewed dstrbuton. Unlke the NSS data, an anonymsed verson of the AAGIS data s avalable, and so these data are used n the emprcal evaluatons presented n Secton 5, whch focus on regonal estmaton for BEEFCL. See Fgure 1 and able 4 for the dstrbutons of regonal sample szes and proportons of zero values n the AAGIS sample data, whle the sample dstrbuton of BEEFCL n these data s shown n Fgure 2. It s clear from Fgures 1 and 2 that BEEFCL s zero-nflated wth hghly skewed non-zero values. Snce a lnear model s not approprate for a semcontnuous varable, commonly used methods for small area estmaton based on the use of lnear mxed models (e.g. the emprcal best lnear unbased predctor or EBLUP) can be neffcent for such varables (see Rao, 2003). Chandra and Chambers (2011a) and Berg and Chandra (2012) nvestgate small area estmaton methods for skewed varables, focussng on the case where a lnear mxed model s approprate after a logarthmc (log) transformaton. Chandra and Chambers (2011a) descrbe two methods of small area estmaton for such postvely skewed varables. he frst, a model-based drect estmator or MBDE, s defned as a weghted sum of the sampled unts n the small area, wth weghts constructed so as to lead to the mnmum mean squared error lnear predctor of the overall populaton mean f the parameters of the log scale lnear mxed model were known. he second, based on the approach of Karlberg (2000), uses an emprcal 1

5 predctor based on a log scale lnear mxed model that s analogous to the synthetc estmator under a lnear mxed model. he MBDE s a drect estmator and unbased n the presence of between area heterogenety, but can yeld unstable estmates f sample szes are too small. On the other hand, the synthetc type emprcal predctor only accounts for between area varablty through between area varaton n the model covarates, and can therefore lead to based estmators when there s sgnfcant resdual between area heterogenety. Berg and Chandra (2012) also descrbe an emprcal best predctor that has mnmum mean squared error n the class of unbased predctors when a log scale lnear mxed model s approprate. hs predctor allows for between area varaton and s ndrect,.e. t uses nformaton from all the small areas. However, all these approaches are restrcted to a strctly postve varable, and so cannot be drectly appled to a semcontnuous varable. he presence of excess zeros n survey data s a well known problem, and a varety of approaches have been suggested for addressng t. However, much less s known when the focus s on small area estmaton usng these data, even though presence of excess zeros wthn a small area are clearly much more nfluental than they are n the larger overall sample. A two part random effects model (Olsen and Schafer, 2001), also referred as a mxture model (Fletcher et al., 2005), s wdely used for small area estmaton wth zero-nflated varables, see for example, Pfeffermann et al. (2008) and Chandra and Sud (2012). In what follows we therefore develop a small area estmaton method for semcontnuous varables under a two part random effects model. Here we frst model the excess zeros va a generalzed lnear mxed model ftted to the probablty of a non-zero,.e. strctly postve, value beng observed, and then model the response, gven that t s strctly postve, usng a log scale lnear mxed model. hese two model components are combned n estmaton. We also propose a parametrc bootstrap method that can be used estmate the mean squared error (MSE) of our proposed two part estmator. he structure of the paper s as follows. In Secton 2 we develop a number of predctors for a small area mean based on a log scale lnear mxed model. In Secton 3 we then ntroduce the two part random effects model (or mxture model) and dscuss dfferent 2

6 approaches to small area estmaton under ths model. Secton 4 then focuses on MSE estmaton va a parametrc bootstrap approach. In Secton 5 we present results from both model-based as well as desgn-based smulatons whch are used llustrate the performances of the dfferent methods of small area estmaton dscussed n Secton 3, wth the desgn-based smulatons based on survey data from the AAGIS. Fnally, n Secton 6 we summarze our man fndngs and dscuss avenues for future research. 2. Small area estmaton under transformaton to lnearty We assume that a non-nformatve samplng method s used to draw a sample of sze n from a fnte populaton U of sze N whch conssts of D non-overlappng domans U ( 1,..., D). Followng standard practce, we refer to these domans as small areas or just areas. We further assume that there s a known number N of populaton unts n small area, wth n of these sampled. he total number of unts n the populaton s N D 1 N, wth correspondng total sample sze n D 1 n. We use s to denote the collecton of unts n sample, wth s the subset drawn from small area (.e. s n), and use expressons lke j and j s to refer to the unts makng up small area and sample s respectvely. Smlarly, r denotes the set of unts n small area that are not n sample, wth r N n and U s r. Let y j denote the value of the varable of nterest Y for unt j n area and x j denote the vector of length m 1 contanng the known values of the auxlary varables for unt j n area. hroughout we assume that the quantty of nterest s the small area mean of Y, 1 N j1 j m N y. We consder a stuaton where the varable of nterest follows a log scale lnear mxed model. hat s, y j satsfes where j 1,g( j ) log( y ) l z u e, (1) j j j j z x s the m 1 vector of covarates defned by approprate transformaton of the auxlary varables, s a m 1 vector of fxed effects, u s a random effect assocated wth area and e j s an ndvdual level random effect for unt j 3

7 n small area. Followng standard practce, we assume that the area and ndvdual effects are mutually ndependent, wth the area effects ndependently and dentcally 2 dstrbuted as u N(0, u) and the ndvdual effects ndependently and dentcally 2 dstrbuted as (0, ) y ; 1,... D; j s are ej N e. he sample observatons j assumed to be avalable. We further assume that the populaton values of z j are avalable, and that they can be lnked to the sample. Consequently, the avalable data for area are y, z ; 1,... D; js z ; 1,... D; jr. Let j j j vector of model parameters, and let 2 2 (, ) u e be the 2 2 (, ) u e ˆ ˆ ˆ ˆ be the Maxmum Lkelhood (ML) or the Restrcted Maxmum Lkelhood (REML) estmator of. In partcular, (, ) s usually referred to as the vector of varance components of the model u e wth estmator ˆ 2 ( ˆ 2, ˆ 2 ). Note that snce we have assumed a non-nformatve u e samplng method, the sample and populaton dstrbutons of the data are the same, and are gven by (1). Gven the sample data, we can estmate the unknown parameters (ncludng the area effect) of model (1) and hence defne the log-scale predctons as ˆ l z ˆ uˆ, where j j s the estmator of, and uˆ ˆ ( l z ˆ) s the emprcal best lnear unbased predctor s s (EBLUP) of the random area effect. Here ˆ ˆ ( ˆ n ˆ ) s the plug-n estmator u u e of the shrnkage effect z 1 s n js j ( n ), and u u e l n log( y ) and 1 s js j z are the sample means of l j and z j respectvely n area. Usng a predcton-based approach smlar to that descrbed n Karlberg (2000), Chandra and Chambers (2011a) then propose a synthetc type predctor for the area mean m under model (1) of the form 1 ˆ s r m ˆ N y y, (2) SYN EP SYN EP j j where and 1 ˆ 2 2 exp z 0.5 SYN EP SYN EP yˆ cˆ ˆ ˆ j j j u e 4

8 cˆ Vˆ ˆ Vˆ ˆ ˆ SYN EP 2 2 j exp 0.5 zj ( ) zj 0.25 ( u e ) s a aylor seres lnearzaton-based correcton for back transformaton bas. Note that (2) s not an Emprcal Best Predctor snce t does not allow for between unt correlaton wthn a small area when t predcts the value of a non-sample y j gven the correspondng sample values for ths varable n area. It s therefore a synthetc predctor of the small area mean. Chandra and Chambers (2011a) also propose a model-based drect estmator (MBDE) of m of the form wy js j j, where w j s an estmator of the weght that leads to the best lnear unbased predctor (BLUP) of the populaton mean f the parameters of the model (1) are known. o derve ths estmator, Chandra and Chambers (2011a) use the approxmatons, ( ) SYN EP E yj 0 1ˆ yj, (3) and Cov y y yˆ yˆ ˆ ˆ ˆ I j k, (4) SYN EP SYN EP ( j, k ) j k exp( u ) 1 exp( u ) exp( e ) 1 [ ] where ˆ SYN y j EP s gven n (2). he approxmatons (3) and (4) follow from the moment generatng functon of a normal dstrbuton, and the fact that the covarance between two unts from dfferent areas s zero. Put y ( y, y ), where y s and y r are the U s r vectors of sampled and non-sampled unts of Y respectvely. Smlarly, let ˆ SYN y EP and ˆ SYN EP r y denote the vectors contanng the values ˆ SYN EP y j for the sampled and non- SYN EP SYN EP sampled unts and defne (, ) (, ),(( ˆ ),( ˆ U s r s r s r ) ) then express (3) and (4) n matrx form as J J J 1 1 y y. We can E( y ) J U U Vss Vsr, (5) V ( yu) VU Vrs Vrr s where and the elements of varance-covarance matrx V U are gven by (4). For known parameters, the model specfed n (3) and (4) s referred to as a 'ftted 5

9 value' model and corresponds to a lnear model for y j. he BLUP of the populaton mean m N 1 D N y U 1 j1 j of Y under (5) s then N w y, where 1 s s w ( w ; js) 1 H ( J 1 J 1 ) ( I H J ) V V 1, (6) 1 s j s s U U s s s s s ss sr r where H ( J V J ) J V. Note that the weghts (6) satsfy s s s ss s s ss D w N 1 js j and. he MBDE of the small area mean m D SYN EP D N SYN EP wyˆ ˆ 1 j j y js 1 j1 j (Chandra and Chambers, 2011a) s then m ˆ N w y, (7) CC 1 js j j where the w j are the weghts (6) assocated wth the sample unts n area. We note that snce (7) s a drect estmator, t can lead to unstable estmates when area sample szes are too small. Balanced aganst ths however s ts nherent robustness to msspecfcaton of the model for the y j. Fnally, Berg and Chandra (2012) use (1) to develop the emprcal verson of the mnmum mean squared error (MMSE) predctor for m. hs s EBP where ˆ ˆ z ˆ z ˆ EBP 1 ˆ s r 2 1 ˆ ˆ, (8) m EBP ˆ N y y j j y exp l 0.5 (1 n ). We note that (8) allows for j j s j e between unt correlatons wthn a small area and s therefore an Emprcal Best Predctor (EBP) under the normalty assumptons of (1). o see ths, observe that for non-sample unt sample data x, l, x ; k s and so j r the condtonal dstrbuton of l j log(y j ) gven the area j k k s normal, wth j j, k, k ; j j, s, s zj s zs E l x l x k s E l z l z l Var l x, l, x ; k s ( n ) (1 n ) j j k k u e u u e e z z E y x y x ks l n, 2 1 j j, k, k; exp j s s 0.5 e (1 ) 6

10 whch mmedately leads to the emprcal verson (8) of the MMSE predctor (8). Consequently, when (1) holds,.e. the y j are lognormally dstrbuted, we expect (8) to domnate (2). Note that EBP ˆ ˆ 2 1 Eyˆ exp 0.5 ˆ (1 ˆ j E j ls s e n ) z z z z 2 1 exp j ls s 0.5 e (1 n ). hat s, the MMSE predctor (8) s based. Berg and Chandra (2012) use aylor seres approxmaton to bas correct ths predctor. Followng ther development, a bas corrected verson of (8) s where 1 yˆ cˆ yˆ EBP BC EBP EBP j j j Put ˆ d ˆ ls s cˆ cˆ EBP BC 1 ˆ s r m EBP BC ˆ N y y j j, wth, (9) EBP ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ j j 1 e 2 u 3 e u c exp 0.5 a c V( ) c V( ) 2 c Cov(, ). z. hen ˆ ˆ( ˆ V ) ˆ a z z z z, j j s j s ˆ ˆ ˆ ˆ ˆ ˆ d 2d n ˆ ˆ ˆ nu nu nu cˆ , ˆ ˆ 1 ˆ dˆ ˆ 1 ˆ 2ˆ 1 ˆ dˆ, ˆ ˆ ˆ u u u dˆ ˆ ˆ dˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 3 d n ˆ 2 ˆ ˆ ˆ nu u nu nu 3. Small area estmaton under a mxture model We now consder the case where the response varable y j s semcontnuous. In partcular, we shall assume that y j s ether zero or has a skewed dstrbuton over the strctly postve real lne. We descrbe an approach based on modellng ths varable va 7

11 a two part random effects model (also referred as a mxture model). hat s, we shall assume that y j s drawn from a two-component mxture, where the frst component corresponds to a fxed value (zero) and the second component corresponds to a strctly postve random varable wth a skewed dstrbuton. Followng Olsen and Schafer (2001), Pfeffermann et al. (2008), Chandra and Chambers (2011b) and Chandra and Sud (2012), we defne I(A) as the ndcator functon for the event A and wrte y 0I y 0 y I y 0 y, where y% j s referred to as the log-lnear j j j j j j component of y j and s assumed to follow the log scale lnear mxed model (1). he second component j Iyj 0 s assumed to follow a generalzed lnear mxed model (GLMM) wth logt lnk functon (Breslow and Clayton, 1993), and s referred as the logstc component of y j. Note that values of y j are only observed when j 1, whereas values of j are always observed. Small area estmaton under ths mxture model s mplemented n three steps. Frst, a logstc lnear mxed model s ftted to the sample values of the ndcator varable j. Second, a log scale lnear mxed model s ftted to the postve sample values of the response varable. Fnally, predcted values generated under these two models are combned at the estmaton stage. Chandra and Chambers (2011b) used a smlar mxture model for small area estmaton of zero-nflated skewed data. However, ther approach focuses on the MBDE estmator for ths case, and uses sample weghts obtaned va the 'ftted value' lnear model mpled by the two part mxture model. hey also develop a MSE estmator based on pseudo-lnearzaton (Chambers et al., 2011). However, as noted earler, the MBDE s a drect estmator and can be unstable when area specfc sample szes are too small. Fttng the logstc component of a two part random effects model poses computatonal challenges smlar to those found when fttng generalzed lnear mxed models. Generally, an approxmate Fsher scorng procedure based on hgher order Laplace approxmatons s used to obtan maxmum lkelhood estmates for the fxed coeffcents and varance components, see Olsen and Schafer (2001). Pfeffermann et al. 8

12 (2008) use a two part random effects model that allows for the random area effects n the two components of the model to be correlated. However, ther smulaton results show that ths correlaton does not sgnfcantly mprove small area estmaton. Furthermore, use of ths correlaton makes model fttng computatonally ntensve and sometmes numercally unstable. Consequently the area random effects n the two components of the two part random effects model are often assumed to be ndependent, see for example, Karlberg (2000) and references theren. We shall proceed smlarly and assume that the two area random effects are uncorrelated. hat s, followng the Pfeffermann et al. (2008), Chandra and Chambers (2011b) and Chandra and Sud (2012) we assume that the correlaton between the two random components j and y j of the assumed mxture model s neglgble. Note that ths mples that the mxture model s not approprate f there s reason to beleve that the dstrbutons of these components are dependent, e.g. f the observed zeros n the data are due to censorng of y j, as n a obt model. We assume that, gven x j, the j are ndependent Bernoull random varables wth P( y 0) P( 1) p. he model lnkng the probablty p j wth the values of the j j j covarates assocated wth unt j n area s a logstc lnear mxed model of the form logt( p ) ln p / (1 p ) x v (10) j j j j j so 1 1 p exp( ) 1exp( ) exp( x v ) 1exp( x v ). Here s a vector of j j j j j unknown fxed effects parameters and v s the random effect assocated wth area, 2 assumed to have a normal dstrbuton wth zero mean and constant varance. We estmate the parameters of (10) usng the procedure descrbed n Sae and Chambers (2003) and Mantega et al. (2007). hs s an teratve procedure, mplemented n the statstcal software package R, that combnes the Penalzed Quas-Lkelhood (PQL) estmaton of and v wth REML estmaton of the varance component parameters. Usng a 'hat' to denote these estmated values, the predcted probabltes of the logstc component of the two part random effects model are: x ˆ 1 p ˆ exp( x ˆ v ˆ ) 1exp( vˆ ). (11) j j j 9

13 In order to estmate the parameters of the second log-lnear component, of y j, we denote by s j s, y j 0 the subset of the sample for whch the response varable s non-zero, wth n denotng the number of non-zero sample unts. In what js j follows, we wll use a subscrpt of + to denote a quantty assocated wth these nonzero sample unts. Usng the data n s, we then ft the model (1) to obtan estmates of the fxed effect parameters and the predcted values of the random effects. In partcular, the Emprcal Best Lnear Unbased Estmator (EBLUE) of s 2 2 Here ˆ ˆ ˆ ss dag u s s e s D -1 1 D 1 1 s ss s 1 s ss s ˆ x vˆ x x vˆ y %. v 1 1 I, wth 1 s, I s equal to the unt vector of length n and the dentty matrx of dmenson n respectvely, where n denotes the number of area unts n s. he correspondng Emprcal Best Lnear Unbased Predctors (EBLUPs) for the random area effects are gven by uˆ ˆ ˆ ls s u u e z wth ˆ ˆ ( ˆ n ˆ ). he estmated values of y j can then be obtaned usng (2) or (9). he frst opton leads to a synthetc type predctor whle the second, after correcton for back transformaton bas, leads to an emprcal verson of the mnmum mean squared error predctor,.e. an EBP, for y j. he synthetc type predctor s SYN EP ˆ ˆ ˆ 1 ˆ 2 2 y ˆ exp ( ˆ ˆ ) ( ) ( ˆ ˆ j zj u e zjv zj V u e ), (12) whle the EBP s wth where EPBBC EPB ˆ ˆ exp ˆ (1 ˆ y j cj j ls j e n ), (13) 2 1 z ˆ ˆ z ˆ EBP ˆ 2 ˆ 2 ˆ 2 2 j a j 1 e 2 u 3 e u cˆ exp 0.5 cˆ V( ˆ ) cˆ V( ˆ ) 2 cˆ Cov( ˆ, ˆ ) ˆ ˆ( ˆ V ) ˆ a z z z z j j s j s 10

14 and ĉ 1, ĉ 2 and ĉ 3 are obtaned from ĉ 1, ĉ 2 and ĉ 3 by replacng the parameter estmates ˆ, ˆ 2, ˆ 2 e u by ˆ, ˆ 2, ˆ 2 e u. Let E 1 denote expectaton wth respect to unt level (level 1) varablty n y j. hat s, ths expectaton condtons on the random area effects n the logstc and log-lnear components of the two part model. hen, settng E ( ) 1 y, we see that under ndependence of these area effects, E ( y ) E y E E y = p (14) 1 j 1 j j 1 j 1 j j j where p j was defned followng (10). Substtutng predcted values for p j and (14) leads to a plug-n predcted value for y j, j j j n Eˆ ( y ) pˆ ˆ, (15) j j j where ˆp j s gven by (11), and ˆ ˆj E1 y j hat s, we have two dfferent predcted values: and yˆ yˆ can be calculated usng ether (12) or (13). pˆ yˆ (16) MxEP SYN EP j j j EPBBC pˆ yˆ. (17) MxEBP j j j As usual. let a 'hat' denote an estmated value. hen, for non-sample unt j n area, we yˆ MxEP E ˆ MxEBP y yˆ Eˆ y x, y, x. see that we can wrte j j x j, whle j j j s s he two predctors (16) and (17) allow us to defne three dfferent estmators for populaton mean of Y n small area as follows: () Usng (16) we can calculate a synthetc type estmator of the form ˆ 1 m MxEP MxEP ˆ,, ˆ E m y x x s s r N y y js j j jr, (18) whch we denote by MxEP n what follows; 11

15 () he ftted values y ˆ MxEP j that defne the synthetc estmator (18) can also be used to defne a 'ftted value' covarate n a lnear model for y j. hs model s then used to calculate sample weghts computed as MxEP w j va (6), and an MBDE based on these weghts m ˆ N w y. (19) MxMBDE 1 MxEP js j j We denote ths estmator by MxMBDE n what follows; () Usng (17) we can calculate an EBP type estmator of the form ˆ 1 m MxEBP MxEBP ˆ,, ˆ E m y x x s s r N y y js j j jr, (20) whch we denote by MxEBP n what follows. 4. Mean squared error estmaton Analytc estmators of the MSE of nonlnear small area estmators are techncally complex to derve and typcally nvolve a consderable degree of approxmaton. As a consequence, a number of numercally ntensve, but computatonally tractable, methods for MSE estmaton have been proposed, e.g. the jackknfe method of Jang, Lahr and Wan (2002) and the bootstrap methods descrbed n Hall and Mat (2006) and Mantega et al. (2007, 2008) and references theren. By constructon, the small area predctors (18) and (20) are non-lnear wth complex structure and so obtanng a closed form expresson for ther correspondng MSEs s not straghtforward. We therefore adopt a bootstrap approach when estmatng the MSE of (18) and (20). In partcular, we use the parametrc bootstrap method defned by the steps n the followng algorthm. Note that we use an estmator ˆm of the area mean m to motvate the algorthm, but t s generally applcable to estmators of any set of fnte populaton parameters defned on the survey populaton. Step 1. Ft the log scale lnear mxed model (1) to the postve values y j n the sample data to obtan the estmates. 12

16 Step 2. Gven the estmates ˆ ˆ ˆ ˆ, generate area-specfc random errors from 2 2 (, ) u e a lognormal dstrbuton * 0, ˆ 2 u LN, 1,, D and ndvdual level random errors from an ndependent lognormal dstrbuton * 0, ˆ 2 u e LN, j 1,..., N ; 1,, D. Step 3. Smlarly, ft the logstc lnear mxed model (10) to the sample values of the j e bnary varable j and compute and v ˆ. Step 4. Gven and ˆ v, calculate probabltes * ˆ ˆ p ˆ exp( v ˆ ) 1exp( vˆ ) 1 x x, j j j and hence generate ndependent bnary values * j, j 1,..., N; 1,, D satsfyng * * P 1 ˆ j pj. Step 5. Calculate bootstrap populaton data y * j, x j under the two part model usng y * ˆ 1 * * * j 0x j u ej j ˆ, j 1,..., N ; 1,, D, (21) and then calculate the correspondng value of the area mean Step 6. Let * * ; ; 1,, s yj js D m N y. 1 j j y denote the vector of bootstrap sample values for ths populaton. Usng these values, calculate the estmate mˆ of the area populaton mean. Step 7. Repeat steps 2-6 ndependently B tmes to generate the bootstrap dstrbuton ( b) ( b) m, ˆ m ; b1,, B of values for m and mˆ. Step 8. Calculate the bootstrap estmate of the MSE of the actual sample-based estmate m ˆ of m as 1 ( ) ( ) ˆ 2. (22) B boot b b mˆ m m B b1 mse ( ) 5. Emprcal evaluatons In ths Secton we report the results from a lmted set of emprcal evaluatons that llustrate the performance of the dfferent estmators of small area means descrbed n the precedng sectons, and ther correspondng MSE estmators. hese estmators are set out n able 1. Note that for the commonly used lnear mxed model EBLUP, denoted by LnEBLUP and whch served as the baselne estmator n our smulatons, we used the 13

17 MSE estmator of Prasad and Rao (1990). For the mxture model based MBDE (19) (MxMBDE) we followed the Chambers et al. (2011) approach and used a pseudolnearzaton-based MSE estmator. Fnally, for the mxture model based ndrect estmators MxEP (18) and MxEBP (20) we used the parametrc bootstrap procedure detaled n Secton 4. We used two types of smulatons n our emprcal evaluatons. he frst used models to smulate populaton and sample data. In ths case, at each smulaton, populaton data were frst generated under the model and a sngle sample was then taken from ths smulated populaton by stratfed smple random samplng wthout replacement, wth the small areas defnng the strata. he results from these smulatons allow one to compare dfferent estmators n terms of ther senstvty to model assumptons. he second type of smulaton was desgn-based, usng populaton data created by nonparametrcally bootstrappng a real survey dataset. Here we evaluated estmators n the context of ther performance under repeated samplng from ths populaton under a pre-specfed sample desgn. he results from these smulatons allow one to assess the robustness of dfferent estmators to the type of model msspecfcaton seen n practce. We use two measures of the relatve performance for the dfferent small area estmaton methods that were consdered n our smulatons. hese are the average percent relatve bas 1 1 K ( ) ˆ 100 k 1 k k AvRB m mean m K m m and the average percent relatve root mean squared error AvRRMSE m mean K 2 1 K mˆ k m k ( ) 100 k 1 mk of the estmates ˆm k generated by an estmaton method. Here 1 K m K m k 1 k, wth the subscrpt ndexng the small areas and the subscrpt k ndexng the K Monte Carlo smulatons, and wth m k denotng the actual area mean at smulaton k, wth predcted value ˆm k. Note that n the desgn-based smulatons m k m, so m m. 14

18 We also nvestgated the performance of the dfferent MSE estmaton methods consdered n the smulatons. Here we calculated the average relatve bas of the MSE estmaton method, defned by AvRB M mean M K M ˆ M. 1 1 K ( ) 100 k 1 k Here M ˆ k denotes the smulaton k value of the MSE estmator n area, and M denotes the actual (.e. Monte Carlo) MSE n area. We also consder a secondary performance ndcator. hs s based on the fact that n many applcatons of small area estmaton, MSE estmators are used to calculate Gaussan type confdence ntervals for the small area quanttes of nterest. Consequently t s nterestng to evaluate the coverage propertes of such ntervals. In partcular, we focussed on two sgma (.e. nomnal 95 percent) Gaussan ntervals, and calculated the average percent coverage 1 K ( ) ˆ 2 ˆ AvCR M mean K I m 1/2 k mk M k 100. k 1 able 1. Defntons of small area predctors used n the smulaton studes. Estmator Descrpton Method of MSE estmaton Mxture model based method MxEBP Emprcal best predctor (20) defned by Bootstrap MSE (22) the predcted values (17) MxEP Emprcal synthetc predctor (18) defned Bootstrap MSE (22) by the predcted values (16) MxMBDE MBDE estmator (19) defned by a 'ftted values' lnear model, wth the predcted values (16) used as the model covarate Pseudo-lnearzaton MSE estmator of Chambers et al. (2011) Raw scale lnear mxed model based method LnEBLUP Standard lnear mxed model EBLUP Prasad and Rao (1990) MSE estmator 5.1 Model-based smulatons Model-based smulatons are a standard way of llustratng the senstvty of an estmaton procedure to varaton n assumptons about the structure of the populaton of nterest. he model-based smulatons reported n ths paper are based on populaton data generated under model (1). We choose a populaton sze N 15,000 wth D 30 small areas and a sample sze n 600 and then randomly generated small area populaton szes 15

19 N, 1,..., D; N N and sample szes as n N( n/ N); n n. he average small area populaton and sample szes were 500 and 20 respectvely. hese were fxed n all smulatons. Populaton values of y j j 1,..., N ; 1,..., D were frst generated va the model log( yj ) log(5) 0.5log( xj ) u ej wth unt level random errors e j ndependently generated from the normal dstrbuton N(0, e 0.5), and random area effects u ndependently generated from the normal dstrbuton N(0, u 0.3 ). he covarate values log( x j ) were generated from the normal dstrbuton N(log(2), x 3 ). We generated zero values for y j usng Posson samplng,.e. we set y j to zero f the realzed value of an ndependently generated unform varate 1 Uj Unform(0, P ) was such that U j p j, where p j was computed usng (10) wth the same fxed effect coeffcent values as (1) and wth an ndependent area effect drawn from the normal dstrbuton wth zero mean and a standard devaton of 0.1. he value of P was chosen to generate dfferng numbers of zero values n the populaton. hus wth P = 0.9, approxmately 10% of populaton values of Y are set to zero, whle wth P = 0.5, ths ncreases to 50% and wth P = 0.3 t becomes 70%. A random sample of (fxed) sze n 20 was drawn from each area. We also repeated these smulatons wth a smaller sample of sze n 300 and wth area sample szes of n 10. All smulatons conssted of K = 1000 ndependent replcatons, wth the results from these smulatons set out n able 2. he percentage average relatve bas (AvRB) values n able 2 ndcate that LnEBLUP has a sgnfcantly larger bas than all three mxture model based small area estmaton methods (MxEBP, MxEP and MxMBDE). hs mples that LnEBLUP may not be sutable for semcontnuous data. Restrctng ourselves to the mxture model based small area estmaton methods, we see that the bas values reported for MxEBP are smaller than those reported for MxMBDE and MxEP. Further, the bas advantage of MxEBP appears larger for smaller sample szes. For moderate sample szes ( n 20) the MxMBDE domnates the MxEP n term of bas, but ths s not the case for small sample szes ( n 10). Average relatve bases ncrease for all the methods as sample szes 16

20 decrease or as the proporton of zero values n the populaton (.e. the level of zero nflaton n the data) ncreases. urnng now to the percentage average relatve root mean square errors (AvRRMSE) values n able 2, we see agan that smaller area sample szes or larger proportons of populaton zeros leads to an ncrease n the percentage average relatve root mean square errors of all the methods. Also, LnEBLUP contnues to record very large values of relatve root mean square error as compared to the mxture model based methods, renforcng our prevous comment that ths method of small area estmaton appears best avoded when faced wth zero nflated skewed data. Among the mxture model based methods, the MxEBP domnates the other methods. Overall, ths predctor appears to offer substantal bas and effcency gans over the other predctors that we consdered n our smulatons. able 2. Percentage average relatve bas (AvRB) and percentage average relatve RMSE (AvRRMSE) of dfferent estmators n model based smulatons. P MxEBP MxEP MxMBDE LnEBLUP n 10 n 20 n 10 n 20 n 10 n 20 n 10 n 20 AvRB AvRRMSE We now turn to an examnaton of the performance of the MSE estmators assocated wth the dfferent predctors. In partcular, we present results from a lmted model-based smulaton study that was carred out to llustrate the emprcal performance of the dfferent MSE estmators defned n able 1. Here we only consdered a sample sze n 300 wth area specfc sample szes of n 10. We also only consdered two zero nflaton scenaros, correspondng to P = 0.50 and P = hese smulatons were repeated K = 500 tmes. Note that bootstrap estmaton of the MSE n each smulaton was based on B = 500 bootstrap samples. he results for these smulatons are set out n able 3 and correspond to averages over the small areas of the true RMSEs (AvRMSE) 17

21 and the estmated RMSEs (AvERMSE), the average percentage relatve bas (AvRB), and the average percentage coverage rates of nomnal 95 per cent Gaussan confdence ntervals (AvCR) based on the varous MSE estmators. able 3. Average true RMSEs (AvRMSE), average estmated RMSEs (AvERMSE), average percentage relatve bas (AvRB), and average percentage coverage rates of nomnal 95 per cent Gaussan confdence ntervals (AvCR) generated by MSE estmators of the dfferent small area estmators defned n able 1. Area sample szes are n 10. Averages are over the small areas. P MxEBP MxEP MxMBDE LnEBLUP AvCR AvERMSE (AvRMSE) (8.67) (14.50) (11.92) (57.20) (7.22) 9.22 (8.90) (12.40) (31.09) AvRB From the results reported n able 3, we see that all methods of MSE estmaton lead to Gaussan confdence ntervals wth average actual coverage AvCR at or near nomnal coverage. Furthermore, the MSE estmators (bootstrap and psuedo lnearzaton) for the three mxture model based predctors (MxEBP, MxEP and MxMBDE) all report average estmated RMSE values that are close to the true average RMSE values. In three out of the four cases of the bootstrap MSE estmator for MxEBP and MxEP we see that on average the estmated RMSE values are a lttle less than the true RMSE values, ndcatng a small downward bas. hs s reflected n the average percentage relatve bas (AvRB) values recorded for these cases. In contrast, the pseudo-lnearzaton MSE estmator used wth MxMBDE has ether vrtually no bas or a very small upward bas (agan reflected n ts AvRB values), whle the lnear model based MSE estmator for LnEBLUP seems somewhat unstable, beng conservatve when the proporton of zeros n the populaton s relatvely small, but optmstc when ths proporton s hgh. Overall, we can see that the average percentage relatve bas (AvRB) values recorded by the MSE 18

22 estmators for the three mxture model based predctors are all small, n contrast to the bas values recorded by the lnear model based MSE estmator for LnEBLUP, whch are much larger. 5.2 Desgn-based smulaton Our desgn-based smulatons were based on actual survey data collected n the Australan Agrcultural Grazng Industry Survey (AAGIS) conducted by the Australan Bureau of Agrcultural and Resource Economcs. he survey collects detaled fnancal (e.g. farm busness recepts, assets, debt), physcal (e.g. farm area and locaton) and socoeconomc nformaton (e.g. age and educaton of farm operator) from farm busnesses across Australa. he target populaton for the survey s broadacre farms operatng n 3 broad agro-ecologcal zones, the pastoral zone, the wheat-sheep zone and the hgh ranfall zone. In ths study we use the wheat-sheep zone, whch conssts of 12 regons (the small areas of nterest). In the orgnal sample there were 760 farms from 12 regons n the wheat-sheep zone. he varable of nterest for ths study s number of beef cattle on hand at the end of the fnancal year (BEEFCL) and the covarate s land area (LAND). A lnear model ft to the sample data was very poor (R 2 = 0.18 for the lnear regresson of BEEFCL on LAND). hs ft mproved slghtly (R 2 = 0.25) when dummy varables correspondng to four out of the fve broadacre ndustres: () specalst croppng farms, () mxed lvestock and croppng farms, () sheep specalsts, (v) beef specalsts and (v) mxed sheep and beef farms, were ncluded as covarates of the lnear model. It s noteworthy that the target varable BEEFCL s zero nflated wth about 38 per cent of ts values equal to zero. In partcular, out of a total sample of 760 observatons there are 286 zero values. he dstrbuton of regon sample szes and proporton of zeros s gven n able 4 and dsplayed n Fgure 1. We used the 474 farms wth BEEFCL > 0 and ftted a model for BEEFCL n terms of correspondng values of LAND for these farms. However, we dd not observe any mprovement n the model ft (R 2 = 0.18) even after we ncluded the dummy varables correspondng to ndustres (), (), (v) and (v) above (R 2 = 0.23). 19

23 able 4. Regon specfc sample szes and populaton szes Regons Populaton Sample sze Sample sze Sample sze Proporton of sze (N ) (n ) for y > 0 for y = 0 zeros otal A careful examnaton of the sample data ndcates that the margnal dstrbutons of both BEEFCL and LAND are hghly skewed and there s clear evdence of non-lnearty n ther relatonshp (see the hstograms dsplayed n Fgure 2). When a lnear model based on the logarthm of LAND and the four ndustry dummy varables referred to earler was ftted to the logarthm of BEEFCL, the ft mproved (R 2 = 0.41). he usual lnear model assumptons of normalty, homoscedastcty, etc., were also satsfed. As a consequence t was decded that a log scale lnear model was approprate for postve values of BEEFCL, wth the covarates for the fxed part of the model defned by the logarthm of LAND and these four ndustry dummy varables. Gven that the resduals from ths model also dsplayed sgnfcant between regon varablty, a regon random effect was ncluded n the model,.e. we ftted model (1). hs mproved the R 2 value to just under 50%, wth all model coeffcents hghly sgnfcant. Furthermore, when we ftted the mxed logstc model (10) to the bnary ndcator for BEEFCL > 0 n these data, usng the same covarates as n (1), the dummy varables correspondng to ndustres () and (v) and the logarthm of LAND were sgnfcant, wth some evdence 2 of overdsperson ( ˆ , wth a standard devaton of ). Fnally, we carred out a crude check of whether the random effects n (1) and (10) mght be 20

24 correlated by fttng a logstc model to the same bnary ndcator for BEEFCL > 0 but ths tme just usng the EBLUPs from (1) as the model covarates. he ft of ths dagnostc model was sgnfcant, wth a Generalzed R 2 of 14%, ndcatng potental correlaton between the random effects n (1) and the random effect n (10). However, n our smulatons we gnored ths and proceeded on the bass of a workng model defned by a zero correlaton between these two sources of varablty. Fgure 1. Dstrbuton of regonal sample szes (left sde) and regonal proportons of zero observatons (rght sde). Fgure 2. Hstogram of BEEFCL (> 0) on raw scale (left plot) and on log scale (rght). We then used these AAGIS sample data to generate a synthetc populaton of N 39,569 farms by re-samplng the orgnal AAGIS sample of n 760 farms wth probablty proportonal to a farm s sample weght. Once created, ths fxed populaton 21

25 was repeatedly sampled usng stratfed random samplng wth regons correspondng to strata and wth stratum sample szes the same as n the orgnal sample. able 5 shows the average over the 12 regons of the percentage relatve bas and percentage relatve root mean squared error values of the dfferent small area estmaton methods based on K = 1000 ndependent stratfed samples taken from ths synthetc populaton. able 5. Regon specfc values of the percentage relatve bases (RB) and percentage relatve root mean squared errors (RRMSE) for dfferent small area predctors. Regons MxEBP MxEP MxMBDE LnEBLUP MxEBP MxEP MxMBDE LnEBLUP RB RRMSE Average Medan From the results set out n able 5 we see that the MxEBP predctor has generally smaller average bas and smaller average RRMSE than the other three predctors consdered here, whle the synthetc type predctor MxEP performs poorly, recordng the worst values for RB n 7 out of the 12 regons. hs s not unexpected snce the log scale lnear mxed model underpnnng MxEP almost certanly does not hold exactly n the synthetc AAGIS populaton. Furthermore, snce MxEP does not explcty allow for heterogenety between regons, t s senstve to bas nduced by regon to regon varablty n the relatonshp between BEEFCL and LAND. On the other hand, even though LnEBLUP s based on a clearly napproprate model for BEEFCL, ts performance as a predctor s reasonable n most cases, reflectng the fact that t ncludes 22

26 a between area adjustment (albet on the raw scale rather than on the log scale). We also see that although the mxture model based drect estmator MxMBDE has better RB values than MxEP, ts RRMSE tends to be large, reflectng the fact that t s a drect estmator. he large relatve bas and relatve RMSE of MxMBDE and LnEBLUP n regon 8 s noteworthy. In ths regon the proporton of zero values s small, and the postve BEEFCL values hghly skewed wth many outlers. Here LnEBLUP performs badly because ts assumed lnear model s a poor ft to these skewed data, whle MxMBDE fals because as a drect estmator t s senstve to the presence of outlers. Overall, t s clear from the results n able 5 that the mxture model based predctor MxEBP performed better n our desgn based smulatons than ts compettors, both n terms of relatve bas and relatve root mean squared error. We now consder the desgn-based performance of the parametrc bootstrap procedure used to estmate the MSE of MxEBP n these smulatons. Here, for each sample from the fxed synthetc populaton, the bootstrap MSE estmate was based on B = 100 bootstrap samples. he average RMSE values generated by these regon-specfc bootstrap MSE estmates for MxEBP are shown n Fgure 3, as s the correspondng average of the true desgn-based RMSE for ths predctor. We see that the value of the true desgn-based RMSE for regon 8 s very hgh, whle the correspondng bootstrapbased RMSE estmate tends to be low. As noted earler, ths regon has hghly skewed data, wth extreme values persstng even after a logarthmc transformaton. hs generated large values for the true RMSE of MxEBP. hs behavour was not replcated by the parametrc bootstrap, as ts bootstrap populaton data were generated under a dstrbutonal assumpton that dd not allow for such outlers. hs rases questons about outler robust MSE estmaton that are beyond the scope of ths paper however. Generally, we see that n the remanng regons, where the log scale lnear model assumptons for BEEFCL are more approprate, the bootstrap MSE estmator tracks the actual MSE of MxEBP reasonably well and we are lead to the same conclusons about ths MSE estmator as n the model based smulaton study presented n Secton

27 Fgure 3. Regon-specfc values of true desgn-based RMSE (sold lne) and average estmated RMSE (dashed lne) for the MxEBP obtaned n the desgn-based smulatons usng the AAGIS data. 6. Conclusons In ths paper we explore small area estmaton for semcontnuous varables, where the data are skewed and contan a substantal proporton of zeros Our approach assumes a mxture or two part random effects model, and we propose an emprcal best predctor estmator for small area means for ths case. We also propose a parametrc bootstrap estmator for ts MSE. Emprcal results reported n the paper support the concluson that the proposed mxture model based emprcal best predctor (MxEBP) s less based and can be more effcent than both the correspondng synthetc type predctor (MxEBP) as well as the model based drect type estmator (MxMBDE) based on the 'ftted values' defned by the assumed mxture model. hese results also suggest that gnorng the skewed and semcontnuous nature of the data and usng a standard mxed lnear modelbased EBLUP estmator (LnEBLUP) can lead to based and unstable estmates. We note that, provded the mxture model assumptons are reasonable for the small area data, the proposed parametrc bootstrap procedure seems to work well. An applcaton to real agrcultural survey data provdes some emprcal support for these observatons. It should be noted that we assume a log scale lnear mxed model for non-zero skewed data. Although the log transformaton s wdely used n practce for such data, t s not 24

28 the only approprate transformaton to lnearty, and other transformatons (e.g. square root) can be explored n ths context. We also assume that zero nflaton n the data can be adequately modelled va a mxture of two ndependent components, a Bernoull varable and a Lognormal varable. As noted earler, ths s not approprate f n fact the zero values are essentally due to truncaton, and ndeed n the AAGIS data that we used n our desgn-based smulatons, there s some evdence that the random area effect n the lnear mxed model (1) and the random area effect n the logstc mxed model (10) are correlated. Furthermore, other models for zero nflated skewed data, e.g. those based a generalzed lnear mxed model wth underlyng Gamma or Posson dstrbutons are also possble. We are currently workng on these ssues. Acknowledgment he authors would lke to acknowledge the valuable comments and suggestons of the Edtor, Assocate Edtor and two anonymous referees. hese led to a consderable mprovement n the paper. References Breslow, N. E. and Clayton, D. G. (1993). Approxmate nference n generalzed lnear mxed models. Journal of the Amercan Statstcs Assocaton, 88, Berg, E., and Chandra, H. (2012). Small area predcton for a unt level lognormal model. Proceedngs of the 2012 Federal Commttee on Statstcal Methodology Research Conference, Washngton, DC, USA, January 10-12, Chandra, H. and Sud, U.C. (2012). Small area estmaton for zero-nflated data. Communcatons n Statstcs - Smulaton and Computaton, 41 (5), Chambers, R., Chandra, H., and zavds, N. (2011). ) On bas-robust mean squared error estmaton for lnear predctors for domans. Survey Methodology, 37 (2), pp Chandra, H. and Chambers, R. (2011a). Small area estmaton under transformaton to lnearty. Survey Methodology, 37 (1), pp Chandra, H. and Chambers, R. (2011b). Small area estmaton for skewed data n presence of zeros. he Bulletn of Calcutta Statstcal Assocaton, 63, pp

29 Chandra, H. and Chambers, R. (2009). Multpurpose weghtng for Small area estmaton. Journal of Offcal Statstcs, 25 (3), Fletcher, D., MacKenze, D. and Vllouta, E. (2005). Modellng skewed data wth many zeros: a smple approach combnng ordnary and logstc regresson. Journal of Envronmental and Ecologcal Statstcs, 12 (1), Hall, P. and Mat,. (2006). On parametrc bootstrap methods for small area predcton. Journal Royal Statstcal Socety, Seres B, 68, Jang, J., Lahr, P. and Wan, S. (2002). A unfed Jackknfe theory for emprcal best predcton wth M-estmaton. Annals of Statstcs, 30 (6), Karlberg, F. (2000). Populaton total predcton under a lognormal superpopulaton model. Metron, LVIII, Mantega, G.W., Lombardìa, M.J., Molna, I., Morales, D., and Santamarìa, L. (2008). Bootstrap mean squared error of a small-area EBLUP. Journal of Statstcal Computaton and Smulaton, 78(5), Mantega, G.W., Lombardìa, M.J., Molna, I., Morales, D., and Santamarìa, L. (2007). Estmaton of the mean squared error of predctors of small area lnear parameters under a logstc mxed model. Computatonal Statstcs and Data Analyss, 51(5), Olsen, M.K. and Schafer, J. L. (2001). A two-part random-effects model for semcontnuous longtudnal data. Journal of the Amercan Statstcal Assocaton, 96 (454), Pfeffermann, D., erryn, B. and Moura, F.A.S. (2008). Small area estmaton under a two-part random effects model wth applcaton to estmaton of lteracy n developng countres. Survey Methodology, 34 (2), Prasad, N.G.N. and Rao, J.N.K. (1990). he estmaton of the mean squared error of small area estmators. Journal of the Amercan Statstcal Assocaton, 85, Rao, J.N.K. (2003). Small Area Estmaton. Wley, New York. Sae, A. and Chambers, R. (2003) Small area estmaton under lnear and generalzed lnear mxed models wth tme and area effects. Methodology Workng Paper No. M03/15. Unversty of Southampton, UK. (avalable from 26

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function On Outler Robust Small Area Mean Estmate Based on Predcton of Emprcal Dstrbuton Functon Payam Mokhtaran Natonal Insttute of Appled Statstcs Research Australa Unversty of Wollongong Small Area Estmaton

More information

Small Area Estimation for Business Surveys

Small Area Estimation for Business Surveys ASA Secton on Survey Research Methods Small Area Estmaton for Busness Surveys Hukum Chandra Southampton Statstcal Scences Research Insttute, Unversty of Southampton Hghfeld, Southampton-SO17 1BJ, U.K.

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Robust Small Area Estimation Using a Mixture Model

Robust Small Area Estimation Using a Mixture Model Robust Small Area Estmaton Usng a Mxture Model Jule Gershunskaya U.S. Bureau of Labor Statstcs Partha Lahr JPSM, Unversty of Maryland, College Park, USA ISI Meetng, Dubln, August 23, 2011 Parameter of

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT Malaysan Journal of Mathematcal Scences 8(S): 37-44 (2014) Specal Issue: Internatonal Conference on Mathematcal Scences and Statstcs 2013 (ICMSS2013) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE P a g e ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE Darmud O Drscoll ¹, Donald E. Ramrez ² ¹ Head of Department of Mathematcs and Computer Studes

More information

A Bound for the Relative Bias of the Design Effect

A Bound for the Relative Bias of the Design Effect A Bound for the Relatve Bas of the Desgn Effect Alberto Padlla Banco de Méxco Abstract Desgn effects are typcally used to compute sample szes or standard errors from complex surveys. In ths paper, we show

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Efficient nonresponse weighting adjustment using estimated response probability

Efficient nonresponse weighting adjustment using estimated response probability Effcent nonresponse weghtng adjustment usng estmated response probablty Jae Kwang Km Department of Appled Statstcs, Yonse Unversty, Seoul, 120-749, KOREA Key Words: Regresson estmator, Propensty score,

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics A New Method for Estmatng Overdsperson Davd Fletcher and Peter Green Department of Mathematcs and Statstcs Byron Morgan Insttute of Mathematcs, Statstcs and Actuaral Scence Unversty of Kent, England Overvew

More information

Small Area Estimation Under Spatial Nonstationarity

Small Area Estimation Under Spatial Nonstationarity Small Area Estmaton Under Spatal Nonstatonarty Hukum Chandra Indan Agrcultural Statstcs Research Insttute, New Delh Ncola Salvat Unversty of Psa Ray Chambers Unversty of Wollongong Nkos Tzavds Unversty

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 0 A nonparametrc two-sample wald test of equalty of varances

More information

Small area prediction of counts under a nonstationary

Small area prediction of counts under a nonstationary Unversty of Wollongong Research Onlne Faculty of Engneerng and Informaton Scences - Papers: Part A Faculty of Engneerng and Informaton Scences 207 Small area predcton of counts under a nonstatonary spatal

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Outlier Robust Small Area Estimation

Outlier Robust Small Area Estimation Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 009 Outler Robust Small Area Estmaton R. Chambers Unversty

More information

Bias-correction under a semi-parametric model for small area estimation

Bias-correction under a semi-parametric model for small area estimation Bas-correcton under a sem-parametrc model for small area estmaton Laura Dumtrescu, Vctora Unversty of Wellngton jont work wth J. N. K. Rao, Carleton Unversty ICORS 2017 Workshop on Robust Inference for

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

The Ordinary Least Squares (OLS) Estimator

The Ordinary Least Squares (OLS) Estimator The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson

More information

Chapter 3 Describing Data Using Numerical Measures

Chapter 3 Describing Data Using Numerical Measures Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Model Based Direct Estimation of Small Area Distributions

Model Based Direct Estimation of Small Area Distributions Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 2010 Model Based Drect Estmaton of Small Area Dstrbutons Ncola

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted

More information

A note on regression estimation with unknown population size

A note on regression estimation with unknown population size Statstcs Publcatons Statstcs 6-016 A note on regresson estmaton wth unknown populaton sze Mchael A. Hdroglou Statstcs Canada Jae Kwang Km Iowa State Unversty jkm@astate.edu Chrstan Olver Nambeu Statstcs

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Andreas C. Drichoutis Agriculural University of Athens. Abstract

Andreas C. Drichoutis Agriculural University of Athens. Abstract Heteroskedastcty, the sngle crossng property and ordered response models Andreas C. Drchouts Agrculural Unversty of Athens Panagots Lazards Agrculural Unversty of Athens Rodolfo M. Nayga, Jr. Texas AMUnversty

More information

Testing for seasonal unit roots in heterogeneous panels

Testing for seasonal unit roots in heterogeneous panels Testng for seasonal unt roots n heterogeneous panels Jesus Otero * Facultad de Economía Unversdad del Rosaro, Colomba Jeremy Smth Department of Economcs Unversty of arwck Monca Gulett Aston Busness School

More information

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE STATISTICA, anno LXXV, n. 4, 015 USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE Manoj K. Chaudhary 1 Department of Statstcs, Banaras Hndu Unversty, Varanas,

More information

Chapter 14: Logit and Probit Models for Categorical Response Variables

Chapter 14: Logit and Probit Models for Categorical Response Variables Chapter 4: Logt and Probt Models for Categorcal Response Varables Sect 4. Models for Dchotomous Data We wll dscuss only ths secton of Chap 4, whch s manly about Logstc Regresson, a specal case of the famly

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi LOGIT ANALYSIS A.K. VASISHT Indan Agrcultural Statstcs Research Insttute, Lbrary Avenue, New Delh-0 02 amtvassht@asr.res.n. Introducton In dummy regresson varable models, t s assumed mplctly that the dependent

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

An R implementation of bootstrap procedures for mixed models

An R implementation of bootstrap procedures for mixed models The R User Conference 2009 July 8-10, Agrocampus-Ouest, Rennes, France An R mplementaton of bootstrap procedures for mxed models José A. Sánchez-Espgares Unverstat Poltècnca de Catalunya Jord Ocaña Unverstat

More information

18. SIMPLE LINEAR REGRESSION III

18. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.

More information

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10) I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling Open Journal of Statstcs, 0,, 300-304 ttp://dx.do.org/0.436/ojs.0.3036 Publsed Onlne July 0 (ttp://www.scrp.org/journal/ojs) Multvarate Rato Estmator of te Populaton Total under Stratfed Random Samplng

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Analytcs The SLR Setup Sample Statstcs Ordnary Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals) wth OLS

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Small Area Interval Estimation

Small Area Interval Estimation .. Small Area Interval Estmaton Partha Lahr Jont Program n Survey Methodology Unversty of Maryland, College Park (Based on jont work wth Masayo Yoshmor, Former JPSM Vstng PhD Student and Research Fellow

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

STK4080/9080 Survival and event history analysis

STK4080/9080 Survival and event history analysis SK48/98 Survval and event hstory analyss Lecture 7: Regresson modellng Relatve rsk regresson Regresson models Assume that we have a sample of n ndvduals, and let N (t) count the observed occurrences of

More information

Assignment 5. Simulation for Logistics. Monti, N.E. Yunita, T.

Assignment 5. Simulation for Logistics. Monti, N.E. Yunita, T. Assgnment 5 Smulaton for Logstcs Mont, N.E. Yunta, T. November 26, 2007 1. Smulaton Desgn The frst objectve of ths assgnment s to derve a 90% two-sded Confdence Interval (CI) for the average watng tme

More information

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data Lab : TWO-LEVEL NORMAL MODELS wth school chldren popularty data Purpose: Introduce basc two-level models for normally dstrbuted responses usng STATA. In partcular, we dscuss Random ntercept models wthout

More information

RELIABILITY ASSESSMENT

RELIABILITY ASSESSMENT CHAPTER Rsk Analyss n Engneerng and Economcs RELIABILITY ASSESSMENT A. J. Clark School of Engneerng Department of Cvl and Envronmental Engneerng 4a CHAPMAN HALL/CRC Rsk Analyss for Engneerng Department

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

Semiparametric geographically weighted generalised linear modelling in GWR 4.0

Semiparametric geographically weighted generalised linear modelling in GWR 4.0 Semparametrc geographcally weghted generalsed lnear modellng n GWR 4.0 T. Nakaya 1, A. S. Fotherngham 2, M. Charlton 2, C. Brunsdon 3 1 Department of Geography, Rtsumekan Unversty, 56-1 Tojn-kta-mach,

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unt 10: Smple Lnear Regresson and Correlaton Statstcs 571: Statstcal Methods Ramón V. León 6/28/2004 Unt 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regresson analyss s a method for studyng the

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information