SOME NEW MODELS FOR SMALL AREA ESTIMATION. Hao Ren

Size: px

Start display at page:

Download "SOME NEW MODELS FOR SMALL AREA ESTIMATION. Hao Ren"

Brett Shepherd
6 years ago
Views:

1 SOME NEW MODELS FOR SMALL AREA ESTIMATION By Hao Ren A DISSERTATION Submtted to Mchgan State Unversty n partal fulfllment of the requrements for the degree of DOCTOR OF PHILOSOPHY Statstcs 2011

2 ABSTRACT SOME NEW MODELS FOR SMALL AREA ESTIMATION By Hao Ren Ths dssertaton ncludes some new models for small area estmaton. There are four parts n total. The frst part studed the selecton of fxed effects covarates n lnear mxed models. A modfed bootstrap selecton procedure for lnear model from lterature was extended to lnear mxed effects models. Both theoretcal work and smulatons showed the effectveness of ths procedure for lnear mxed effects models. In the second part, a new approach by shrnkng both means and varances of small areas was ntroducted. Ths method modeled the small area means and varances n a unfed framework. The smoothed varance estmators used nformaton of drect pont estmators and ther samplng varances, and consequently, for the smoothed small area estmators. Condtonal mean squared error of predcton was also studed n ths part to evaluate the performance of predctors. The thrd part studed the confdence ntervals of small area estmators ntroduced n the second part. The lterature of small area estmaton s domnated by pont estmaton and ther standard errors. The standard normal or student-t confdence ntervals do not produce accurate ntervals. The confdence ntervals produced n ths part are from a decson theory perspectve. The fourth part estmated the small areas means wth clusterng of the small areas. In the realstc applcaton, the estmaton may not be approprate to borrow strength from all other small areas unversally, f cluster effects exst between clusters of small areas. A model

3 based on clusterng was studed n ths part, whch ncluded an addtonal cluster effect to the basc area level model. Snce the partton of clusters was not known, a stochastc search procedure from lterature was adapted frst to fnd the clusterng partton.

4 ACKNOWLEDGMENT I would lke to express my sncere apprecaton to my major dssertaton advsor, Professor Tapabrata Mat; for hs nvaluable gudance, support, and encouragement throughout my dssertaton study. Ths dssertaton could not have been completed wthout hs gudance and support. I would also lke to thank Professors Sarat Dass, Lfeng Wang, and Wllam Schmdt for servng on my gudance commttee. My specal thanks go to Professor Sarat Dass for hs kndly help and great advce for my dssertaton. I also want to thank Professor Lfeng Wang and Wllam Schmdt for ther valuable help and support to my dssertaton. I also want to thank Adele Brandstrom and Jessalyn Smth for ther knd help on style edtng. I would lke to thank my wfe Q Dao for her support, encouragement and love. Fnally, I would lke to thank my beloved parents and sster for ther consstent support and love. Thank you! v

5 TABLE OF CONTENTS Lst of Tables v Lst of Fgures x 1 Introducton Small Area Estmaton Classcal Approach for Sample Survey Model Based Estmaton Area Level Model Unt Level Model Mxed Model Study Topcs Bootstrap Model Selecton for Lnear Mxed Effects Models: Applcaton to Small Area Estmaton Introducton Lnear Model Lnear Mxed Model Fay-Herrot Model Nested-Error Regresson Model The Theoretcal Framework Smulaton Study Algorthms and Settngs for Dfferent Models Results Predcton Error of Small Area Predctors Shrnkng both Mean and Varances Introducton Model and Estmaton Method Model wth Assumpton Estmaton of the Small Area Parameters Estmaton of the Stuctural Parameters Predcton Error Calculaton Mean Squared Error of Predcton Bas Correcton for ν ( ˆB; X, S 2) Approxmaton of c (B; X, S 2 ) Smulaton Study v

6 3.5 Appendx: Matrx Calculaton Results Computaton of ˆB B Second Order Correcton of ν Confdence Interval Estmaton of Small Area Parameters Shrnkng both Mean and Varances Introducton Proposed Model Confdence Interval Defnton Choce of the Tunng Parameter Theoretcal Justfcaton of Tunng Parameter Alternatve Model for Confdence Interval Smulaton Study A Real Data Analyss Clusterng Based Small Area Estmaton: An Applcaton to MEAP Data Introducton Data Set Proposed Model Pror Informaton Model-Based Objectve Functons Estmaton of Mean Scores MSE of Estmator Stochastc Search Based Random Walk Splt-Merge Moves Data Analyss Dscusson 119 Bblography 123 v

7 LIST OF TABLES 2.1 Selecton probabltes for lnear model The coverage probabltes and lengths of confdence ntervals for lnear model Selecton probabltes for Fay-Herrot model, loss functon wthout random effect The coverage probabltes and lengths of confdence ntervals for the Fay- Herrot model, loss functon wthout random effect Selecton probabltes for Fay-Herrot model, loss functon wth random effect The coverage probabltes and lengths of confdence ntervals for the Fay- Herrot model, loss functon wth random effect Selecton probabltes for Nested-Error regresson model wth group level The coverage probabltes and lengths of confdence ntervals for for Nested- Error regresson model wth group level Selecton probabltes for Nested-Error regresson model wth element level The coverage probabltes and lengths of confdence ntervals for Nested-Error regresson model wth element level Compare of loss functons wthout and wth random effect for Nested-Error model, data sze n = The coverage probabltes and lengths of confdence ntervals for Nested-Error regresson model wth dfferent loss functons, data sze n = Results of the smulaton study, and here we present estmate (Est.), emprcal standard devaton (SD) for β and τ 2. We set β = Results of MSE estmator, n = 36, m = v

8 3.3 Results of MSE estmator, n = 180, m = Estmaton of model parameter. The left panel s for β and the rght panel s for τ Smulaton results for predcton when τ 2 = Smulaton results for predcton when τ 2 = Smulaton results for predcton when τ 2 = Crop data from You and Chapman(2006) Estmaton results of corn Number of publc school 4th graders n some school dstrcts n Mchgan state 100 v

9 LIST OF FIGURES 4.1 Corn hectares estmaton. The vertcal lne for each county dsplays the confdence nterval of ˆθ, wth ˆθ marked by the crcle, for (I) Proposed method, (II)Wang and Fuller (2003), (III)Hwang et al. (2009) and (IV)Qu and Hwang (2007) Boxplot of estmates of corn hectares for each county. (I) to (IV) are the 4 methods correspondng to Fgure Standard devaton of publc school students assessment scores n avalable school dstrcts n Mchgan state Poverty level of avalable school dstrcts n Mchgan state Math scores of 4th graders n publc schools n avalable school dstrcts n Mchgan state Trace plot of dfferent m values Map of clusters Mean scores vs. poverty levels for school dstrcts n the 7 clusters and all the dstrcts Coeffcent of varaton for the model-based estmates and observed mean scores MSE of estmators for models wth clusterng (5.2) or no clusterng (5.1) x

10 Chapter 1 Introducton 1.1 Small Area Estmaton The term small area s commonly used to denote a small geographcal area, such as a dstrct. It can also be used to denote a small demographc group, such as a small group wth certan socal economc status or a sex/race/ethncty group. Usually, a small area s defned when the doman specfc sample s not large enough to support drect estmates wth an adequate level of statstcal precson. The hstory of small area statstcs s very long and can be traced back at least to eleventh century England and seventeenth century Canada. However, those are based on ether census or admnstratve records targetng a complete enumeraton (Brackstone, 1987) and samplng s usually not nvolved n those studes. In recent years, sample surveys have become more and more popular because they are cost-effectve and can help solve the ssue that occurs when the populaton s dynamc and the ndvduals makng up the populaton are movng constantly. Some basc sample survey 1

11 methods can be smple random samplng, systematc or stratfed samplng. From the collected sample, the drect measures, such as the mean and standard error, can be calculated as the estmates for the populaton or each doman. However, f a small area exsts, that s, f the sample for some doman s not large enough, t may result n unacceptably large standard errors f t s only based on drect survey estmators and only from the sample area data. Thus, t s mportant to research small area statstcs to develop more relable measures. Small area estmaton has ganed more attenton n recent years because of greatly ncreasng demand from both publc and prvate sectors. For publc sectors, from countres lke the U.S. and Canada, they have a growng demand for small area statstcs for the purpose of formulatng polces and programs, n cases lke government fundng allocaton and local regonal plannng. In order to dentfy areas that are n need of fundng, such as certan school dstrcts or some subpopulaton, the relable estmates from the small area are requred. The ncreasng demand for small area statstcs also comes from prvate sectors, lke small busnesses. Often, the busness decsons rely on the local socal-economc envronment and other regonal condtons. Therefore, estmates wth adequate precson are needed from small areas. Snce drect estmates from sample surveys sometmes cannot meet such requrements for small areas, the research on small area statstcs has become more and more mportant. What s more, when central and eastern European countres and the former Sovet Unon countres moved away from a centralzed decson makng system, they also demand survey results not only for large areas but also small areas. Agan, ths created an ncreasng demand for the small area statstcs. 2

12 Wth hgh power computers, the processng of large and complex data become feasble and t helps wth the development of the small area statstcs. Powerful statstcal methods wth a sound theoretcal foundaton have been developed for the analyss of small area data. Such methods borrow strength from related or smlar small areas through mplct or explct models whch provde a lnk between related small areas. These types of borrow strength models wll be the man focus of ths dssertaton. For revew on small area estmaton, papers nclude Ghosh and Rao (1994), Rao(1999), Marker(1999), Rao (2001) and Pfeffermann (2002), etc. 1.2 Classcal Approach for Sample Survey Drect Estmators The varables of nterest n a sample survey are usually the total measures or mean of the area or doman. Drect estmators are commonly used to provde estmate of such varables n a doman and use the sample data only n that doman. The typcal drect estmators are desgn-based. A more extensve reference of drect estmaton n samplng theory can be found n Lohr (1999). Drect estmators wll sometmes suffce, such as domans wth suffcently large sample sze and partcularly after addressng survey desgn ssues. But, t s well known that drect estmators for small areas are usually unrelable for the unacceptable large standard errors due to unduly sample szes of small areas. Model-based methods for drect estmators are also developed. They provde vald condtonal nferences about the partcular sample drawn, regardless of the samplng desgn. However, model-based methods depend heavly on the correct specfcaton of models. The methods can perform poorly under msspecfcaton even f the sample sze s large. 3

13 Demographc Methods The most powerful demographc method s census. Censuses are usually conducted at 10- year or 5-year ntervals to provde populaton counts for specfc geographcal areas or subpopulatons defned by age, sex, martal status and other demographc varables. But the nformaton from a census becomes outdated due to changes n the sze and composton of the resdent populaton over tme. Therefore, varous demographc methods, other than census, are developed to provde populaton estmaton n the noncensal years. The changes of demographc varables are strongly related to changes of local populaton. Admnstratve regsters contan current data of local populaton on varous demographc varables. Such varables are called symptomatc ndcators, such as number of brths and deaths and net emgraton durng the perod snce the last census. Tradtonal demographc methods employ ndrect estmators based on mplct lnkng models, whch related the populaton estmates and symptomatc varables. These methods may be categorzed as ether symptomatc accountng technques or regresson symptomatc procedures. Symptomatc accountng technques provde ndrect estmators under some mplct lnkng models wth the symptomatc varables. Regresson symptomatc procedures use multple lnear regresson to estmate local area populatons. The symptomatc varables are used as ndependent varables. For detaled descrpton of these methods, one can see Rao (2003). Typcally, demographc ndrect estmators use only admnstratve and census data and samplng s not nvolved n these methods. Indrect Estmators As ntroduced prevously, the unacceptably large standard errors of drect estmators are due 4

14 to unduly sample sze of the small areas. Therefore, t s necessary to fnd ndrect estmators that ncrease the effectve sample sze and thus decrease the standard error. Tradtonal ndrect estmaton methods are based on mplct models that provde a lnk to related small areas through supplementary data. Such estmators nclude synthetc estmators, composte estmators, and James-Sten (or shrnkage) estmators. If a large area covers several small areas and the small areas are assumed to have the same characterstcs as the large area, a relable drect estmator for the large area can be used to derve an ndrect estmator for a small area. Such an estmator s called synthetc estmator (Gonzalez 1973). The global measures (averaged over small areas) are often used wth synthetc estmates. If an estmator s a weghted average of a synthetc estmator and a drect estmator, t s called a composte estmator. Actually, any estmator that has the composte form can be called a composte estmator, both desgn-based and model-based. For a composte estmator, a sutable weght needs to be chosen. The common weght approach uses a common weght for the composte estmators for all small areas, then the total MSE s mnmzed wth respect to the common weght. The James-Sten estmator (James and Sten, 1961) s smlar to the common weght estmator and attracted a lot of attenton n manstream statstcs lterature. It acheves large gans n effcency n the tradtonal desgn-based framework wthout assumng a model on the small area parameters. A detaled ntroducton of these estmators can be found n Rao (2003). 1.3 Model Based Estmaton The tradtonal ndrect estmators are brefly ntroduced n the prevous secton. Reducton n MSE s the man reason for usng ndrect estmators. Indrect estmators are largely 5

15 based on sample survey data n conjuncton wth auxlary populaton data. However, the tradtonal ndrect estmators only provde an mplct lnk between the small areas. The model based estmators whch provde an explct lnk between the small areas are ntroduced n ths secton. Model based estmators are ndrect estmators based on small area models. Small area models take the random area-specfc effects nto account and nclude addtonal auxlary varables n the model to explan the effects, whch make specfc allowance for between area varaton. Such models defne the way how the related data are ncorporated n the estmaton process. The use of explct models makes model dagnostcs possble whch can be used to fnd sutable models that ft the data well. For example, a selecton of auxlary varables s conducted later n ths dssertaton. Area-specfc measures of precson also can be assocated wth each small area estmate. For complex data structures, more complex model structures can be adopted, such as mxed models and generalzed lnear models. And the exstng methodologes for these models can be utlzed drectly to acheve accurate small area estmaton. In general, small area models can be classfed nto two broad types: () area level model: modelng the small area drect estmators wth area-specfc covarates. When unt level data are not avalable, such area level models are necessary. () unt level model: modelng the unt drect estmators wth unt-specfc covarates Area Level Model Let Ȳ be the small area means and g( ) be a specfed functon that lnk the model based estmator θ and Ȳ, θ = g(ȳ ). The use of g( ) makes the model more robust. But n our 6

16 study, we choose g( ) as the dentcal functon,.e. Ȳ = θ. The area level model relates the θ and the area-specfc auxlary data z = (z 1,..., z p ) T n the followng way, θ = z T β + b v, = 1,..., m (1.1) where β = (β 1,..., βp) s the p 1 vector of regresson parameters, b s are known postve constants, and v s are area-specfc random effects that are d wth E(v ) = 0 and V ar(v ) = σv. 2 In ths dssertaton, a normal dstrbuton s always adopted for v. However, t s possble to choose other dstrbutons whch makes the model robust. m s the total number of areas. For makng nferences about the θ s under model (1.1), we assume that drect estmators, ˆθ, are avalable and ˆθ = θ + e, = 1,..., m (1.2) where the e s are samplng errors, E(e ) = 0 and V ar(e ) = σe 2. That s, the estmators ˆθ are desgn-unbased. And we always assume that the samplng varances, σe 2, are known. The area level models have varous extensons. A famous example s the Fay-Herrot model. The model was developed by Fay and Herrot n 1979 to estmate per-capta ncome for a small area (populaton less than 1,000). The Fay-Herrot model s adapted n ths dssertaton and the detals wll be ntroducted later Unt Level Model When unt-specfc auxlary data for each populaton element n each small area s avalable, the unt level model s adapted. Let y j be the varable of nterest and X j = 7

17 (x j1,..., x jp ) T be the avalable element-specfc auxlary data. Then a one-fold nested error lnear regresson model s gven y j = X T j β + ν + e j (1.3) j = 1,..., n ; = 1,..., m. Here ν are the area-specfc effects and are assumed to be d random varables wth E(v ) = 0 and V ar(v ) = σ 2 v. e j = k j ẽ j and the ẽ j s are d random varables, ndependent of the ν s, wth E(ẽ j ) = 0 and V ar(ẽ j ) = σ 2 e, the k j s are known constants, and n are the number of elements n the th area. In addton, normalty of the ν s and ẽ j s s often assumed. The parameters of nferental nterest are the small area totals Y = n =1 y j or the means Ȳ = Y /n. The unt level model does not nclude sample selecton bas n the model; that s, the sample values are assumed to obey the model. Smple random samplng s satsfed for ths condton. But for more complex samplng desgns, t may not be approprate. For example, n stratfed multstage samplng, the desgn features are not ncorporated n the model. However, there are varous extensons to account for such features. 1.4 Mxed Model The small area models ntroducted n the prevous secton may be regarded as specal cases of the mxed models. The research of Mxed Models has ganed much attenton n recent years. It has many names n n a wde varety of dscplnes n the physcal, bologcal and socal scences. The mxed model s partcularly useful n settngs where repeated measurements 8

18 are made on the same statstcal unts (longtudnal data), or where measurements are made on clusters of related statstcal unts (clustered or panel data). Therefore, t can be called the model for repeated measurements, or a herarchcal model. The most mportant dfference of the mxed model from classcal statstcs s that observatons are not necessary from the same populaton, where ndependent and dentcally dstrbuted s a typcal assumpton. Mxed model data may have a more complex, multlevel, herarchcal structure. The general assumpton of mxed model data s that observatons from one cluster can be correlated, but ndependent between dfferent clusters; and the subpopulatons of clusters can be dfferent. Therefore, a random effect at each cluster level s ntroduced nto the model. The forms of a mxed model can be lnear, generalzed lnear (such Logstc and Posson), and nonlnear. In ths dssertaton, we focus on the lnear mxed models only. A general lnear mxed model can be wrtten as y = Xβ + Zv + e (1.4) where y s the vector of sample observatons, X and Z are known matrces, and v and e are dstrbuted ndependently wth means 0 and covarance matrces G and R, respectvely, dependng on some varance components parameters. One approach to ths model can be obtaned by usng the general theory of Henderson (1975) for a mxed lnear model. If the parameters of varance components are known, the best lnear unbased estmator of θ = l T β + m T v s gven by ˆθ = l T ˆβ + m T GZ T V 1 (y X ˆβ) (1.5) 9

19 where V = R + ZGZ T s the varance-covarance matrx of y and ˆβ = (X T V 1 X) 1 (X T V 1 Y ) s the generalzed least squares estmator of β. There are a lot of other approaches avalable, such as emprcal best lnear unbased predcton, emprcal Bayes method, and herarchcal Bayes method. Some of these methods wll be nvolved n our studes later. The detals of these methods can be found n lteratures, such as Ghosh and Rao (1994) and Rao (2003). 1.5 Study Topcs As I ntroduced n prevous sectons, small area estmaton and the statstcal technques theren have become a topc of growng mportance n recent years and ths s the reason why I chose small area estmaton as the topc of my dssertaton. To be more specfc, the works n ths dssertaton ncludes the followng aspects. Model Selecton for Lnear Mxed Effects Models The need for relable small area estmates s felt by many agences, both publc and prvate, for makng useful polcy decsons. For example, the small area statstcs beng used to montor soco-economc and health condtons for dfferent stratum defned by age, sex, racal groups over small geographcal areas. It s now wdely recognzed that the drect survey estmates for small areas are usually unrelable, beng accompaned wth large standard errors and coeffcents of varaton. Ths 10

20 makes t necessary to use models, ether explct or mplct, to connect the small areas and obtan estmates of mproved precson by borrowng strength across areas. In the prevous sectons, some basc models for small area estmaton s presented. Before we start workng on more complex approach methods, t s mportant to note the nfluence on the choce of models, partcularly on the choce of auxlary varables. The success of any model-based method depends on the avalablty of good auxlary data. Therefore, more attenton should be gven to the selecton of auxlary varables that are good predctors of the study varables. The model selecton study here arses from potental choces of the fxed effects covarates n lnear mxed effects models. A bootstrap model selecton method s adapted n ths dssertaton. The procedure based on the bootstrap has some mportant features: frst, the bootstrap method can provde more accurate nference (Adkns and Hll, 1990; Hall,1989) at the same tme when the model selecton s undergong. In other words, the bootstrap based nference on regresson parameters takes nto account the model selecton procedure. Second and most mportantly, the bootstrap selecton procedure does not depend on a specfc probablty dstrbuton. However, a straghtforward applcaton of the bootstrap does not yeld a consstent model selecton procedure (Shao, 1996). A smple modfcaton was appled to the straghtforward bootstrap procedure: generate m (nstead of n, and m < n, n s the sze of all avalable data) d bootstrap observatons from the emprcal dstrbuton whch puts equal mass on each par of avalable data. After the modfcaton, t wll result wth consstency f and only f m/n 0 and m. 11

21 However, only the lnear model s studed n the lterature. In ths dssertaton, the modfed bootstrap selecton procedure s extended to lnear mxed effects models, ncludng the Fay-Herrot Model and the Nested-Error Regresson Model, that are commonly used n small area estmaton. Small Area Predctors by Shrnkng both Mean and Varances The basc area level model s ntroducted n secton The survey based drect small area estmates and ther varance estmates are the man ngredent to buld area level small area models. Typcal modelng strateges assume that the samplng varances are known whle a sutable lnear regresson model s assumed for the means. For detaled developments, one can see Ghosh and Rao (1994), Pfeffermann (2002) and Rao (2003). The typcal arealevel models are subject to two man crtcsms: () n practce, the samplng varances are estmated quanttes and these are subject to substantal errors due to the fact that they are often based on equvalent sample szes as the drect estmates are beng calculated and () assumpton of known and fxed samplng varances does not take nto account the uncertanty of estmaton nto the overall small area estmaton strategy. Prevous attempts have been made to model only the samplng varances by Maples et al. (2009), Gershunskaya and Lahr (2005), Huff et al. (2002), Cho et al. (2002), Vallant (1987), and Otto and Bell (1995). Wang and Fuller (2003) and Rvest and Vandal (2003) extended the asymptotc mean squared error (MSE) estmaton of Prasad and Rao (1990) when the samplng varances are modeled wth few addtonal parameters. You and Chapman (2006) consdered samplng varance modelng. However, they adopted full Bayesan estmaton technques. 12

22 The ssues have been recently rased by the practtoners. For example, Herrador et al. (2008) nvestgated estmates of desgn varances of model based and model-asssted small area estmators. The latest developments are ncely summarzed n a recent artcle by Wllam Bell of the Unted States Census Bureau (Bell, 2008). He carefully examned the effect of the above two ssues n the context of MSE estmaton for model based small area estmators. He also provded numercal evdence of the effect of assumng known samplng varances n the estmaton of MSE n the context of the Fay-Herrot model. The developments so far made to ths ssue can be loosely vewed as () smoothng the drect samplng error varances to obtan stable varance estmates wth low bas and () (partal) account of samplng varance uncertanty by extendng the Fay-Herrot model. Much less or no attenton has been gven to accountng the samplng varances effectvely whle modelng the mean compared to the volume of research that has been devoted for modelng the means and ther nferental ssues. Thus, there s a lack of systematc development n small area estmaton whch ncludes shrnkng both means and varances. In other words, we would lke to explot the technque of borrowng strength from other small areas to mprove the varance estmates as we do to mprove the small area mean estmates. The methodology ntroduced here develops a dual shrnkage estmaton for both the small area means and varances n a unfed framework. In ths process, the smoothed varance estmators use nformaton of drect pont estmators and ther samplng varances, and consequently, for the smoothed small area estmators. The modelng perspectve s closely related to Wang and Fuller (2003), Rvest and Vandal (2003), You and Chapman (2006) and Hwang et al. (2009). An EM-based estmaton approach s developed. Numercal evdences are provded to show effectveness of the dual 13

23 shrnkage estmaton. One statstc reported s condtonal mean squared error of predcton (CMSEP) whch s more akn to Booth and Hobert (1998). Booth and Hobert argued strongly for CMSEP as opposed to uncondtonal mean squared error of predcton (UMSEP). Recently, ths technque has agan been emphaszed by Lohr and Rao (2009) n the context of nonlnear mxed effect models. These authors favor CMSEP partcularly for non-normal models when the posteror varance of small area parameters depends on the area specfc responses. Although they were nterested only n generalzed lnear mxed models where the posteror varance depends on area specfc responses, ths property of posteror varance s perhaps true for stuatons wth posteror non-lnearty. Another nference adopted s confdence ntervals of small area means. The small area estmaton lterature s domnated by pont estmaton and ther standard errors. It s well known that the standard practce of (pt. est. ± qs.e.), q s a Z (standard normal) or a t cutoff pont, does not produce accurate ntervals. See, Hall and Mat (2006) and Chatterjee et al. (2008) for more detals. The prevous works are based on the bootstrap procedure and has lmted use due to repeated estmaton of model parameters. The confdence ntervals produced n ths dssertaton are from a decson theory perspectve. Clusterng Based Small Area Estmaton For the ntroduced small area models, both the area level model and the unt level model assumed d random area effects. That s, when we make nference of each small area, we borrow strength from all other small areas unversally. But the realstc geographcal (spatal) and soco-economc status of small areas n a large regon may be qute dfferent from each other. For example, the demographc composton of a small area mght be close 14

24 to the adjacent small areas, but mght not be all the adjacent areas; a smlar thng exsts for house-hold ncomes, poverty levels and many other types of nformaton. When we borrow strength from other small areas under such condtons, the nference may even be msleadng. Therefore, t s more approprate to dvde all small areas nto groups (clusters) f dverstes exst between groups of small areas. In ths dssertaton, a data set from the Mchgan Educatonal Assessment Program (MEAP) s analyzed. The numbers of students for each school dstrct s dverse. The standard devatons of students math scores for each school dstrct are also qute dfferent from each other. A detaled descrpton of the data set s gven the later chapters. These nformaton show that small area models are approprate to be adopted to analyze the data set; and the comparsons of mean scores based on model-based estmate results are more meanngful than the drect comparson of mean score. In addton, the poverty levels of school dstrcts suggest that small area models wth clusterng the school dstrcts s more approprate. There are a lot of clusterng methods avalable. However, t usually happens that we do not have partton nformaton for clusterng, or even the number of clusters. A stochastc search procedure from Booth et al. (2008) s adopted to solve the problem. The partton of clusterng s nvolved n the model as a parameter. A cluster-specfc random effect s also ncluded n the model, whch allows the cluster means departure from the assumed base model. The objectve functon s constructed based on the posteror dstrbuton of the undergong partton. The partton maxmzes the objectve functon s chosen as the optmal clusterng partton. 15

25 In Booth et al. (2008), they assumed the varances of objects were unknown but d wthn a cluster. The man dfference between our model and the model n Booth et al. (2008) s that dfferent varances are assgned to each school dstrct and assumed as known snce they are reported n the MEAP data set. The posteror dstrbuton based on dfferent varances for each school dstrct does not have a close form and only can be calculated by numercal method. 16

26 Chapter 2 Bootstrap Model Selecton for Lnear Mxed Effects Models: Applcaton to Small Area Estmaton 2.1 Introducton Let {(x, y ), = 1,..., n} be the avalable data set. Some models always need to be ftted f the objectve s to dscover the relatonshp between x and y. Often, all the components of x may not be related or mportant to y. An optmal model contans the necessary components of x. The procedure of selectng the varables s a model selecton problem n whch each model corresponds to a partcular set of the components of x. Among the many exstng varable/model selecton procedures, such as AIC, BIC, Mellow s Cp, R 2 etc., the procedure based on the bootstrap has some mportant features: frst, the bootstrap method can provde more accurate nference (Adkns and Hll, 1990; Hall, 17

27 1989) at the same tme when the model selecton s undergong. In other words, the bootstrap based nference on regresson parameters takes nto account the model selecton procedure. Second and most mportantly, the bootstrap selecton procedure does not depend on a specfc probablty dstrbuton. A very mportant theoretcal study of a bootstrap selecton procedure s ts consstency. Shao (1996) dscovered that a straghtforward applcaton of the bootstrap does not yeld a consstent model selecton procedure. A smple modfcaton was appled to the straghtforward bootstrap procedure: generate m (nstead of n, and m < n) ndependent and dentcal (d) bootstrap observatons from the emprcal dstrbuton ˆF, whch puts mass n 1 on each par (x, y ), = 1,..., n. After the modfcaton, t wll result n consstency f and only f m/n 0 and m. However, only the lnear model s studed n the lterature. In ths chapter, the modfed bootstrap selecton procedure s extended to dfferent aspects of the mxed model, ncludng the Fay-Herrot Model and Nested-Error Regresson Model, that are commonly used n small area estmaton. The model selecton study here arses from potental choces of the fxed effects covarates. In the followng sectons, we frst start wth a replcaton of Shao s algorthm n the lnear model. Then the modfed algorthms of model selecton procedures for lnear mxed models s provded. After that, the smulaton studes were carred out wth respect to the prevous algorthms. The results are lsted n the fnal secton. 2.2 Lnear Model The model s gven by y = x T β + e, = 1,..., n (2.1) 18

28 where x s the th value of a p 1 vector of explanatory varables and y s the response at x, β s a p 1 vector of unknown parameters. In our study, p s fxed, and does not ncrease as n ncreases. We assume that e, = 1,..., n, are d Normal(0, σ 2 e ) and µ = E(y x ) = x T β var(y x ) = σ2 e = 1,..., n The unknown parameters are estmated by the least squares estmator (LSE) ˆβ = (X T X) 1 X T Y (2.2) For the varable/model selecton procedures, let α be a subset of 1,..., p of sze pα and let x α (or βα) be the subvector of x (or β) contanng the components of x (or β) ndexed by the ntegers n α. Then a canddate model, say model α, s y = x T α β α + e, = 1,..., n µ α = E(y x ) = x T α β α var(y x ) = σ 2 e and the parameter s estmated usng equaton (2.2) ˆβα = (X T α Xα) 1 X T α Y Then an average condtonal expected loss s defned to measure the effcency of model α Γn(α) = E 1 n (y n x T α ˆβα) 2 Y, X (2.3) =1 19

29 The model α A whch mnmzes Γn(α) wll be chosen as the estmate of the optmal model, where A s the collecton of some subsets of 1,..., p. The largest possble A s the one contanng all nonempty subsets of 1,..., p. For practce, we wll consder a smaller collecton of subsets. The optmal model, called α 0, n ths sense s the correct model wth the smallest sze and the smallest Γn(α). As Γn(α) nvolves the unknown parameter β, α 0 must be estmated by ˆα. The model selecton procedure s consstent f lm n P (ˆα = α 0 ) = 1 Wth the modfed bootstrappng pars method, generate m (< n) d pars (x, y ) from the emprcal dstrbuton ˆF puttng mass n 1 on (x, y ), = 1,..., n. Then a bootstrap estmator of E[Γm(α)], for m < n, s ˆΓn,m(α) = E Y X α β α,m 2 n (2.4) where a = a T a for any vector a, β α,m s the bootstrap analog of ˆβα based on the generated m d bootstrappng observatons, 1 β α,m m m = x α x α x α y =1 =1 and E represents expectaton approxmated by Monte Carlo wth sze B, ˆΓ (B) n,m(α) = 1 B B b=1 20 Y Xα β b α,m 2 n

30 The objectve of the selecton procedure s to select a model ˆαn,m A that mnmzes ˆΓn,m(α). From the lterature (Shao, 1996), ths bootstrap selecton procedure s consstent; that s lm n P (ˆα n,m = α 0 ) = 1 provded that m satsfes m/n 0 and m. 2.3 Lnear Mxed Model In ths secton, the modfed bootstrappng selecton procedure wll be appled to the Lnear Mxed Model (LMM). The general mxed lnear model s y = Xβ + Zν + e Where y s the vector of sample observatons, X and Z are known matrces, β s the unknown parameters, and ν and e are dstrbuted ndependently wth mean 0 and covarance matrces G and R. Here two specal cases of LMM, the Fay-Herrot model and Nested-Error regresson model, wll be used to llustrate the modfed bootstrap selecton procedure Fay-Herrot Model If the character of nterest s the sample mean ȳ of each group, ȳ = (ȳ 1,..., ȳn), Fay and Herrot (1979) developed ths model to estmate per-capta ncome for small areas (populaton 21

31 less than 1,000). The model can be stated as ȳ = µ + e µ = x T β + ν = 1,..., n (2.5) where x = (x 1,..., x p ) T s avalable for each area, β s the unknown parameters, and e = (e 1,..., en) T and ν = (ν 1,..., νn) T are dstrbuted ndependently as Normal(0, D) and Normal(0, AI) respectvely, where D = dag(d 1,..., Dn) and D s known. Therefore, µ s Normal(x T β, A) and ȳ s Normal(µ, D) f µ s gven. For the Fay-Herrot model, the best lnear unbased estmator of µ s ˆµ = x T ˆβ + A A+D (ȳ x T ˆβ), = 1,..., n ˆβ = (X T V 1 X) 1 X T V 1 ȳ (2.6) where V = dag(a + D 1,..., A + Dn) and X = col 1 n (x T ). The model selecton procedure n the Fay-Herrot model s smlar to that n the lnear model. Consder bootstrappng pars, let (x, ȳ ), = 1,..., m, be d sample from the emprcal dstrbuton puttng mass n 1 to each (x, ȳ ), = 1,..., n. And D, = 1,..., m, are the correspondng known varance components. Then the bootstrap analogs of β and ˆµ under model α wth bootstrap sample sze m are β α,m m x = α x α =1 Âm + D 1 m x αȳ =1 Âm + D x αȳ µ α = x T α β α,m Âm + Âm + D (ȳ x T α β α,m) 22

32 where Âm s the analog of Â wth bootstrap sze m and Â s an unbased quadratc estmator of A, where Ã = 1 m ˆµ 2 m ( n p D 1 x T (XT X) 1 ) x =1 =1 Â = max(ã, 1/n) (2.7) where ˆµ = ȳ x T ˆβ and ˆβ = (X T X) 1 X T ȳ. Defne ˆΓn,m(α) = E 1 m (ȳ n x T α β α,m) 2 (2.8) =1 Or defne the loss functon wth the random effect estmated by the bootstrap sample data ( ) 1 2 ˆΓn,m(α) = E ȳ m x T α β α,m Âm (ȳ x T I Âm + D α β α,m) + B 1 (ȳ n m x T α β α,m) 2 / I B (2.9) where I B s the set of ndexes of whch groups are chosen as bootstrap sample. The model selected by ths bootstrap procedure s ˆαn,m A that mnmzes ˆΓn,m(α). 23

33 2.3.2 Nested-Error Regresson Model Ths model was proposed by Battese et al. (1988) to estmate mean acreage under a crop for countes (small area) n Iowa. Ther model s gven by y j = x T j β + ν + e j = 1,..., t, j = 1,..., n, t n = n (2.10) =1 ν N(0, σν), 2 e j N(0, σe) 2 where y j s the character of nterest for the jth sampled unt n the th sample area, x j = (x j1,..., x jp ) T, β s the unknown parameters, and n s the number of sampled unts n the th small area. The random errors ν and e j are ndependent of each other. The varance-covarance matrx of y s V = dag(v 1,..., V t ) wth V = σein 2 + σν1n 2 1 T n. The generalzed least squares estmator of β s the same as (2.6) ˆβ = (X T V 1 X) 1 X T V 1 Y (2.11) The correspondng estmator of ν s ˆν = γ (ȳ x T ˆβ) where γ = σν(σ 2 ν 2 + σen 2 1 ) 1, and ȳ and x are the sample mean of y j and x j n the th group. If varance components are unknown n LMM, varance components are estmated by 24

34 Henderson s method 3 (for nested error model): ˆσ e 2 = (n t k + λ) 1 ê 2 j σ ν 2 = n 1 [ û2 j (n k)ˆσ e] 2 (2.12) [ where n = n tr ( X T X) 1 t j=1 n 2 ] j x j xt j. λ = 0 f the model has no ntercept term and λ = 1 otherwse. {ê 2 j } are the resduals from the ordnary least squares regresson of y j ȳ on {x j1 x 1,, x jk x k } and {û j } are the resduals from the ordnary least squares regresson of y j on {x j1,, x jk }. To avod negatve σ 2 ν, we defne ˆσ 2 ν = max( σ 2 ν, 1/n). The model selecton procedure n the nested-error regresson model s a lttle dfferent from prevous models. Frst, the bootstrap wll be done wth the group level, nstead of elements n each group. Let (x, y ), = 1,..., m, be d sample groups from the emprcal dstrbuton puttng mass t 1 to each group (x, y ), = 1,..., t, where x = (x T 1,..., xt n )T and y = (y 1,..., y n ) T. Then the bootstrap analogs of ˆβ under model α wth bootstrap sample sze m s β α,m = ( ˆX T α V 1 ˆX α ) 1 ˆX T α V 1 Ŷ (2.13) where ˆX α and Ŷ are the bootstrap sample data wth the correspondng estmated varance components. Defne ˆΓn,m(α) = E 1 n (y n j x j β α,m ) 2 (2.14) =1 Or defne the loss functon by consderng the random effect estmated by the bootstrap 25

35 sample data: 1 ˆΓn,m(α) = E m I B 1 n m ( y j x T j,α β α,m γ (ȳ x T α β α,m)) 2 + (y j x T j,α β α,m) 2 / I B (2.15) where γ = σ ν( σ 2 ν 2 + σ en 2 1 ) 1, and σ ν 2 and σ e 2 are estmated varance components under model α wth bootstrap sample sze m, I B s the set of ndexes of whch groups are chosen as the bootstrap sample. The alternatve algorthm s to bootstrap wth the element level. Snce the data are correlated wthn a group, the transformaton s needed frst to reduce the correlaton: ŷ j = y j α ȳ x j = x j α x (2.16) ν = ν α ν ê j = e j α ē where α = 1 [ σe/(σ 2 e 2 + n σν) 2 ] 1/2 After the transformaton, the new model s a Lnear Model : ŷ j = x T j β + u j where u j Normal(0, σ2 e ) (2.17) Then β s estmated by ˆβ = ( X T X) 1 XT Ŷ where X = col 1 t col 1 j n ( x T j ) and Ŷ = col 1 t col 1 j n (ŷ j ). After the trans- 26

36 formaton of the data, the bootstrap procedure of the lnear model can be used to the nested-error regresson model. 2.4 The Theoretcal Framework Recently, Das et al. (2004) presented a general lnear mxed model Y = X 0 β 0 + Zv + en (2.18) where Y R n s a vector of observed responses, X 0, n p 0, Zn q are known matrces and v and en are ndependent random varables wth dsperson matrces D(ψ) and Rn(ψ) respectvely. Here β 0 R p 0 and ψ R k are fxed parameters. The mxed ANOVA model, longtudnal models ncludng the Fay-Herrot model and the nested error regresson model are specal cases of the above framework. The model selecton ssue here arses from potental choces of the fxed effects covarates, whch are reported n the columns of the matrx X 0. In partcular, the true value of p 0 s not known. We have a collecton of vectors X R n, = 1,..., p, and any canddate X has a sub-collecton of these vectors as columns. Let S = {1,..., p} denote the set of ndces of the columns of covarates, and a typcal subset of S s gven by α = {j 1,..., jpα } S, and has p α {0, 1,..., p} elements. Xα = [X j1 :... : X j pα ] Rn R p α, that s, the columns of Xα are those vectors whose ndces match wth those of α. In a slght abuse of notaton, we wrte that the model 27

37 α s gven by Y = Xαβα + Zv + en. (2.19) The number and choce of canddate models may be restrcted by mposng condtons on the subset α = {j 1,..., jpα } S. For example, nested models may be consdered as a restrctve choce of canddate models α. It can be seen that the true data generatng process, gven by (2.18), s one of the canddate models. The goal of model selecton s to consstently dentfy the true model. We start wth a random vector w = (w 1,..., wn) whch s a realzaton from a R n dmensonal random varable, sometmes referred to as bootstrap weghts. We mpose the restrcton that w 0, whch also renforces the noton that these are weghts. For convenence, we defne the dagonal matrx W, whose dagonal entres are gven by w, that s, the th entry s w. When w has the Multnomal (m; 1/n,..., 1/n) dstrbuton for some m n, ths method s dentcal to the m-out-of-n bootstrap. In partcular, when m = n, we get the classcal botstrap of Efron (1979). Other weghts may be used as well, for example to obtan the Bayesan bootstrap. We use these bootstrap weghts n a most nterestng way. We defne the followng nner product between two vectors x and y n R n : < x, y > W = x T n Wy = w x y. =1 Thus, ths nner product s a weghted Eucldean nner product, where the weghts are the bootstrap weghts. Notce that we recover the usual Eucldean nner product when all the 28

38 weghts are equal to 1. In general, however, <, > W s a sem-nner product, as some of the weghts can be zero. Suppose P W,α s the projecton matrx for a projecton on the column space of Xα n terms of ths randomly weghted sem-nner product <, > W. Thus, for any vector v R n, ts projecton on the column space of Xα s gven by v W,α = P W,α v. Let In denote the dentty matrx n R n. For each model α, we obtan the followng number: Tα = E B (In P W,α )Y 2, where E B stands for the bootstrap expectaton, that s, expectaton condtonal on the observed data. Thus, n terms of our framework, E B stands for ntegraton wth respect to the random matrx W condtonal on all other random terms. We now develop some condtons for the bootstrap weghts {w 1,..., wn}. Frst, we assume that all the weghts are sem-exchangeable up to order 4, that s, all margnal dstrbutons nvolvng subsets of sze 4 or below from {w 1,..., wn} have an exchangeablty property. In partcular, we assume that the unvarate margnals of w for varous = 1,..., n are all dentcal, and the two-dmensonal margnal of any par (w, w j ) for j = 1..., n also have the same dstrbuton. We assume several lower order moments of w. In partcular, we reserve the notaton Ew 1 = µw and Vw 1 = σ 2 w, and defne the centered and scaled weghts W = σ 1 w (w µw) for = 1,..., n. Our techncal assumptons wll nvolve condtons on varous cross-moments, whch we state later. In terms of the centered and scaled bootstrap weghts W = (W 1,..., Wn), we now wrte the random matrx of bootstrap weghts as W = µwin+σww, where W = dag(w 1,..., Wn), 29

39 the dagonal matrx wth centered and scaled weghts. For use n our techncal proofs, we defne the matrx U = σwµ 1 w W. In the m-out-of-n bootstrap (moon-bootstrap) case, we have σ 2 w = n 1 m(1 n 1 ), µw = n 1 m and c w11 = n 2 m. We also defne the notaton n 1 X T αxα = Dn,α. The nverse of Dn,α exsts f and only f Xα has full column rank, and we assume that ths s the case. Our techncal methodology apples to cases where Dn,α does not have an nverse. However, ths scenaro results n a problem of dentfablty of models, and we assume that Xα has full column rank to avod any ssue of unque dentfcaton of models. We also defne Aα = n 1 D 1/2 α X T αuxαd 1/2 α Pα = ( Xα X T 1 αxα) X T α = n 1 XαD 1/2 α X T α Bα = UXαD 1/2 α (In + Aα) 1 D 1/2 α X T αu Pα = [ Pα In U + n 1 ] Bα Pα Eα = Pα Pα [In + U] In terms of these, we have P W,α = ( Xα X T 1 αwxα) X T α W = ( Xα X T 1 α [µwin + σww ] Xα) X T α [µwin + σww ] 30

40 = n 1 ( µwxα n 1 µwx T 1 α [In + U] Xα) X T α [In + U] = n 1 ( Xα n 1 X T 1 α [In + U] Xα) X T α [In + U] = n 1 ( Xα = n 1 Xα = n 1 Xα Dα + n 1 X T 1 αuxα) X T α [In + U] [ D 1/2 { α In + n 1 D 1/2 α X T αuxαd 1/2 α [ D 1/2 α {I n + Aα} D 1/2 ] 1 α X T α [In + U] = n 1 XαD 1/2 α [In + Aα] 1 D 1/2 = n 1 XαD 1/2 α α X T α [In + U] ( In Aα + Aα [In + Aα] 1 Aα [ D 1 α D 1/2 α AαD 1/2 α = n 1 Xα +D 1/2 α Aα (In + Aα) 1 AαD 1/2 ] α X T α [In + U] [ = Pα PαUPα + n 1 PαUXαD 1/2 α (In + Aα) 1 Dα 1/2 X T ] αupα [In + U] = Pα [ In U + n 1 Bα = Pα [In + U]. ] Pα [In + U] } D 1/2 ] 1 α X T α [In + U] ) D 1/2 α X T α [In + U] Usng ths, we have In P W,α = In Pα + Pα P W,α = In Pα + Pα Pα [In + U] = In Pα + Eα. 31

41 We now analyze the last term n greater detal. We have Eα = Pα Pα [In + U] = [ Pα Pα In U + n 1 ] Bα Pα [In + U] = [ Pα Pα In U + n 1 ] Bα Pα PαU = PαUPα n 1 PαBαPα PαU = PαUPα n 1 [ PαBαPα Pα In U + n 1 ] Bα PαU = PαUPα PαU + PαUPαU n 1 PαBαPα n 1 PαBαPαU = PαUPα Pα(In U)PαU n 1 PαBαPα(In + U) = PαU(In Pα) + PαUPαU n 1 PαBαPα(In + U). Snce Eα s Pα tmes another matrx, we have (In Pα)Eα = 0. Usng ths, we have Tα = E B (In P W,α )Y 2 = E B Y T (In Pα + Eα) T (In Pα + Eα)Y = Y T (In Pα)Y + E B Y T E T αeαy. Thus, we have a very neat decomposton of Tα, wth the frst term capturng the squared resduals n model α, and the other term contanng all the bootstrap related quanttes. Our next task s to compute E B (E T αeα). Note that all the bootstrap weghts are n the matrx U. We have E T αeα = (In Pα)UPαU(In Pα) + E R,α, 32

42 where E R,α s a symmetrc matrx, whch s entrely neglgble. The algebra for showng the dfferent components of E R,α s long and tedous, and t also s the source of much of our techncal assumptons, so we skp the detals here. Consequently, we now need to evaluate Vα = E B UPαU = ((E B U U j P α,j )) σwµ 2 2 w P α, = c w11 σ 2 wµ 2 w P α,j f = j f j For use later on, we defne the dagonal matrx Gα = (1 c w11 )σ 2 wµ 2 w dag(p α, ). In terms of the subsequent analyss, the dfference between Vα and Gα s neglgble. Thus, the propertes of Tα are governed by Zα = Y T (In Pα) [In + Gα] (In Pα)Y. The dfference between Tα and Zα s neglgble, although showng ths nvolves a lot of algebra. There s a Zα value correspondng to each model α. Also note that, at least n large samples, Zα s postve, whch s very consstent wth the fact that Tα s always postve. We compare the varous Tα s, and establsh that wth probablty tendng towards one the true model has the smallest value. Ths s agan a lot of algebra, but the man scheme for provng 33

43 ths uses the moment-generatng functon for Tα. The results follow from a calculaton based on dervatves of determnants of matrces lke A + tb. 2.5 Smulaton Study To examne the fnte-sample performance of the selecton procedures, a smulaton study was carred out based on bootstrappng pars wth dfferent bootstrap samplng sze m and varous model settngs. Frst, the smlar results of the lnear model from Shao s paper were replcated. The same data set used by Shao from the sold waste data example of Gunst and Mason (1980) was used agan; and the same value of β, (2, 9, 6, 4, 8), s chosen. Then the Fay-Herrot model was tested wth the same data set. After that, two dfferent selecton procedures of the nested-error model based on the same data set were carred out Algorthms and Settngs for Dfferent Models For the lnear model, the sze of the data set, n, s 40 and e, = 1,..., n, s d standard normal errors. The algorthm s: 1. Generate data, Y, accordng to the defned lnear model (2.1). 2. Bootstrap one sample wth sze m = 15 from the orgnal data. 3. For each canddate model: (1) estmate the unknown parameters by (2.2); (2) calculate the loss functon n (2.3). 4. Repeat step 2&3 B = 100 tmes. Choose the canddate model wth the mnmzed mean of loss functons as the optmal model. Get the 95% confdence nterval of fxed effect parameters from the 2.5% and 97.5% quantles of B bootstrap samples. 34

44 5. Repeat steps 2, 3 & 4 wth dfferent bootstrap samples of sze m = 20, 25, 30, Record the optmal model and correspondng confdence ntervals. Repeat steps tmes. 7. Repeat the whole process wth dfferent true model settngs. For the Fay-Herrot model, the same x s used as the lnear model and n = 40; for the varance components, chose A = 1 and D = dag(0.7i 8, 0.6I 8, 0.5I 8, 0.4I 8, 0.3I 8 ), I 8 means an 8 8 dentcal matrx. The algorthm s: 1. Generate data, Y, based on the model (2.5). 2. Bootstrap one sample wth sze m = 15 from the orgnal data. 3. For each canddate model: (1) estmate varance components A usng (2.7); (2) estmate the fxed effect parameters wth Â usng (2.6); (3) calculate the loss functon usng (2.8) and (2.9). 4. Repeat step 2&3 B = 100 tmes. Choose the canddate model wth the mnmzed mean of loss functons as the optmal model. Get the 95% confdence nterval of fxed effect parameters from the 2.5% and 97.5% quantles of B bootstrap samples. 5. Repeat steps 2, 3 & 4 wth dfferent bootstrap samples of sze m = 20, 25, 30, Record the optmal model and correspondng confdence ntervals. Repeat steps tmes. 7. Repeat the whole process wth dfferent true model settngs. 35

45 For the nested-error regresson model, we use the same data set wth sze n = 40, dvded nto t = 10 groups wth n = (5, 6, 4, 5, 3, 2, 5, 3, 3, 4); choose the varance components as σ 2 ν = 1 and σ 2 e = 1. The algorthm of the bootstrap wth group level s: 1. Generate data, Y, based on model (2.10). 2. Bootstrap one sample of m = 4 groups from the orgnal 10 groups of the data. 3. For each canddate model: (1) estmate varance components by Henderson s method 3 n (2.12); (2) estmate the fxed effect parameters based on ˆσ 2 ν and ˆσ 2 e usng (2.11) or (2.13); (3) calculate the loss functon usng (2.14) and (2.15). 4. Repeat steps 2&3 B = 100 tmes. Choose the canddate model wth the mnmzed mean of loss functons as the optmal model. Get the 95% confdence nterval of fxed effect parameters from the 2.5% and 97.5% quantles of B bootstrap samples. 5. Repeat steps 2, 3 & 4 wth dfferent bootstrap samples of sze m = 5, 6, 8, Record the optmal model and correspondng confdence ntervals. Repeat steps tmes. 7. Repeat the whole process wth dfferent true model settngs. The algorthm of the bootstrap wth element level of Nested-Error Regresson Model: 1. Generate data, Y, based on model (2.10). 2. Estmate varance components wth the whole data by Henderson s method 3 n (2.12). 3. Transfer the orgnal data to lnearzed data whch can be ftted by a lnear model usng the tranformaton n (2.16). 36

46 4. Bootstrap one sample wth sze m = 15 from the new data. 5. For each canddate model: estmate the fxed effect parameters lke a lnear model and calculate the correspondng loss functon. 6. Repeat steps 4&5 B = 100 tmes. Choose the canddate model wth the mnmzed mean of loss functons as the optmal model. Get the 95% confdence nterval of fxed effect parameters from the 2.5% and 97.5% quantles of B bootstrap samples. 7. Repeat steps 4, 5 & 6 wth dfferent bootstrap samples of sze m = 20, 25, 30, Record the optmal model and correspondng confdence ntervals. Repeat steps tmes. 9. Repeat the whole process wth dfferent true model settngs. The bootstrap estmators ˆΓn,m(α) were computed wth sze B = 100. The emprcal probabltes of selectng each model and the coverage probabltes of the true parameter value were based on 1,000 smulatons. The lengths of the confdence nterval were means of the 1,000 replcatons. For comparson, the results of emprcal selecton methods usng AIC, BIC and adjusted R 2 were also reported. The whole procedure was programmed n R Results At frst the results of the Lnear Model are reported. Table 2.1 s the selecton probabltes of the optmal model; Table 2.2 s the coverage probabltes of each parameter and the lengths of the correspondng confdence ntervals. From the results, the smlar characters of ths bootstrap selecton procedure as Shao s results are supported. The consstency of the 37

47 modfed bootstrap selecton procedure s supported clearly. Also, the modfed bootstrap selecton procedure can be substantally better than other selecton methods n most cases. After that the results of the Fay-Herrot model are reported n Table From the results n the Tables 2.3 and 2.4, the applcaton of ths bootstrap selecton procedure workng wth the Fay-Herrot model s supported. In Tables 2.5 and 2.6, the dfferent loss functons wth or wthout consderng the random effects are compared. The selecton probabltes of the loss functon addng the random effects are hgher than that of the loss functon wthout the random effects n some cases wth hgher bootstrap sample sze. The coverage probabltes do not have a sgnfcant dfference. Tables 2.7 and 2.8 report the emprcal selectng probabltes of the Nested-Error regresson model workng on the group level. Tables 2.9 and 2.10 report the results of workng on the element level. From the results, the selecton probabltes of workng on the element level are hgher n most cases, as the process of bootstrappng wth the element level after the transformaton s appled s closer to the process of the Lnear model. But the data sze s only 40 n these cases and especally the number of groups s only 10, maybe ths s part of the reason why the bootstrappng wth group level s not as good as the work wth element level. After that, the two dfferent defntons of loss functons are compared n Tables 2.11 and 2.12 wth a bgger data sze n = 160 to test the effect of ncreasng data sze. Smlar to the case of the Fay-Herrot model, the probabltes of the functon wth random effect are hgher when the bootstrap sample sze s bgger. In addton, except the case when the full model s the optmal model, the modfed bootstrap model selecton procedure wth smaller bootstrap sze m works better than the unmodfed bootstrap model selecton procedure, when bootstrap wth sze 40. From the 38

48 trend of probablty changng, the optmal choce of m clearly depends on the parameter β. When the sze of parameters of the optmal model s small, the modfed bootstrap selecton procedure wth smaller bootstrap sze s better than the procedure wth bgger bootstrap sze. If the sze of parameters of the optmal model ncreases, the optmal choce of bootstrap sze m wll ncrease. Ths changng trend s tested n more detal by more bootstraps of szes m = 11, 12, 13, 14, 15, 20, 25, 30, 40. The result s not lsted here. The emprcal probabltes of the coverage of the confdence nterval are the probabltes of the bootstrap confdence nterval coverng the true parameter values. From the results, the 95% confdence nterval from the modfed bootstrap selecton procedure wll cover the true parameter values wth more than probablty 0.9 n most cases. The coverage probabltes wth smaller bootstrap sample sze are hgher, but the lengths of the confdence ntervals wth bgger bootstrap sample sze are shorter. In concluson, the modfed bootstrap model selecton procedure can be appled to nonlnear model cases, lke lnear mxed models. Bootstrappng pars wth sze m works well n these models, where m/n 0 and m. 39

49 Table 2.1: Selecton probabltes for lnear model Bootstrap True β T Model m=15 m=20 m=25 m=30 m=40 AIC BIC R 2 (2,0,0,4,0) 1,4* ,2, ,3, ,4, ,2,3, ,2,4, ,3,4, ,2,3,4, (2,0,0,4,8) 1,4,5* ,2,4, ,3,4, ,2,3,4, (2,9,0,4,8) 1,4, ,2,4,5* ,3,4, ,2,3,4, (2,9,6,4,8) 1,2,3, ,2,4, ,3,4, ,2,3,4,5*

50 Table 2.2: The coverage probabltes and lengths of confdence ntervals for lnear model True β T Para m=15 m=20 m=25 m=30 m=40 Non Bootstrap (2,0,0,4,0) σ 2 e 0.974(1.320) 0.935(1.135) 0.922(0.989) 0.883(0.905) 0.810(0.759) 0.920(0.902) β (1.427) 0.983(1.201) 0.968(1.060) 0.957(0.958) 0.905(0.819) 0.948(0.855) β (2.106) 0.974(1.751) 0.945(1.579) 0.924(1.481) 0.841(1.397) 0.948(0.966) (2,0,0,4,8) σe (1.372) 0.950(1.167) 0.918(1.012) 0.884(0.912) 0.840(0.779) 0.933(0.918) β (1.500) 0.990(1.252) 0.970(1.089) 0.952(0.983) 0.927(0.838) 0.955(0.864) β (4.348) 0.985(3.431) 0.980(2.909) 0.944(2.556) 0.912(2.108) 0.944(1.700) β (4.799) 0.994(3.769) 0.978(3.164) 0.962(2.768) 0.915(2.269) 0.964(2.119) (2,9,0,4,8) σe (1.403) 0.940(1.178) 0.914(1.023) 0.885(0.922) 0.831(0.786) 0.927(0.927) β (1.576) 0.989(1.300) 0.971(1.116) 0.959(1.005) 0.932(0.852) 0.937(0.863) β (14.015) 0.990(10.255) 0.974(8.075) 0.958(6.867) 0.914(5.353) 0.951(4.108) β (6.292) 0.993(4.799) 0.986(3.890) 0.975(3.384) 0.937(2.632) 0.936(2.153) β (5.416) 0.993(4.072) 0.973(3.351) 0.971(2.945) 0.919(2.367) 0.946(2.151) (2,9,6,4,8) σe (1.555) 0.920(1.187) 0.907(1.050) 0.873(0.916) 0.814(0.791) 0.927(0.939) β (1.738) 0.988(1.365) 0.967(1.175) 0.959(1.029) 0.935(0.865) 0.940(0.873) β (17.204) 0.986(13.43) 0.973(10.983) 0.962(9.297) 0.931(7.400) 0.944(7.022) β (10.175) 0.988(7.186) 0.976(5.656) 0.971(4.735) 0.940(3.661) 0.957(3.277) β (6.962) 0.997(5.174) 0.983(4.196) 0.980(3.531) 0.939(2.750) 0.944(2.286) β (5.958) 0.995(4.365) 0.982(3.588) 0.970(3.061) 0.931(2.448) 0.945(2.203) 41

51 Table 2.3: Selecton probabltes for Fay-Herrot model, loss functon wthout random effect Bootstrap True β T Model m=15 m=20 m=25 m=30 m=40 AIC BIC R 2 (2,0,0,4,0) 1,4* ,2, ,3, ,4, ,2,3, ,2,4, ,3,4, ,2,3,4, (2,0,0,4,8) 1,4,5* ,2,4, ,3,4, ,2,3,4, (2,9,0,4,8) 1,4, ,2,4,5* ,3,4, ,2,3,4, (2,9,6,4,8) 1,2,3, ,2,4, ,3,4, ,2,3,4,5*

52 Table 2.4: The coverage probabltes and lengths of confdence ntervals for the Fay-Herrot model, loss functon wthout random effect True β T Para m=15 m=20 m=25 m=30 m=40 Non Bootstrap (2,0,0,4,0) σ 2 e 0.967(1.927) 0.934(1.666) 0.919(1.476) 0.876(1.333) 0.820(1.145) 0.641(0.656) β (1.748) 0.988(1.473) 0.970(1.293) 0.960(1.160) 0.927(0.994) 0.944(1.042) β (2.570) 0.973(2.089) 0.956(1.878) 0.926(1.838) 0.853(1.719) 0.939(1.199) (2,0,0,4,8) σe (1.960) 0.940(1.696) 0.917(1.490) 0.876(1.349) 0.816(1.149) 0.625(0.658) β (1.837) 0.990(1.526) 0.976(1.332) 0.963(1.192) 0.926(1.014) 0.939(1.048) β (5.288) 0.991(4.232) 0.973(3.622) 0.959(3.197) 0.920(2.634) 0.946(2.140) β (5.861) 0.991(4.607) 0.979(3.890) 0.970(3.411) 0.934(2.788) 0.946(2.582) (2,9,0,4,8) σe (1.990) 0.930(1.711) 0.909(1.506) 0.871(1.356) 0.817(1.162) 0.656(0.667) β (1.936) 0.991(1.583) 0.980(1.364) 0.963(1.220) 0.927(1.028) 0.947(1.057) β (16.619) 0.984(12.204) 0.967(9.712) 0.954(8.265) 0.895(6.491) 0.950(5.209) β (7.626) 0.998(5.829) 0.983(4.755) 0.979(4.095) 0.948(3.208) 0.954(2.738) β (6.525) 0.991(4.982) 0.981(4.136) 0.967(3.574) 0.932(2.882) 0.950(2.639) (2,9,6,4,8) σe (2.219) 0.927(1.757) 0.891(1.519) 0.858(1.367) 0.788(1.164) 0.615(0.652) β (2.084) 0.992(1.657) 0.985(1.416) 0.969(1.251) 0.936(1.047) 0.938(1.054) β (17.268) 0.919(15.283) 0.957(12.815) 0.963(11.158) 0.926(8.877) 0.938(8.329) β (11.206) 0.961(8.563) 0.968(6.800) 0.959(5.696) 0.924(4.379) 0.927(3.847) β (7.675) 0.968(6.094) 0.974(5.038) 0.978(4.275) 0.947(3.332) 0.936(2.819) β (7.121) 0.995(5.327) 0.982(4.348) 0.974(3.719) 0.938(2.949) 0.942(2.650) 43

53 Table 2.5: Selecton probabltes for Fay-Herrot model, loss functon wth random effect True β T Model m=15 m=20 m=25 m=30 m=40 (2,0,0,4,0) 1,4* ,2, ,3, ,4, ,2,3, ,2,4, ,3,4, ,2,3,4, (2,0,0,4,8) 1,4,5* ,2,4, ,3,4, ,2,3,4, (2,9,0,4,8) 1,4, ,2,4,5* ,3,4, ,2,3,4, (2,9,6,4,8) 1,2,3, ,2,4, ,3,4, ,2,3,4,5*

54 Table 2.6: The coverage probabltes and lengths of confdence ntervals for the Fay-Herrot model, loss functon wth random effect True β T Para m=15 m=20 m=25 m=30 m=40 (2,0,0,4,0) σ 2 e 0.966(1.891) 0.940(1.637) 0.917(1.458) 0.888(1.331) 0.834(1.134) β (1.738) 0.983(1.463) 0.969(1.284) 0.960(1.162) 0.910(0.991) β (2.517) 0.978(1.979) 0.952(1.716) 0.919(1.570) 0.862(1.331) (2,0,0,4,8) σe (1.926) 0.943(1.665) 0.914(1.476) 0.884(1.337) 0.817(1.141) β (1.835) 0.990(1.518) 0.978(1.326) 0.963(1.199) 0.934(1.015) β (5.279) 0.983(4.182) 0.965(3.511) 0.948(3.105) 0.900(2.515) β (5.775) 0.985(4.586) 0.972(3.858) 0.957(3.385) 0.924(2.777) (2,9,0,4,8) σe (2.021) 0.944(1.725) 0.910(1.529) 0.890(1.378) 0.829(1.171) β (1.938) 0.983(1.567) 0.972(1.369) 0.948(1.215) 0.910(1.032) β (16.438) 0.983(12.087) 0.965(9.504) 0.941(8.000) 0.884(6.120) β (7.678) 0.989(5.802) 0.976(4.762) 0.950(4.070) 0.913(3.189) β (6.587) 0.995(5.021) 0.985(4.143) 0.962(3.586) 0.921(2.886) (2,9,6,4,8) σe (2.267) 0.939(1.802) 0.901(1.546) 0.862(1.384) 0.790(1.172) β (2.099) 0.986(1.676) 0.974(1.422) 0.968(1.262) 0.928(1.054) β (16.369) 0.863(14.225) 0.918(12.238) 0.911(10.559) 0.892(8.551) β (11.030) 0.927(8.344) 0.942(6.633) 0.924(5.612) 0.903(4.316) β (7.533) 0.943(5.982) 0.968(4.967) 0.950(4.235) 0.916(3.312) β (7.162) 0.983(5.330) 0.979(4.347) 0.960(3.731) 0.934(2.976) 45

55 Table 2.7: Selecton probabltes for Nested-Error regresson model wth group level Bootstrap True β T Model m=4 m=5 m=6 m=8 m=10 AIC BIC R 2 (2,0,0,4,0) 1,4* ,2, ,3, ,4, ,2,3, ,2,4, ,3,4, ,2,3,4, (2,0,0,4,8) 1,4,5* ,2,4, ,3,4, ,2,3,4, (2,9,0,4,8) 1,4, ,2,4,5* ,3,4, ,2,3,4, (2,9,6,4,8) 1,2,3, ,2,4, ,3,4, ,2,3,4,5*

56 Table 2.8: The coverage probabltes and lengths of confdence ntervals for for Nested-Error regresson model wth group level True β T Para m=4 m=5 m=6 m=8 m=10 Non Bootstrap (2,0,0,4,0) σ 2 v 0.914(2.673) 0.883(2.326) 0.855(2.105) 0.804(1.810) 0.768(1.597) 0.943(5.833) σ 2 e 0.964(1.408) 0.935(1.228) 0.917(1.116) 0.871(0.942) 0.826(0.841) 0.919(1.035) β (2.338) 0.971(2.044) 0.952(1.842) 0.928(1.570) 0.890(1.398) 0.918(1.515) β (3.145) 0.958(2.473) 0.930(2.112) 0.885(1.795) 0.824(1.661) 0.938(1.070) (2,0,0,4,8) σv (2.757) 0.881(2.345) 0.851(2.100) 0.794(1.790) 0.760(1.570) 0.934(5.781) σe (1.476) 0.917(1.279) 0.886(1.144) 0.838(0.957) 0.797(0.844) 0.908(1.044) β (2.535) 0.968(2.172) 0.951(1.918) 0.913(1.618) 0.879(1.426) 0.916(1.522) β (5.079) 0.973(4.255) 0.961(3.742) 0.919(3.037) 0.891(2.593) 0.934(1.814) β (6.799) 0.985(5.003) 0.974(4.080) 0.939(3.197) 0.921(2.722) 0.949(2.421) (2,9,0,4,8) σv (2.896) 0.903(2.406) 0.863(2.118) 0.814(1.794) 0.767(1.566) 0.929(5.598) σe (1.743) 0.924(1.402) 0.896(1.210) 0.843(1.002) 0.794(0.875) 0.921(1.081) β (2.822) 0.979(2.277) 0.953(2.012) 0.923(1.664) 0.895(1.454) 0.920(1.511) β (13.957) 0.885(11.292) 0.912(9.585) 0.897(7.139) 0.852(5.980) 0.943(4.417) β (6.866) 0.971(5.705) 0.962(4.929) 0.945(3.850) 0.911(3.242) 0.943(2.336) β (7.922) 0.984(5.554) 0.976(4.452) 0.937(3.304) 0.894(2.830) 0.928(2.474) (2,9,6,4,8) σv (3.220) 0.917(2.499) 0.862(2.123) 0.785(1.728) 0.736(1.514) 0.936(5.835) σe (2.110) 0.907(1.580) 0.879(1.283) 0.824(1.012) 0.793(0.874) 0.914(1.086) β (2.964) 0.972(2.378) 0.963(2.051) 0.924(1.678) 0.889(1.457) 0.928(1.527) β (11.579) 0.680(12.512) 0.847(12.343) 0.926(10.322) 0.915(8.705) 0.932(8.088) β (9.159) 0.835(7.749) 0.889(6.808) 0.928(5.367) 0.919(4.514) 0.929(3.951) β (6.134) 0.889(5.724) 0.918(5.057) 0.945(4.041) 0.922(3.367) 0.940(2.530) β (9.033) 0.987(6.423) 0.972(4.956) 0.944(3.642) 0.915(3.025) 0.937(2.660) 47

57 Table 2.9: Selecton probabltes for Nested-Error regresson model wth element level Bootstrap True β T Model m=15 m=20 m=25 m=30 m=40 AIC BIC R 2 (2,0,0,4,0) 1,4* ,2, ,3, ,4, ,2,3, ,2,4, ,3,4, ,2,3,4, (2,0,0,4,8) 1,4,5* ,2,4, ,3,4, ,2,3,4, (2,9,0,4,8) 1,4, ,2,4,5* ,3,4, ,2,3,4, (2,9,6,4,8) 1,2,3, ,2,4, ,3,4, ,2,3,4,5*

58 Table 2.10: The coverage probabltes and lengths of confdence ntervals for Nested-Error regresson model wth element level True β T Para m=15 m=20 m=25 m=30 m=40 Non Bootstrap (2,0,0,4,0) β (2.390) 0.974(2.036) 0.953(1.815) 0.933(1.652) 0.898(1.417) 0.931(1.519) β (2.175) 0.966(1.852) 0.927(1.709) 0.891(1.635) 0.823(1.528) 0.941(1.065) (2,0,0,4,8) β (2.479) 0.977(2.090) 0.954(1.854) 0.932(1.681) 0.891(1.440) 0.928(1.519) β (4.363) 0.989(3.467) 0.961(2.943) 0.931(2.637) 0.876(2.203) 0.943(1.807) β (5.526) 0.983(4.264) 0.966(3.577) 0.947(3.129) 0.894(2.565) 0.931(2.412) (2,9,0,4,8) β (2.628) 0.983(2.164) 0.957(1.897) 0.935(1.709) 0.898(1.453) 0.918(1.527) β (13.728) 0.976(10.271) 0.959(8.354) 0.943(7.216) 0.885(5.945) 0.941(4.401) β (6.272) 0.993(4.752) 0.976(3.968) 0.960(3.428) 0.914(2.750) 0.939(2.328) β (6.294) 0.986(4.687) 0.974(3.837) 0.955(3.329) 0.904(2.663) 0.939(2.467) (2,9,6,4,8) β (2.814) 0.985(2.260) 0.960(1.946) 0.947(1.744) 0.905(1.475) 0.922(1.516) β (17.305) 0.955(14.590) 0.966(12.200) 0.951(10.542) 0.919(8.559) 0.918(8.039) β (10.208) 0.969(7.838) 0.970(6.359) 0.958(5.478) 0.922(4.368) 0.925(3.925) β (6.498) 0.990(5.052) 0.976(4.200) 0.965(3.614) 0.930(2.888) 0.929(2.516) β (7.140) 0.988(5.209) 0.976(4.223) 0.960(3.618) 0.916(2.868) 0.928(2.643) 49

59 Table 2.11: Compare of loss functons wthout and wth random effect for Nested-Error model, data sze n = 160. m=15 m=20 m=25 m=30 m=40 True β T Model w/o w w/o w w/o w w/o w w/o w (2,0,0,4,0) 1,4* ,2, ,3, ,4, ,2,3, ,2,4, ,3,4, ,2,3,4, (2,0,0,4,8) 1,4,5* ,2,4, ,3,4, ,2,3,4, (2,9,0,4,8) 1,4, ,2,4,5* ,3,4, ,2,3,4, (2,9,6,4,8) 1,2,3, ,2,4, ,3,4, ,2,3,4,5*

60 Table 2.12: The coverage probabltes and lengths of confdence ntervals for Nested-Error regresson model wth dfferent loss functons, data sze n = 160. m=15 m=20 m=25 True β T Para w/o w w/o w w/o w (2,0,0,4,0) σ 2 v 0.982(1.635) 0.982(1.638) 0.951(1.406) 0.951(1.409) 0.942(1.251) 0.946(1.255) σ 2 e 0.996(0.775) 0.996(0.773) 0.980(0.659) 0.975(0.658) 0.957(0.597) 0.957(0.595) β (1.200) 0.996(1.200) 0.987(1.030) 0.987(1.031) 0.969(0.922) 0.971(0.922) β (1.155) 0.982(1.215) 0.957(1.038) 0.955(1.051) 0.930(0.948) 0.928(0.922) (2,0,0,4,8) σv (1.658) 0.977(1.659) 0.954(1.424) 0.952(1.426) 0.939(1.262) 0.941(1.264) σe (0.776) 0.984(0.775) 0.977(0.668) 0.977(0.667) 0.959(0.595) 0.959(0.594) β (1.207) 0.993(1.208) 0.986(1.047) 0.986(1.046) 0.974(0.932) 0.974(0.933) β (1.894) 0.993(1.903) 0.974(1.539) 0.973(1.536) 0.952(1.331) 0.951(1.321) β (2.174) 0.996(2.175) 0.990(1.797) 0.989(1.794) 0.978(1.575) 0.978(1.568) (2,9,0,4,8) σv (1.644) 0.979(1.646) 0.953(1.407) 0.953(1.410) 0.936(1.245) 0.936(1.247) σe (0.787) 0.981(0.785) 0.966(0.673) 0.966(0.672) 0.949(0.596) 0.949(0.595) β (1.216) 0.994(1.217) 0.988(1.042) 0.987(1.042) 0.970(0.933) 0.969(0.934) β (4.316) 0.990(4.286) 0.981(3.725) 0.981(3.634) 0.958(3.339) 0.958(3.231) β (2.325) 0.997(2.323) 0.986(1.891) 0.988(1.886) 0.973(1.601) 0.971(1.595) β (2.205) 0.997(2.203) 0.991(1.853) 0.991(1.845) 0.976(1.602) 0.977(1.597) (2,9,6,4,8) σv (1.685) 0.972(1.685) 0.940(1.438) 0.940(1.438) 0.915(1.273) 0.915(1.273) σe (0.800) 0.978(0.800) 0.958(0.687) 0.958(0.687) 0.946(0.604) 0.946(0.604) β (1.229) 0.994(1.229) 0.981(1.050) 0.981(1.050) 0.968(0.932) 0.968(0.932) β (7.032) 0.999(7.032) 0.996(5.862) 0.996(5.862) 0.978(5.130) 0.978(5.130) β (3.531) 0.997(3.531) 0.987(2.926) 0.987(2.926) 0.965(2.542) 0.965(2.542) β (2.470) 0.997(2.470) 0.990(1.989) 0.990(1.989) 0.988(1.705) 0.988(1.705) β (2.376) 0.991(2.376) 0.984(1.950) 0.984(1.950) 0.977(1.718) 0.977(1.718) 51

61 Contnued from Table 2.12 m=30 m=40 True β T Para w/o w w/o w (2,0,0,4,0) σv (1.133) 0.935(1.138) 0.876(0.982) 0.883(0.986) σe (0.542) 0.944(0.540) 0.924(0.466) 0.924(0.465) β (0.833) 0.962(0.831) 0.924(0.726) 0.919(0.726) β (0.876) 0.892(0.845) 0.840(0.790) 0.834(0.728) (2,0,0,4,8) σv (1.142) 0.913(1.144) 0.857(0.982) 0.866(0.986) σe (0.540) 0.934(0.539) 0.891(0.464) 0.895(0.464) β (0.848) 0.956(0.847) 0.928(0.728) 0.929(0.727) β (1.178) 0.934(1.153) 0.893(1.002) 0.888(0.971) β (1.420) 0.956(1.412) 0.929(1.207) 0.922(1.194) (2,9,0,4,8) σv (1.133) 0.920(1.135) 0.866(0.978) 0.869(0.980) σe (0.539) 0.927(0.538) 0.882(0.469) 0.882(0.468) β (0.847) 0.960(0.847) 0.916(0.730) 0.916(0.730) β (3.076) 0.941(2.918) 0.898(2.693) 0.890(2.513) β (1.415) 0.958(1.405) 0.922(1.176) 0.923(1.167) β (1.450) 0.961(1.441) 0.927(1.229) 0.927(1.219) (2,9,6,4,8) σv (1.152) 0.914(1.152) 0.860(0.994) 0.860(0.994) σe (0.547) 0.934(0.547) 0.902(0.471) 0.902(0.471) β (0.852) 0.947(0.852) 0.918(0.732) 0.918(0.732) β (4.684) 0.965(4.684) 0.924(3.961) 0.924(3.961) β (2.314) 0.950(2.314) 0.926(1.947) 0.926(1.947) β (1.508) 0.955(1.508) 0.914(1.254) 0.914(1.254) β (1.536) 0.947(1.536) 0.896(1.305) 0.896(1.305) 52

62 Chapter 3 Predcton Error of Small Area Predctors Shrnkng both Mean and Varances 3.1 Introducton The goal of ths chapter s to ntroduce a methodology whch develops a dual shrnkage estmaton for both the small area means and varances n a unfed framework. In ths process, the smoothed varance estmators use nformaton from drect pont estmators and ther samplng varances and consequently for the smoothed small area estmators. The modelng perspectve s closely related to Wang and Fuller (2003), Rvest et al. (2003), You and Chapman (2006) and Hwang and Zhao (2009). The condtonal mean squared error of predcton (CMSEP) s used to evaluate the predcton error, whch s more akn to Booth and Hobert (1998). Booth and Hobert argued 53

63 strongly for CMSEP as opposed to uncondtonal mean squared error of predcton (UM- SEP). Recently, ths technque has agan been emphaszed by Lohr and Rao (2009) n the context of nonlnear mxed effect models. These authors favor CMSEP partcularly for nonnormal models when the posteror varance of small area parameters depends on the area specfc responses. Although they were nterested only n generalzed lnear mxed models where the posteror varance depends on area specfc responses, ths property of posteror varance s perhaps true for a stuaton wth posteror non-lnearty. A bref outlne of the rest of the chapter s as follows. Secton 3.2 contans the proposed model and estmaton method of small area parameters and the structural parameters. In Secton 3.3, we provde an estmaton of predcton error n terms of the condtonal mean squared error of predcton. Smulaton s performed n Secton 3.4. And the detaled forms of some matrx calculatons are gven n appendx 3.5 of ths chapter. 3.2 Model and Estmaton Method Model wth Assumpton As noted by Bell (2008), the prevous researchers Wang and Fuller (2003) and Rvest and Vandal (2003) essentally used the drect survey estmates to estmate the parameters related to samplng varance modelng and dd not use the drect survey varance estmates though they were avalable. We propose a herarchcal model that uses both the drect survey estmates and samplng varance estmates to estmate all the parameters that determnes the stochastc system. In ths process we explot the mean-varance jont modellng va a herarchcal model so that the fnal estmator s based on shrnkng both mean and varance. 54

64 We lke to menton that Hwang et al. (2009) consdered shrnkng means and varances n the context of mcroarray data analyss and prescrbed an mportant soluton where they plugged n a shrnkage estmator of varance nto the mean estmator. Thereby the nference regardng the mean does not take nto account the varance estmaton completely. Furthermore, ther model does not nclude any covarate nformaton. Let (X, S 2 ) be the par of drect survey estmates and ther samplng varances for the th area, = 1,, n. Let Z = (Z 1,, Z p ) T be the set of covarates avalable at the estmaton stage and β = (β 1,, βp) T be the assocated regresson coeffcents. We propose the followng herarchcal model: X θ, σ 2 Normal(θ, σ2 ) θ Normal(Z T β, τ2 ); = 1,, n, (3.1) (n 1)S 2 /σ2 χ2 n 1 σ 2 Gamma{α, γ}; = 1,, n, (3.2) where B = (α, γ, β T, τ 2 ) T, referred to as the structural parameters, are unknown and n represents the sample sze for a smple random sample (SRS) from the th area. For a complex survey desgn the degrees of freedom of the ch-square dstrbuton need to be determned carefully (e.g., Maples and Huang 2009). Note that the ch-square dstrbuton for the sample varance s vald for only a random sample. Note that σ 2 are the samplng varances of the X s and are usually estmated by S 2 s. You and Chapman (2006) dd not consder the second level of samplng varance modellng. Thus ther model can be treated as the Bayesan verson 55

65 of the models consdered n Rvest and Vandal (2003) and Wang and Fuller (2003). The second level of (3.2) mght be further extended as Gamma{γ, exp(z T β 2 )/γ} to accommodate covarate nformaton n the varance modelng. The nference can be made from the condtonal dstrbuton of θ (the parameter of nterest) gven the data (X, S 2, Z ), = 1,, n. However, ths does not have a closed form expresson. We adopted rejecton samplng to overcome the stuaton Estmaton of the Small Area Parameters If the parameters B are known, the jont dstrbuton of {X, S 2, θ, σ2 } s π(x, S 2, θ, σ2 B) = 1 2πσ 2 [ (n 1) S2 σ 2 exp ( = exp ( exp{ (X θ )2 2σ 2 1 } Γ{(n 1)/2}2 (n 1)/2 ](n 1)/2 1 exp{ (n 1)S2 ( ) n 2σ 2 } 1 σ 2 exp{ (θ ZT β)2 ( ) α τ 2 } Γ(α)γ α σ 2 exp{ 1 γσ 2 } { (X θ )2 2σ 2 (n 1)S2 2σ 2 (θ ZT β)2 } 2τ 2 1 γσ 2 1 σ 2 [ ) n 2 +α+1 ( ) τ 2 (θ ZT β)2 2τ 2 1 σ 2 Γ(α)γ α { (X θ ) (n 1)S γ } 1 ] 1 2πτ 2 σ 2 ) n 2 +α+1 1 τ 2 Γ(α)γ α. (3.3) 56

66 Therefore the condtonal dstrbuton of σ 2 and θ gven the data (X, S2 ), = 1,, n and B are π(σ 2 X, S2, B) exp [ π(θ X, S 2, B) exp { (X ZT β)2 { } ] (n 1) 2(σ 2+τ2 ) 2 S 2 + γ 1 1 σ 2 (σ 2)(n 1)/2+α+1 (σ 2 + τ2 ) 1/2, (3.4) (θ ZT β)2 2τ 2 } ψ (n /2+α), (3.5) where ψ { 0.5(X θ ) (n 1)S }. (3.6) b Note that the above condtonals are obtaned by ntegratng out θ and σ 2 respectvely from the jont dstrbuton (3.3). From now on we wll borrow the notatons from Booth and Hobert (1998) for determnng all related stochastc dstrbutons. In ths context, a meanngful pont estmator for θ s ts condtonal mean, θ (B; X, S 2 ) = E B (θ X, S2 ), (3.7) where E B represents the expecton wth respect to the condtonal dstrbuton of θ wth known B. A sensble predcton error can be measured by the posteror varance ν (B; X, S 2 ) = var B (θ X, S2 ). (3.8) 57

67 Nether (3.7) nor (3.8) has any closed form expresson. Therefore, we have numercally computed usng the followng approach. 1. Generate R random numbers from π(θ X, S 2, B) wth rejecton samplng method: 1a. Generate a canddate random sample θ (c) from a Normal(Z T β, τ2 ); 1b. Generate a unform random number U; 1c. If θ ( X U < 1 + Gx )2 (n /2+α) n + 2α 1 where G 2 x = {(n 1)S 2 + 2/γ}/(n + 2α + 1), then accept the candate random sample as θ (r) ; otherwse go back to 1a; 1d. Repeat 1a to 1c R tmes. 2. Approxmate θ and ν by and θ = 1 R θ (r) R r=1 ν = 1 R (θ (r) R θ ) 2 r=1 In pratce B s unknown. We estmate them by maxmzng the margnal lkelhood and the detals are gven n the next secton. Let ˆB denote the correspondng estmator. Substtutng B by ˆB n formulas (3.7) and (3.8) wll produce the estmates of θ and ν : ˆθ = θ ( ˆB; X, S 2 ) = E ˆB(θ X, S 2 ) (3.9) ˆν = ν ( ˆB; X, S 2 ) = var ˆB(θ X, S 2 ). (3.10) 58

68 The estmator (3.9) s popularly known as the emprcal Bayes estmator and (3.10) s the estmated Bayes rsks. It s well known that the ν only consder the varablty n the predcton procedure, but not the varablty due to the parameter estmaton ( ˆB). We accounted ths addtonal varablty by adaptng the technque of Booth and Hobert (1998) n ths set up. The detals are dscussed n Secton Estmaton of the Stuctural Parameters In pracce, B s unknown. We obtan the maxmum lkelhood estmate of B by maxmzng the margnal lkelhood L M = n =1 L M of {(X, S 2, Z )n =1 ; B}, where L M Constant Γ( n 2 + α) τ 2 Γ(α)γ α exp { (θ ZT β)2 2τ 2 } ( n ) ψ 2 +α dθ. (3.11) Therefore, the log-lkelhood s n [ log(l M ) = logconstant + log{γ( n 2 + α)} log{γ(α)} 1 2 logτ2 =1 αlog(γ) + log exp { (θ ZT β)2 2τ 2 } ( n ) ψ 2 +α dθ The maxmzng margnal lkelhood equaton s log(l M ) B = 0 where L M s the complete data lkelhood and L M = n =1 L M. 59

69 The detaled expresson of the partal dervatve correspondng to β s: log(l M ) β n = =1 ( n = E =1 θ Z Z T β { τ 2 exp (θ ZT β)2 } 2τ 2 } exp { (θ ZT β)2 2τ 2 Z θ ZT β ) τ 2 ( n ψ ( n ψ 2 +α ) 2 +α ) dθ dθ (3.12) where the expectaton corresponds to condtonal dstrbuton of θ, π(θ X, S 2, B). The estmate of β s obtaned by solvng log(l M )/ β = 0, and we obtan n ˆβ = ( Z Z T n ) 1 { Z E(θ )}. (3.13) =1 =1 The expresson of the partal dervatve correspondng to τ 2 s: log(l M ) τ 2 = n 2τ 2 + = n 2τ 2 + n =1 n E =1 (θ Z T β)2 { 2(τ 2 ) 2 exp (θ ZT β)2 2τ 2 exp { (θ ZT β)2 } 2τ 2 ( (θ Z T β)2 ) 2(τ 2 ) 2 } ( n ) ψ 2 +α dθ ) 2 +α dθ ( n ψ (3.14) Then the estmate of τ 2 s obtaned by solvng log(l M )/ τ 2 = 0, and we obtan ˆτ 2 n = E(θ Z T β)2 /n. (3.15) =1 60

70 Smlarly, we estmate α and γ by solvng Sα = 0 and Sγ = 0 where Sα = Elog(L M ) α n = =1 Sγ = Elog(L M ) γ log {Γ( n 2 + α)} nlog {Γ(α)} nlog(γ) = nα γ + 1 γ 2 n ( n ( ) α)e ψ =1 n E{log(ψ )} (3.16) =1 (3.17) To solve equatons (3.13), (3.15), (3.16) and (3.17), we use the EM algorthm. At the E-step of the t th teraton, we calculate the expectaton wth respect to the condtonal densty of θ, π(θ X, S 2, B(t 1) ), = 1,..., n, wth B (t 1) denotng the estmate of B at the (t 1) th teraton. The rejecton samplng method s used to generate samples from the condtonal dstrbuton of θ s. At the M-step of the t th teraton, we maxmze L M by solvng equatons (3.13), (3.15), (3.16) and (3.17) condtonal on the values of expectatons obtaned n the E-step. To solve these equatons, we also need Sαα = n =1 [log {Γ( n ] 2 + α)} log {Γ(α)} + V ar{log(ψ )} Sαγ = ( ) n =1 [ γ γ 2 E 1 ( n { }] ψ 2 + α) 1 γ 2 Cov 1, log(ψ ψ ) Sγα = Sαγ Sγγ = { n α =1 γ 2 (n + 2α) 1 γ 3 E ( +( n 2 + α)2 1 γ 4 V ar )} 1 ψ ( ) 1 + ( n ψ 2 + α) 1 γ 4 E ( 1 ψ 2 ) (3.18) 61

71 Then α and γ can be estmated by teratve Newton-Raphson method: l α γ (l 1) 1 α Sαα Sαγ Sα = γ Sγα Sγγ Sγ 3.3 Predcton Error Calculaton Mean Squared Error of Predcton Followng the defnton of condtonal mean squared predcton error (CMSEP) of Booth and Hobert (1998), the predcton error varance s, CMSEP (B; X, S 2 ) = E B [{ˆθ θ (B; X, S 2 )}2 X, S 2 ] where ˆθ and θ (B; X, S 2 ) are as defned n (3.9) and (3.7). Snce θ (B; X, S2 ) θ and ˆθ θ (B; X, S 2 ) are condtonally ndependent gven X and S2, CMSEP (B; X, S 2 ) = var B (θ X, S2 ) + E B [{ˆθ θ (B; X, S 2 )}2 X, S 2 ] = ν (B; X, S 2 ) + c (B; X, S2 ) (3.19) where c (B; X, S 2 ) s the correcton term due to the estmaton of unknown parameters B. The correcton contrbuton s of order Op(n 1 ). Note that the above measure s stll not usable because t nvolved the unknown structural parameters B. It s natural to plug-n the 62

72 estmate ˆB of B and get a usable measure of mean squared predcton error CMSEP = MSE( ˆB; X, S 2 ) = ν ( ˆB; X, S 2 ) + c ( ˆB; X, S 2 ) = ˆν + ĉ As t wll be clear n the next secton (as well as from small area estmaton lterature) the estmator (3.19) has consderable bas. Typcally the order of bas s Op(n 1 ) due to estmaton of ν by ˆν Bas Correcton for ν ( ˆB; X, S 2 ) For the bas correcton of the estmated condtonal varance ν ( ˆB; X, S 2 ) we expand ths about B: ˆν = ν ( ˆB; X, S 2 ) = ν (B; X, S2 ) + ( ˆB B) T ν (B; X, S2 ) B ( ˆB B) T ν (B; X, S2 ) B B T ( ˆB B) + Op(n 2 ). (3.20) Then the approxmated bas nvolved n ν ( ˆB; X, S 2 ) s E( ˆB B) ν (B; X, S2 ) B tr{ ν (B; X, S2 ) B B T I 1 (B)}, where I(B) s the Fsher s nformaton matrx obtaned from the margnal lkelhood L M, [ 2 ] Isr = E Bs Br log lkelhood(x, S2 ) 63

73 The second order dervatve of the log-lkelhood s gven n secton (3.18) and appendx and can be used to compute the nformaton matrx. Handlng the bas analytcally s dffcult. Booth and Hobert (1998) adopted bootstrap bas correcton nstead. Ths s due to the fact that there s no closed form expressons avalable for ths bas terms. The bootstrap method requres repeated estmaton of model parameters based on re-sampled data. Ths often pauses practcal and computatonal dffcultes n ths herarchcal model. As we wll see n the next subsecton handlng the second term n (3.19) s also dffcult due to the same dffculty of not havng any closed form expresson. Thus, f we have to do the bootstrap for the bas correcton of ν, the estmaton of c can also be done n the same run. It s not necessary to obtan any analytcal approxmatons. The resamplng technques has been used n Jang et al. (2002), Hall and Mat (2006). We wll be applyng the rejecton samplng method nstead where there s no need of estmatng model parameters repeatedly. Let ˆB be the maxmum lkelhood estmator of B as proposed n the prevous secton. ( ) Followng Cox & Snell (1968, Equaton 20), we approxmate E ˆB B up to O(n 1 ). Defne I 1 = (I rs ) wth as the nverse of I = (Irs), where Irs = E( V ( ) rs ) and V () rs = 2 logl M / Br Bs. The bas n the s th element of ˆB s E( ˆBs Bs) 1 I rs I tu (K 2 rtu + 2J t,ru ) (3.21) r t u K rst = E(W ( ) rst ), J r,st = E{ U r () V () st }, W () rst = U() r 3 logl M Br Bs B t, = loglm Br where the L M s the margnal lkelhood of (X, S 2 ) defned n the prevous secton. The 64

74 detaled formulas are gven n appendx. Wth ( ) ν T { ν = B α, ν γ, ν β, ν } τ 2 (3.22) and ν (B; X, S 2) = var B (θ X, S2 ), we have ν α = Cov {θ 2, log(ψ )} + 2E θ Cov {θ, log(ψ )} ν γ = (n 2 + α) 1 { γ 2 Cov (θ 2, 1 ) 2E θ ψ Cov (θ, 1 } ) ψ ν β = 1 { τ 2 Cov (θ 2, θ ) 2E θ Cov } (θ, θ ) ν τ 2 = 1 { 2(τ 2 ) 2 Cov {θ 2, (θ β)2 } 2E θ Cov {θ, (θ β) 2 } } where the means that the expecton, varrance and covarance that are calculated wth respect to the condtonal dstrbuton of θ at the estmated parameters value. The approxmated expresson of ν (B; X, S 2)/ B BT s gven n the appendx. The expecton, varance and covarance are computed based on the Monte Carlo approxmaton. For examples, E ˆ (θ ) = R r=1 θ (r) /R, where (θ (1),, θ (R) ) are random numbers generated from the condtonal dstrbuton of θ. Therefore, ˆν ν ( ˆB; X, S 2 ) ( ˆB B) ν B 1 2 tr{ ν (B; X, S2 ) B B T I 1 (B)}. Ths expresson s second order correct meanng the bas s of order op(n 1 ). 65

75 3.3.3 Approxmaton of c (B; X, S 2 ) The defnton of c (B; X, S 2) s gven n the prevous secton by c (B; X, S2 ) = E B [{ˆθ θ (B; X, S 2)}2 X, S 2] where ˆθ θ (B; X, S 2) = θ ( ˆB; X, S 2) θ (B; X, S2 ). Usng the Taylor seres expanson and gnorng the term Op( ˆB B 2 ) we wrte θ ( ˆB; X, S 2 ) θ (B; X, S2 ) = AT (B; X, S2 )( ˆB B) (3.23) where A T (B; X, S2 ) = θ (B; X, S2 ) ( B θ (B; X, S 2 = ) α, θ (B; X, S2 ) γ, θ (B; X, S2 ), θ (B; X, S2 ) ) β τ 2 Snce θ (B; X, S 2 ) = E B (θ X, S2 ), the components of A are θ (B; X, S 2) = E (θ α )E {log(ψ )} E {θ log(ψ )} = Cov {θ, log(ψ )} θ (B; X, S 2) = ( n γ 2 + α) 1 { γ 2 E ( ) θ E (θ ψ )E ( )} 1 = ( n ψ 2 + α) 1 γ 2 Cov (θ, 1 ) ψ θ (B; X, S 2) = 1 [ β τ 2 E (θ 2 ) {E (θ )} 2] = 1 τ 2 var (θ ) θ (B; X, S 2 ) τ 2 = 1 [ 2(τ 2 ) 2 E {θ (θ β) 2 } E (θ )E (θ β) 2] = 1 2τ 2 Cov {θ, (θ β) 2 } As a consequence of (3.23) c (B; X, S 2 ) AT (B; X, S2 )I 1 (B)A (B; X, S 2 ) (3.24) 66

76 and the approxmaton s correct up to Op(n 1 ). Substtutng ˆB nto formula (3.24) wll yeld an estmate of the correcton term, c ( ˆB; X, S 2 ) AT ( ˆB; X, S 2 )I 1 ( ˆB)A ( ˆB; X, S 2 ). As the estmated nformaton matrx s {I( ˆB)} 1 = Op(n 1 ), the error n the approxmaton s op(n 1 ). Smlar to the case of ν, the tems n A ( ˆB; X, S 2 ) do not have closed forms. They can be approxmated by the samples generated n the estmaton procedure. Summng up all the dervatons and approxmatons we obtan the followng result. Theorem. The estmated condtonal mean squared of predcton error for ˆθ s CMSP E = ν ( ˆB; X, S 2 ) ( ˆB B) ν B 1 2 tr{ ν (B; X, S2 ) B B T I 1 (B)} +A T ( ˆB; X, S 2 )I( ˆB) 1 A ( ˆB; X, S 2 ). (3.25) The formula s second order correct n the sense that t has a bas of Op(n 2 ). 3.4 Smulaton Study Smulaton desgn To check the fnte sample performance of the proposed estmators, a smulaton set up closely related to Wang and Fuller (2003) was consdered. To smplfy the smulaton, we do not choose any covarate Z, only (X, S 2 ) are generated. Frst, generate observatons for each small area usng the model X j = β + u + e j, j = 1,..., n, = 1,..., n, 67

77 where u Normal(0, τ 2 ) and e j Normal(0, n σ 2 ). Then the random effects model for the small area mean s X = β + u + e, = 1,..., n, (3.26) where X = X = n 1 n j=1 X j, e = e = n 1 n j=1 e j. Therefore, X Normal(θ, σ2 ), where θ = β + u and θ Normal(β, τ 2 ), and e Normal(0, σ 2 ). We estmated the mean for each small area, θ, = 1,..., n. We estmated σ 2 wth the unbased estmator S 2 = 1 n 1 (X n 1 n j X ) 2 (3.27) j=1 It s to be noted that (n 1)S 2/σ2 χ2. Lke Wang and Fuller (2003), we set all (n 1) n equal to m that eased our programmng efforts. However, the samplng varances were stll unequal by choosng one-thrd of the σ 2 equal to 1, one-thrd equal to 4, and one-thrd equal to 16. In the smulaton, we set β = 10 and took three dfferent values of τ 2, 0.5, 1, 4. For each of τ 2, we generated 1,000 samples for each of the combnatons (m, n) = (9, 36), (18, 180). In table 3.1 we present the mean, emprcal standard devaton and the estmated standard devaton of estmates of β and τ 2. The results are consstent wth the large sample theory of maxmum lkelhood estmaton. Method of analyss We also compute the estmates based on the Wang and Fuller approach. Table 3.2 and Table 3.3 provde the numercal results for comparng two methods: (I) the proposed method and (II) Wang and Fuller (2003). All the results are averaged over areas wthn the group havng the same samplng varances. We calculated the bas and predcton mean squared error of ˆθ based on 1,000 replcatons. We used emprcal measures 68

78 Table 3.1: Results of the smulaton study, and here we present estmate (Est.), emprcal standard devaton (SD) for β and τ 2. We set β = 10. For β Proposed Method Wang and Fuller (2003) τ 2 Est. SD Est. SD n = 36, m = n = 180, m = For τ 2 Proposed Method Wang and Fuller (2003) τ 2 Est. SD Est. SD n = 36, m = n = 180, m = of relatve bas and coeffcent of varaton to quantfy the performaces of dfferent MSE estmators. Relatve bas of the MSE estmator was defned by E{ MSE RB = } MSE MSE (3.28) for = 1,..., 3, where E{ MSE } was estmated emprcally as the average of values of MSE over replcatons. MSE was defned as the average value of (ˆθ θ ) 2. The coeffcent of varaton of the MSE estmator was taken to be CV = [E{ MSE MSE } 2 ] 1 2 MSE (3.29) for = 1,..., 3. 69

79 Table 3.2: Results of MSE estmator, n = 36, m = 9 τ 2 = 0.5 τ 2 = 1 τ 2 = 4 σ 2 I II I II I II Bas MSE RB CV Table 3.3: Results of MSE estmator, n = 180, m = 18 τ 2 = 0.5 τ 2 = 1 τ 2 = 4 σ 2 I II I II I II Bas MSE RB CV

80 Smulaton results The proposed method has consderable lower predcton mean squared errors. The gan s a maxmum when the rato of samplng varance to model varance s the largest. The reducton s almost half when the number of small areas s 36. The relatve bas s always less than Wang and Fuller (2003) except the case τ 2 = 4, σ = 1 (hgh model varance, low samplng varance). The proposed method mght have low underestmaton (about 5%) n case of n = 36. On the other hand, the Wang Fuller method can have very large overestmaton. In terms of coeffcent of varaton the proposed method outperforms Wang and Fuller (2003). 3.5 Appendx: Matrx Calculaton Results Computaton of ˆB B In ths secton of the appendx, the detaled expresson of equatons (3.21) are gven. The log-lkelhood s already gven n the secton The frst order dervatve terms, U r (), are gven n the formulas (3.16), (3.17), (3.12) and (3.14) n secton For the second order dervatve, V () αα, V () γγ, and V () αγ are gven n equaton (3.18) as Sαα, Sγγ, and Sαγ. The other terms are V () ββ = 1 τ (τ 2 ) 2 V ar(θ β) V () τ 2 τ 2 = 1 2(τ 2 ) 2 1 (τ 2 ) 3 E(θ β) (τ 2 ) 4 V ar((θ β)2 ) 71

81 The other cross terms of the second order dervatve are V () αβ = 1 τ 2 Cov(log(ψ ), θ β) V () ατ 2 = 1 2(τ 2 ) 2 Cov(log(ψ ), (θ β)2 ) V () γβ = (n 2 + α) 1 γ 2 τ 2 Cov( 1 ψ, θ β) V () γτ 2 = (n 2 + α) 1 2γ 2 (τ 2 ) 2 Cov( 1 ψ, (θ β) 2 ) V () βτ 2 = 1 (τ 2 ) 2 E(θ β) + 1 2(τ 2 ) 3 Cov(θ β, (θ β)2 ) The thrd dervatve terms are: W () ααα = logγ ( n 2 + α) logγ (α) Cov{log(ψ ), log 2 (ψ )} 2Elog(ψ )V ar{log(ψ )} ( W () γγγ = 2α γ 3 + (n 2 + α) 6 γ 4 E 1 6 ψ γ 5 E 1 ψ 2 ) + 2 γ 6 E 1 ψ 3 } 1 ψ 2 ) } { ( n 2 + α)2 6 γ 5 V ar( 1 ) + 1 ψ γ 6 Cov( 1, ψ { ( n 2 + α)3 1 γ 6 Cov( 1 1, ψ ψ 2 ) 2E 1 V ar( 1 ) ψ ψ W () βββ = 1 [ (τ 2 ) 3 Cov{θ β, (θ β) 2 ] } 2E(θ β)v ar(θ β) W () τ 2 τ 2 τ 2 = 1 (τ 2 ) (τ 2 ) 4 E(θ β)2 3 2(τ 2 ) 5 V ar{(θ β)2 } + 1 8(τ 2 ) 6 [ Cov{(θ β) 4, (θ β) 2 } 2E(θ β) 2 V ar{(θ β) 2 ] } 72

82 The other cross terms of the thrd order dervatve are W ααγ () = ( n 2 + α) 1 [ γ 2 Cov{log 2 (ψ ), 1 } 2Elog(ψ ψ )Cov{log(ψ ), 1 ] } ψ 2 γ 2 Cov{log(ψ ), 1 } ψ W () ααβ = 1 τ 2 [ Cov{log 2 (ψ ), θ β} 2Elog(ψ )Cov{log(ψ ), θ β} W () αατ 2 = 1 2(τ 2 ) 2 [ Cov{log 2 (ψ ), (θ β) 2 } 2Elog(ψ )Cov{log(ψ ), (θ β) 2 } W αγγ () = 1 γ 2 2 γ 3 E 1 1 ψ γ 4 E 1 ψ 2 ( n 2 + α)2 1 [ γ 4 Elog(ψ )V ar( 1 ] ) ψ Cov{log(ψ ) [ + ( n 2 + α) 1 γ 4 2V ar( 1 ) Cov{log(ψ ψ ), 1 ψ, 1 ψ } Cov{log(ψ ), 1 ψ }E 1 ψ W () αγβ = 1 γ 2 τ 2 Cov( 1, θ ψ β) ( n [ 2 + α) 1 1 γ 2 τ 2 Cov{log(ψ ), θ ψ β} Cov{log(ψ ), θ β}e 1 Elog(ψ ψ )Cov( 1 ], θ ψ β) W () αγτ 2 = 1 2γ 2 (τ 2 ) 2 Cov{ 1, (θ ψ β) 2 } ( n 2 + α) 2γ 2 (τ 2 ) 2 [ Cov{log(ψ ) ] ] ] 1 ψ 2 } 1, (θ ψ β) 2 } Cov{log(ψ ), (θ β) 2 }E 1 Elog(ψ ψ )Cov{ 1, (θ ψ β) 2 ] } W () αββ = 1 [ (τ 2 ) 2 Cov{log(ψ ), (θ β) 2 ] } 2E(θ β)cov{log(ψ ), θ β} W () αβτ 2 = 1 (τ 2 ) 2 Cov{log(ψ ), θ β} 1 [ 2(τ 2 ) 3 Cov{log(ψ ) (θ β), (θ β) 2 } Cov{log(ψ ), (θ β) 2 }E(θ β) Elog(ψ )Cov{θ β, (θ β) 2 ] } W () ατ 2 τ 2 = 1 4(τ 2 ) 3 Cov{log(ψ ), (θ β)2 } 1 [ 4(τ 2 ) 4 Cov{log(ψ ) (θ β) 2, (θ β) 2 } Cov{log(ψ ), (θ β) 2 }E(θ β) 2 Elog(ψ )V ar{(θ β) 2 ] } 73

83 W () γγβ = (n 2 + α) 2 γ 3 τ 2 Cov(θ β, 1 ) + ( n ψ 2 + α) 1 γ 4 τ 2 Cov(θ β, 1 ψ 2 ) { } +( n 2 + α)2 1 1 γ 4 τ 2 Cov(θ β, ψ 2 ) 2E 1 Cov(θ ψ β, 1 ) ψ W () n γγτ 2 = ( 2 + α) γ 3 (τ 2 ) 2 Cov{(θ β)2, 1 } + ( n 2 + α) ψ 2γ 4 (τ 2 ) 2 Cov{(θ β)2 1, ψ 2 } + (n 2 + α)2 [ ] 2γ 4 (τ 2 ) 2 Cov{(θ β) 2 1, ψ 2 } 2E 1 Cov{(θ ψ β) 2, 1 } ψ W () [ γββ = (n 2 + α) 1 γ 2 (τ 2 ) 2 Cov{ 1 (θ ψ β), θ β} Cov( 1, θ ψ β)e(θ β) E 1 ] V ar(θ ψ β) W () n γβτ 2 = ( 2 + α) γ 2 (τ 2 ) 2 Cov( 1, θ ψ β) + ( n 2 + α) [ 2γ 2 (τ 2 ) 3 Cov{ 1 (θ ψ β), (θ β) 2 } Cov{ 1, (θ ψ β) 2 }E(θ β) E 1 Cov{θ ψ β, (θ β) 2 ] } W () n γτ 2 τ 2 = ( 2 + α) γ 2 (τ 2 ) 3 Cov{ 1, (θ ψ β) 2 } + ( n 2 + α) [ 4γ 2 (τ 2 ) 4 Cov{ 1 (θ ψ β) 2, (θ β) 2 } Cov{ 1, (θ ψ β) 2 }E(θ β) 2 E 1 V ar{(θ ψ β) 2 ] } W () ββτ 2 = 1 (τ 2 ) 2 2 (τ 2 ) 3 V ar(θ β) + 1 [ 2(τ 2 ) 4 V ar{(θ β) 2 } 2E(θ β)cov{(θ β) 2 ], θ β} W () βτ 2 τ 2 = 2 (τ 2 ) 3 E(θ β) 2 (τ 2 ) 4 Cov{(θ β)2, θ β} + 1 4(τ 2 ) 5 [ Cov{(θ β) 3, (θ β) 2 } V ar{(θ β) 2 }E(θ β) ] All the other terms are equal to the above values accordng to symmetry. 74

84 3.5.2 Second Order Correcton of ν In the equaton (3.20), the second order dervatve of ν s ncluded. The detaled formula s gven n the followng: 2 ν α 2 = α Cov(θ2, logψ ) + 2 α Eθ Cov(θ, logψ ) + 2Eθ α Cov(θ, logψ ) α Cov(θ2, logψ ) = Cov(θ2 logψ, logψ ) + Cov(θ2, logψ )Elogψ + Eθ2 Cov(logψ, logψ ) α Eθ = Cov(θ, logψ ) α Cov(θ, logψ ) = Cov(θ logψ, logψ ) + Cov(θ, logψ )Elogψ + Eθ Cov(logψ, logψ ) 2 ν α γ = γ Cov(θ2, logψ ) + 2 γ Eθ Cov(θ, logψ ) + 2Eθ γ Cov(θ, logψ { ) γ Cov(θ2, logψ ) = (n /2 + α) 1 γ 2 Cov(θ 2logψ, ψ 1 ) Cov(θ 2, ψ 1 )Elogψ } Eθ 2Cov(logψ, ψ 1 ) 1 γ 2 Cov(θ2, 1 ψ ) γ Eθ = (n /2 + α) 1 γ 2 Cov(θ, ψ 1 ) { γ Cov(θ, logψ ) = (n /2 + α) 1 γ 2 Cov(θ logψ, ψ 1 ) Cov(θ, 1 )Elogψ ψ } Eθ Cov(logψ, ψ 1 ) 1 γ 2 Cov(θ, ψ 1 ) 2 ν α β = β Cov(θ2, logψ ) + 2 β Eθ Cov(θ, logψ ) + 2Eθ β Cov(θ, logψ ) β Cov(θ2, logψ ) = 1 { τ 2 Cov(θ 2logψ, θ β) Cov(θ2, θ β)elogψ Eθ 2 } Cov(logψ, θ β) β Eθ = 1 τ 2 V ar(θ ) β Cov(θ, logψ ) = 1 { τ 2 Cov(θ logψ, θ β) Cov(θ, θ β)elogψ Eθ Cov(logψ, θ β) } 75

85 2 ν α τ 2 = τ 2 Cov(θ2, logψ ) + 2 τ 2 Eθ Cov(θ, logψ ) + 2Eθ τ 2 Cov(θ, logψ ) [ τ 2 Cov(θ2, logψ ) = 1 2(τ 2 ) 2 Cov{θ 2logψ, (θ β)2 } Cov{θ 2, (θ β)2 }Elogψ Eθ 2 ] Cov{logψ, (θ β)2 } τ 2 Eθ = 1 2(τ 2 ) 2 Cov{θ, (θ β)2 } [ τ 2 Cov(θ, logψ ) = 1 2(τ 2 ) 2 Cov{θ logψ, (θ β) 2 } Cov{θ, (θ β) 2 }Elogψ Eθ Cov{logψ, (θ β) 2 ] } 2 { } ν γ 2 = (n 2 + α) 2 γ 3 Cov(θ 2, ψ 1 ) 2Eθ Cov(θ, 1 ) + ( n ψ 2 + α) 1 γ 2 { } γ Cov(θ2, ψ 1 ) 2 γ Eθ Cov(θ, ψ 1 ) 2Eθ Cov(θ γ, ψ 1 ) γ Cov(θ2, 1 ψ ) = ( n 2 + α) 1 γ 2 γ Eθ = (n /2 + α) γ 2 Cov(θ2, 1 ψ 2 ) γ 2 Cov(θ, 1 ψ ) γ Cov(θ, 1 ψ ) = ( n 2 + α) 1 γ γ 2 Cov(θ, 1 ψ 2 ) { Cov(θ 2 1 ψ, 1 ψ ) Cov(θ 2, 1 ψ )E 1 ψ Eθ 2 V ar( 1 ψ ) { Cov(θ 1 ψ, 1 ψ ) Cov(θ, 1 ψ )E 1 ψ Eθ V ar( 1 ψ ) } } 2 { ν γ β = (n 2 + α) 1 γ 2 β Cov(θ2, ψ 1 ) = 1 τ 2 β Eθ = 1 τ 2 V ar(θ ) β Cov(θ, ψ 1 ) = 1 τ 2 } Cov(θ β, ψ 1 ) β Cov(θ2, ψ 1 ) 2 β Eθ Cov(θ, ψ 1 ) 2Eθ { Cov(θ 2 ψ 1, θ β) Cov(θ 2, θ β)e ψ 1 Eθ 2 Cov( ψ 1, θ β) { } Cov(θ 1, θ ψ β) Cov(θ, θ β)e ψ 1 Eθ Cov( 1, θ ψ β) } 76

86 2 { ν γ τ 2 = (n 2 + α) 1 γ 2 τ 2 Cov(θ2, ψ 1 ) 2 [ τ 2 Cov(θ2, ψ 1 ) = 1 2(τ 2 ) 2 ] Eθ 2Cov{ ψ 1, (θ β) 2 } τ 2 Eθ = 1 2(τ 2 ) 2 Cov{θ, (θ β)2 } τ 2 Cov(θ, ψ 1 ) = 1 2(τ 2 ) 2 τ 2 Eθ Cov(θ, 1 ψ ) 2Eθ Cov{θ 2 1 ψ, (θ β) 2 } Cov{θ 2, (θ β)2 }E 1 ψ [ Cov{θ 1, (θ ψ β) 2 } Cov{θ, (θ β) 2 }E ψ 1 ] Eθ Cov{ ψ 1, (θ β) 2 } } τ 2 Cov(θ, ψ 1 ) 2 ν β 2 = 1 { τ 2 Cov(θ 2 β, θ β) 2 β } Eθ Cov(θ, θ β) 2Eθ Cov(θ β, θ β) β Cov(θ2, θ β) = 1 [ τ 2 Cov{θ 2(θ β), θ β} Cov(θ2, θ β)e(θ β) Eθ 2 ] V ar(θ β) β Eθ = Cov(θ, θ β) β Cov(θ, θ β) = 1 [ τ 2 Cov{θ (θ β), θ β} Cov(θ, θ β)e(θ β) Eθ V ar(θ β) ] 2 ν β τ 2 = 1 { (τ 2 ) 2 Cov(θ 2 }, θ β) 2Eθ Cov(θ, θ β) + 1 { τ 2 τ 2 Cov(θ2, θ β) 2 } τ 2 Eθ Cov(θ, θ β) 2Eθ τ 2 Cov(θ, θ β) [ τ 2 Cov(θ2, θ β) = 1 2(τ 2 ) 2 Cov{θ 2(θ β), (θ β)2 } Cov{θ 2, (θ β)2 }E(θ β) Eθ 2 ] Cov{θ β, (θ β)2 } τ 2 Eθ = 1 2(τ 2 ) 2 Cov{θ, (θ β)2 } [ τ 2 Cov(θ, θ β) = 1 2(τ 2 ) 2 Cov{θ (θ β), (θ β) 2 } Cov{θ, (θ β) 2 }E(θ β) Eθ Cov{θ β, (θ β) 2 ] } 77

87 2 ν τ 22 = 1 [ (τ 2 ) 3 Cov{θ 2, (θ β)2 } 2Eθ Cov{θ, (θ β) 2 ] } + 1 2(τ 2 ) 2 [ τ 2 Cov{θ2, (θ β)2 } 2 ] τ 2 Eθ Cov{θ, (θ β)2 } 2Eθ τ 2 Cov{θ, (θ β)2 } [ τ 2 Cov{θ2, (θ β)2 } = 1 2(τ 2 ) 2 Cov{θ 2(θ β)2, (θ β) 2 } Eθ 2V ar{(θ β)2 } τ 2 Eθ = 1 2(τ 2 ) 2 Cov{θ, (θ β)2 } Cov{θ 2, (θ β)2 }E(θ β) 2] [ τ 2 Cov{θ, (θ β)2 } = 1 2(τ 2 ) 2 Cov{θ (θ β) 2, (θ β) 2 } Eθ V ar{(θ β) 2 } Cov{θ, (θ β) 2 }E(θ β) 2] The other terms are equal to the symetrc terms. 78

88 Chapter 4 Confdence Interval Estmaton of Small Area Parameters Shrnkng both Mean and Varances 4.1 Introducton The new approach to small area estmaton based on jont modelng of mean and varances s gven n the prevous chapter. The condtonal mean squared error of predcton s also estmated to evaluate the predcton error. In ths chapter, we wll obtan confdence ntervals of small area means. The small area estmaton lterature s domnated by pont estmaton and ther standard errors. It s well known that the standard practce of (pt. est. ± qs.e.), q s Z (standard normal) or t cut-off pont, does not produce accurate ntervals. See, Hall and Mat (2006) and Chatterjee et al. (2008) for more detals. The prevous works are based on the bootsrap procedure and has lmted use due to repeated estmaton of model parameters. 79

89 The confdence ntervals produced n ths chapter are from a decson theory perspectve. The rest of the chapter s organzed as follows. In secton 4.2, the proposed model s repeated. Secton 4.3 gves the defnton of the confdence ntervals. Theoretcal justfcaton and an alternatve model are gven n secton 4.4 and 4.5. Secton 4.6 contans a smulaton study. A real data analyss s ncluded n Secton Proposed Model The herarchcal model adopted here s already ntroduced n secton 3.2. We want to pont out agan that ths model used both the drect survey estmates and samplng varance estmates to estmate all the parameters that determnes the stochastc system. From the smulaton study, ths mean-varance jont modellng performed better than Hwang et al. (2009), whch does not take nto account the varance estmaton completely. Let X be the drect survey estmates and S 2 be ther samplng varances for the th area, = 1,, n. Let Z = (Z 1,, Z p ) T be the set of covarates avalable at the estmaton stage and β = (β 1,, βp) T be the assocated regresson coeffcents. We propose the followng herarchcal model: X θ, σ 2 Normal(θ, σ2 ) θ Normal(Z T β, τ2 ); = 1,, n, (4.1) (n 1)S 2 /σ2 χ2 n 1 σ 2 Gamma{α, γ}; = 1,, n, (4.2) 80

90 where B = (α, γ, β T, τ 2 ) T, referred to as the structural parameters, are unknown and n represents the sample sze for a smple random sample (SRS) from the th area. Note that σ 2 are the samplng varances of X s and are usually estmated by S2 s. Note that the ch-square dstrbuton for the sample varance s vald for only a random sample. For a complex survey desgn the degrees of freedom of the ch-square dstrbuton need to be determned carefully (e.g., Maples and Huang 2009). The second level of (3.2) mght be further extended as Gamma{γ, exp(z T β 2 )/γ} to accommodate covarate nformaton n the varance modelng. The nference can be made from the condtonal dstrbuton of θ (the parameter of nterest) gven the data (X, S 2, Z ), = 1,, n. Under our model set up the condtonal dstrbuton of θ does not have a closed form, and for handlng a non-standard dstrbuton we use Monte Carlo methods. 4.3 Confdence Interval Defnton Followng Josh (1969), Casella and Hwang (1991), Hwang et al. (2009), consder the loss functon (k/σ)l(c) I C (θ) where k s a tunng parameter, ndependent of the model parameters and L(C) s the length of a confdence nterval C and I C (θ) s 1 or 0 dependng on θ C or not. Then the decson Bayes confdence nterval for θ s obtaned by mnmzng E {[(k/σ)l(c) I C (θ)] X, S 2 } ) 81

91 and s gven by C (B) = {θ : ke(σ 1 X, S 2, B) < π(θ X, S2, B)}. (4.3) When k and B are known we follow the followng steps to calculate the above CI. Note that the condtonal dstrbuton of σ 2 and θ gven the data and B n equaton (3.4) and (3.5) are π(σ 2 X, S2, B) exp[ 0.5(X ZT β)2 /(σ 2 + τ2 ) {0.5(n 1)S 2 + 1/b}(1/σ2 )] (σ 2 )(n 1)/2+a+1 (σ 2 + τ2 ) 1/2 π(θ X, S 2, B) exp{ 0.5(θ ZT β)2 /τ 2 }ψ (n /2+a) where ψ = 0.5(X θ ) (n 1)S 2 + 1/γ. Therefore, we calculate E(σ 1 X, S 2, B) usng the Monte Carlo method. Suppose that σ 2 k, k = 1,, N are N observatons from π(σ2 X, S2, B). Then E(σ 1 X, S 2, B) wll be calculated by 1 N N k=1 1 σ k. For drawng random numbers from π(σ 2 X, S2, B) we use the rejecton samplng method (Robert and Casella, 2004). Next we determne the values of θ by solvng ke(σ 1 X, S 2, B) π(θ X, S 2, B) = 0. Note that for solvng the above equaton we requre the normalzng constant of the condtonal densty π(θ X, S 2, B), and that s calculated by a numercal method. In practce, B s unknown. We propose to estmate t by maxmzng the margnal lkelhood L M = n =1 L M of {(X, S 2, Z )n =1 ; B}. The detaled prodecure s descrbed 82

92 n secton Choce of the Tunng Parameter We dscuss n ths secton the choce of the tunng parameter k n (4.3). As a frst step when a s known, consder k = k(b) = u 0 ϕ ( ) n tα/2 + 2a + 2 n 1 (4.4) where ϕ s the standard normal dstrbuton, t α/2 s (1 α/2)-th percentle of t dstrbuton wth (n 1) degrees of freedom, and u 0 = 1 + σ2 τ 2. (4.5) Ths choce of k nvolves components of B whch wll be assumed to be known for the moment. In the case when B s unknown, as s n most practcal stuatons, B wll be replaced by ts correspondng estmate ˆB derved usng the estmaton method descrbed n Secton The defnton of u 0 n (4.5) further nvolves σ 2 whch s not a component of B. Thus, u 0 s replaced by û 0 obtaned by replacng σ 2 wth ts maxmum a posteror estmate ˆσ 2 = ˆσ2 (B) = arg max σ 2 π(σ 2 X, S2, B), (4.6) the value of σ 2 that maxmzes the posteror densty of σ2 gven X, S2 and B. We demonstrate that the coverage probablty of C (B) wth ths choce of k s close to 1 α. Theoretcal justfcatons are provded n Secton 4.4. In Hwang et al. (2009) the choce of k s ad-hoc where they equate the Bayes nterval to the t-nterval and solve for k. 83

93 Note that wthout any herarchcal model S and X are ndependent as S 2 and X are ancllary and complete suffcent statstcs for θ, respectvely. However, under models (3.1) and (3.2) the condtonal dstrbuton of σ 2 and θ nvolve both X and S2 whch s seen from (3.4) and (3.5). We would lke to menton that n Hwang et al. (2009) the shrnkage estmators for σ 2 was based only on the nformaton on S 2, but not usng both X and S2. They then plugged-n the Bayes estmator of σ nto the Bayes estmators of small area parameters. Thus f ˆσ B, 2 s the Bayes estmator of σ 2, then Hwang et al. s small area estmator can be wrtten as E(θ X, σ 2) σ 2=ˆσ2 B,. Remark 1. As t s mentoned prevously the d.f. assocated wth the χ 2 dstrbuton for samplng varance modelng need not be smply n 1, n beng the sample sze for -th area. There s no sound theoretcal result for determnng the d.f. n the case of complex survey desgn. Wang and Fuller (2003) approxmated the χ 2 to the normal based on the Wlson-Hlferty approxmaton. If one knows the exact samplng desgn then the smulaton based gudelne of Maples et al. (2009) could be useful. In the case for county level estmaton from the Amercan Communty Survey, they suggested the estmated d.f. = 0.36 n. 4.4 Theoretcal Justfcaton of Tunng Parameter We present some theoretcal justfcaton for the choce of k accordng to equatons (4.4), (4.5) and (4.6). Assume B s fxed and known for the moment. The condtonal dstrbuton 84

94 of θ can be approxmated as π(θ X, S 2, B) = 0 π(θ X, S 2, B, σ2 )π(σ2 X, S2, B) dσ2 π(θ X, S 2 (4.7), B, ˆσ2 ) where ˆσ 2 as defned n (4.6). In a smlar way, approxmate E(σ 1 X, S 2, B) by E(σ 1 X, S 2, B) ˆσ 1. (4.8) Based on (4.7) and (4.8), we have C (B) C (B) where C (B) s the confdence nterval for θ gven by { C (B) = θ : π(θ X, S 2 }, B, ˆσ2 ) k ˆσ 1, (4.9) wth σ 2 replaced by ˆσ2. From (3.1), t follows that the condtonal densty π(θ X, S2, B, σ2 ) s normal wth mean µ and varance v, where µ and v are gven by the expressons µ = w X + (1 w )Z T β and v = ( 1 σ 2 ) 1 ( + 1 τ 2 = σ σ2 ) 1 τ 2, (4.10) and w = 1/σ 2 ( ) n (1/σ 2+1/τ2 ). Now, choosng k = û 0 ϕ +2a+2 t α/2 n 1 confdence nterval C (B) becomes C (B) = { θ : û 0 θ ˆµ ˆσ t α/2 as dscussed, the } n + 2a + 2, (4.11) n 1 85

95 where ˆµ s the expresson for µ n (4.10) wth σ 2 replaced by ˆσ2. Now consder the behavor of ˆσ 2 ˆσ2 (B) as τ2 ranges between 0 and. When τ 2, ˆσ 2 converges to (n 1) ˆσ 2 ( ) ˆσ 2 (a, b, β, ) = 2 S b n a + 1 = (n 1)S2 + 2 b n + 2a + 1. Smlarly, when τ 2 0, ˆσ 2 converges to ˆσ 2 (0) ˆσ 2 (a, b, β, 0) = (X ZT β)2 + (n 1)S b n + 2a + 2. For all ntermedate values of τ 2, we have mn{ˆσ 2 (0), ˆσ 2 ( )} ˆσ 2 max{ˆσ2 (0), ˆσ 2 ( )}. Therefore, t s suffcent to consder the followng two cases: () ˆσ 2 ˆσ2 ( ), where t follows that (n + 2a + 2)ˆσ 2 = (n + 2a + 1)ˆσ2 + ˆσ2 (n 1)S2 + 2 b + ˆσ2 (n 1)S2, and () ˆσ 2 ˆσ2 (0), where t follows that (n +2a+2)ˆσ 2 = (X ZT β)2 +(n 1)S 2+2 b (n 1)S2. So, n both cases () and (), (n + 2a + 2) ˆσ 2 (n 1)S2. (4.12) Snce θ µ N ( 0, σ 2τ2 /(σ 2 ) + τ2 ) and (n 1)S 2 /σ2 χ 2 n 1, the confdence nterval D = { } θ θ : u µ 0 t S α/2 (4.13) has coverage probablty 1 α. Thus, f u 0 and µ are replaced by û 0 and ˆµ, t s expected that the resultng confdence nterval D, say, wll have coverage probablty of approxmately 1 α. From (4.12), we have 86

96 P ( C(B)) P ( D ) 1 α, (4.14) establshng an approxmate lower bound of 1 α for the confdence level of C(B). In (4.14), B was assumed fxed and known. In the case when B s unknown, we replace B by ts margnal maxmum lkelhood estmate ˆB. Snce (4.14) holds regardless of the true value of B, substtutng ˆB for B n (4.14) wll nvolve an order O(1/ N) of error where N = n =1 n. Compared to each sngle n, ths poolng of n s s expected to reduce the error sgnfcantly so that C( ˆB) s suffcently close to C(B) to satsfy the lower bound of 1 α n (4.14). 4.5 Alternatve Model for Confdence Interval It s possble to reduce the wdth of the confdence nterval C(B) based on an alternatve herarchcal model for small area estmaton. The constant term n + 2a + 2 n (4.12) n the alternatve model becomes n + 2a. The model s X θ, σ 2 N(θ, σ2 ), (4.15) θ σ 2 N(Z β, λ σ2 ), (4.16) (n 1)(S 2 /σ2 ) χ2 n 1, (4.17) σ 2 Inverse-Gamma(a, b), (4.18) for = 1, 2,, n. Note that n the above alternatve formulaton, t s assumed that the varablty of θ s proportonal to σ 2 whle n the prevous model, the varance of θ was τ2 87

97 ndependent of σ 2 ; n other words the model varance was constant. The set of all unknown parameters n the current herarchcal model s B = (a, b, β, λ). By re-parametrzng the varance n (4.16), some smplcatons can be obtaned n the dervaton of the posterors of θ and σ gven X, S 2 and B. We have π(σ 2 X, S2, B) = IG n 2 + a, [ (n 1)S (X Z β)2 2(1 + λ) + 1 b ] 1. Gven B and σ 2, the condtonal dstrbuton of θ s normal wth mean µ and varance v as n (4.10) wth τ 2 replaced by λ σ 2: π(θ X, σ2, B) = N(µ, λσ2 ). By ntegratng out 1+λ σ 2, t follows that the condtonal dstrbuton of θ gven X, S2 and B s π(θ X, S 2, B) = π(θ X, σ 2 0, B)π(σ2 X, S2, B)dσ2 [ ] (n (1 + λ) 2λ (θ µ )2 + δ2 +2a+1)/2, (4.19) 2 where δ 2 = (n 1)S 2 + (X Z β)2 /(1 + λ) + 2/b. We can rewrte (4.19) as π(θ X, S 2, B) = { Γ((n + 1)/2 + a) 1 + λ δ Γ(n /2 + a) (θ 1 + µ ) 2 } (n +2a+1) 2 (n + 2a)λ π (n + 2a)δ 2 λ/(1 + λ) whch can be seen to be a scaled t-dstrbuton wth n + 2a degrees of freedom and scale parameter δ λ 1+λ where δ 2 = δ 2 (n +2a). Also 88

98 E(σ 1 X, S, B) = Γ((n + 1)/2 + a)(δ2 /2) {(n +1)/2+a} Γ(n /2 + a)(δ 2 /2) (n /2+a) = Γ((n + 1)/2 + a) 2 Γ(n /2 + a) δ n + 2a. In ths context, choosng k = k(b) as t2 (n +2a+1)/2 α/2 1 + λ k = 1 + 1, n 1 λ 2 π the confdence nterval n (4.3) smplfes to C (B) θ : λ 1+λ θ µ (n +2a)δ 2 n 1 t α/2. (4.20) Usng smlar arguments as before and notng that (n + 2a)δ 2 (n 1)S 2, we have P (C (B) D ) = 1 α where D s the confdence nterval n (4.13). Agan here, B was assumed fxed and known. In the case when B s unknown, we replace B by ts margnal maxmum lkelhood estmate ˆB. It s expected that the poolng technque wll result n an error small enough so that P (C ( ˆB)) P (C (B)), and thus, enable the confdence level of C ( ˆB) to be greater than 1 α. 89

99 4.6 Smulaton Study Smulaton desgn We consdered a smulaton settng smlar to Wang and Fuller (2003), whch s the same as the set up n secton 3.4. Each sample n the smulaton study was generated through the same steps as the secton 3.4. To smplfy the smulaton, we stll do not choose any covarate Z, only (X, S 2 ) are generated. The observatons for each small area s frst generated as X j = β + u + e j, j = 1,..., n, = 1,..., n, where u Normal(0, τ 2 ) and e j Normal(0, n σ 2). Then, the model for (X, S2 ) s the same as equatons (3.26) and (3.27): X = β + u + e, = 1,..., n, S 2 = 1 n 1 (X n 1 n j X ) 2 j=1 where X = X = n 1 n j=1 X j, e = e = n 1 n j=1 e j. Therefore, X Normal(θ, σ2 ), where θ = β + u and θ Normal(β, τ 2 ), and e Normal(0, σ 2 ). It s to be noted that (n 1)S 2/σ2 χ2. Then the quanttes to be predcted are the mean for each small (n 1) area, θ, = 1,..., n. Lke Wang and Fuller (2003), we set all n equal to m whch eased our programmng efforts. However, the samplng varances were stll unequal by choosng one-thrd of the σ 2 equal to 1, one-thrd equal to 4, and one-thrd equal to 16. In the smulaton, we set β = 10 and took three dfferent values of τ 2, 0.25, 1, 4. For each τ 2, we generated 200 samples for 90

100 each of the combnatons (m, n) = (9, 36), (18, 180). In ths smulaton study we compare the proposed method wth the method of Wang and Fuller (2003), Hwang et al. (2009) and Qu and Hwang (2007) whch are referred to as I, II, III, and IV, respectvely. Note that the estmator proposed n Qu and Hwang (2007) was adjusted for the Fay-Herrot model (1979). The methods are judged based on bas, mean squared error (MSE), asymptotc coverage probablty (ACP) of the confdence ntervals and the length of the confdence ntervals (ALCI). Smulaton results In Table 4.1 we present the mean and emprcal standard devaton of estmates of β and τ 2. The numercal results ndcate good performance of the EM algorthm based maxmum lkelhood estmate of the model parameters. Table 4.1: Estmaton of model parameter. The left panel s for β and the rght panel s for τ 2 β = 10 τ 2 = 0.25, 1, 4 n = 36, m = 9 n = 180, m = 18 n = 36, m = 9 n = 180, m = 18 τ 2 Mean SD Mean SD τ 2 Mean SD Mean SD SD: standard devaton over 200 replcates The followng Tables 4.2, 4.3 and 4.4 provde the numercal results averaged over areas wthn the group (havng the same samplng varances). We calculated the relatve bas, mean squared error, coverage rate and average confdence ntervals of the small area estmators based on 200 replcatons. In most cases, the bas of the four methods are comparable. In the case of hgh samplng varance, the method IV outperformed other methods. Hgh samplng varance gves more 91

101 Table 4.2: Smulaton results for predcton when τ 2 = 0.25 n = 36, m = 9 n = 180, m = 18 σ 2 I II III IV I II III IV Bas MSE ALCI ACP Table 4.3: Smulaton results for predcton when τ 2 = 1 n = 36, m = 9 n = 180, m = 18 σ 2 I II III IV I II III IV Bas MSE ALCI ACP

102 Table 4.4: Smulaton results for predcton when τ 2 = 4 n = 36, m = 9 n = 180, m = 18 σ 2 I II III IV I II III IV Bas MSE ALCI ACP weght to the populaton mean by a constructon that makes the estmator closer to the mean at the second level. On the other hand, methods I-III use shrnkage estmators of the samplng varances whch would be less than the maxmum of all samplng varances. Ths would tend to have a lttle more bas. However, due to shrnkage n samplng varance, method I would expect gan n the varance of the estmators meanng the mean squared error (MSE) would tend to smaller. Among the methods I-III, method I performed better than method II and method III, method II and III are much closer to each other. The maxmum gan n method I compared to method II s 99%. In terms of the mean squared error (MSE), method I performed consstently better than the other three methods n most cases, except the case when the rato of samplng varance to model varance s lowest: (σ 2 = 1)/(τ2 = 4) = In ths case, the varance between small areas (model varance) s much hgher than the varance wthn each small area(samplng varance). When usng our method to estmate the mean of each small area, the nformaton 93

103 borrowed from other areas wll msdrect the estmaton. The maxmum and mnmum gan compared to methods II and III are 30%, -9% and 77%, -11% respectvely. ACP: We calculated 95% confdence ntervals. Methods I and III do not have any under coverage. Ths s expected by ther optmal nterval constructon. Method I meets the nomnal coverage rate more frequently than any other method. Method II has some under coverage. Ths could go as low as 82%. ALCI: Method I produced consderably shorter confdence ntervals n general. Method IV produced comparable lengths as n other methods, but the length was consderably hgher n case of hgh samplng varance. The reason s the method IV truncates M = τ 2 /(τ 2 +σ 2) wth a postve number M 1 = 1 Qα/(N 2), where Qα s the α-quantle of a ch-squared dstrbuton wth N degrees freedom. When the rato of samplng varance to model varance, σ 2/τ2, s hgh, M 1 s much greater than M. For example, n the case of (σ 2, τ2 ) = (16, 0.25), the average length s n method IV whereas ths s only 2.78 n method I and 4.56 n method II. 4.7 A Real Data Analyss We llustrate our methodology wth a wdely studed example. The data set s from the U.S. Department of Agrculture and was frst analyzed by Battese, Harter and Fuller (1988). The data set s about crop and soybeans n 12 Iowa countes. The sample szes for these areas are small, rangng from 1 to 5. We shall consder corn only to save space. For the proposed model, the sample sze of each area requres n > 1. Therefore the same modfed data from You and Chapman (2006) s used, whch only ncludes the areas wth sample sze 2 or greater. The mean reported crop hectares for corn (x ) comprse the drect survey estmates. 94

104 Table 4.5: Crop data from You and Chapman(2006) Corn County n x z 1 z 2 S 2 Frankln Pocahontas Wnnebago Wrght Webster Hancock Kossuth Hardn Table 4.6: Estmaton results of corn I: Proposed method II: Wang and Fuller(2003) County ˆθ Confdence Interval ˆθ Confdence Interval Frankln , (55.287) , ( ) Pocahontas , (55.536) , ( ) Wnnebago , (55.216) , ( ) Wrght , (55.828) , ( ) Webster , (40.543) , ( ) Hancock , (35.412) , ( ) Kossuth , (35.175) , ( ) Hardn , (35.741) , ( ) III: Hwang et al.(2009) IV: Qu and Hwang(2007) County ˆθ Confdence Interval ˆθ Confdence Interval Frankln , ( ) , ( ) Pocahontas , ( ) , ( ) Wnnebago , ( ) , ( ) Wrght , ( ) , ( ) Webster , ( ) , ( ) Hancock , ( ) , ( ) Kossuth , ( ) , ( ) Hardn , ( ) , ( ) 95

105 Fgure 4.1: Corn hectares estmaton. The vertcal lne for each county dsplays the confdence nterval of ˆθ, wth ˆθ marked by the crcle, for (I) Proposed method, (II)Wang and Fuller (2003), (III)Hwang et al. (2009) and (IV)Qu and Hwang (2007). The sample varances are calculated based on the orgnal data assumng a smple random samplng. The sample s.d. vares wdely from to (coeffcent of varaton vares from to 0.423). The means of number of pxels from LANDSAT satellte data (z ) are the covarates n the estmaton procedure. z 1 s the mean of pxels of corn and z 2 s the mean of pxels of soybean. These covarates are used to ft the model (4.1) and (4.2). The detal of the modfed data can be found n You and Chapman (2006)and tabulated n the Table 4.5. The small area estmates and ther confdence ntervals are summarzed n Fg 4.1. The 96

106 Fgure 4.2: Boxplot of estmates of corn hectares for each county. (I) to (IV) are the 4 methods correspondng to Fgure 4.1. numercal fgures are also provded n Table 4.6. The pont estmates n all 4 methods are comparable. The summary measures, mean, medan, and range of the parameter estmates for the methods (I,II,II,IV) are respectvely (121.9, 124.1, 122.2, 122.6), (125.2, 120.4, 115.0, 114.5) and (23.1, 53.0, 58.4, 56.6). The dstrbutons are summarzed n Fg 4.2. The central locatons are smlar under all the methods but there s a sgnfcant dfference n ther varablty. If the assumpton of constant model varance s true and when sample sze vares from only from 3 to 5, then under smple random samplng, one would expect relatvely low varaton among the small area estmates. Method I s superor n ths sense. Further, smoothng samplng varances has strong mplcatons n measurng uncertanty and hence n the nterval estmaton. The proposed method has the shortest confdence nterval on average compared to all other methods. Methods II and III provde ntervals wth negatve lower lmt. Ths seems unrealstc because the drect average of area under corn s 97

107 hgh postve for all the countes. Moreover, the crude confdence ntervals (x ± t S ), the wdest, do not contan zero for any of the areas. Note that method II does not have any theoretcal support on ts confdence ntervals. Method II and method III produce wder confdence ntervals when the samplng varance s hgh. For example, the sample sze for both Frankln county and Pocahontas county s three, but the samplng standard devatons are and Although the confdence nterval under method I s comparable, they are wde apart for methods II and III. Ths s because, although these methods consder the uncertanty n samplng varance estmates, the smoothng dd not use the nformaton from drect survey estmates, resulted n the underlyng samplng varance estmates reman hghly varable (due to small sample sze). In effect, the varance of the varance estmator (of the pont estmates) s bgger compared to that n method I. Ths s further confrmed by the fact that the ntutve standard devatons of the smoothed small area estmates (one fourth of the nterval) are smaller and less varable under method I compared to others. In addton to the confdence nterval, we also calculated the BIC for the proposed method and method III (Hwang et al.(2009)) as these two methods have the same numbers of parameters and ther model structure are very smlar to each other. The dfference s n the level-2 model for the varance part. The BIC of proposed method s , method III s s The BIC of the proposed method s lower. We could not compare Wang and Fuller (2003) snce the explct lkelhood has not been used. The BIC crtera support our data analyss. 98

108 Chapter 5 Clusterng Based Small Area Estmaton: An Applcaton to MEAP Data 5.1 Introducton As we mentoned n the prevous chapters, small area estmaton problems exst n many applcaton felds. In ths chapter, we examne small area estmaton n educatonal assessment. In partcular, we analyze data from the Mchgan Educatonal Assessment Program (MEAP). For educatonal accountablty purpose, the results of hgh stakes tests are used as an ndcator of school dstrctperformance. Often tmes the average test scores across grade are reported for each school dstrct. However, the number of students n each school dstrc are qute dfferent from each other. Table 5.1 summarzes the number of school dstrcts wth a 99

109 Table 5.1: Number of publc school 4th graders n some school dstrcts n Mchgan state Range of Numbers of Students Number of School Dstrcts Fgure 5.1: Standard devaton of publc school students assessment scores n avalable school dstrcts n Mchgan state. gven sze for Grade 4 n Mchgan. We can see that a large school dstrct n a large cty mght have thousands of students n a grade, whle a small school dstrct n a rural area mght have less than 100 students n a grade. Fgure 5.1 gves the standard devaton of the assessment scores of students n each school dstrct. From the map, t can be seen that there s a large amount of varaton n the standard devatons of the school dstrcts. Therefore, the drect comparson of average scores between large and small school dstrcts may not be approprate. A comparson of model based smoothed results s more approprate. 100

110 The general approach for ths type of analyss s to use the Fay-Herrot model. Let Y be the average score reported, X be the avalable covarate nformaton for the regresson stage, and β be the regresson parameter, = 1,..., n for the total number of dstrcts n. Then the model s gven by (1.1) or (2.5), Y = X T β + v + e (5.1) where v s the random effect for each school dstrct, whch follows an ndependent and dentcal dstrbuton, usually a normal dstrbuton; e s the samplng error; and the varances of e are known. When we use the model based approach n equaton (5.1), we borrow strength from all other school dstrcts unversally snce all school dstrcts sharng the same regresson parameter β and the random effects v are d. However, the actual geographcal and socoeconomc characterstcs of school dstrcts n a large regon are qute dverse. Poverty levels for the school dstrcts n Fgure 5.1 are gven n Fgure 5.2. From the map, t can be seen that the school dstrcts n upper Mchgan have hgher poverty levels. Although the school dstrcts n lower Mchgan have lower poverty levels n general, some school dstrcts have hgher poverty levels than others. Therefore, t s more approprate to dvde the school dstrcts nto several groups (clusters). In each group, the school dstrcts are smlar n some meanngful ways, namely they are from the same cluster. Then the estmaton for a school dstrct only borrows strength from smlar school dstrcts and avods the msleadng nformaton from other school dstrcts. Snce we do not have any restrcton on the spatal characterstc of school dstrcts n our study, we only focus on the clusterng based small area estmaton. However, the 101

111 Fgure 5.2: Poverty level of avalable school dstrcts n Mchgan state. partton of clusters was not known. Cluster analyss had to be conducted before we can estmate the small areas. There are many methods for cluster analyss (clusterng). In one wdely used class of methods n cluster analyss, two clusters are chosen to be merged based on the optmzaton of some crteron. Popular crtera nclude the sum of wthn-group sums of squares (Ward, 1963) and the shortest dstance between groups. Iteratve relocaton (parttonng) s another common class of methods. In each teraton, data ponts are moved from one cluster to another dependng on whether mprovement s acheved wth respect to some crteron. K-means clusterng (MacQueen, 1967) s a method of teratve relocaton wth the sum of squares crteron. K-means clusterng s used to generate an ntal partton n ths dssertaton. Clusterng algorthms can also be based on probablty models. Lterature shows that 102

112 some of the heurstc methods, such as K-means clusterng, can also be treated as approxmate estmaton methods of certan probablty models. In the context of model-based clusterng, fnte mxture models are often proposed. Each component probablty dstrbuton n fnte mxture models corresponds to a cluster. It has been shown that fnte mxture models can be used to solve the practcal questons that arse when applyng clusterng methods. A revew of model-based clusterng can be found n Fraley and Raftery (2002). However, the fnte mxture model approach of clusterng does not explctly nclude the partton as a parameter and nvolves ndependent and dentcally dstrbuted structures. In addton, there s usually no restrcton on the mean structure n ths class of models. Informaton from covarates n the mean profle s often necessary n many applcatons. Therefore, the dea of modelng the mean va regresson and keepng the ablty to detect clusters at the same tme has ganed more attenton recently. Booth et al. (2008) proposed a new clusterng methodology based on a multlevel lnear mxed model. A cluster-specfc random effect s ncluded n the model, whch allows the departure of the cluster means from the assumed base model. An objectve functon s constructed based on the posteror dstrbuton of the undergong partton. The partton that maxmzes the objectve functon s chosen as the optmal clusterng partton. A stochastc search algorthm s also proposed to fnd such a posteror probablty. A smlar model from Booth, Casella and Hobert (2008) s used to descrbe the relatonshp between the assessment performance and other covarate nformaton n ths study. The actual undergong partton of dstrcts s used as a parameter n the model. An objectve functon s defned as the posteror dstrbuton of the partton parameter based on the gven data. The posteror dstrbuton s known up to a normalzng constant. The parttons wth 103

Fgure 5.3: Math scores of 4th graders n publc schools n avalable school dstrcts n Mchgan state. the hghest posteror probablty wll be the fnal result.

113 Fgure 5.3: Math scores of 4th graders n publc schools n avalable school dstrcts n Mchgan state. the hghest posteror probablty wll be the fnal result. The stochastc search procedure s such an algorthm constructed by a mxture of two Metropols-Hastngs algorthms: one makes small scale changes to ndvdual objects and another performs large scale moves nvolvng entre clusters. The detals are gven n the followng sectons. The data set s ntroduced n Secton 5.2, and the proposed model s gven n Secton 5.3. The stochastc search algorthm s descrbed n Secton 5.4. In the Secton 5.5, the results of the data analyss are presented. 5.2 Data Set The data used s from the Mchgan Educatonal Assessment Program (MEAP). MEAP s a standardzed test that was started by the State Board of Educaton. The test s taken by all publc school students n the state of Mchgan from elementary school to mddle/junor 104

114 hgh school. The results at the dstrct level are publc accessble and can be downloaded at Our study consdered the 4th grade math score n Fall There are about 700 dstrcts n the MEAP s data set. We wanted to analyze the cluster partton of dstrcts wth respect to the poverty level n each dstrct, the data reported by Small Area Income and Poverty Estmates (SAIPE) program. SAIPE s under the U.S. Census Bureau and reports model-based estmaton results of sample surveys n noncensal years. A more detaled descrpton of SAIPE can be found at ts webste There are about 550 school dstrcts n SAIPE s data set. Snce MEAP and SAIPE use dfferent codes to represent the same school dstrcts, we only had 476 dstrcts left after mergng these two data sets by matchng the names of school dstrcts, that s, some school dstrcts wth data were not ncluded n our merged data set due to non-unform data codng. No statstcal methods were used n ths study to manpulate the data sets. In the future, f other nformaton can be accessed so that all the schools dstrcts data can be matched, there would be no mssng data and a more complete analyss could be conducted. There s a lot of nformaton reported n the orgnal data set, but we only kept the average math score, standard devaton, number of students, and poverty level n our data set. The map n Fgure 5.3 shows the dstrbuton of math scores of publc school 4th graders n avalable dstrcts n Mchgan state. Comparng the maps n Fgures 5.1, 5.2 and 5.3, we can see there are some patterns between the math score and the poverty level, but there s not a smple lnear relatonshp. 105

115 5.3 Proposed Model We used w to represent a partton of the dstrcts, and c = c(w) clusters n ths partton, whch are denoted by C 1,..., Cc. Let (Y, S 2 ) be the observed score and the varances for the -th dstrct, = 1,, n, where n s the total number of avalable dstrcts. Let X = (X 1,, X p ) T be the set of avalable covarates and β k = (β k1,, β kp ) T be the assocated regresson coeffcents f the dstrct belongs to cluster k. For a fxed partton w, we consder the followng herarchcal model: Y = X T β k + u k + v + ε, C k (5.2) u k N(0, λσ k 2 ), λ > 0 v N(0, σ k 2 ) ε N(0, S 2 ) where X T β k s the lnear part of the mean specfcaton for each cluster, u k are the random effects at the cluster level, v are the random effects from the dstrcts wthn each cluster, and S 2, = 1,..., n are treated as known and reported by MEAP. The data-drven tunng parameter λ can be determned by analyss of varance Pror Informaton The prors of the model are set as β k unform on R p (5.3) σ 2 k IG(a, b) (5.4) 106

116 where R p s the p-dmensonal real space. We choose an mproper pror for β k, and no specfc pror nformaton s gven to β k. IG(a, b) s the nverse gamma dstrbuton wth parameters a and b. The pror for w s chosen as πn(w) = Γ(m)mc(w) Γ(n + m) c(w) Γ(n k ) (5.5) k=1 where n k = #(C k ) and m s a parameter, m > 0. Ths pror dstrbuton was used n Crowley (1997) and Booth et al. (2008). From ths pror, more weght s put on the partton wth smaller numbers of clusters when m s decreased. If w follows the densty functon n (5.5), then P r{c(w) = k} = Γ(m)mk Γ(n + m) k Γ(n j ) (5.6) w:c(w)=k j=1 and n 1 1 E{c(w)} = m m + =0 (5.7) If m 0, E{c(w)} 1. If m, E{c(w)} n Model-Based Objectve Functons Based on the model and pror nformaton descrbed prevously, we can determne the modelbased objectve functons, whch mmcs the procedure n Booth et al. (2008). For a fxed w, let θ k denote the parameters of each cluster and θ = (θ k ) c(w). Then the jont densty k=1 107

117 functon of Y = (Y 1,..., Yn) s gven by c(w) f(y θ, w) = k=1 C k f(y u k, v, θ k )f(v θ k )dv f(u k θ k )du k (5.8) If we use Y k, an n k 1 vector, to represent the Y s n cluster k, then Y k follows a normal dstrbuton wth mean X k β k and varance V k. X k = {XT, C k } and (1 + λ)σ k 2 + S2 λσ 1 k 2 λσ k 2 λσ V k = k 2 (1 + λ)σ k 2 + S2 λσ 2 2 k.... λσ k 2 λσ k 2 (1 + λ)σ k 2 + S2 n k (5.9) Then the densty functon of Y k s f(y k θ, w) = (2π) n k 2 V k 1 2 exp{ 1 2 (Y k X k β k )T V 1 k (Y k X k β k )} (5.10) and c(w) f(y θ, w) = f(y k θ, w) (5.11) k=1 Integratng the product of the jont densty functon of Y and the pror of β k, σ 2 k, and multplyng by the pror πn(w) yelds the objectve functon π(w y), c(w) π(w y) πn(w) k=1 f(y k θ, w)π(σ2 k )π(β)dσ2 k dβ k (5.12) 108

118 where π( ) represents the pror dstrbutons of σ 2 k and β k. Snce we nclude ndvdual dstrct varance S 2 for each dstrct, the objectve functon does not have a closed form for the fnal ntegratng result, and we compute the ntegral numercally Estmaton of Mean Scores For a fxed partton w, the model (5.2) for cluster k, k = 1,..., c(w) can be rewrtten n the form of a general lnear mxed model as (1.4), Y k = X k β k + Z k v k + e k where Y k = (Y, C k )T, X k = (XT, C k )T, e k = (ε, C k ) T, and Z k = n k (n k +1) u k v 1 v k =. v n k (n k +1) 1 (5.13) Then the varance matrces of v k and e k are G k = dag(λσ k 2, σ2 k,..., σ2 k ) and R k = dag(s 2,..., S 2 ). Therefore, the varance matrx s V 1 n k = R k + Z k G k Z T k k. Usng the general theory of Henderson (1975) for a mxed lnear model and equaton (1.5), the best lnear unbased estmator (BLUE) of Ŷ = XT ˆβ k + û k + ˆv s gven by Ŷ = X T ˆβ k + m T G k ZT k ˆV 1 k (Y k X k ˆβ k ) (5.14) 109

119 where ˆβ k s the general least squared estmator, ˆβ k = (X T k 1 ˆV k X k ) 1 (X T k 1 ˆV k Y k ) (5.15) where ˆV k s the varance matrx wth estmator of σ k 2, ˆσ2 k. Then m = (1, 0,..., 1,..., 0)T s a (n k + 1) 1 vector wth 1 at the frst and + 1 postons for th dstrct and 0 for other postons, and ˆσ k 2 s obtaned by maxmzng the log of the margnal lkelhood of σ2 k, logf(σ 2 k Y k, X k, θ) = log f(y k θ, w)dβ = (1 n k 2 )log(2π) 1 2 log V k log (X T k V 1 k X k ) Y T k V 1 k Y k Y T k V 1 k X T k (X T k V 1 k X k ) 1 X T k V 1 k Y k (5.16) The maxmzaton s found by numerc methods MSE of Estmator A MSE estmator from (6.2.37) n Rao (2003) for the BLUE gven n the prevous secton s gven as MSE(Ŷ ) = g 1 (ˆσ2 k ) bṱ σ 2 k g 1 (ˆσ 2 k ) + g 2 (ˆσ2 k ) + 2g 3 (ˆσ2 k ) g 1 (ˆσ k 2) = mt (G k G k ZT 1 ˆV K ZG k )mt 1 ˆV g 2 (ˆσ k 2) = dt (X T k ( b T ) σ k 2 ˆV g 3 (ˆσ k 2) = tr k ( b T σ k 2 X k ) 1 d ) T V (ˆσ 2 k ) (5.17) 110

120 where d T = x T mt G k ZT ˆV 1 K X T k, and mt, G k, ZT, ˆV K and X k are gven n the prevous secton. b Ṱ σ k 2 g 1 (ˆσ k 2) s the bas correcton of g 1 (ˆσ2 k ) and calculated by bootstrap. 5.4 Stochastc Search The estmator of mean score and the correspondng MSE can be calculated f the cluster partton s gven, but the partton of the school dstrcts s not known. The optmal cluster partton can be found by maxmzng the objectve functon through the followng stochastc search procedure. Smlar to Booth et al. (2008), the stochastc search procedure ncluded two dfferent behavors: a small scale change as a based random walk of one dstrct and a large scale change as a splt-merge of clusters. The two dfferent scale moves are combned together wth dfferent probabltes. Let w be the current partton and c s the number of clusters. At each teraton, f c = 1, a splt move s proposed automatcally. If c > 1, the based random walk s proposed wth probablty p b, and the splt-merge move s proposed wth probablty 1 p b Based Random Walk As wth the nearest neghbor random walk, one dstrct s moved each tme, and the move w w has a postve probablty f and only f w and w share the same partton but only have one dstrct n dfferent clusters. In Booth et al. (2008), the algorthm was appled to genetc data. A cluster may contan a sngle gene only because the observed data are replcated measurements of genes. However, we only have one pece of observed data for each school dstrct. If the cluster contans one 111

121 school dstrct only, the model (5.2) can not be estmated or determned for that cluster. Therefore, each cluster must contan at least two school dstrcts. The detaled movng rules are descrbed as follows. Frst, one dstrct s chosen unformly and randomly from the dstrcts nsde the clusters wth n k >= 2. Then t wll move to one of the other c 1 clusters wth probablty 1/(c 1). The acceptance probablty of w w s mn{1, π(w)/π(w )} Splt-Merge Moves Splttng or mergng of clusters are large scale moves. In the based random walk, only one dstrct s moved at a tme. A large scale change would requre a lot of steps of the based random walk and s very unlkely to occur. Therefore, t s necessary to add the large scale change to the algorthm. At each teraton, a merge move or splt move wll be proposed randomly. A merge move s proposed wth probablty pm (0, 1). Two clusters are chosen unformly at random from the current partton and form a new cluster together. A splt move s proposed wth probablty 1 pm. Snce each cluster ncludes by at least two school dstrcts, one cluster wth at least four school dstrcts n the current partton s chosen and then splt nto two clusters wth each cluster contanng at least two school dstrcts. Suppose the current partton s w, and w s the partton after mergng two clusters n w. Then P (w w) = P (w w ) = pm c(w )(c(w ) 1) 2 1 pm (2 n 1 n c(w) 1) k=1 I[#{C k (w)} 4] (5.18) 112

122 where n s the number of dstrcts of the cluster n w whch s formed by mergng two clusters n w. The acceptance probablty of move w w s mn{1, R} where R = π(w)p (w w ) π(w )P (w w) (5.19) and the acceptance probablty of move w w s mn{1, 1/R}. 5.5 Data Analyss In the analyss of our data set, we chose settngs smlar to those of Booth et al. (2008) for the proposed probabltes of dfferent moves: p b = 0.9 and pm = 0.5. For the pror, we chose a = 3, b = 40 and kept the range of cluster-level varaton to a smaller range to the varance between school dstrcts n the model (5.1). Frst, we appled the K-means cluster algorthm n R to the data wthout any covarates. We chose the ntal number of clusters as 10 and used the numbers 1,..., 10 to denote the clusters. Then we started the stochastc search procedure wth ths ntal partton. In each teraton, the based random walk or splt-merge moves s proposed and accepted wth respect to the acceptance probablty. The numbers 1,..., c(w) were used to denote the clusters n the new partton. The total number of teratons was Fgure 5.4 gves the trace plot of the number of clusters vs. teratons. Four lnes are drawn wth respect to dfferent m values: 0.5, 0.1, , From Fgure 5.4, we can see that the fnal number of clusters decreases as the value of m decreases. Ths follows the property of the pror of the partton parameter and resembles the results of Booth et al. (2008). We chose the m value as so that the expected number of clusters would be 113

123 Fgure 5.4: Trace plot of dfferent m values. close to 1. We ran the whole procedure agan wth double the number of teratons and used the frst 10 5 teratons as a burn-n perod. The cluster ndex of each dstrct was recorded. The average number of clusters n the last 10 5 teratons was 9.4. When the stochastc search procedure converged, all the clusterng results had closed posteror probablty. One clusterng result wth 7 clusters s shown n Fgure 5.5. Scatter plots of mean scores vs. poverty levels of the school dstrcts n the 7 clusters are gven n Fgure 5.6. The reason for choosng ths partton s that all the szes of the clusters are greater than 10 n ths clusterng result, whch s more smlar to the real world snce there are a total of 476 school 114

124 Fgure 5.5: Map of clusters. dstrcts. From the map of clusters n Fgure 5.5, we can see that all the avalable school dstrcts are dvded nto 7 clusters, and each cluster s formed from several geographcally separated small groups. However, most dstrcts n each small group are neghbors of each other. Ths agrees wth our ntutve mage of school dstrcts n that adjacent school dstrcts wth smlar poverty levels may perform smlarly on the state assessment, but all the school dstrcts n a county/cty maybe not be placed n one performance level category. The scatter plots n Fgure 5.6 ndcate that the relatonshp between the mean scores of school dstrcts and ther poverty levels are close to lnearty. However, the R 2 s only 0.3 f we ft all the data wth a smple lnear regresson. The R 2 s are (0.26, 0.37, 0.26, 0.89, 0.58, 0.42, 0.74) f we ft the data wthn each cluster wth smple lnear regresson. But f we ft the data wth model (5.2), the R 2 s are greater than 0.9 for all the clusters. Ths ndcates the neces- 115

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes