Poisson M-quantile regression for small area estimation

Size: px

Start display at page:

Download "Poisson M-quantile regression for small area estimation"

Eleanore Hunt
6 years ago
Views:

University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2013 Poisson M-quantile regression for small

1 University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2013 Poisson M-quantile regression for small area estimation Nikos Tzaviis University of Southampton Maria Giovanna Rannalli University of Perugia Nicola Salvati University of Pisa Emanuela Dreassi University of Florence Ray Chambers University of Wollongong, Recommene Citation Tzaviis, Nikos; Giovanna Rannalli, Maria; Salvati, Nicola; Dreassi, Emanuela; an Chambers, Ray, Poisson M-quantile regression for small area estimation, Centre for Statistical an Survey Methoology, University of Wollongong, Working Paper 14-13, 2013, Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library:

National Institute for Applie Statistics Research Australia The University of Wollongong Working Paper 14-13 Poisson M-quantile Regression for Small Area Estimation Nikos Tzaviis, M.

2 National Institute for Applie Statistics Research Australia The University of Wollongong Working Paper Poisson M-quantile Regression for Small Area Estimation Nikos Tzaviis, M. Giovanna Ranalli, Nicola Salvati, Emanuela Dreassi an Ray Chambers Copyright 2013 by the National Institute for Applie Statistics Research Australia, UOW. Work in progress, no part of this paper may be reprouce without permission from the Institute. National Institute for Applie Statistics Research Australia, University of Wollongong, Wollongong NSW Phone , Fax anica@uow.eu.au

3 Poisson M-quantile Regression for Small Area Estimation Nikos Tzaviis M. Giovanna Ranalli Nicola Salvati Emanuela Dreassi Ray Chambers Abstract A new approach to moel-base small area estimation for count outcomes is propose an use for estimating the average number of visits to physicians for Health Districts in Central Italy. The propose small area preictor is base on efining a Poisson M-quantile moel by extening the ieas in Cantoni & Ronchetti (2001) an Chambers & Tzaviis (2006). This preictor can be viewe as a semi-parametric outlier robust alternative to the more commonly use plug-in Empirical Best Preictor that is base on a Poisson generalise linear mixe moel with Gaussian ranom effects. Results from the real ata application an from a simulation experiment confirm that the propose small area preictor has goo robustness properties an can be more efficient than alternative small area preictors. Keywors: Count ata; generalize linear moels; health survey; non-normal outcomes; nonparametric bootstrap; robust inference. 1 Introuction The Health Conitions an Appeal to Meicare Survey (HCAMS) is a national, multistage, personal interview sample survey conucte perioically in Italy by Southampton Statistical Sciences Research Institute, University of Southampton, Southampton SO17 1BJ, UK, n.tzavis@soton.ac.uk Dipartimento i Economia, Finanza e Statistica, Universtà egli Stui i Perugia, Via Pascoli - I Perugia, Italy giovanna@stat.unipg.it Dipartimento i Economia e Management, Università i Pisa, Via Riolfi, 10 - I Pisa, Italy, salvati@ec.unipi.it Dipartimento i Statistica, Informatica, Applicazioni G. Parenti, Università egli Stui i Firenze, Viale Morgagni, 59 - I Firenze, Italy, reassi@isia.unifi.it Centre for Statistical an Survey Methoology, University of Wollongong, New South Wales 2522, Australia, ray@uow.eu.au 1

4 the National Institute of Statistics. The eition is currently running; previous eitions have been conucte in the perios an It provies information about the health conition an health care use of the noninstitutionalize population of Italy. The questionnaire comprises items on basic health conition (like perceive health status, MBI an ietary habits) that are also surveye annually by the Multipurpose Everyay Life Survey. In aition, it covers special health topics on chronic an acute iseases, visits to physicians an general practitioners. HCAMS is a multistage survey in which municipalities are primary sampling units (PSUs), while househols are seconary sampling units (SSUs). The eition has about 1,449 PSUs (out of 8,102) an 52,332 househols with approximately 120,000 iniviuals. HCAMS is esigne to provie reliable (irect an esign base) estimates at the Aministrative Region (NUTS2) level, but there is also a nee for estimates for smaller subpopulations or geographical areas. This is true in general for National Surveys, but particularly relevant for surveys with health relate information, since in Italy health is mainly manage locally at the level of NUTS2. In particular, policies are enorse by Aministrative Regions by allocating resources an funs to Health Districts (HDs) that are in charge for local implementation. HDs are efine by groups of contiguous municipalities an are not planne omains in the HCAMS. A goo number of HDs have very low sample size an represent, therefore, small areas of interest. The increasing eman from the aministrators an policy planners for reliable estimates of various parameters at small area level has le to the evelopment of a number of efficient moel-base small area estimation (SAE) methos (see Rao, 2003, for a review of such methos). For example, the empirical best linear unbiase preictor (EBLUP) base on a linear mixe moel (LMM) is often recommene when the target of inference is the small area average of a continuous response variable (Battese et al., 1988). However, using a LMM to characterise ifferences between small areas requires strong istributional assumptions. Robust SAE inference uner the LMM has recently attracte some interest (Sinha & Rao, 2009; Chambers et al., 2013). An alternative approach to small area estimation that automatically allows for robust inference is to use M-quantile moels (Breckling & Chambers, 1988) to characterise these ifferences (Chambers & Tzaviis, 2006). Most of the variables in the HCAMS are binary or take the form of a count an are therefore not suite to stanar SAE methos base on LMMs. Working within a frequentist paraigm, one can follow Jiang & Lahiri (2001) who propose an empirical best preictor (EBP) for a binary response, or Jiang (2003) who extens these results to generalize linear mixe moels (GLMMs). Nevertheless, use of EBP can be computationally challenging (Molina & Rao, 2010). Despite their attractive properties as far as moelling non-normal outcomes is concerne, 2

5 the use of GLMMs requires numerical approximations. In particular, the likelihoo function efine by a GLMM can involve high-imensional integrals which cannot be evaluate analytically (see Mc Culloch, 1994, 1997; Song et al., 2005). In such cases numerical approximations can be use, as for example in the R function glmer in the package lme4. Alternatively, estimation of the moel parameters can be obtaine by using an iterative proceure that combines Maximum Penalize Quasi-Likelihoo (MPQL) an REML estimation (Saei & Chambers, 2003). For all these reasons, alternative approaches to moelling iscrete outcomes shoul be consiere. Furthermore, estimates of GLMM parameters can be very sensitive to outliers or epartures from unerlying istributional assumptions. Large eviations from the expecte response as well as outlying points in the space of the explanatory variables are known to have a large influence on classical maximum likelihoo inference base on generalize linear moels (GLMs). Following a Bayesian paraigm, Maiti (2001) escribes a Hierarchical Bayes approach to fitting a GLMM base on an outlier-robust normal mixture prior for the ranom effects an uses this moel for SAE. Sinha (2004) proposes robust estimation of the fixe effects an the variance components of a GLMM, using a Metropolis algorithm to approximate the posterior istribution of the ranom effects. Let us introuce briefly the notation for small area estimation an GLMMs. Let U enote a finite population of size N which can be partitione into D omains or small areas, with U enoting population on small area, = 1,..., D. The small area population sizes N, for = 1,..., D are assume known. Let y j be the value of the outcome of interest, for the purposes of this paper a iscrete or a categorical variable, for unit j in area, an let x j enote a p 1 vector of unit level covariates (incluing an intercept). It is assume that the values of x j are known for all units in the population, as are the values z of a q 1 vector of area level covariates. We will see that the first requirement can be relaxe to some extent when there are no continuous variables among the x s. In the presence of categorical covariates, an equivalent alternative representation of the assume ata structure is in the form of a cross-tabulation. The aim is to use the sample values of y j an the population values of x j an z to estimate a proportion or a count of a characteristic in the small area = 1,..., D. For iscrete outcomes moel-base small area estimation conventionally employs a GLMM for µ j = E[y j u ] of the form g(µ j ) = η j = x T jβ + z T u, (1) where g is a link function. When y j is a count outcome the logarithmic link function is commonly use an the iniviual y j values in area are assume to be 3

6 inepenent Poisson ranom variables with µ j = E[y j u ] = exp{η j } (2) an Var[y j u ] = µ j. The q-imensional vector u is generally assume to be inepenently istribute between areas accoring to a normal istribution with mean 0 an covariance matrix Σ u. Σ u epens on parameters δ = (δ 1,..., δ K ), which are referre to as the variance components an β in (1) is the vector of fixe j U y j effects. If the target of inference is the small area mean, ȳ = N 1 an the Poisson-GLMM (1) is assume, the approximation to the minimum mean [ j s y j + j r µ j ]. Since µ j epens on β an u, a further stage of approximation is require, where unknown parameters are replace by suitable estimates. This leas to the plug-in version of the EBP (hereafter EBPP) for the area proportion ȳ uner (2), square error preictor of ȳ is N 1 ˆȳ EBPP = N 1 { y j + j s j r ˆµ j }, (3) where ˆµ j = exp{ˆη j }, ˆη j = x T j ˆβ + z T û, ˆβ is the vector of the estimate fixe effects an û enotes the vector of the preicte area-specific ranom effects (see Rao, 2003; Saei & Chambers, 2003; Jiang & Lahiri, 2006; González-Manteiga et al., 2007). In (3) s an r enote the set of sample (of size n ) an nonsample (of size N n ) units in small area, respectively. In this paper, we focus on estimates of the mean number of visits to physicians within the past four weeks among people age 65 or more for 60 HDs comprise in the three Aministrative Regions of Liguria, Toscana an Umbria. These are neighboring Regions place in the center part of Italy an share a common concern on the quality of services for elerly. Aging of the population is a general concern in Italy given that it shows the largest proportion of people age 65 or more in Europe (20.3% in 2011, latest available figure). Liguria, Toscana an Umbria are three of the four olest regions in Italy with proportions of 26.7%, 23.3% an 23.1%, respectively. Malec et al. (1997) also consier the estimation of quantities relate to the number of visits to physicians from the American National Health Interview Survey. They focus on the proportion of population with at least one visit in the past twelve months an use a Hierarchical Bayesian approach. A logistic moel to relate the iniviual s probability of a octor visit to his/her characteristics is use an then small area parameters are moele with respect to area specific covariates. In this case, we are intereste in the number of visits to physicians an, therefore, we nee to properly moel this count variable. In aition, the istribution of the variable of interest shows some unuly large values that require a robust proceure to properly account for them in the estimation process. It is in 4

7 fact very important that final estimates are not overly influence by them to provie a reliable comparison among small areas. For all these reasons, in this paper we present a new approach to SAE for count outcomes base on M-quantile moeling. These moels o not epen on strong istributional assumptions nor on a preefine hierarchical structure, an outlier robust inference is automatically performe when these moels are fitte. Following Chambers et al. (2012b) an Chambers et al. (2012a) we exten the existing M-quantile approach for continuous ata to the case where the response is a count. As with M-quantile moeling of a continuous response (Chambers & Tzaviis, 2006) ranom effects are avoie an between area variation in the response is characterise by variation in area-specific values of quantile-like coefficients. In Section 2 we motivate the use of M-quantile moels for the estimation of mean number of visits to the physicians with some explorative analysis on the ata set. In Section 3, after reviewing M-quantile small area estimation for a continuous response, we show how the approach for robust inference for GLMs propose by Cantoni & Ronchetti (2001) can be extene for fitting an M-quantile GLM. Approaches for efining the M-quantile coefficients, which play the role of pseuo-ranom effects in this framework, are iscusse in Section 4, alongsie the efinition of small area preictors an corresponing Mean Square Error (MSE) estimators. In Section 5 we report the results from the application of the propose methoology for eriving estimates of the number of visits in primary health care outlets for HDs in Italy. Results from moel-base an esign-base simulation stuies aime at empirically assessing the performance of the propose small area preictors are presente in Section 6. Section 7 conclues the paper with some remarks an possible future researches. 2 The estimation of mean number of visits to a octor from HCAMS: empirical challenges The survey esign of HCAMS uses a complex sampling scheme an, in particular, it is as follows. Within a given Province (LAU1), municipalities are classifie as Self-Representing Areas (SRAs) - consisting of the larger municipalities - an Non Self-Representing Areas (NSRAs) - consisting of the smaller ones. In SRAs each municipality is a single stratum an househols are selecte by means of systematic sampling. In NSRAs the sample is base on a stratifie two stage sample esign. Municipalities are PSUs, while househols are SSUs. PSUs are ivie into strata of approximately the same imension in terms of population. One PSU is rawn from each stratum with probability proportional to the PSU population size. The SSUs are selecte from population registers hel by municipalities by 5

8 Figure 1: Sample size in each Health Districts of Liguria, Toscana an Umbria in means of systematic sampling in each PSU. All members of each sample househol, in both SRAs an NSRAs, are interviewe. The ata that we consier in this paper are from the eition of HCAMS. We are intereste in proucing estimates of the average number of visits to physicians within the past four weeks for elerly (age 65 or more) in the 60 HDs of Toscana, Liguria an Umbria. Recall that HDs are groups of neighboring municipalities (PSUs), while elerly is a omain that cuts across PSUs. The total sample size for the three Regions is n = 4,021. Figure 1 provies a map of the three regions of interest; HDs are color coe accoring to the sample size. Note that 5 HDs in Toscana an 1 in Umbria are out of sample areas, i.e. they have no sample units. The application of small area unit level methoologies requires iniviual level covariate information for all units in the population for each area. One possible source of these ata is the Population Census run in Italy in Among variables available from the census, we concentrate on those that are also available every year from population registers hel at a municipality level. This is the set of auxiliary variables that are available also for the other eitions of the survey an can eventually be employe to obtain estimates an have comparisons over time. From aministrative registers, we have the istribution for each municipality by age, gener an marital status. We collapse age in 5-year classes (65-69, 70-74, 75-79, 80-84, 85 or more) an also consier an overall Aministrative Region effect. Table 1 reports the results of a proceure of analysis of eviance from fitting 6

9 Table 1: Analysis of eviance table from fitting Poisson-Normal mixe moels to the whole ataset (in italics the moels with nonsignificant improvements in the fit). Covariates Resi. f f Deviance p-value Null 4019 age e-11 age, gener age, gener, marital status age, gener, region age, gener, region, age gener age, gener, region, region gener age, gener, region, region age a Poisson-Normal mixe effects moel on the sample ata, in which a ranom intercept is fitte for each HD. We can note that age class, gener an region are significant. On the other han, marital status an interactions between pairs of the three significant variables are not significant. A likelihoo ratio test for the significance of the variance of the istribution of the small area ranom effects has been conucte as well. Given that we are testing whether the parameter of interest is zero, i.e. a value on the bounary of its parameter space, it can be shown that minus twice the ratio of the two log-likelihoos has approximate ensity function equal to a 50:50 mixture between a χ 2 0 an a χ 2 1 istribution. Using this approximation instea of the usual χ 2 1 essentially leas to p-values being halve. In our case the value of the test statistic is , with a p-value 3.22e 09, that provies evience of a significant area effect. An important property of the Poisson regression moel is that it allows to analyze iniviual or groupe ata with equivalent results ue to the fact that the sum of inepenent Poisson ranom variables is also Poisson. This is useful when we have groups of iniviuals with ientical covariate values as it occurs in this case stuy. For this reason, in the sequel we will fit moels to groupe ata in which groups are efine by cross classifying gener by age class by HDs. The response variable is the total number of visits to physicians for all iniviuals in each group. The overall sample size from each group is consiere as an offset. Figure 2 reports two plots of Pearson resiuals from the Poisson-Normal mixe effects moel with covariates given by age class, gener an Region. The histogram clearly shows that the istribution of the resiuals is positively skewe an has some quite large values. This is confirme by the secon plot, representing the istribution of the resiuals by HD: some HDs contain many positive resiuals, whereas some HDs have negative resiuals. Finally, Figure 3 plots the raw resiu- 7

10 Frequency Pearson Resiuals Pearson Resiuals Health District Figure 2: Moel fit iagnostics for a Poisson-Normal mixe moel fit to the ata: histogram of Pearson resiuals (left) an box-plots of Pearson resiuals by Health District (right). als against the preicte values for the number of visits to physicians. The x-axis values range between 0 an 20 in orer to show clearly over 98% of the observations. In the x-range from 8 to 20, there is no obvious pattern. However, between 0 an 8 we can see more variability, that can be an effect of the skewness of the preicte values. The plot inicates some uner-preiction when the preicte number of visits is very small. The results of these preliminary moel iagnostics suggest that the use of M-quantile small area estimators is a reasonable choice. 3 M-quantile regression In this Section we present an extension of linear M-quantile regression to count ata following Chambers et al. (2012b) an Chambers et al. (2012a). We start by proviing a fairly etaile presentation of M-quantile regression for continuous outcomes before focusing on the case of count outcomes. In this Section we rop subscript for ease of notation. 3.1 M-quantile regression for a continuous response The classic regression moel summarises the behaviour of the mean of a ranom variable y at each point in a set of covariates x. This provies a rather incomplete picture, in much the same way as the mean gives an incomplete picture of a istribution. Quantile regression summarises the behaviour of ifferent parts (e.g. 8

11 Resiuals Preicte Number of Visits Figure 3: Moel fit iagnostics for a Poisson-Normal mixe moel fit to the ata: raw resiuals vs preicte values. 9

12 quantiles) of the conitional istribution of y at each point in the set of the x s. In the linear case, quantile regression leas to a family of hyper-planes inexe by a real number q (0, 1). For a given value of q, the corresponing moel shows how the q-th quantile of the conitional istribution of y varies with x. For example, if q = 0.5 the quantile regression hyperplane shows how the meian of the conitional istribution changes with x. Similarly, for q = 0.1 the quantile regression hyperplane separates the lower 10% of the conitional istribution from the remaining 90%. Suppose (x T j, y j ), j = 1,..., n enotes the values observe for a ranom sample consisting of n inepenent observations from a population, where x T j are row p-vectors of a known esign matrix X an y j is a scalar response variable corresponing to a realisation of a continuous ranom variable with unknown continuous cumulative istribution function F. A linear regression moel for the q-th conitional quantile of y j given x j is Q y (q x j ) = x j T β q. (4) An estimate of the q-th regression parameter β q is obtaine by minimizing n y j x T j β q {(1 q)i(y j x T j β q 0) + qi(y j x T j β q > 0)}. j=1 Solutions to this problem are usually obtaine by linear programming methos (Koenker & D Orey, 1987) an algorithms for fitting quantile regression are now available in stanar statistical software, for example the library quantreg in R (R Development Core Team, 2010), the comman qreg in Stata, an the proceure quantreg in SAS. Quantile regression can be viewe as a generalization of meian regression. In the same way, expectile regression (Newey & Powell, 1987) is a quantile-like generalization of mean (i.e. stanar) regression. M-quantile regression (Breckling & Chambers, 1988) integrates these concepts within a framework efine by a quantile-like generalization of regression base on influence functions (Mregression). The M-quantile of orer q for the conitional ensity of y given the set of covariates x, f(y x), is efine as the solution MQ y (q x; ψ) of the estimating equation ψ q {y MQ y (q x; ψ)}f(y x)y = 0, where ψ q enotes an asymmetric influence function, which is the erivative of an asymmetric loss function ρ q. A linear M-quantile regression moel y j given x j is one where we assume that MQ y (q x j ; ψ) = x j T β q. (5) That is, we allow a ifferent set of p regression parameters for each value of q 10

13 (0, 1). Estimates of β q are obtaine by minimizing n ρ q (y j x T j β q ). (6) j=1 Different regression moels can be efine as special cases of (6). In particular, by varying the specifications of the asymmetric loss function ρ q we obtain the expectile, M-quantile an quantile regression moels as special cases. When ρ q is the square loss function we obtain the linear expectile regression moel if q 0.5 (Newey & Powell, 1987) an the stanar linear regression moel if q = 0.5. When ρ q is the loss function escribe by (Koenker & Bassett, 1978) we obtain the linear quantile regression. Setting the first erivative of (6) equal to zero leas to the following estimating equations n ψ q (r jq )x j = 0, (7) j=1 where r jq = y j x T j β q, ψ q (r jq ) = 2ψ(s 1 r jq ){qi(r jq > 0) + (1 q)i(r jq 0)} an s > 0 is a suitable estimate of scale. For example, in the case of robust regression, s = meian r jq /0.6745, an we use the Huber Proposal 2 influence function, ψ(u) = ui( c u c)+c sgn(u)i( u > c). Provie that the tuning constant c is strictly greater than zero, estimates of β q are obtaine using iterative weighte least squares (IWLS). 3.2 M-quantile regression for count ata: A Quasi-likelihoo approach The use of M-quantile regression with iscrete outcomes is challenging as in this case there is no agree efinition of an M-quantile regression function (Chambers et al., 2012b,a). A popular approach for moelling the mean of a iscrete outcome as a function of preictors is via the use of GLMs by assuming that the response variable follows a istribution that is a member of the exponential family of istributions using an appropriate link function. For count ata an appropriate istribution is the Poisson an the link function is the logarithm. In the same way that we impose in the linear specification (4) the continuous case, we impose an appropriate continuous (in q) specification on MQ y (q X; ψ) for count ata (Chambers et al., 2012b,a). The most obvious specification for count ata is the log-linear specification. That is, we replace (5) by MQ y (q x j ; ψ) = t j exp(x j β q ), (8) 11

14 where t j is an offset term. For estimating β q, following Chambers et al. (2012b,a), we consier extensions of the robust version of the estimating equations for GLMs by Cantoni & Ronchetti (2001) to the M-quantile case. In particular, Cantoni & Ronchetti (2001) propose a robust version of the estimating equations for GLMs an consier two popular GLMs namely, the binomial an the Poisson moels. Estimating equations are efine by Ψ(β) := n 1 n { 1 } ψ(r j )w(x j ) σ(µ j ) µ j a(β) = 0, (9) j=1 where r j = σ(µ j ) 1 (y j µ j ) are Pearson resiuals, E[Y j ] = µ j, µ i is its erivative with respect to β, Var[Y j ] = σ 2 (µ j ), an a(β) = n 1 n j=1 E[ψ(r j)]w(x j )µ j/σ(µ j ) ensures the Fisher consistency of the estimator. The boune ψ function is introuce to control eviation in y-space, whereas weights w( ) are use to ownweight the leverage points. When w(x j ) = 1, j = 1,..., n Cantoni & Ronchetti (2001) call the estimator the Huber quasi-likelihoo estimator. Notice that when ψ is the ientity function we obtain the classic quasi-likelihoo estimator for GLMs. For M-quantile regression the estimating equations (9) can be re-written as Ψ(β q ) := 1 n n { 1 } ψ q (r jq )w(x j ) σ(mq y (q x j ; ψ)) MQ y(q x j ; ψ) a(β q ) = 0, j=1 (10) where r jq = σ(mq y (q x j ; ψ)) 1 (y j MQ y (q x j ; ψ)), σ(mq y (q x j ; ψ)) = = MQ y (q x j ; ψ) 1/2, MQ y(q x j ; ψ) = MQ y (q x j ; ψ)x j an a(β q ) is a correction term to obtain unbiase estimators, which is efine following the arguments in Cantoni & Ronchetti (2001), with a(β q ) = n 1 n j=1 { 2w q (r jq )w(x j ) cp (Y j i 2 + 1) cp (Y j i 1 )+ MQ y (q x j ; ψ)) } σ(mq y (q x j ; ψ)) [P (Y j = i 1 ) P (Y j = i 2 )] MQ y (q x j ; ψ) 1/2 x j, i 1 = MQ y (q x j ; ψ) cσ(mq y (q x j ; ψ)), i 2 = MQ y (q x j ; ψ) + cσ(mq y (q x j ; ψ)) an w q (r jq ) = [qi(r jq > 0) + (1 q)i(r jq 0)]. When w(x j ) = 1, j = 1,..., n a Huber quasi-likelihoo estimator is obtaine. An alternative simple choice for w(x j ) suggeste by robust estimation in linear moels is w(x j ) = 1 h j where h j is the jth iagonal element of the hat matrix H = X(X T X) 1 X T. 12

15 The solution to the estimating equations (10) can be obtaine numerically by a Fisher scoring proceure. Note that (9) can be obtaine as special case of (10) for specific choices of q. In particular, when q = 0.5 we obtain (9). Moreover, linear M-quantile regression is a special case of (10) if the the linear link function MQ y (q x j ; ψ) = x T j β q is use an c tens to infinity. R routines for fitting M-quantile regression for count ata are available from the authors. 3.3 Alternative estimation approaches Quantile regression for count ata from a Bayesian perspective has been recently consiere by Lee & Neocleous (2010) whereas from a frequentist perspective by Machao & Santos Silva (2005). In both papers, the authors point out that the problem with estimating conitional quantiles of counts is cause by the combination of a non ifferentiable sample objective function with a iscrete epenent variable. To overcome this problem, Machao & Santos Silva (2005) use a specific form of jittering for creating artificial smoothness in the outcome. In particular, smoothness is achieve by aing to the count outcome noise generate from a Uniform(0, 1). The quantiles of the resulting continuous outcome are then irectly moelle because of the one-to-one relationship between the conitional quantiles of the count outcome an those of the artificially generate continuous outcome. An alternative approach to moeling the conitional istribution of a count outcome given the covariates was propose by Efron (1992) who propose using asymmetric maximum likelihoo estimation. As Machao & Santos Silva (2005) point out, asymmetric maximum likelihoo estimation can be seen as the result of smoothing the objective function use to efine the quantile regression estimator. Efron s approach results in estimates of conitional location for count ata that is similar to conitional expectiles propose by Newey & Powell (1987). The approach we propose in this paper for estimating M-quantile regression also uses an objective function that has a egree of smoothness. In particular, the smoothness can be increase by setting the tuning constant in the Huber influence function equal to a large value in which case estimates of the moel parameters from our approach shoul be close to those obtaine by Efron s asymmetric maximum likelihoo estimation. Inee, comparing estimates of the moel parameters from Efron s (1992) metho (using the vgam function with family equal to amlpoisson in R) with our metho, when setting Huber s tuning constant equal to a large value, confirm this assumption. Nevertheless, more nees to be one for comparing the ifferent estimation approaches especially when these are use for preiction purposes as is the case with small area estimation. 13

16 4 Estimation of small area counts by M-quantile regression moels 4.1 Point estimation Linear mixe effects moels an GLMMs inclue ranom area effects to account for between-area variation. Estimation of the moel parameters is then implemente by means of parametric assumptions such as that ranom effects are normally istribute. Efficient preiction of ranom effects is crucial ue to their central role in small area estimation. Although for linear moels close form solutions exist, for GLMs this is not the case. For GLMMs an from a frequentist perspective preicte ranom effects are obtaine by using approximations to the likelihoo, for example via first or secon orer penalise quasi-likelihoo, or numerical methos such as Gaussian quarature. Hence, for GLMMs outlier robust preiction of ranom effects becomes more challenging an the use of semiparametric methos may offer a simpler solution to outlier robust estimation. A key concept in the application of M-quantile methos to ata with group structure is the ientification of a unique M-quantile coefficient associate with each observe atum. These coefficients are then average, in some suitable way, over observations making up the group to efine a group level M-quantile coefficient, which can be use to characterise the istribution of y x within the group in very much the same way as a ranom group effect. In the continuous y case, the M-quantile coefficient for observation j is simply efine as the unique solution q j to the equation y j = MQ y (q j x j ; ψ). However, for count ata the equation y j = MQ y (q j x j ; ψ) oes not have a solution when y j = 0. To overcome this problem we use the efinition by Chambers et al. (2012a): MQ y (q j x j ; ψ) = { k(xj ) y j = 0 y j y j = 1, 2,... A possibility is k(x j ) = MQ y (q min x j ; ψ) where q min enotes the smallest q- value in the gri of q-values use to etermine the q j values of the observe units. However, this implies that q j = q min whenever y j = 0, irrespective of the value of x j, which oes not appear to be appropriate. One way to tackle this is by following the same line of argument that Chambers et al. (2012b) use in motivating the efinition of q j for the Bernoulli case. This implies that an observation with value y 1 = 0 correspons to a smaller q-value than another with value y 2 = 0 when MQ y1 (0.5 x 1 ; ψ) > MQ y2 (0.5 x 2 ; ψ). A way to efine this is by setting k(x j ) = min{1 ɛ, [ MQ y (0.5 x j ; ψ)] 1 }, where ɛ > 0 is a small positive constant. 14

17 Then the M-quantile coefficient for unit j is q j, where { 1 } min 1 ɛ, MQ y (q j x j ; ψ) = exp(x T ˆβ y j = 0 j 0.5 ) y j y j = 1, 2,... (11) For a etaile iscussion see Chambers et al. (2012a,b). Provie there are sample observations in area, an area specific M-quantile coefficient, ˆθ can be efine as the average value of the sample M-quantile coefficients in area, otherwise we set ˆθ = 0.5. Following Chambers & Tzaviis (2006), the M-quantile preictor of the average count ȳ in small area is then ˆȳ MQ = N 1 where MQ y (ˆθ x j ; ψ) = exp{x T j ˆβˆθ }. 4.2 Mean square error estimation The Mean Square Error of the preictor ˆȳ MQ { } y j + MQy (ˆθ x j ; ψ), (12) j s j r is efine as MSE(ˆȳ MQ ) = E[(ˆȳ MQ ȳ ) 2 ]. (13) Following Chambers et al. (2012b) we propose a nonparametric bootstrap-base estimator of the MSE of the ˆȳ MQ by constructing an artificial finite population that resembles the real population. To evelop the bootstrap proceure we write the linear preictor of the Poisson M-quantile regression moel in a form that mimics the mixe effects moel form, y j = x T jβ x T j(β θ β 0.5 ). (14) Averaging the last term on the right-han sie of (14) for each small area results in a term u MQ which can be interprete as a pseuo-ranom effect for area in that it quantifies an average ifference of the area-specific M-quantile fit from the meian fit. The bootstrap we propose is nonparametric in nature in the sense that the area effects are generate by using an empirical rather than a parametric istribution. The steps of the bootstrap proceure are summarize below: (Step 1) Using sample s, fit moel (8) an obtain preictors ˆȳ MQ. For each small area compute the pseuo-ranom effect û MQ by computing the E(x T j (β θ β 0.5 )) for each area. It is convenient to re-scale the elements û MQ so that they have sample mean exactly equal to zero. 15

18 (Step 2) Construct the vector û MQ = {û MQ 1,..., û MQ D }T, whose elements are obtaine by extracting a simple ranom sample with replacement of size D from the set {û MQ 1,..., û MQ D }T. (Step 3) Generate a bootstrap population U of size N = D =1 N, by generating values from a poisson istribution with µ j = exp{x T ˆβ j û MQ }, j = 1,..., N an calculate the bootstrap population parameters ȳ, = 1,..., D. The choice of a Poisson istribution may appear contraictory to the non-parametric nature of the propose bootstrap. However, the Poisson assumption is implicit in our M-quantile Poisson moel ue to the mean-variance relationship in the estimating equations. The non-parametric aspect of the propose bootstrap is only relate to the way the pseuo-ranom effects are generate. (Step 4) Extract a sample s of size n from the bootstrap population U using the assume sampling esign an compute small area estimates with the bootstrap sample ˆȳ MQ, = 1,..., D. (Step 5) Repeat steps 2-4 B times. (Step 6) Denoting by ˆȳ MQ (b) the M-quantile preictor in the b-th bootstrap replication an by ȳ (b) the corresponing population value in the b-th bootstrap population, a bootstrap estimator of MSE is MSE(ˆȳ MQ ) = B 1 B b=1 (ˆȳ MQ (b) ) 2. ȳ (b) (15) The propose bootstrap is not the only approach to MSE estimation. An alternative approach woul have been to use the ranom effects block bootstrap (Chambers & Chanra, 2012), which is free both of the istribution an the epenence assumptions of the usual parametric bootstrap. Chambers et al. (2012b) aapte the block bootstrap for estimating the MSE of the M-quantile small area preictor in the case of a Bernoulli outcome. A similar approach can be use in the case of a count outcome. A comparison between the alternative approaches to MSE estimation will be iscusse in future work. 5 Results an iscussion In this section we present the results of the application of the small area estimation proceure introuce in the previous section to ata from the HCAMS. In particular, we compare small area estimates of the average number of visits to physicians 16

19 Table 2: Analysis of quasi-eviance table for the Poisson M-quantile moel at q = Covariates Resi. f f Deviance p-value Null 506 age e-10 age, gener age, gener, region among elerly (age 65 or more) an their corresponing mean square error estimates base on the following estimators: (i) the irect estimator compute as a ratio estimator using calibration weights provie with the micro-ata; (ii) the EBPP in equation (3) base on a Poisson-Normal moel with ranom area intercepts using age, gener an region as covariates accoring the results of Table 1; (iii) the M-quantile preictor in (12) base on the Poisson M-quantile moel introuce in Section 3.2. In the Poisson M-quantile moel the ψ function is set to be the Huber Proposal 2 with the tuning constant c = 1.6 (Cantoni & Ronchetti, 2001). In aition, moel selection is carrie out via a robust stepwise proceure base on the Huber quasi-eviance at q = 0.5 (Cantoni & Ronchetti, 2001). The analysis of eviance reporte in Table 2 shows that the auxiliary variables age, gener an region, ae sequentially, are highly significant on the basis of their eviance value. Interactions between pairs of these variables are again nonsignificant. Table 3 reports the estimate β q coefficients at q = 0.50, together with stanar errors estimate using the proposal in Cantoni & Ronchetti (2001) an p-values. Estimates confirm what expecte: the number of visits increases as the people grow ol an women go to the physicians more often than men. This latter figure can be explaine by the greater longevity of women that is reflecte in a larger number of years in conitions of isability or in presence of chronicities. Efficient estimates of area effects are necessary for small area estimation via GLMMs. Similarly, estimation of M-quantile coefficients is necessary for small area estimation using the Poisson M-quantile moel propose in this paper. Figure 4 shows how the stanarize M-quantile coefficients estimate via expression (11) are relate to the stanarize area effects estimate using the glmer function in R. Figure 4 shows that the relationship between the estimate area effects an the estimate M-quantile coefficients is strong. The correlation between the estimate area effects an the estimate M-quantile coefficients is This result suggests that M-quantile coefficients are comparable to estimate area effects obtaine using stanar GLMM fitting proceures as far as capturing intra-area (omain) variability is concerne. 17

20 Table 3: Estimate Poisson M-quantile β q coefficients an their stanar errors at q = The baseline for variable age is 65-69, for variable gener is female an for variable region is Liguria. Covariates Estimate St. Error p-value Intercept e-09 age age e-06 age age > e-05 gener region Toscana region Umbria Estimate Area Effects Estimate M-quantile Coefficients Figure 4: Estimate M-quantile coefficients vs. estimate area effects (stanarize values). 18

21 Direct Estimate Moel Estimate Figure 5: Total number of visits to physicians in Health Districts of Liguria, Toscana an Umbria in 2000: Moel-base M-quantile estimates versus corresponing irect estimates. For comparing the performance of the ifferent estimators we must use a set of iagnostics. Such iagnostics are suggeste in Brown et al. (2001). Moel-base estimates shoul be (i) coherent with unbiase irect estimates an (ii) more precise than irect estimates. To valiate the reliability of the moel-base smallarea estimates, we use the gooness of fit (GoF) iagnostic an the values of the coefficient of variation (CV). Overall, the correlation between the moel-base estimates an the irect estimates are positive an high, which inicates that the moel-base estimates are coherent with the irect estimates (Direct/M-quantile is 0.87 an Direct/EBPP is 0.95). This result for M-quantile estimates is confirme by Figure 5 where the irect estimates are plotte vs M-quantile estimates: we note that M-quantile estimates appear to be consistent with irect estimates of the total number of visits to physicians. The GoF iagnostic is base on the null hypothesis that the irect an moelbase estimates are statistically equivalent. The alternative is that the irect an moel-base estimates are statistically ifferent. The GoF iagnostic is compute 19

22 Estimate CV (%) Health District Figure 6: The plot shows istribution of Health Districts values of estimate CV for irect (soli line) an moel-base estimates, with estimate CVs for the M- quantile preictor shown as a ashe blue line an estimate CVs for the EBPP shown as a ashe re line. HDs are orere accoring to increasing sample size. Out of sample areas are the last six areas. 20

23 Ratio/n i < Direct EBPP Table 4: Mean values across areas of the ratios of the estimate MSE of the irect an EBPP estimators to the estimate MSE of the Poisson M-quantile estimators groupe by area sample sizes. using the following Wal statistic for every moel base estimator W = { (ˆȳ irect ˆȳ moel ) 2 }. [ var(ˆȳ irect ) + mse(ˆȳ moel )] The value from the test statistic W is compare against the value from a χ 2 istribution with D = 54 egrees of freeom. In our case, this value is at 5% level of significance. We use the nonparametric bootstrap algorithm for the M- quantile preictor an the bootstrap proceure propose by González-Manteiga et al. (2007) for EBPP for estimating the MSE for each small area. Variance estimates for the irect estimator have been compute taking into account the complex two stage esign employe for HCAMS. In particular, variance estimates for the estimate of the average number of visits to physicians has been compute separately for each of the three regions to provie a first estimate of the esign effect (values between 5.6 an 7.1). Then, following Kish (1987, Section 2.6), esign effects have been recompute to account for the fact that the elerly constitute a subomain that cuts across PSUs (final eff values for each small area between 0.9 an 1.5). The values of the GoF are 27.3 for M-quantile preictor an 15.9 for EBPP. These results inicate that all moel-base estimates are not statistically ifferent from the irect estimates. Figure 6 shows the istribution across HDs of the estimate CVs (expresse in percentage terms) for irect (soli black line) an moel-base estimates (blue enotes M-quantile estimates an re enotes EBPP estimates). The estimate gains of the moel-base estimates over the irect estimates are large, particularly for HDs with a small number of sample units. Generally, the M-quantile estimates have a smaller estimate CV than corresponing EBPP estimates. Moreover, to evaluate the precision of M-quantile preictor, in Table 5 we report the meian values across areas of the ratios of the estimate MSE of the EBPP an irect estimators to the estimate MSE of the WMQ estimator for three groups of areas forme accoring to area-specific sample sizes. The M-quantile estimator is more efficient than the irect an the EBPP when the sample sizes are small. For large sample sizes the improvements in efficiency of M-quantile preictor are smaller; with n i > 100 M-quantile an EBPP preictors seem to be equivalent: the value of the the ratio is

24 In Figure 7 we compare the maps obtaine for the average number of visits to physicians in HDs of Liguria, Toscana an Umbria in 2000 as estimate by irect, Poisson M-quantile an EBPP base estimators. We use the same cut-points to epict the three maps. In line with expectations, most HDs have close levels of average number of visits to physicians, but there are areas eviating from the bulk of the istribution in both irections. As anticipate by the aforementione correlation coefficients, the estimates are all comparable in magnitue over the three maps. However, maps base on the two moel base estimators look very much alike an show a smoother pattern as oppose to that base on esign base estimates. In aition, moel base maps allow to have estimates also for empty small areas for which the esign base estimator cannot be compute. However, when comparing moel base maps, we can note that some HDs ten to have a larger estimate in the EBPP base map than that shown in the Poisson M-quantile map. This is ue to the fact that those are the small areas with particularly large values of y for which M-quantile moels provie more robustness in the final estimates. 6 Simulation stuy The purposes of this simulation experiment are: (i) to compare the performance of the M-quantile preictor with that obtaine by EBPP an the irect estimator; (ii) to evaluate the performance of the bootstrap mean square estimator (15) propose in Subsection 4.2. The simulate ata are generate using the iniviual x T j values an the estimates of ˆβ = ( 0.44, 0.05, 0.28, 0.27, 0.29, 0.13, 0.16, 0.14) an of the stanar error ˆϕ = obtaine fitting the GLMM on the real ata of Section 5. In each run of the simulation, y values are generate for the groups given by crossclassifying gener by age class for each of D = 54 small areas for which we have sample values. In total, we have ten groups for each small area. The value of the y variable for each cell y j ( = 1,..., D, j = 1,..., 10) is calculate as Poisson(µ j ) with µ j = N j exp{η j } an η j = x T ˆβ j + u, where u are inepenently rawn from a normal istribution with mean 0 an stanar error ˆϕ. Here, T = 1, 000 populations are generate an the true values of the average number of visits for each of the 54 sample HDs of Liguria, Toscana an Umbria, ȳ = N 1 10 j=1 y j, = 1,..., D, of the synthetic populations are available. For each population, sample values y j for each cell are generate from a Poisson(µ j ) with µ j = n j exp{x T ˆβ j + u }, where u is the value of ranom effect rawn previously to create the population, an accoring to two scenarios: (0) - No outliers. 22

25 Direct estimates Poisson M-quantile base estimates EBPP base estimates 23 Figure 7: Maps of the irect, Poisson M-quantile an EBPP base estimates of average number of visits to physicians in Health Districts of Liguria, Toscana an Umbria in 2000.

26 (M) - Measurement-type error: 2%, 5%, 10% of ranomly chosen response values has been change from y j to y j = y j For each sample the M-quantile, EBPP an the irect estimator are use to estimate the average small area count ȳ, = 1,..., D. The performances of ifferent small area estimators are evaluate with respect to two basic criteria: the bias an the root mean square error (RMSE). Simulate values of the bias an of the mean square error for a small area estimator are obtaine as T 1 T t=1 (ˆȳ t ȳ t ) an T 1 T t=1 (ˆȳ t ȳ t ) 2, respectively. Here ȳ t enotes the actual area value at simulation t, with preicte value ˆȳ t. The meian an maximum value of the absolute Bias an the meian value of RMSE over small areas are set out in Table 5. The results confirm our expectations regaring the behaviour of the estimators: uner the (0) scenario the EBPP performs slightly better than the M- quantile in terms of RMSE, whereas there is no noticeable ifference among the three estimators in terms of bias. The M-quantile preictor is the best in terms of bias an RMSE uner the (M) scenarios an it is clearly superior respect to the other estimators as contamination increases. Table 5: Moel-base simulation results: performances of preictors of small area count. Scenarios (0) an (M), Contamination: 2%, 5%, 10%, D = 54. Preictor/Scenario (0) (M) 2% (M) 5% (M) 10% Meian values of Absolute Bias EBPP M-quantile Direct Maximum value of Absolute Bias EBPP M-quantile Direct Meian values of RMSE EBPP M-quantile Direct Regaring the secon purpose of the simulation stuy, i.e. the evaluation of the performance of the bootstrap MSE estimator (15) propose in Section 4.2, we use the ata generate for scenarios (0) an (M)-10% an a subset of small areas: D = 14, the HDs of the Region Liguria. The results of the MSE estimator, base on 500 bootstrap iterations, for each scenario are shown in Table 6 where we report the meian values of their area specific biases an their root mean square errors, 24

27 expresse in relative terms (%). The MSE estimator shows small bias an a goo stability uner both scenarios. In particular, uner scenario (M)-10% ten to be biase up an its Relative RMSE increases not much respect the no-contaminate scenario. Examination of Table 6 shows that MSE estimation metho generate nominal 95 per cent confience intervals with a small uner coverage. A bootstrap bias correction term coul be inclue in expression (15). This term will be a part of a future research. Table 6: Moel-base simulation results: performance of bootstrap MSE estimator (15). Scenarios (0) an (M)-10%, D = 14, Meian values. Inicator/Scenario (0) (M) 10% Relative bias Relative RMSE Coverage rate (95% nominal) 90% 86% 7 Final remarks To carry on SAE for count ata, Poisson M-quantile moels are introuce an investigate. Such moels offer a natural way of moeling between-area variability in the ata, without imposing prior assumptions on the source of this variability or a pre-specifie hierarchical structure. In particular, with M-quantile moels there is no nee to explicitly specify the ranom components of the moel. Rather, inter-area ifferences are capture via area-specific M-quantile coefficients. As a consequence, the M-quantile approach reuces the nee for parametric assumptions. In aition, estimation an outlier robust inference is straightforwar uner these moels. The propose approach looks suitable for estimating a wie range of parameters an the moel-base simulation suggests that it is a reasonable alternative to mixe effects moels. In fact, M-quantile regression-base methos outperform GLMM-base methos when ata inclue outliers. An, in case there are no outliers, there is no noticeable loss in efficiency. The occurrence of this type of ata in real situations, as well as the usefulness of the suggeste approach, have been escribe by a real application. The moel allows us to estimate the average number of recourse to physicians for each HDs, stratifie for age an gener, for three Italian regions with oler population. Even though the suggeste approach provies encouraging results, further investigation is still necessary. A first topic of future research is the choice of the 25

M-Quantile Regression for Binary Data with Application to Small Area Estimation

University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2012 M-Quantile Regression for Binary Data with Application