M-Quantile Regression for Binary Data with Application to Small Area Estimation

Size: px

Start display at page:

Download "M-Quantile Regression for Binary Data with Application to Small Area Estimation"

Jesse Walton
5 years ago
Views:

University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2012 M-Quantile Regression for Binary Data with

au Nicola Salvati University of Pisa Nikos Tzaviis University of Southampton Recommene Citation Chambers, Ray; Salvati, Nicola; an Tzaviis, Nikos, M-Quantile Regression for Binary Data with

1 University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2012 M-Quantile Regression for Binary Data with Application to Small Area Estimation Ray Chambers University of Wollongong, Nicola Salvati University of Pisa Nikos Tzaviis University of Southampton Recommene Citation Chambers, Ray; Salvati, Nicola; an Tzaviis, Nikos, M-Quantile Regression for Binary Data with Application to Small Area Estimation, Centre for Statistical an Survey Methoology, University of Wollongong, Working Paper 12-12, 2012, Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library:

Centre for Statistical an Survey Methoology The University of Wollongong Working Paper 12-12 M-Quantile Regression for Binary Data with Application to Small Area Estimation Ray Chambers, Nicola

2 Centre for Statistical an Survey Methoology The University of Wollongong Working Paper M-Quantile Regression for Binary Data with Application to Small Area Estimation Ray Chambers, Nicola Salvati an Nikos Tzaviis Copyright 2008 by the Centre for Statistical & Survey Methoology, UOW. Work in progress, no part of this paper may be reprouce without permission from the Centre. Centre for Statistical & Survey Methoology, University of Wollongong, Wollongong NSW Phone , Fax anica@uow.eu.au

3 Biometrika (2012), xx, x, pp C 2007 Biometrika Trust Printe in Great Britain M-Quantile Regression for Binary Data with Application to Small Area Estimation BY RAY CHAMBERS Centre for Statistical an Survey Methoology, University of Wollongong, New South Wales 2522, Australia ray@uow.eu.au NICOLA SALVATI Dipartimento i Statistica e Matematica Applicata all Economia, Università i Pisa, Via Riolfi, 10 - I56124 Pisa, Italy, salvati@ec.unipi.it NIKOS TZAVIDIS Southampton Statistical Sciences Research Institute, University of Southampton, Southampton SO17 1BJ, UK, n.tzaviis@soton.ac.uk SUMMARY M-quantile regression moels are a robust an flexible alternative to ranom effects moels, particularly in small area estimation. However quantiles, an more generally M-quantiles, are only uniquely efine for continuous variables. In this paper we exten the M-quantile regression approach to binary ata, an more generally to count ata. This approach is then applie to estimation of a small area proportion, where a popular alternative approach is to use a plugin version of the Empirical Best (EB) preictor base on a generalise linear mixe moel for the unerlying binary variable. Results from both moel-base an esign-base simulations comparing the binary M-quantile an the plug-in EB preictors emonstrate the usefulness of the M-quantile approach in this case. The paper conclues with two illustrative applications. The first aresses estimation of the number of unemploye people age 16 an above resient in the Unitary Authorities an Local Authority Districts of Great Britain. The secon consiers estimation of the number of poor househols in each of the Local Labour Systems of the Tuscany region of Italy. Some key wors: Influence function; M-estimation; Bootstrap methos; Simulation experiments. 1. INTRODUCTION The increasing eman for reliable small area statistics has le to the evelopment of a number of efficient moel-base small area estimation (SAE) methos (Rao, 2003; Jiang & Lahiri, 2006). For example, the empirical best linear unbiase preictor (EBLUP) base on a linear mixe moel (LMM) is often recommene when the target of inference is the small area average of a continuous response variable (Battese et al., 1988; Prasa & Rao, 1990). However, using a mixe moel to characterise ifferences between small areas requires strong istributional assumptions, a formal specification of the ranom part of the moel an oes not easily allow for outlier robust inference. An alternative approach is to use M-quantile moels (Breckling & Chambers, 1988) to characterise these ifferences (Chambers & Tzaviis, 2006). Unlike traitional mixe moels,

4 2 M-quantile moels o not epen on strong istributional assumptions an automatically provie outlier robust inference. Unfortunately, many survey variables are categorical in nature an are therefore not suite to stanar SAE methos base on LMMs. One option in such cases is to aopt a Hierarchical Bayes approach (Malec et al., 1997; Nanram et al., 1999) or to use Empirical Bayes (MacGibbon & Tomberlin, 1989; Farrell et al., 1997). Alternatively, if a frequentist approach is preferre, one can follow Jiang & Lahiri (2001) who propose an empirical best preictor (EBP) for a binary response, or Jiang (2003) who extens these results to generalise linear mixe moels (GLMMs). However, application of these moels in SAE is not straightforwar, as computing the maximum likelihoo estimates of the fixe an ranom effects parameters of a GLMM can require evaluation of high imensional integrals. Furthermore, the EBP is moel epenent an estimates of GLMM parameters can be very sensitive to outliers or epartures from unerlying istributional assumptions. Large eviations from the expecte response as well as outlying points in the space of the explanatory variables are known to have a large influence on classical maximum likelihoo inference base on generalise linear moels (GLMs). This has lea to the evelopment of robust alternative methos for fitting these moels (Pregibon, 1982; Preisser & Qaqish, 1999; Cantoni & Ronchetti, 2001). But evelopment of corresponing robust methos for GLMM-base SAE is limite. Maiti (2001) escribes a Hierarchical Bayes approach to fitting a GLMM base on an outlier-robust normal mixture prior for the ranom effects, while Sinha (2004) proposes robust estimation of the fixe effects an the variance components of a GLMM, using a Metropolis algorithm to approximate the posterior istribution of the ranom effects. Similarly, Noh & Lee (2007) propose methos that allow robust inferences for GLMs, with extension to GLMMs. However, only Maiti (2001) applies these methos to SAE. To the best of our knowlege, there is no existing theory for robust SAE base on GLMMs in the frequentist framework. In this paper we therefore propose a new approach to SAE for iscrete ata base on M-quantile moelling. This allows straightforwar extension of the existing M-quantile approach for continuous ata to the case where the response is binary or, more generally, a count. As with M-quantile moelling of a continuous response (Chambers & Tzaviis, 2006) ranom effects are avoie an between area variation in the response is characterise by variation in area-specific values of quantile-like coefficients. Outlier-robust inference is also automatic in case of both misclassification error an measurement error. In the next Section we summarise estimation of a small area proportion base on a GLMM. In Section 3 we then exten the M-quantile regression approach to binary ata an, more generally, to count ata. In Section 4 we buil on the extension to binary ata to efine the corresponing M-quantile coefficients of the sample units. As with the continuous case, these coefficients capture small area effects in the ata, leaing to an M-quantile estimator of a small area proportion. In the same Section, we also propose three estimators of the mean square error (MSE) of this estimator: an analytical estimator base on first orer approximations to the variances of solutions to estimating equations, an two bootstrap methos that exten ieas set out in Tzaviis et al. (2010) an Chambers & Chanra (2012). In Section 5, we present results from moel-base an esign-base simulation stuies aime at assessing the performance of the ifferent small area preictors consiere in this paper. In Section 6 we use the M-quantile approach escribe in this paper to estimate: (1) the number of unemploye people age 16 an above resient in each of 406 Unitary Authorities an Local Authority Districts (UALAD s) of the UK; an (2) the number of poor househols in each of the 57 Local Labour Systems (LLS s) of the Tuscany region of Italy. Finally, in Section 7 we conclue the paper with a iscussion of our results an outstaning research issues.

5 2. SMALL AREA ESTIMATION BASED ON GENERALISED LINEAR MIXED MODELS Let U enote a finite population of size N which can be partitione into D omains or small areas, with U enoting small area. The small area population sizes N ; = 1,..., D are assume known. Let y j be the value of the variable of interest (typically a iscrete or a categorical variable) for unit j in area, an let x j enote a p 1 vector of unit level covariates (incluing an intercept). It is assume that the values of x j are known for all units in the population, as are the values z of a q 1 vector of area level covariates. The aim is to use the sample values of y j an the population values of x j an z to infer the values θ ; = 1,..., D of a small area characteristic of interest. To save notation, in what follows we use E s to enote the expectation conitional on this information. It is well known that the minimum mean square error preictor of θ is then E s [θ ]. In many cases θ = N 1 j U f(y j ) where f is a known function. The minimum mean square error preictor of θ is then N 1 { j s f(y j ) + j r E s [f(y j )]}, where s enotes the n sample units in small area an r enotes the N n remaining (i.e. non-sample) units in this area. In general, the conitional expectation E s [f(y j )] can be ifficult to evaluate, an so is replace by a suitable approximation. One such approximation is E[f(y j ) u ] where the u ; = 1,..., D are q-imensional inepenent ranom effects characterising the between-area ifferences in the istribution of y j given x j (see Rao, 2003; Jiang & Lahiri, 2006; González-Manteiga et al., 2007). This can be formalise by assuming a generalise linear mixe moel (GLMM) for µ j = E[y j u ] of the form g(µ j ) = η j = x T j β + zt u, (1) where g is a known invertible link function. When y j is binary-value a popular choice for g is the logistic link function an the iniviual y j values in area are taken to be inepenent Bernoulli outcomes with µ j = E[y j u ] = P (y j = 1 u ) = exp{η j }(1 + exp{η j }) 1 (2) an V ar[y j u ] = µ j (1 µ j ). The q-imensional vector u is generally assume to be inepenently istribute between areas an to follow a normal istribution with mean 0 an covariance matrix Σ u. The matrix Σ u is allowe to epen on parameters δ = (δ 1,..., δ K ), which are then referre to as the variance components of the GLMM, while the vector β in (1) is referre to as the fixe effects parameter of this moel. We focus on the situation where the target of inference is the small area proportion, ȳ = N 1 j U y j an the Bernoulli-Logistic GLMM (2) is assume. In this case the approximation to the minimum mean square error preictor of ȳ is N 1 [ j s y j + j r µ j ]. Since µ j epens on β an u, a further stage of approximation is require, where unknown parameters are replace by suitable estimates. This leas to the plug-in version of the Empirical Best Preictor (EBP) for the area proportion ȳ uner (2), ˆȳ EBP = N 1 { y j + j s j r ˆµ j 3 }, (3) where ˆµ j = exp{ˆη j }(1 + exp{ˆη j }) 1, ˆη j = x T j ˆβ + z T û, ˆβ is the vector of the estimate fixe effects an û enotes the vector of the estimate area-specific ranom effects. In the simplest case, q = 1 an z = 1, in which case the u are scalar small area effects. We refer to (3) in this case as a ranom intercepts EBP. For more etails on this preictor, incluing estimation of its MSE, see Saei & Chambers (2003), Jiang & Lahiri (2006) an González-Manteiga et al. (2007).

6 4 Unfortunately, espite their attractive properties as far as moelling non-normal response variables are concerne, application of GLMMs in small area estimation is not always straightforwar. In particular, the likelihoo function efine by a GLMM can involve high-imensional integrals which cannot be evaluate analytically (see Mc Culloch, 1994, 1997; Song et al., 2005). In such cases numerical approximations can be use, as for example in the R function glmer in the package lme4. Alternatively, estimation of the moel parameters in (2) can be carrie out using an iterative proceure that combines Maximum Penalize Quasi-Likelihoo (MPQL) estimation of β an u with REML estimation of δ. See Saei & Chambers (2003). In the empirical results reporte in Section 5, we use glmer for parameter estimation. 3. M-QUANTILE REGRESSION MODELS FOR BINARY DATA In this Section we evelop an extension of linear M-quantile regression to binary ata. Since M-quantile moels o not epen on how areas are specifie, we also rop the subscript in this Section. We start by summarising M-quantile regression for a continuous response M-quantile regression for a continuous response M-quantile regression (Breckling & Chambers, 1988) is a quantile-like generalisation of regression base on influence functions (M-regression). The M-quantile of orer q of a continuous ranom variable Y with istribution function F (Y ) is the value Q q that satisfies ( Y Qq ) ψ q F (Y ) = 0, (4) σ q where ψ q (t) = 2ψ(t){qI(t > 0) + (1 q)i(t 0)} an ψ is a user-efine influence function. Here σ q is a suitable measure of the scale of the ranom variable Y Q q. Note that when ψ(t) = t we obtain the expectile of orer q, which represents a quantile-like generalisation of the mean (Newey & Powell, 1987), an when ψ(t) = sgn(t) we obtain the stanar quantile of orer q (Koenker & Bassett, 1978). Breckling & Chambers (1988) efine a linear M-quantile regression moel as one where the ψ-base M-quantile Q q (X; ψ) of orer q of the conitional istribution of y given the vector of p auxiliary variables X satisfies Q q (X; ψ) = Xβ q. (5) Let (y j, x j ; j = 1,..., n) enote the available ata. For specifie q an continuous ψ, an estimate ˆβ q of β q is obtaine by solving the estimating equation n 1 n ψ q (r jq )x j = 0, (6) i=1 where r jq = y j Q q (x j ; ψ), ψ q (r jq ) = 2ψ(ˆσ q 1 r jq ){qi(r jq > 0) + (1 q)i(r jq 0)} an ˆσ q is a suitable robust estimator of scale, i.e. ˆσ q = meian r jq / In this paper we will always use the Huber Proposal 2 influence function ψ(t) = ti( c < t < c) + c sgn(t)i( t c). Provie the tuning constant c is boune away from zero, we can solve (6) using stanar iteratively re-weighte least squares (IRLS) M-quantile regression for a binary response: an estimating equation approach There is no obvious efinition of a quantile regression function when Y is binary since the orer q quantile of a binary variable is not unique. However, provie the unerlying influence

7 function ψ is continuous an monotone non-ecreasing, the M-quantiles of a binary variable o exist an are unique. This is easily seen by consiering the solution to (4) when Y is binary, with P (Y = 1) = p. In this case (4) becomes ( ) ( ) 1 Qq Qq pqψ = (1 p) (1 q) ψ. σ q It is easy to see that when ψ(t) = t an q = 0.5, the solution to this estimating equation is Q 0.5 = p, as shoul be the case. Furthermore, when both p an q lie strictly between 0 an 1, the preceing assumptions about ψ ensure that Q q also lies strictly between 0 an 1 an is monotone non-ecreasing in q for fixe p. It is also monotone non-ecreasing in p for fixe q uner the assumption of a fixe scale parameter. A proof of this is available from the authors on request. In the same way that we impose a linear specification (5) on Q q (X; ψ) in the continuous case, we can impose an appropriate continuous (in q) specification on Q q (X; ψ) in the binary case. In particular, it seems sensible to replace (5) by the linear logistic specification Q q (x j ; ψ) = exp(xt j β q) 1 + exp(x T j β q). (7) In orer to estimate the parameter β q we consier the extension to the M-quantile case of the Cantoni & Ronchetti (2001) approach to robust estimation of the parameters of a GLM. In particular these authors propose a robustifie version of the maximum likelihoo estimating equations for a GLM of the form: n { Ψ(β) := n 1 1 } ψ(r j )w(x j ) σ(µ j ) µ j a(β) = 0, (8) j=1 where r j = y j µ j σ(µ j ) are the Pearson resiuals, E[Y j] = µ j, V ar[y j ] = σ 2 (µ j ), µ i is the erivative of µ j with respect to β an a(β) = 1 n n j=1 E[ψ(r 1 j)]w(x j ) σ(µ j ) µ j ensures the Fisher consistency of the solution to (8). The boune influence function ψ is use to control outliers in y, whereas the weights w are use to ownweight the leverage points. When w(x j ) = 1 j Cantoni & Ronchetti (2001) refer to the solution to (8) as the Huber quasi-likelihoo estimator. When ψ is the ientity function, (8) reuces to the usual maximum likelihoo estimating equations for a GLM. Uner the M-quantile framework the estimating equations (8) can be re-written as n { Ψ(β q ) := n 1 1 } ψ q (r jq )w(x j ) σ(q q (x j ; ψ)) Q q(x j ; ψ) a(β q ) = 0, (9) j=1 where r jq = y j Q q(x j ;ψ) σ(q q(x j ;ψ)), σ(q q(x j ; ψ)) = [Q q (x j ; ψ)(1 Q q (x j ; ψ))] 1/2, Q q(x j ; ψ) = σ 2 (Q q (x j ; ψ))x j an a(β q ) is a bias correction term: { a(β q ) = ψ q ( 1 Qq (x j ; ψ) σ(q q (x j ; ψ)) σ q ) Q q (x j ; ψ) ψ q ( Qq (x j ; ψ) σ(q q (x j ; ψ)) ) } (1 Q q (x j ; ψ)). Setting w(x j ) = 1 j leas to a Huber quasi-likelihoo M-quantile estimator. An alternative choice is w(x j ) = 1 h j where h j is the jth iagonal element of the hat matrix H = X(X T X) 1 X T. This leas to a Mallows type M-quantile estimator. In either case, the estimating equations (9) can be solve numerically using a Fisher scoring proceure to obtain an estimate ˆβ q of β q. 5

8 6 Note that when q = 0.5, (9) reuces to (8). Moreover, (6) is a special case of (9) if the the linear link function Q q (x j ; ψ) = x T j β q is use an the tuning constant c in the Huber influence function tens to infinity (i.e. ψ is the ientity function). This estimating equation approach applies quite generally. For example, it can be use when Y is a Poisson ranom variable. In this case the most appealing specification for the M-quantiles of the conitional istribution of Y is log-linear. That is, Q q (x j ; ψ) = k exp(x j β q ), (10) where k is an offset term. The parameter β q can then be estimate by solving (9) with σ(q q (x j ; ψ)) = Q q (x j ; ψ) 1/2, Q q(x j ; ψ) = Q q (x j ; ψ)x j an { a(β q ) = 2w q (r jq ) cp (Y j i 2 + 1) cp (Y j i 1 ) + Q q(x j ; ψ) } σ(q q (x j ; ψ)) [P (Y j = i 1 ) P (Y j = i 2 )], with i 1 = Q q (x j ; ψ) cσ(q q (x j ; ψ)), i 2 = Q q (x j ; ψ) + cσ(q q (x j ; ψ)) an w q (r jq ) = [qi(r jq > 0) + (1 q)i(r jq 0)]. Assuming that ψ is a continuous monotone non-ecreasing function, we can write own a first orer sanwich-type approximation to the variance of (9) of the form Here V ar( ˆβ q ) = n 1{ E V ar{ψ(β q )} = n 1 n j=1 where [ { E ψq 2 yj Q q (x j ; ψ) }] = σ(q q (x j ; ψ)) [ Ψ(βq ) ]} 1 [{ V ar{ψ(βq )} β q E [ Ψ(βq ) β q ]} 1] T. (11) { [ { x j σ 2 (Q q (x j ; ψ))e ψq 2 yj Q q (x j ; ψ) }] } x T j σ(q q (x j ; ψ)) { ψ 2 q n a 2 j(β q ), j=1 ( 1 Qq (x j ; ψ) ) ( Q q (x j ; ψ) + ψ 2 Qq (x j ; ψ) q σ(q q (x j ; ψ)) σ(q q (x j ; ψ)) a 2 j (β q) is the square of the bias correction term for unit j, an the expectation E B(β q ) = n 1 n j=1 {{ σ(q q (x j ; ψ)) ψ q ( 1 Qq (x j ; ψ) σ(q q (x j ; ψ)) ) } (1 Q q (x j ; ψ)), [ ] Ψ(βq) β q is ) ( Qq (x j ; ψ) )} } + ψ q σ 2 (Q q (x j ; ψ))x j x T j. σ(q q (x j ; ψ)) An estimator of (11) is then efine by plugging in estimates of unknown quantities into these expressions. Denoting these plug-in estimates by a hat leas to a variance estimator for ˆβ q of the form V ar( ˆβ q ) = n 1 ˆB 1 ( ˆβ q ) V ar{ψ( ˆβ q )}[ ˆB 1 ( ˆβ q )] T. (12) R routines for estimation an inference using M-quantile regression moels with binary an Poisson ata are available from the authors M-quantile regression for a binary response: an econometric approach The estimating equation approach escribe in the previous subsection oes not strictly apply to stanar quantile regression for binary ata, an quantile regression for binary ata has been evelope in the econometric literature using a latent variable concept. However, as we now show, the two approaches are very closely relate, since the econometric approach can be shown to be equivalent to the solution of an estimating equation analogous

9 to (6). Since we confine ourselves to stanar quantiles in this subsection, we rop the influence function ψ from our notation an, following Koras (2006), we assume that the observe values y j represent the outcome of a continuously istribute latent variable. That is, the observe value y j is generate by an unobserve (latent) real value yj in the sense that y j = I(yj > 0). Let Q q(x j ) enote the conitional quantile function of this latent variable. Since y j = I(yj > 0) is a monotonic transformation of y j, the qth conitional quantile of y j shoul be the same transformation of the qth conitional quantile of yj. That is Q q (x j ) = I(Q q(x j ) > 0). Given that Q q(x) = Xβ q, it follows that Q q (x j ) = I(x T j β q > 0) an a maximum score estimator for β q, efine by ˆβ q = max b=1 n 1 n {y j (1 q)}i(x T j b > 0) (13) j=1 was suggeste by Manski (1975, 1985). Put I j (b) = I{y j < I(x T j b > 0)}. Since I{y j < I j (b)} = (1 y j )I j (b), we can, after some simplification, show that (13) reuces to ˆβ q = min b=1 n 1 n j=1 [ ] qi{y j I j (b)} + (1 q)i{y j < I j (b)} y j I j (b). (14) This is equivalent to fitting the quantile regression moel Q q (x j ) = I(x T j β q) to the observe y j, subject to the restriction β q = 1, or, in what amounts to the same thing, solving (6) with ψ(t) = sgn(t), subject to this restriction. Note that the restriction is necessary in orer to ensure that β q is ientifiable (since the scale of yj is unknown) an so (13) has a solution. A smoothe version of (13) has been propose by Horowitz (1992) as having better finite sample properties: ˆβ q = max b=1 n 1 n j=1 {y j (1 q)}f (σ 1 n x T j b), (15) where F is an appropriately chosen smooth cumulative istribution function efine on the entire real line an σ n 0 as n. The same simplifying steps as those leaing to (14) allow us to write (15) as ˆβ q = min b=1 n 1 n [ qi{y j F (σn 1 j=1 ] x T j b)} + (1 q)i{y j < F (σ 1 x T j b)} n y j F (σ 1 n since 0 < F (t) < 1 I{y j < F (σn 1 x T j b)} = 1 y j. That is, this smoothe loss function for regression quantiles for binary ata leas to essentially the same estimator as the logistic ( ) 1 formulation (7). In fact, if we put F (t) = exp{σn 1 x T j b} 1 + exp{σn 1 x T j b} then as σn 0, F (σ 1 n x T j b) exp{xt j b} (1 + exp{x T j b} ) 1, an we en up with the quantile analogue of the solution to (6), with Q q (x j ; ψ) efine by (7) an subject to the restriction β q = 1. 7 x T j b),

10 8 4. ESTIMATION OF SMALL AREA PROPORTIONS USING M-QUANTILE REGRESSION MODELS Much of the ata measure in business, labour force an living conitions surveys is binary in character, an there is a growing nee for reliable small area estimates base on these ata. From now on therefore we focus on using M-quantile regression moels for binary ata to estimate the small area averages (i.e. the small area proportions) efine by a binary outcome variable Point estimation The mixe moels employe in small area estimation use ranom area effects to account for between-area variation. These moels epen on istributional assumptions for the ranom part of the moel an o not easily allow for outlier robust inference. An alternative approach to moelling the variability associate with the conitional istribution of Y given X is via M- quantile regression moels. These moels o not epen on strong istributional assumptions or on a preefine small area geography, an outlier robust inference is automatic. A key concept when using an M-quantile regression moel for small area estimation is that of the M-quantile coefficient q j for a unit j s. When the variable Y is continuous this is the value q j such that y j = Q qj (x j ; ψ). Note that M-quantile coefficients are etermine at population level. If between area variation is an important part of overall population variability then units within an area will have similar M-quantile coefficients. Provie there are sample observations in area, estimate values of their M-quantile coefficients are efine by substituting ˆQ qj (x j ; ψ) in the preceing efinition. An area specific M-quantile coefficient, ˆθ can then be efine as the average value of the estimate M-quantile coefficients in area, otherwise we set ˆθ = 0.5. Following Chambers & Tzaviis (2006), the M-quantile preictor of the average ȳ in small area is ˆȳ MQ { = N 1 } y j + (x ˆQˆθ j ; ψ). (16) j s j r When Y is binary, an we moel its regression M-quantile of orer q via (7), the natural extension ( 1 of this approach is to put ˆQˆθ (x j ; ψ) = exp{x T j ˆβˆθ } 1 + exp{x T j }) ˆβˆθ in (16). However, this begs the question of how one efines ˆθ, since the estimating equation y j = ˆQ qj (x j ; ψ) for the estimate M-quantile coefficient of a continuous y j no longer has a solution when y j is binary. We therefore iscuss extension of the M-quantile coefficient concept to binary Y before we consier inference base on (16) M-quantile coefficients for binary ata A first step in efining M-quantile coefficients for binary ata is to note that any reasonable efinition of this concept has to associate a larger M-quantile coefficient with a value y j = 1 compare with a value y j = 0 at the same value of x j. The next thing to note is that the solution m j to the equation ˆQ mj (x j ; ψ) = 0.5 can be interprete as a measure of the propensity for y j = 1 to be observe relative to the propensity for y j = 0 to be observe at x j. A value m j < 0.5 inicates that y j = 1 is more likely than y j = 0 an vice versa. This leas to our first efinition of an estimate M-quantile coefficient when Y is binary. DEFINITION A: Given binary ata with fitte M-quantile regression function ˆQ q (x j ; ψ), the estimate M-quantile coefficient for observation j is q j = (m j + y j )/2, where ˆQ mj (x j ; ψ) = 0.5. Note that provie ˆQ q (x j ; ψ) is monotone in q at x j, the above efinition of an estimate M- quantile coefficient shoul be unique. In orer to unerstan the motivation for this efinition,

11 suppose that y j = 0 at x j an that there are many more Y = 0 than Y = 1 near x j. Then (a) y j = 0 is not unusual, an (b) we anticipate that the monotone increasing function f(q) = ˆQ q (x j ; ψ) will only excee half for values of q close to one. That is, m j will be close to one an so q j will be slightly less than half. On the other han, suppose y j = 1 but there are still many more Y = 0 than Y = 1 near x j. Then (a) y j = 1 is unusual, an (b) we still anticipate that the monotone increasing function f(q) = ˆQ q (x j ; ψ) will only excee half for values of q close to one. Here q j will be close to one. Conversely, suppose that there are many more observations with Y = 1 than with Y = 0 near x j, so m j is close to zero. Then if y j = 0 (an unusual value) we expect q j will also be close to zero, while if y j = 1 (not unusual) we expect q j will be slightly greater than a half. The estimate M-quantile coefficients allow us to inex the sample ata. A somewhat ifferent inexing base on quantile regression moelling of Y is escribe in Koras (2006). This takes a latent variable approach an the resulting inex is essentially efine by a quantilebase estimate of P (y j = 1 x j ). Uner linearity of the conitional quantiles of this latent variable, we have alreay seen that Q q (x j ) = I(x T j β q > 0) an so P (y j = 1 x j ) = 1 h j, where x T j β h j = 0. Consequently, given an estimate ˆβ q for each value 0 < q < 1 we can inex the sample observations by p j = 1 h j where x T j ˆβ hj = 0. Note that this inex oes not epen on y j, an so cannot reflect iniviual effects, which woul seem to limit its usefulness in characterising how groups iffer after covariate effects have been taken into account. However, we can use the approach leaing to Definition A to exten this inex by allowing it to reflect iniviual effects.this leas to our secon efinition of an estimate M-quantile coefficient for the binary case. DEFINITION B: Given binary ata with fitte M-quantile regression function ˆQ q (x j ; ψ), the estimate M-quantile coefficient for observation j is q j = (h j + y j )/2, where x T j ˆβ hj = 0. Note that if x T j ˆβ q = 0 ˆQ q (x j ; ψ) = 0.5 then Definition B an Definition A are ientical. This conition will hol, for example, whenever ψ is the ientity function an Q q (x j ; ψ) = Q q (x j ) = F (x T j β q) where F (t) is a istribution function that satisfies F (0) = 0.5. Unfortunately, both Definition A an Definition B have a serious eficiency. This follows from the fact that in applications where h j varies aroun some constant, say h, q j will be concentrate near (1 + h)/2 an h/2. Furthermore, it is impossible to observe q j = 0.5 in general. An extreme case is where there is no relationship between y j an x j, an y j = 1 is just as likely as y j = 0. In this case h j = 0.5, an there are just two possible values of q j, 0.75 (y j = 1) an 0.25 (y j = 0). The basic reason for this behaviour is that both Definition A an Definition B compute q j on the same scale as y j. This makes sense when the istribution of y j is measure on a linear scale. However, in the binary case the istribution of y j is linear in the logistic scale, an so it makes sense to efine q j in the same way. That is, we replace q j an h j in Definition B by ˆQ qj (x j ; ψ) an ˆQ 0.5 (x j ; ψ) respectively, leaing to our thir, an final, efinition of q j : DEFINITION C: Given binary ata with fitte M-quantile regression function ˆQ q (x j ; ψ), the estimate M-quantile coefficient for observation j is q j, where ˆQ qj (x j ; ψ) = ( ˆQ 0.5 (x j ; ψ) + y j )/2. Note that uner a logistic specification for ˆQ q (x j ; ψ), using Definition C is equivalent to efining q j as the solution to y j = xt j β q j, where ( 0.5{ yj ˆQ0.5 (x j ; ψ) + y j } ) = log 1 0.5{ ˆQ. 0.5 (x j ; ψ) + y j } 9

12 10 True Area Effects True Area Effects Estimate Parametric Area Effects Estimate M-quantile Coefficients (Definition C) Fig. 1. Estimate area effects vs. true area effects (left plot) an estimate M-quantile coefficients vs. true area effects (right plot) in a Monte-Carlo simulation with D = 200 an n = 25. The value yj above can be thought of as a pseuo-value that behaves like the unobservable latent variable whose istribution etermines that of y j. In the rest of this paper, an particularly in the simulation experiments reporte in Section 5, we use Definition C when calculating estimate M-quantile coefficients. Efficient estimates of area effects are necessary for small area estimation via GLMMs. Similarly, estimation of M-quantile coefficients is necessary for small area estimation using the binary M-quantile moel propose in this paper. A natural question is then the strength of the relationship between the actual area effects an the estimate M-quantile coefficients. Some empirical evience for such a relationship is isplaye in Figure 1. These scatterplots show how area effects estimate using the glmer function in R an M-quantile coefficients estimate via Definition C are relate to true area effects. The simulate ata unerpinning these plots were generate using D = 200 areas, each with a sample size of n = 25. At each simulation, values of x j were inepenently rawn as Normal(0, 1) an corresponing values of y j were then generate as Bernoulli(p j ) with p j = exp{η j }(1 + exp{η j }) 1 an η j = x j + u. The small area effects u were inepenently rawn as Normal(0, 1). Figure 1 shows how the estimate area effects an the estimate M-quantile coefficients are relate to the true area effects in one Monte-Carlo simulation. Over 1, 000 simulations, the average correlation between the true area effects an estimate area effects was 0.89, an the corresponing average correlation between the true area effects an the estimate M-quantile coefficients was These results suggest that M-quantile coefficients are comparable to estimate area effects obtaine using stanar GLMM fitting proceures as far as capturing intra-area (omain) variability is concerne. Note also that these simulations buil on ata generate via a GLMM. In real applications, where GLMM assumptions may be violate, we expect that an M-quantile approach shoul offer a robust alternative for small area estimation.

13 4 3. Mean square error estimation To start, we propose a MSE estimator for (16) base on the linearisation approach set out in Chambers et al. (2009). This assumes that the working moel for inference conitions on the realise values of the area effects, an so the MSE of interest is conitional an equal to a conitional preiction variance plus a square conitional preiction bias. In orer to conserve space, we omit some technical etails in the following evelopment, but these are available from the authors upon request. We also assume that the estimate area-level M-quantile coefficient values θ have negligible variability an so can be treate as fixe. A first orer approximation to the conitional preiction variance of (16) is then V ar(ˆȳ MQ ȳ θ ) = N 2 { [ ] V ar ˆQθ (x j ; ψ) j r {[ N 2 Q θ (x j ; ψ)x T j j r + } V ar(y j ), j r + } V ar(y j ) j r ] V ar( ˆβ [ ] T θ ) Q θ (x j ; ψ)x T j j r 11 which can be estimate by {[ ] [ V ar(ˆȳ MQ ) = N 2 (x ˆQˆθ j ; ψ)x T j V ar( ˆβˆθ ) j r + } V ar(yj ). j r j r ˆQˆθ (x j ; ψ)x T j ] T Here V ar( ˆβˆθ ) is the sanwich-type estimator (12) an V ar(y j ) can be calculate either by (i) using the sample ata from area, V ar(y j ) = ˆȳ (1 ˆȳ ) or by (ii) pooling ata from the entire sample, in which case V ar(y j ) = ˆȳ(1 ˆȳ). Note that the poole estimator shoul lea to more stable preiction variance estimates when area sample sizes are very small. The conitional preiction bias can be approximate using the results of Copas (1988): E(ˆȳ MQ ȳ θ ) 1 2N { β θ { } 1 { [{ Ψ(β θ ) tr β θ } Q θ (x j ; ψ), j r with corresponing plug-in estimator Bias(ˆȳ MQ ) = 1 { } 1{ Ψ(β θ ) 2N β βθ = ˆβˆθ θ { }. β θ j r Q θ (x j ; ψ) βθ = ˆβˆθ The estimator of the conitional MSE of ˆȳ MQ mse A (ˆȳ MQ is then [{ tr β θ β T θ Ψ(β θ ) } V ar( ˆβ ]} θ ) β θ β T θ Ψ(β θ ) βθ = ˆβˆθ } V ar( ˆβˆθ ) ) = V ar(ˆȳ MQ ) + { Bias(ˆȳ MQ )} 2. (17) We also propose two ifferent bootstrap-base methos for estimating the MSE of (16) - a nonparametric bootstrap an a ranom effect block bootstrap. In orer to save space, the compu- ]}

14 12 tational etails of each bootstrap proceure are set out in the Appenix. Here we summarise the main characteristics of each metho. The nonparametric bootstrap-base estimator (hereafter NPB) of the MSE of ˆȳ MQ is efine by using the GLMM representation (2) to generate bootstrap populations of binary values that mimic the real population, sampling from this bootstrap population an calculating (16). This proceure is an extension to binary ata of the metho propose by Tzaviis et al. (2010). The bootstrap proceure is nonparametric in the sense that it replicates area effects in the population by expressing the linear component of the logistic M-quantile regression moel (7) as γ jq = x T j β x T (β θ β 0.5 ), (18) where x T is the vector of area averages of the moel covariates. The last term on the right-han sie of (18) is then interprete as a pseuo-ranom effect for area. The ranom effect block bootstrap, see Chambers & Chanra (2012), is a robust alternative to parametric bootstrap methos for clustere ata. It is free of both the istribution an the epenence assumptions of the usual parametric bootstrap for such ata an is consistent when the mixe moel assumption is vali. In particular, it preserves area effects by bootstrap resampling within areas. We aapt this proceure (hereafter REBB) for estimating the istribution of the M-quantile preictor (16) by resampling the marginal logistic scale resiuals r MQ j = x T j ( ˆβˆθ ˆβ 0.5 ) within each area to generate bootstrap values of P (y j = 1 x j ) for the population units making up the area. Bootstrap binary population values are then obtaine using Bernoulli simulation. 5. SIMULATION STUDIES We present results from two types of simulation stuies that are use to examine the performance of the small area estimators iscusse in the preceing Sections. In Subsection 5 1 below we report results from moel-base simulations where population ata are first generate using a GLMM an a sample is then rawn from this simulate population using a pre-specifie esign. The properties of estimators of small area proportions are assesse by comparing the estimates that they generate base on the sample ata to the corresponing population proportions. In Subsection 5 2 we report results from a esign-base simulation. Here a realistic finite population is simulate by resampling from ata collecte in a sample survey, an then repeate samples are rawn from it using the same (or similar) sample esign as the original survey. In this case the survey ata were source from the European Union Statistics on Income an Living Conitions (EU-SILC) 2005 survey, an the istribution of estimates of small area proportions generate uner repeate sampling is use to compare the properties of ifferent estimators of these proportions. Two ifferent M-quantile versions of (16) were investigate in the simulations, both base on a linear logistic M-quantile moel efine by a Huber influence function with tuning constant c. In the first, referre to as M-quantile below, c = 1.345, while the secon, referre to as Expectile below, c = 100. These estimators were compare with the EBP (3) uner a GLMM with logistic link function an with the irect estimator (the sample proportion). Both MSE estimation an confience interval coverage performance were evaluate using the analytic an two bootstrap methos escribe in Subsection 4 3 for the M-quantile preictor. Note that the logistic M-quantile linear regression fit unerpinning the M-quantile an Expectile preictors was obtaine using an extene version of a M-quantile linear regression moel function for SAE written in R. The parameters of the GLMM use in the EBP were estimate using the function glmer in R.

15 5 1. Moel-base simulations In each simulation we generate N = 5, 000 population values of X an Y in D = 50 small areas with N = 100, = 1,..., D. Iniviual x j values were rawn inepenently at each simulation as Uniform(a, b ), for a = 1 an b = /4, = 1,..., D, j = 1,..., N. Values of y j were then generate as Bernoulli(p j ) with p j = exp{η j }(1 + exp{η j }) 1 an η j = x j β + u. The small area effects u were inepenently rawn from a normal istribution with mean 0 an variance ϕ = 0.25, an β = 1 (González-Manteiga et al., 2007). Population values generate uner this scenario are enote by (0) below. In aition, we generate ata corresponing to a combine misclassification error an measurement error scenario, enote (M) below. In this scenario, a ranom 1% sample of the x j values were replace by 20 (introucing measurement error) an the corresponing y j values were set to 0 (introucing misclassification error). For each of these scenarios T = 1, 000 Monte-Carlo populations were generate. For each generate population an for each area we then took simple ranom samples without replacement of sizes n = 10 an n = 20 so that the overall sample sizes were n = 500 an n = 1, 000. For each sample the M-quantile an Expectile preictors, the EBP an the irect estimator were use to estimate the small area proportions ȳ, = 1,..., D. The performances of ifferent small area estimators for area were evaluate with respect to two criteria: their average error T 1 T t=1 (ˆȳ t ȳ t ) an the square root of their average square error T 1 T t=1 (ˆȳ t ȳ t ) 2. These are enote Bias an RMSE respectively below. Here ȳ t enotes the actual area value at simulation t, with preicte value ˆȳ t. The meian values of Bias an RMSE over the D small areas are set out in Table 1, where we see that claims in the literature (Chambers & Tzaviis, 2006) about the superior outlier robustness of the M-quantile preictor compare with the EBP an the Expectile preictor certainly hol true in these simulations. In particular, uner the (0) scenario the EBP performs better than the M- quantile an Expectile preictors in terms of Bias, whereas the M-quantile preictor is the best uner the (M) scenario. In terms of RMSE, there is nothing to choose between EBP, M-quantile an Expectile uner the (0) scenario, while uner the (M) scenario the M-quantile preictor is clearly superior. Table 1. Moel-base simulation results: Preictors of small area proportions. n = 10 n = 20 Preictor/Scenario (0) (M) (0) (M) Meian values of Bias EBP M-quantile Expectile Direct Meian values of RMSE EBP M-quantile Expectile Direct In orer to evaluate the performances of the MSE estimators propose in Subsection 4 3 we use the ata generate for the scenario with D = 50 an n = 10 an also carrie out a further moel-base simulation stuy with the same sample sizes within the small areas but

16 14 with D = 100 an N = 100. Again, T = 1, 000 Monte-Carlo populations were generate, with iniviual x j values rawn inepenently as Uniform(a, b ), with a = 1 an b = /8, = 1,..., D, j = 1,..., N. For each generate population a simple ranom sample without replacement of size n = 10 was rawn from each area, the M-quantile preictor calculate as well as its linearisation MSE estimator (17) an the two bootstrap MSE estimators NPB an REBB, both base on 100 bootstrap iterations. The behaviour of these MSE estimators for each scenario is isplaye in Table 2 where we show the meians of their area specific Bias an RMSE values, expresse in relative terms (%). We also show the meian empirical coverage rates for nominal 95 per cent confience intervals base on these methos. In the case of (17) these intervals were efine by the small area estimate plus or minus twice the value of the square root of (17). For NPB an REBB these intervals were base on the 2.5 an the 97.5 percentiles of the relevant bootstrap istribution. Examination of Table 2 shows that all three MSE estimation methos ten to be biase low, but all generate nominal 95 per cent confience intervals with acceptable coverage. Overall, the REBB bootstrap estimator seems preferable because it shows smaller or similar bias an more stability than both the linearisation-base estimator (17) an the NPB bootstrap estimator. Table 2. Moel-base simulation results: MSE estimators. Meian values Meian values Meian values of Relative Bias (%) Relative RMSE (%) of Coverage Rate (%) D = 50 Estimator/Scenario (0) (M) (0) (M) (0) (M) mse A (ˆȳ MQ ) mse NP B (ˆȳ MQ ) mse REBB (ˆȳ MQ ) D = 100 mse A (ˆȳ MQ ) mse NP B (ˆȳ MQ ) mse REBB (ˆȳ MQ ) Design-base simulation The population unerpinning the esign-base simulation is base on ata collecte from a sample of 1, 560 househols sprea across 64 municipalities (out of 287) in Tuscany They were collecte by ISTAT as part of the European Union Statistics on Income an Living Conitions (EU-SILC) 2005 survey. These survey ata provie information on a variety of issues relate to living conitions of the people in Tuscany, incluing etails on income an non-income imensions of poverty in the region, an form the basis of poverty assessment in this region. Here we efine the sample municipalities as our small areas of interest. Since nine municipalities ha sample sizes less than three, these were combine with ajacent municipalities, leaing to a total of 55 small areas. We use these sample househols to generate a population of N = 74, 951 househols by sampling with replacement from the original sample of 1, 560 househols with probabilities proportional to their sample weights. Given this (fixe) population, we then inepenently rew 1, 000 stratifie ranom samples from it, with the 55 (reefine) municipalities serving as strata, an with the sample size for each municipality fixe to be the same as in the original sample. Note that these sample sizes varie from 4 to 116. The binary variable of interest Y was efine

17 15 Pearson Resiuals Sample Quantiles Municipality Theoretical Quantiles Fig. 2. Moel fit iagnostics for a logistic mixe moel fit to the EU-SILC ata: This shows the istribution of Pearson resiuals by municipality (left) an a normal probability plot of the estimate municipality effects (right). as taking the value 1 if the equivalise income of a househol was below the meian income for the simulate population an 0 otherwise. The aim was to compare the repeate sampling performance of ifferent preictors of the proportion of househols below meian equivalise income at municipality level, using househol ownership status (a strong inicator of poverty), the age of the hea of the househol, the employment status of the hea of the househol, the gener of the hea of the househol, the years of eucation of the hea of the househol an the househol size as covariates. Figure 2 shows selecte iagnostics for a logistic mixe moel fit to Y base on the original EU-SILC ata an using these covariates. The istribution of Pearson resiuals inicates the presence of potential influential observations in the ata, with a number of large resiuals ( r j > 2). Further evience for the presence of these influential observations is obtaine when we fit the moel using a robust metho (Cantoni & Ronchetti, 2001) an note that although most observations receive a weight of 1 in this fit, there are 19 (about 1% of the overall sample) that receive a weight of less than 0.7. The normal probability plot shown in Figure 2 also inicates that the Gaussian assumption for the istribution of the ranom effects in this GLMM is oubtful. Using a moel that relaxes this assumption, such as an M-quantile moel with a boune influence function, therefore seems reasonable for these ata. Table 3 shows the five point summaries of the municipality level istributions of Bias an RMSE generate uner repeate sampling for the same four estimators that were the focus of the moel-base simulations reporte in the previous Subsection. These show that the M-quantile an Expectile preictors have similar performance an both generally perform better than the EBP an the Direct estimator in terms of RMSE. In particular, if one focuses on meian performance, then the M-quantile preictor seems the best of the four. Table 4 shows the meian values of Bias an RMSE, expresse in relative terms, of the three MSE estimators for the M-quantile preictor. In this case the NPB metho performs best, with the REBB an the linearisation-base estimator (17) isplaying a somewhat larger positive Bias

18 16 Table 3. Design-base simulation results: Distributions of Bias an RMSE for preictors of the proportion of househols with below meian income. Preictor Min Q1 Meian Mean Q3 Max Bias EBP M-quantile Expectile Direct RMSE EBP M-quantile Expectile Direct an slightly more instability. Again, we see that meian coverage performance of all three MSE estimation methos appears acceptable. Table 4. Design-base simulation results: Performances of MSE estimators. Estimator Meian Meian Meian Relative Bias (%) Relative RMSE (%) Coverage Rate (%) mse A (ˆȳ MQ ) mse NP B (ˆȳ MQ ) mse REBB (ˆȳ MQ ) APPLICATIONS 6 1. Estimates of ILO unemployment for UALADs of Great Britain In this Subsection we apply the M-quantile moelling approach to estimating the number of unemploye people age 16 an over in each of 406 Unitary Authorities an Local Authority Districts (UALADs) of Great Britain. We use the ILO unemployment efinition an ata from a sample of about 169, 000 people age 16 an over who participate in the UK Labour Force Survey (LFS) carrie out by the Office for National Statistics (ONS) in 2000, with M-quantile moel covariates specifie by prior stuies of small area estimation of UK labour force characteristics (Molina et al., 2007). In particular, we use the following covariates: sex-age category of an iniviual (6 categories corresponing to Male/Female an age groups 16 25, an > 40), government office region of the UALAD (12 categories), ONS socio-economic classification of the UALAD (7 categories) an total of registere unemploye in the sex-age group for the UALAD. In orer to compare M-quantile estimates with EBP estimates, the same covariates were also use to fit a logistic mixe moel to the LFS ata, an corresponing EBP-type estimates were compute. All estimates of proportions were scale up to estimates of counts by multiplying by population counts for each sex-age group in a UALAD an then summing over these groups within a UALAD. Figure 3 shows the normal probability plot of estimate UALAD ranom effects obtaine by fitting a logistic mixe moel to the sample ata. A Shapiro-Wilk normality test rejects the null hypothesis of normality for these estimate ranom effects (p-value = ). Apply-

19 ing a robust fitting metho (Cantoni & Ronchetti, 2001) to this ata set, we note that although most observations receive a weight of 1, about 3.5% receive weights less than 0.7. Using a less parametric moel, e.g. an M-quantile moel with a boune influence function, therefore seems reasonable for these ata. 17 Sample Quantiles Theoretical Quantiles Fig. 3. Normal probability plot of estimate UALAD ranom effects for proportion of unemploye population age 16 an over, base on a logistic mixe moel fit to UK LFS ata. In orer to assess the resulting M-quantile estimates we note that moel-base small area estimates shoul be both consistent with corresponing unbiase irect estimates an more precise than them. Figure 4 plots the M-quantile estimates of total numbers of unemploye against corresponing irect estimates for each UALAD. We can see that the M-quantile estimates appear to be generally consistent with the irect estimates, although the relationship between these two sets of estimates iverges in the larger UALADs, where the M-quantile estimates ten to be larger. This is consistent with the right skewe istribution of estimate area effects shown in Figure 3. Overall, however, the correlation of 0.78 between the M-quantile estimates an the irect estimates is reasonably high. In orer to assess the gain in precision from using moel-base estimates instea of the irect estimates, we can look at the istribution of the ratios of the estimate coefficients of variation (CVs) of the irect an the moel-base estimates. A value greater than 1 for this ratio inicates that the estimate CV of the moel-base estimate is less than that of the irect estimate. Figure 5 shows the relationship between these ratios an the number of unemploye people in the LFS sample in each UALAD. Two sets of ratios are plotte - those corresponing to EBP estimates (re) an those corresponing to M-quantile estimates (blue). Note that the CV for the M-quantile estimate is calculate as [mse REBB (ˆȳ MQ )] 1/2 /ˆȳ MQ, while that for the EBP esti-

Poisson M-quantile regression for small area estimation

University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2013 Poisson M-quantile regression for small area