Adaptively Transformed Mixed Model Prediction of General Finite Population Parameters

Size: px

Start display at page:

Download "Adaptively Transformed Mixed Model Prediction of General Finite Population Parameters"

Marvin McDowell
5 years ago
Views:

1 Adaptvely Transformed Mxed Model Predcton of General Fnte Populaton Parameters By Shonosuke Sugasawa and Tatsuya Kubokawa Aprl 2018 CENTER FOR RESEARCH AND EDUCATION FOR POLICY EVALUATION DISCUSSION PAPER NO. 19 CENTER FOR RESEARCH AND EDUCATION FOR POLICY EVALUATION (CREPE) THE UNIVERSITY OF TOKYO

2 Adaptvely Transformed Mxed Model Predcton of General Fnte Populaton Parameters SHONOSUKE SUGASAWA Rsk Analyss Research Center, The Insttute of Statstcal Mathematcs TATSUYA KUBOKAWA Faculty of Economcs, Unversty of Tokyo Abstract. For estmatng area-specfc parameters (quanttes) n a fnte populaton, a mxed model predcton approach s attractve. However, ths approach strongly depends on the normalty assumpton of the response values although we often encounter a non-normal case n practce. In such a case, transformng observatons to make them close to normalty s a useful tool, but the problem of selectng sutable transformaton stll remans open. To overcome the dffculty, we here propose a new emprcal best predctng method by usng a parametrc famly of transformatons to estmate a sutable transformaton based on the data. We suggest a smple estmatng method for transformaton parameters based on the profle lkelhood functon, whch acheves consstency under some condtons on transformaton functons. For measurng varablty of pont predcton, we construct an emprcal Bayes confdence nterval of the populaton parameter of nterest. Through smulaton studes, we nvestgate some numercal performances of the proposed methods. Fnally, we apply the proposed method to synthetc ncome data n Spansh provnces n whch the resultng estmates ndcate that the commonly used log-transformaton s not approprate. Key words: Confdence nterval; Emprcal Bayes; Fnte populaton; Mean squared error; Random effect; Small area estmaton. Ths verson: May 11,

3 1 Introducton The mxed model predcton based on random effect models has been wdely used n small area estmaton (Rao and Molna, 2015). The random effect models used n small area estmaton are manly dvded nto two models: the Fay-Herrot model (Fay and Herrot, 1979) and the nested error regresson model (Battese et al., 1988). Especally, the nested error regresson model has been used for estmatng populaton parameters n a fnte populaton. Here we consder a fnte populaton consstng of m areas and each area has N unts for = 1,..., m. Let Y j be a characterstc of the jth unt n the th area, the man purpose s to estmate the area-specfc parameter defned as µ = 1 N T (Y j ), (1) N j=1 where T ( ) s a known (user-specfed) functon. The smplest choce s T (x) = x, n whch µ corresponds to the fnte populaton mean, and many lteratures have been focused on ths case; Chambers et al. (2014), Jang and Lahr (2006), Lahr and Mukherjee (2007) and Schmt et al. (2016). On the other hand, as noted by Molna and Rao (2010), other forms of T ( ) are used n practce. For example, n poverty mappng, we often use FGT poverty measure T (x) = {(z x)/z} α I(x < z) (Foster et al., 1984), notng that µ corresponds to the poverty rate when α = 0. If all the unts Y j n the th area were observed, we could calculate the true value of µ. However, only a part of the unts are avalable n practce. Let n (< N ) be the number of sampled unts and y s = {y j, j = 1,..., n, = 1,..., m} be the sampled data. It s known that the drect estmator of µ usng the observed unts has hgh varablty, especally n the case that n s much smaller than N. In real applcaton, some covarates assocated wth Y j are avalable not only for sampled but also for nonsampled unts, whch are denoted by x j wth j = 1,..., N and = 1,..., m. Hence, one ams to estmate µ based on the sampled data and nformaton on covarates. To ths end, a typcal strategy s to assume that all the unts follow the nested error 2

4 regresson model: Y j = x t jβ + v + ε j, j = 1,..., N, = 1,..., m, (2) where x j and β are p-dmensonal vectors of covarates and regresson coeffcents, v s the area-specfc effect whch follows N(0, τ 2 ) and ε j s a samplng error dstrbuted as N(0, σ 2 ). Then, the condtonal dstrbuton of the non-sampled data Y j gven all the sampled data y s s gven by Y j y s N (x tjβ + n τ 2 σ 2 + n τ 2 (ȳ x t β), σ 2 τ 2 ) σ 2 + n τ 2, j = n + 1,..., N, (3) whch follows from the normalty of Y j under the model (2). Then the best predctor of µ under squared error loss s the condtonal expectaton E[µ y ], whch has the form µ E[µ y s ] = 1 { n T (y j ) + N j=1 N j=n +1 } E[T (Y j ) y s ]. (4) Here, the expectaton E[T (Y j ) y s ] can be computed va the Monte Carlo ntegraton by generatng a large number of random samples from the condtonal dstrbuton (3). Moreover, the best predctor µ depends on the unknown model parameters β, τ 2 and σ 2 n the model (2), so that these parameters should be replaced wth ther estmated counterparts. To ths end, one can estmate the model parameters n the model (2) based on the sampled data y s based on, for example, the maxmum lkelhood or restrcted maxmum lkelhood methods. It s observed that the key assumpton n dervng the best predctor (4) s the normalty of Y j n (2), whch enables us to obtan the smple expresson of the condtonal dstrbuton (3). However, we often encounter the case where the normalty assumpton s not plausble for Y j. In fact, n poverty mappng, Y j s a welfare varable lke ncome, thereby the dstrbuton of Y j could be skewed. In ths case, Molna and Rao (2010) proposed assumng the nested error model (2) for the transformed varables H(Y j ) nstead of Y j. If Y j s rght skewed, one may use H(x) = log x. However, we stll suffer from the msspecfcaton of the transformaton and the predctor of µ 3

5 under msspecfed transformaton would be based, thereby selectng sutable transformatons from the data s desred to valdate the predcton method. To overcome the dffculty, we propose the adaptvely transformed emprcal best predctng method n whch we use a parametrc famly of transformatons for data transformaton nstead of the use of specfed transformatons. We derve a form of the best predctor of µ and provde a smple estmatng method for transformaton parameters based on profle lkelhood functon, whch produces a consstent estmator under some regularty condtons. We also construct an emprcal Bayes confdence nterval of µ for measurng the varablty of the pont predcton. The proposed ntervals are shown to have O(m 1 ) coverage error, and we also suggest the parametrc bootstrap calbraton for confdence ntervals wth further accuracy. As related methods, L and Lahr (2007) suggested to use the Box-Cox transformaton (Box and Cox, 1964) for the data transformaton for robust estmaton of fnte populaton totals whle ther method was developed under models wthout random effects. Hence, our method would be more effcent. Concernng the emprcal Bayes confdence ntervals, Nandram (1999) derved an emprcal Bayes confdence ntervals of the fnte populaton means, whch corresponds to the case takng T (x) = x n (1). Ths paper s organzed as follows: In Secton 2, we descrbe the proposed predcton method as well as parameter estmaton of the model parameters. In Secton 3, we construct an emprcal Bayes confdence nterval of µ. In Secton 4, we present the results from smulaton studes and a data applcaton. In Secton 5, we gve conclusons and some dscussons. The techncal proofs are gven n Appendx. 2 Adaptvely Transformed Emprcal Best Predcton 2.1 Transformed best predctor Let H λ ( ) be a famly of transformatons wth parameter λ. The transformaton parameter λ mght be multdmensonal, but we treat λ as a scalar parameter for notatonal smplcty. The assumptons and specfc choces of H λ ( ) wll be dscussed n the subsequent secton. We assume that the transformed varable H λ (y j ) follows the nested 4

6 error regresson model: H λ (Y j ) = x t jβ + v + ε j, j = 1,..., N, = 1,..., m, (5) where x j and β are p-dmensonal vectors of covarates and regresson coeffcents, v and ε j are an area-specfc effect and a samplng error, respectvely. Here we assume that v and ε j are mutually ndependent and dstrbuted as v N(0, τ 2 ) and ε j N(0, σ 2 ) wth unknown two varance parameters τ 2 and σ 2. It s worth notng that, owng to the area effect v, the unts n the same area are mutually correlated whle the unts n the dfferent area are ndependent. Specfcally, from (5), t holds Cor(H λ (Y j ), H λ (Y k )) = (τ 2 + σ 2 ) 1 τ 2, j k, thereby the unts n the same area are mutually correlated and the degree of correlaton s determned by the rato τ 2 /σ 2. From the normalty assumptons of v and ε j, t follows that H λ (Y j ) N(x t j β, τ 2 + σ 2 ). Thus, the transformaton parameter λ can be chosen to make the transformed data H λ (y j ) close to normalty. We defne ϕ = (β t, τ 2, σ 2, λ) t, as the vector of unknown model parameters n (5). The estmaton procedure wll be gven n the subsequent secton. Let y s = {y j, j = 1,..., n, = 1,..., m} be the sampled data. From the model (5), we have H λ (Y j ) y s N(θ j, s 2 + σ2 ), j = n + 1,..., N, where θ j = x t jβ + τ 2 σ 2 + n τ 2 n j=1 (H λ (y j ) x t σ jβ), s = 2 τ 2 σ 2 + n τ 2. (6) Hence, the best predctor of µ gven n (1) can be obtaned as µ (y s ; ϕ) E[µ y s ] = 1 n T (y N j ) + j=1 N j=n +1 E[T H 1 λ (u j)], (7) where the expectaton s taken wth respect to u j N(θ j, s 2 + σ2 ), and T H 1 λ ( ) s the composte functon of T ( ) and H 1 λ, the nverse functon of H λ( ). Although the expectaton E[T H 1 λ (u j)] does not have a closed form n general, t can be easly computed va the Monte Carlo ntegraton. We call the best predctor (7) adaptvely 5

7 transformed best predctor (ATBP). 2.2 Estmaton of structural parameters We here consder estmatng the unknown model parameters ϕ n (5) based on the margnal lkelhood functon. Notng that the log-margnal lkelhood functon of ϕ s gven by L(ϕ) = 1 2 m log Σ 1 2 =1 1 2 m n log 2π + =1 m =1 {H λ (y ) X β} t Σ 1 {H λ (y ) X β} m n log H λ (y j), =1 j=1 (8) where (Σ ) kl = τ 2 +σ 2 I(k = l), H λ (y ) = (H λ (y 1 ),..., H λ (y n )) t, X = (x t 1,..., xt n ) t, and H λ ( ) denotes the dervatve of H λ( ). The maxmum lkelhood estmator of ϕ can be defned as the maxmzer of L(ϕ). For maxmzng the lkelhood functon L(ϕ), we frst note that the profle lkelhood functon of λ can be expressed as m n PL(λ) = ML(λ) + log H λ (y j), (9) =1 j=1 where ML(λ) s the maxmum lkelhood of the nested error regresson model wth response values H λ (y j ) and covarate vectors x j, whch can be effcently carred out by usng well-developed numercal method (e.g. Molna and Marhuenda, 2015). Usng the ease of the pont evaluaton of the profle lkelhood PL(λ), we can obtan the maxmzer of PL(λ) by usng, for example, the golden secton method (Brent et al., 1973). Once we obtan the estmator λ, we get the estmators of other parameters by applyng the nested error regresson model to the data set {H λ(y j ), x j }. For estmatng the two varance parameters τ 2 and σ 2, the restrcted maxmum lkelhood (RML) method (Jang, 1996) mght be more attractve than the maxmum lkelhood method. To mplement the RML estmaton, the frst three terms n (8) need to be changed to the restrcted maxmum lkelhood, but the transformaton parameter λ can be easly estmated n the same manner as the maxmum lkelhood 6

8 method based on the profle lkelhood functon. However, n ths paper, we consder only the maxmum lkelhood estmator for smplcty. 2.3 Class of transformatons We here consder the concrete choce of the famly of transformatons H λ ( ). To begn wth, we gve some condtons to be satsfed by the transformatons. Assumpton 1. (Class of transformatons) 1. H λ s a dfferentable and monotone functon, and the range of H λ s R for all λ. 2. For fxed x, H λ (x) as the functon of λ s dfferentable. 3. The functon H λ (w)/ λ, 2 H λ (w)/ λ 2 and 2 log H λ (w)/ λ2 wth w = H 1 λ (x) are bounded from the upper by C 1{exp(C 2 x) + exp( C 2 x)} wth some constants C 1, C 2 > 0. The frst condton s crucal n ths context. If the range of H λ s not R, but some subset A R, the nverse functon H 1 λ problems n computng the best predctor (7). cannot be defned on R \ A, whch causes When the observatons are postve valued, the Box-Cox (BC) transformaton (Box and Cox, 1964), H λ (x) = λ 1 (x λ 1) for λ 0 and H 0 (x) = log(x), s wdely used. However, t s known that the range of BC transformaton s truncated and not whole real lne, so that the BC transformaton cannot be used n ths context. An alternatve transformaton, called dual power (DP) transformaton, has been suggested by Yang (2006): H DP λ (x) = xλ x λ, x > 0, λ > 0, (10) 2λ where lm λ 0 H DP (x) = log x. It can be seen as the mean of two BC transformatons, λ and t s easy to confrm that the range of DPT s R, so that DPT can be used as a parametrc famly ncludng log-transformaton n ths context. The expresson of the nverse functon s requred n computng the transformed best predctor (7), and the 7

9 Jacoban s also needed for computng the profle lkelhood functon (9). These are gven by ( H DP( 1) λ (x) = λx + ) 1/λ 1 + λ 2 x 2 and dhλ DP(x) = 1 dx 2 (xλ 1 + x λ 1 ). In the context of small area estmaton, the DP transformaton was used n Sugasawa and Kubokawa (2017) n the Fay-Herrot model. The orgnal DP transformaton (10) can be used when the response varables are postve. When response varables are real valued, one may use the shfted-dp transformaton of the form H λ,c (x) = {(x + c) λ (x + c) λ }/2λ, where c (mn(y j ) + ε, ) wth specfed small ε > 0. Another attractve transformaton s the snh-arcsnh (SS) transformaton suggested n Jones and Pewsey (2009) n the context of dstrbuton theory, whch has the form H SS a,b (x) = snh(b snh 1 (x) a), x (, ), a (, ), b (0, ) (11) where snh(x) = (e x e x )/2 s the hyperbolc sne functon, snh 1 (x) = log(x + x 2 + 1), and two transformaton parameter a and b control skewness and tal heavness, respectvely. The nverse transformaton and the Jacoban are obtaned as H SS( 1) a,b (x) = snh(b 1 snh 1 (x) + a), and dha,b SS(x) 1 + Ha,b SS = b (x)2 dx 1 + x 2. These transformatons wll be used and compared n the applcaton presented n Secton Large sample propertes We here consder the large sample propertes of the estmator of structural parameters. To ths end, we assume the followng condton: Assumpton 2. (Assumptons under large m) 1. The true parameter vector ϕ 0 s an nteror pont of the parameter space Φ. 8

10 2. 0 < mn =1,...,m N max =1,...,m N <. 3. The elements of X are unformly bounded and X t X s postve defnte. 4. m 1 m =1 Xt Σ 1 X converges to a postve defnte matrx as m. Snce the asymptotc varance and covarance matrx of MLE can be derved from the Fsher nformaton matrx, we frst provde the Fsher nformaton matrx n the followng Theorem, where the proof s gven n Appendx. Theorem 1. We defne the Fsher nformaton I ϕk ϕ j = E[ 2 L(ϕ)/ ϕ k ϕ j ], then t follows that I τ 2 τ 2 = 1 2 I ββ = I λβ = I λλ = m =1 m =1 (1 t n Σ 1 1 n ) 2, I τ 2 σ 2 = 1 2 m =1 X t Σ 1 X, I βτ 2 = I βσ 2 = 0, I λσ 2 = m =1 m [ E =1 [ ] X t Σ 1 E H (1) λ (y ), I λτ 2 = H (1) λ ] (y ) t Σ 1 H (1) λ (y ) + m =1 1 t n Σ 2 1 n, I σ 2 σ 2 = 1 2 =1 =1 m =1 tr (Σ 2 ), m [ ] E z t Σ 2 H (1) λ (y ), m [ E z t Σ 1 1 n 1 t n Σ 1 [ ] E z t Σ 1 H (2) λ (y ) m n =1 j=1 H (1) ] λ (y ), [ ] 2 E λ 2 log H λ (y j), where H (k) λ (y ) = k H λ (y )/ λ k for k = 1, 2, z = H λ (y ) X β, and E[ ] denotes the expectaton wth respect to y j s followng the model (5). Then, under Assumptons 1 and 2, the maxmum lkelhood estmator ϕ s asymptotcally dstrbuted as ϕ N(ϕ, I 1 ϕ ). From Theorem 1, t s observed that the nformaton matrx of (β t, τ 2, σ 2 ) does not depend on the transformaton parameter λ, and ther expressons are the same as those of the tradtonal nested error regresson models. Whle the two varance parameters τ 2 and σ 2 are orthogonal to β n the sense that I βτ 2 = I βσ 2 = 0, the transformaton parameter λ s not orthogonal to the others. The expectatons appeared n the Fsher matrx s not analytcally tractable, but t can be easly estmated by replacng the expectaton wth ts sample counterpart. In the case that λ s multdmensonal, the extenson of Theorem 1 s straghtforward. The expressons of H (k) λ (y ) and 2 log H λ (y j)/ λ 2 could be analytcally complcated and requre tedous algebrac 9

11 calculatons. In such a case, the numercal dervatve can be useful snce we need to compute only the pont values of the dervatves. 3 Emprcal Bayes Confdence Intervals 3.1 Asymptotcally vald confdence ntervals Measurng the varablty of the transformed emprcal best predctor µ s an mportant ssue n practce. Tradtonally, the mean squared error (MSE) of µ has been used, and several methods rangng from analytcal method (Prasad and Rao, 1990) to numercal methods (Hall and Mat, 2006) have been consdered. On the other hand, an emprcal Bayes confdence nterval of µ s more preferable snce t can provde dstrbutonal nformaton than MSE though constructon of the confdence nterval s generally dffcult. Here, we derve an asymptotcally vald emprcal Bayes confdence nterval of µ. The key to the confdence nterval s the condtonal dstrbuton of µ gven y s. Notng that Cov(H λ (Y j ), H λ (Y k ) y s ) = Var(v y s ) = s 2 for j k, t follows that (H λ (Y,n +1),..., H λ (Y N )) t y s N((θ,n +1,..., θ N ) t, s 2 1 N n 1 t N n + σ 2 I N n ), namely, the each component has the expresson H λ (Y j ) y = θ j + s z + σw j, j = n + 1,..., N, where z and w j are mutually ndependent standard normal random varables, and θ j and s are defned n (6). Then the posteror dstrbuton of µ can be expressed as µ y d = 1 N n T (y j ) + j=1 N j=n +1 T H 1 λ (θ j + s z + σw j ), (12) whch s a complex functon of standard normal random varables z and w j. However, random samples from the condtonal dstrbuton (12) can be easly smulated. We defne Q a (y, ϕ) as the lower 100a% quantle pont of the posteror dstr- 10

12 buton of µ wth the true ϕ, whch satsfes P(µ Q a (y, ϕ) y ) = a. Hence, the Bayes confdence nterval of µ wth nomnal level 1 α s obtaned as I α = (Q α/2 (y, ϕ), Q 1 α/2 (y, ϕ)), whch holds that P(µ I α ) = 1 α. However, the nterval I α depends on the unknown parameter ϕ, so that the feasble verson of I α s obtaned by replacng ϕ wth ts estmator ϕ, namely I N α = (Q α/2 (y s, ϕ), Q 1 α/2 (y s, ϕ)), (13) whch we call nave emprcal Bayes confdence nterval of µ. The two quantles appeared n (13) can be computed by generatng a large number of random samples from the condtonal dstrbuton (12). Owng to the asymptotc propertes of ϕ, the coverage probablty of the nave nterval (13) converges to the nomnal level as the number of areas m tends to nfnty as shown n the followng theorem proved n Appendx. Theorem 2. Under Assumptons 1 and 2, t holds P(µ I N α ) = 1 α + O(m 1 ). 3.2 Bootstrap calbrated ntervals As shown n Theorem 2, the coverage error of the nave nterval (13) s of order m 1, whch s not necessarly neglgble when m s not suffcently large. Snce the number of m s usually moderate n practce, the calbrated ntervals wth hgher accuracy would be valuable. Followng Chatterjee, et al. (2008), Hall and Mat (2006), we construct a second order corrected emprcal Bayes confdence nterval I C α satsfyng P (µ I C α ) = 1 α + o(m 1 ). To begn wth, we defne the bootstrap estmator of the coverage probablty of the nave nterval. Let Y j estmated model (5) wth ϕ = ϕ, and y s Moreover, let µ be the parametrc bootstrap samples generated from the be the bootstrap verson of µ based on Y j s. = {Y j, j = 1,..., n, = 1,..., m}. Snce the coverage probablty s P(Q a/2 (y s, ϕ) µ Q 1 a/2 (y s, ϕ)), ts parametrc bootstrap estmator can be defned as CP(a) = E [ I { Q a/2 (ys, ϕ) µ Q 1 a/2 (ys, ϕ) }], 11

13 where the expectaton s taken wth respect to the bootstrap samples Yj s. Based on the coverage probablty, we defne the calbrated nomnal level a as the soluton of the equaton CP(a ) = 1 α, whch can be solved by the bsectonal method (Brent, 1973). Then, the calbrated nterval s gven by I C α = (Q a /2(y s, ϕ), Q 1 a /2(y s, ϕ)), (14) whch has second order accuracy as shown n the followng theorem proved n Appendx. Theorem 3. Under Assumptons 1 and 2, t holds P(µ I C α ) = 1 α + o(m 1 ). 4 Numercal Studes 4.1 Evaluaton of predcton errors We frst evaluate the predcton errors of the proposed predctors together wth some exstng methods. To ths end, we consdered the followng data generatng processes: (A) (2λ) 1 (Y λ j Y λ j ) = β 0 + β 1 X j + v + ε j, v N(0, τ 2 ), ε j N(0, σ 2 ) (B) (2λ) 1 (Y λ j Y λ j ) = β 0 + β 1 X j + v + ε j, v t 5 (0, τ 2 ), ε j t 5 (0, σ 2 ) (C) Y j = exp(β 0 + β 1 X j )v ε j, v Γ(1/τ 2, 1/τ 2 ), ε j Γ(1/σ 2, 1/σ 2 ) (D) Y j = 0.2 exp(u j ) + 0.8U 2 j, U j = β 0 + β 1 X j + v + ε j, v N(0, τ 2 ), ε j N(0, σ 2 ), where = 1,..., m, j = 1,..., N, β 0 = 1, β 1 = 3, τ = 0.3, σ = 0.7, and X j were ntally generated from U(1, 2) and fxed through smulaton experments. In model () and (), we consdered three values for λ, λ = 0, 0.2 and 0.4. In ths study, we set N = 200 and m = 25, and we focus on estmatng the rato of the observaton wth values under z, namely µ = 1 N N I(Y j < z), = 1,..., m, (15) j=1 where z s defned as 0.6 tmes medan of Y j s. 12

14 Concernng the area sample szes, we dvded m = 25 areas nto fve groups wth equal number of areas, and we set the same number of n wthn the same groups. The group pattern of n we consdered was (20, 40, 60, 80, 100). Among the generated Y 1,..., Y N, we used frst n observatons y 1 (= Y 1 ),..., y n (= Y n ) as the sampled data. Then, based on the sampled data y j s and covarates X j s, we computed the predcted value of µ based on the four methods: the proposed flexble transformed emprcal best predcton (ATP) method wth DP transformaton (10), the transformed emprcal best predcton (TP) method proposed by Molna and Rao (2010) wth logtransformaton, the emprcal best predcton (EBP) method by drectly applyng the nested error regresson model to the non-transfdemd observaton y j, and the drect estmator (DE) gven by µ D = 1 n I(y j < z), = 1,..., m. n j=1 It should be noted that the TP method s correctly specfed n scenaro (A) wth λ = 0 whle the ATP method s overfttng n ths case. In the other cases n scenaro (A), the ATP method uses the same model as the data generatng model. Scenaro (B) s smlar to (A), but the dstrbuton of error terms have the t-dstrbuton. In scenaro (C) and (D), the data generaton models do not concdes wth any methods. To compare the performances of the four methods, we computed the square root of mean squared error (RMSE) defned as RMSE = 1 R R r=1 ( µ (r) ) 2, µ (r) where R = 2000 n ths study, µ (r) and µ (r) are the estmated and true values of µ, respectvely, n the rth teraton. The obtaned values of RMSEs are averaged over the same groups and the results are reported n Table 1. From Table 1, we can observe that the proposed method provdes better estmates than three exstng methods n almost all cases. As mentoned n the above, ATP method s overfttng n scenaro (A) wth λ = 0 whle TP method s correctly specfed. 13

15 However, the results show that the performances between ATP and TP are almost the same, whch mght ndcate that the MSE nflaton due to overfllng s not serous. The smlar observaton can be done n scenaro (B) wth λ = 0. On the other hand, n the other cases, the proposed ATP method can mprove the estmaton accuracy of TP method as well as EBP and DE methods, by adaptvely estmatng the transformaton parameter from the data. 4.2 Fnte sample evaluaton of emprcal Bayes confdence ntervals We next evaluate the fnte sample performances of the emprcal Bayes confdence ntervals gven n Secton 3. To ths end, we consdered the followng data generatng process for populaton varables Y j : (2λ) 1 (Y λ j Y λ j ) = β 0 + β 1 X j + v + ε j, v N(0, τ 2 ), ε j N(0, σ 2 ), where j = 1,..., N and = 1,..., m wth n = 200. We set the true parameter values λ = 0.3, β 0 = 1, β 1 = 3, τ = 0.3, σ = 0.7, and X j were ntally generated from the unform dstrbuton on (1, 2), whch were fxed through smulaton runs. We focused on the same populaton parameter gven n (15). Among the generated Y 1,..., Y N, the frst n = 50 observatons Y 1,..., Y n were used as the sampled data y 1,..., y n. Then, based on y j s and X j s, we computed two types pf confdence ntervals for µ, nave confdence nterval (13) and bootstrap calbrated confdence nterval (14), whch are denoted by NCI and BCI, respectvely. To evaluate the performances of two confdence ntervals, based on R = 2000 smulaton runs, we computed the emprcal coverage probablty (CP) and the average length of confdence nterval (AL), whch are defned as where µ (r) CP = 1 R R r=1 I(µ (r) s the true value and CI (r) CI (r) ) and AL = 1 R R r=1 CI (r), s NCI or BCI n the rth teraton. In Fgure, we show the obtaned CP and AL n each area for two cases m = 20 and m =

16 Table 1: The group-wse averaged values of smulated square root of mean squared errors (RMSE) for four methods, proposed adaptvely transformed predcton (ATP) method, Molna and Rao s transformed predcton (TP) method, emprcal best predcton (EBP) method wthout any data transformatons, and drect estmator (DE) for eght scenaros. All the values n the table are multpled by 100. Area sample sze n Scenaro Method ATP (A) λ = 0 TP EBP DE ATP (A) λ = 0.2 TP EBP DE ATP (A) λ = 0.4 TP EBP DE ATP (B) λ = 0 TP EBP DE ATP (B) λ = 0.2 TP EBP DE ATP (B) λ = 0.4 TP EBP DE ATP (C) TP EBP DE ATP (D) TP EBP DE Concernng CP, the nave method tends to produce shorter confdence ntervals, so that the coverage probablty s smaller than the nomnal level for all areas, whch s more serous n case m = 20 than m = 30. Ths comes from the accuracy of NCI presented n Theorem 2, whch mentons that the coverage accuracy of NCI s O(m 1 ). On the 15

17 other hand, bootstrap method can mprove the drawbacks of the nave method, and provdes reasonable CP around the nomnal level under both m = 20 and m = 30. The results clearly support the theoretcal property gven n Theorem 3 presentng BCI s second order accurate. Snce undervaluaton of estmaton rsk may produce serous problems n practce, we should be use the bootstrap method when the number of areas s not large. m=20 m=20 CP BCI NCI AL BCI NCI area area m=30 m=30 CP BCI NCI AL BCI NCI area area Fgure 1: Smulated coverage probablty (CP) and average length (AL) of two confdence ntervals, nave confdence nterval (NCI) and bootstrap calbrated confdence nterval (BCI) for m = 20 (upper) and m = 30 (lower). 16

18 4.3 Example: poverty mappng n Span We appled the proposed method to estmaton of poverty ndcators n Spansh provnces, usng the synthetc ncome data avalable n sae package (Molna and Marhuenda, 2015) n R language, n whch the equalzed annual net ncome are gven. The smlar data set was used n Molna and Rao (2010) and Molna et al. (2014). As auxlary varables, we consdered the ndcators of the fve qunquennal groupngs of the varable age, the ndcator of havng Spansh natonalty, the ndcators of the three levels of the varable educaton level, and the ndcators of the three categores of the varable employment, wth categores unemployed, employed and nactve. For each auxlary varable, one of the categores was consdered as base reference, omttng the correspondng ndcator and then ncludng an ntercept n the model. The poverty measures we focused on were the FGT poverty measures (Foster et al., 1984): ( ) x z α T (x) = I(x < z), z where z s a fxed poverty lne, and t corresponds to poverty ncdence or head count rato (α = 0), poverty gap (α = 1) and poverty severalty (α = 2). In ths example, we focused on poverty rato (α = 0), and we set z as the 0.6 tmes the medan of ncomes. Let E j be the ncome of jth ndvdual n th area. Such data are avalable for m = 52 areas and the sample szes are are rangng from 20 to Snce the small porton of E j take negatve values, we assume the nested error regresson model wth shfted-dpt: SDP: (2λ) 1 { (E j + c) 1/λ (E j + c) 1/λ} = x t jβ + v + ε j, (16) notng that the model has two transformaton parameters λ and c. We also consdered two submodel of (16). In both models, we set c = c mn(e j ) + 1 to ensure that E j + c s postve for all (, j). The frst submodel s dened by puttng c = c n (16), whch s referred to SDP-s. The second sub-model s the shfted-log transformaton 17

19 model: SL: log(e j + c ) = x t jβ + v + ε j, (17) whch has no longer parameters and was used n Molna and Rao (2010). Fnally, we also appled the model wth snh-arcsnh transformaton presented n Secton 2.3: SS: snh(b snh 1 (E j ) a) = x t jβ + v + ε j, (18) whch has two transformaton parameter a and b. By maxmzng the profle lkelhood functon of transformaton parameters, we obtaned as follows: (SDP) λ = ( ), ĉ = 4319 (170.69) (SDP-s) λ = ( ) (SS) â = ( ), b = ( ), where the values n the parentheses are the correspondng standard errors calculated from the Fsher nformaton matrx gven n Theorem 1. From the above result, t can be observed that the approxmate 95% confdence ntervals of the transformaton parameter λ n SDP and SDP-s are bounded from 0, whch means that the logtransformed model would be napproprate. Moreover, we computed AIC and BIC based on the maxmum margnal lkelhood, and the results are gven n Table 2 n whch the values scaled by the number of sampled unts (N = 17199) are reported. The results show that the SDP fts the best among the four models n terms of both AIC and BIC whle the SL model fts the worst. Hence, the use of parametrc transformaton can mprove AIC and BIC n ths applcaton. To see the fttng of the models n terms of normalty assumpton of the error terms, we computed the standardzed resduals defned as r j = Ĥ(y j ) x t j β τ 2 + σ 2, j = 1,..., n, = 1,..., m, 18

20 where Ĥ s the estmated transformaton functon, notng that r j s asymptotcally follow the standard normal dstrbutons f the assumed model s correctly specfed. In Fgure 2, we shows QQ-plots of r j s of the four models. We can observe that the normalty assumptons n the three models wth parametrc transformatons, SDP, SDP-s and SS, seem plausble from Fgure 2. However, the QQ-plot for SL shows that the dstrbuton of standardzed resduals s skewed and the normalty assumpton would not be approprate. Fnally, we calculated the estmated values of the poverty rates µ from the drect estmator (DE), and four model based methods. For computng the emprcal best predctor of µ, we used 500 random samples for Monte Carlo ntegraton. The obtaned values are gven n Table 3. In the parentheses, we provded the bootstrap emprcal Bayes confdence ntervals of µ wth 200 bootstrap teratons. It can be seen that the drect estmator produces qute dfferent estmates of µ from the model based methods when area sample sze n s not large lke n Avla and Tarragona. On the other hand, n provnces wth large samples szes, the dfferences of estmates between DE and the other model methods are relatvely small. We can also observe that SL method tend to produce smaller estmates than the other model based methods. However, from AIC and BIC values and QQ-plot n Fgure 2, the valdty of SL method s hghly doubtful n ths case, so that the predcted values gven n Table 3 would not be relable. As shown n Table 3, the use of dfferent transformaton functon leads to sgnfcantly dfferent predcted values of µ. Hence, t would be valuable to select an adequate transformaton functon by estmatng transformaton parameters based on the sampled data. Table 2: AIC and BIC of four models. The values are scaled by the number of sampled unts (N = 17199). SDP SDP-s SS SL AIC BIC

21 SDPs Sample Quantles Sample Quantles SDP Theoretcal Quantles Theoretcal Quantles SS SL Sample Quantles Sample Quantles Theoretcal Quantles Theoretcal Quantles Fgure 2: QQ-plots of standardzed resduals n four models. 5 Conclusons and Dscusson We have ntroduced the use of a parametrc famly of transformatons for estmatng (predctng) general area specfc parameters based on the mxed eﬀects models. We have provded the best predctor of the parameter as well as the maxmum lkelhood method for estmatng model parameters. Moreover, for measurng varablty of the predctor, we constructed an emprcal Bayes confdence nterval of the area parameter. The smulaton and emprcal studes have revealed that the use of parametrc 20

22 Table 3: Estmated poverty rates from the drect estmator (DE) and four model based methods n fve provnces. The bootstrap emprcal Bayes confdence ntervals are gven n the parenthess. area n DE SDP SDP-s SS SL Avla (1.63, 4.05) (1.87, 4.31) (2.00, 4.50) (1.57, 3.62) Tarragona (6.02, 10.42) (6.38, 10.82) (7.28,11.27) (5.83, 9.35) Santander (5.75, 8.07) (6.03, 8.03) (6.38, 8.49) (5.21, 6.99) Sevlla (4.15, 5.75) (4.45,6.07) (4.77, 6.35) (3.73, 4.90) Ovedo (4.14, 5.44) (4.45, 5.51) (4.75, 6.00) (3.77, 4.74) transformatons would mprove the predcton accuracy of the exstng method usng specfed transformatons. Although we consdered an emprcal Bayes approach n ths paper, the herarchcal Bayes approach as consdered n Molna et al. (2014), by assgnng some pror dstrbutons for model parameters, would be useful. Moreover, one may use more flexble method for the regresson part lke the penalzed splne as used n Opsomer et al. (2008). The detaled nvestgaton of these ssues are left to a valuable future study. Acknowledgement Shonosuke Sugasawa and Tatsuya Kubokawa are supported by Grant-n-Ad for Scentfc Research (16H07406, 15H01943 and ) from Japan Socety for the Promoton of Scence. Appendx A1. Proof of Theorem 1. From the lkelhood functon (8), ts frst order derva- 21

23 tves are gven by L m β = X t Σ 1 z, =1 L σ 2 = 1 2 =1 m =1 tr (Σ 1 ) 1 2 L τ 2 = 1 2 m =1 L m λ = z t Σ 1 H (1) λ (y ) + m =1 z t Σ 2 z, m n =1 j=1 1 t n Σ 1 1 n 1 2 λ log H λ (y j), m =1 z t Σ 1 1 n 1 t n Σ 1 z where z = H λ (y ) X β. Snce E[z ] = 0, t follows that E[ 2 L/ β τ 2 ] = E[ 2 L/ β σ 2 ] = 0. The other elements of the Fsher nformaton can be obtaned by a straghtforward calculaton. Moreover, under Assumptons 1 and 2, the each element of the Fsher nformaton matrx s fnte, so that the asymptotc normalty of ϕ follows. A2. Proof of Theorem 2. Let ϕ 0 s the true values of parameters. It suffces to show that P (µ Q a (y, ϕ)) = a + O(m 1 ) for a (0, 1). We frst note that It holds that P (µ Q a (y, ϕ)) = E[P (µ Q a (y, ϕ) y s )] = E[F (Q a (y, ϕ); y, ϕ 0 )], where F ( ; y, ϕ 0 ) s a dstrbuton functon of µ gven y. Let G(y, ϕ, ϕ 0 ) = F (Q a (y, ϕ); y, ϕ 0 ), notng that 0 G(y, ϕ, ϕ 0 ) 1 and G(y, ϕ 0, ϕ 0 ) = a. The Taylor expanson of G(y, ϕ, ϕ 0 ) shows that G(y, ϕ, ϕ 0 ) = G(y, ϕ 0, ϕ 0 ) + j G ϕj (y, ϕ, ϕ 0 ) ϕ=ϕ0 ( ϕ j ϕ j ) + 1 G ϕj ϕ 2 k (y, ϕ, ϕ 0 ) ϕ=ϕ0 ( ϕ j ϕ j )( ϕ k ϕ k ) j,k + 1 G ϕj ϕ 6 k ϕ l (y, ϕ, ϕ 0 ) ϕ=ϕ ( ϕ j ϕ j )( ϕ k ϕ k )( ϕ l ϕ l ), j,k,l where ϕ s on the lne connectng ϕ and ϕ 0. Then, t follows that P (µ Q a (y, ϕ)) = E[G(y, ϕ, ϕ 0 )] = a + R R R 3, 22

24 where [ R 1 = E G ϕ (y, ϕ, ϕ 0 ) ] ϕ=ϕ0 ( ϕ ϕ 0 ) R 2 = [ E G ϕj ϕ k (y, ϕ, ϕ 0 ) ] ϕ=ϕ0 ( ϕ j ϕ j )( ϕ k ϕ k ) j,k R 3 = [ E G ϕj ϕ k ϕ l (y, ϕ, ϕ 0 ) ] ϕ=ϕ ( ϕ j ϕ j )( ϕ k ϕ k )( ϕ l ϕ l ). j,k,l Usng the Cauchy-Schwarz nequalty, we have [ E G ϕj ϕ k (y, ϕ, ϕ 0 ) ] ϕ=ϕ0 ( ϕ j ϕ j )( ϕ k ϕ k ) { } 1 { } 1 E[( ϕ j ϕ j ) 4 4 ] E[( ϕ k ϕ k ) 4 4 ] E [G ϕj ϕ k (y, ϕ, ϕ 0 ) 2 ϕ=ϕ0 ]. From the asymptotc normalty of ϕ gven n Theorem 1, t holds that E[ ϕ k ϕ k r ] = O(m r/2 ). Moreover, snce the range of G(y, ϕ, ϕ 0 ) s (0, 1), the partal dervatves of G(y, ϕ, ϕ 0 ) are bounded. Then, we obtan R 2 = O(m 1 ). Usng the smlar evaluaton, we can show that R 3 = O(m 1 ). Regardng R 1, t s noted that [ E G ϕ (y, ϕ, ϕ 0 ) ] [ ] ϕ=ϕ0 ( ϕ ϕ 0 ) = E G ϕ (y, ϕ, ϕ 0 )E[ ϕ ϕ 0 y ]. From Lohr and Rao (2009), t holds E[ ϕ ϕ 0 y ] = m 1 b ϕ I 1 ϕ L (y, ϕ 0 )/ ϕ + o p (m 1 ), where m =1 L (y, ϕ 0 ) L(ϕ) and b ϕ = lm m me[ ϕ ϕ 0 ] s the asymptotc bas of ϕ. Hence, we have [ ] E G ϕ (y, ϕ, ϕ 0 )E[ ϕ ϕ 0 y ] = 1 [ m E [G ϕ(y, ϕ, ϕ 0 )] b ϕ E G ϕ (y, ϕ, ϕ 0 )I 1 ϕ ] ϕ L (y ; ϕ 0 ) + o(m 1 ), whch s O(m 1 ). Therefore, the proof s completed. A3. Proof of Theorem 3. From the proof of Theorem 2, we have F a (ϕ 0 ) P (µ Q a (y, ϕ)) = a + c a(ϕ 0 ) m + o(m 1 ), 23

25 where c a (ϕ) s a smooth functon of ϕ. Let a and â be satsfyng F a (ϕ 0 ) = a and F a ( ϕ) = a, respectvely. Then, from the above expanson, we have â a = o(m 1 ), thereby P (µ Qâ (y, ϕ)) = P (µ Q a (y, ϕ)) + o(m 1 ) = a + o(m 1 ), whch completes the proof. A4. Checkng assumptons of transformatons. We here check the assumpton 3 n Assumpton 1 for the dual power (DP) transformaton (10) and snh-arcsnh (SS) transformaton (11). (DP transformaton) puttng x = t for t > 0, we have We frst note that H 1 λ (x) = O(x1/λ ) as x. By H 1 λ (x) = ( 1 + λ 2 t 2 λt) 1/λ = as t. A straghtforward calculaton shows that 1 ( 1 + λ 2 t 2 + λt) 1/λ = O(t 1/λ ) H λ (x) λ = xλ log x + x λ log x 2λ + xλ x λ 2λ 2, thereby, t follows that H λ λ (H 1 λ (x)) = O( x log x ) + O( x 1 log x ) + O( x ) + O( x 1 ) = O( x log x ) as x. Moreover, snce 2 H λ (x) λ 2 = xλ (log x) 2 x λ (log x) 2 xλ x λ 2λ λ 3, the smlar evaluaton leads to 2 H λ (w)/ λ 2 = O( x (log x ) 2 ) as x. Regardng 2 log H λ (x)/ λ2, t holds that 2 log H λ (w) λ 2 = 4(log w) 2 w 2 (w λ 1 + w λ 1 ) 2 = O((log x )2 x 2 ) 24

26 as x, so that the DP transformaton satsfes the assumpton. When the locaton parameter s used, namely, H λ,c (x) = {(x + c) λ (x + c) λ }/2λ, t s noted that k H λ,c (x)/ c k = k H λ,c (x)/ x k, so that the qute smlar evaluaton shows that the shfted-dp transformaton also satsfes the assumpton. (SS transformaton) It follows that H a,b (x) a = cosh(b snh 1 (x) a), H a,b (x) b = cosh(b snh 1 (x) a) snh 1 (x). Note that snh 1 (x) = O(log x ) as x, so that H 1 a,b (x) = O(exp(b 1 log x )) = O( x 1/b ). Then, we have H a,b a (H 1 a,b (x)) = O(exp(b log x 1/b )) = O( x ), H a,b b (H 1 a,b (x)) = O(exp(b log x 1/b ) log x 1/b ) = O( x log x ), as x. Moreover, t holds that 2 H a,b (x) 2 a 2 H a,b (x) a b = snh(b snh 1 (x) a), 2 H a,b (x) 2 b = snh(b snh 1 (x) a) snh 1 (x), = snh(b snh 1 (x) a){snh 1 (x)} 2 thereby the smlar evaluaton shows that 2 H a,b (x)/ 2 a = O( x ), 2 H a,b (x)/ 2 b = O( x (log x ) 2 ) and 2 H a,b (x)/ a b = O( x log x ) as x. On the other hand, a straghtforward calculaton shows that a log H a,b (x) = H a,b(x) H a,b (x) 1 + H a,b (x) 2, a b log H a,b (x) = 1 b + H a,b(x) H a,b (x) 1 + H a,b (x) 2, b whch are bounded by the functon H a,b (x)/ a and H a,b (x)/ b, respectvely. A straghtforward calculatons show that the second partal dervatves of log H a,b (x) are bounded by polynomal functons of the second partal dervatves of H a,b (x) and H a,b (x), thereby the assumpton s satsfed. 25

27 References [1] Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error-components model for predcton of county crop areas usng survey and satellte data. Journal of the Amercan Statstcal Assocaton, 83, [2] Brent, R. (1973). Algorthms for Mnmzaton wthout Dervatves. Englewood Clffs N.J. Prentce-Hall. [3] Box, G.E.P. and Cox, D.R. (1964). An analyss of transformaton (wth dscusson). Journal of the Royal Statstcal Socety: Seres B, 26, [4] Chambers, R., Chandra, H., Salvat, N. and Tzavds, N. (2014) Outlner robust small area estmaton. Journal of the Royal Statstcal Socety: Seres B, 76, [5] Chatterjee, S., Lahr, P. and L, H. (2008). Parametrc bootstrap approxmaton to the dstrbuton of EBLUP and related predctons ntervals n lnear mxed models. The Annals of Statstcs, 36, [6] Fay, R. and Herrot, R. (1979). Estmators of ncome for small area places: an applcaton of James Sten procedures to census. Journal of the Amercan Statstcal Assocaton, 74, [7] Foster, J., Greer, J. and Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrca, 52, [8] Hall, P. and Mat, T. (2006). On parametrc bootstrap methods for small area predcton. Journal of the Royal Statstcal Socety: Seres B, 68, [9] Jang, J. (1996). REML estmaton: asymptotc behavor and related topcs. The Annals of Statstcs, 24, [10] Jang, J. (2010). Large Sample Technques for Statstcs. Sprnger. [11] Jang, J. and Lahr, P. (2006). Estmaton of fnte populaton doman means: a model-asssted emprcal best predcton approach. Journal of the Amercan Statstcal Assocaton, 101, [12] Jones, M. C. and Pewsey, A. (2009). Snh-arcsnh dstrbutons. Bometrka, 96,

28 [13] Lahr, P. and Mukherjee, K. (2007). On the desgn consstency property of herarchcal Bayes estmators n fnte populaton samplng. The Annals of Statstcs, 35, [14] L, Y, and Lahr, P. (2007). Robust model-based and model-asssted predctors of the fnte populaton total. Journal of the Amercan Statstcal Assocaton, 102, [15] Lohr, S. L. and Rao, J. N. K. (2009). Jackknfe estmaton of mean squared error of small area predctors n nonlnear mxed models. Bometrka, 96, [16] Molna, I. and Marhuenda, Y. (2015). sae: an R package for small area estmaton. The R Journal, 7, [17] Molna, I. and Rao, J. N. K. (2010). Small area estmaton of poverty ndcators. Canadan Journal of Statstcs, 38, [18] Molna, I., Nandram, B. and Rao, J. N. K. (2014). Small area estmaton of general parameters wth applcaton to poverty ndcators: A herarchcal Bayes approach. The Annals of Appled Statstcs, 8, [19] Nandram, B. (1999). An emprcal Bayes predcton nterval for the fnte populaton mean of a small area. Statstca Snca, 9, [20] Opsomer, J. D., Claeskens, G., Ranall, M. G., Kauermann, G. and Bredt, F. J. (2008). Non-parametrc small area estmaton usng penalzed splne regresson. Journal of the Royal Statstcal Socety, B, 70, [21] Rao, J.N.K. and Molna, I. (2015) Small Area Estmaton, 2nd Edton. Wley. [22] Schmd, T., Tzavds, N., Münnch, R. and Chambers, R. (2016) Outler robust small-area estmaton under spatal correlaton. Scandnavan Journal of Statstcs, 43, [23] Sugasawa, S. and Kubokawa, T. (2017). Transformng response values n small area predcton. Computatonal Statstcs & Data Analyss, n press. [24] Yang, Z. L. (2006). A modfed famly of power transformatons. Economcs Letters, 92,

29 [25] Yoshmor, M. and Lahr, P. (2014). A second-order effcent emprcal Bayes confdence nterval. The Annals of Statstcs, 42,

arxiv: v3 [stat.me] 11 Jun 2018

arxiv: v3 [stat.me] 11 Jun 2018 Adaptvely Transformed Mxed Model Predcton of General Fnte Populaton Parameters SHONOSUKE SUGASAWA Center for Spatal Informaton Scence, The Unversty of Tokyo arxv:1705.04136v3 [stat.me] 11 Jun 2018 TATSUYA