Estimating function analysis for a class of Tweedie regression models

Size: px

Start display at page:

Download "Estimating function analysis for a class of Tweedie regression models"

Nicholas Shaw
5 years ago
Views:

1 Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal do Paraná - UFPR, Curitiba, Paraná, Brasil, wbonat@ufr.br and Deartament of Mathematics and Comuter science - IMADA, University of Southern Denmark - SDU, Odense, Denmark, wbonat@sdu.dk Abstract We roose a new way to make inference on Tweedie regression models based on the estimating function aroach. We adoted quasi-score function for regression arameters and Pearson estimating function for disersion arameters. We erform a simulation study to comare our aroach with the maximum likelihood method. The results show that both methods are similar, but estimating function aroach is better to estimate small values of the ower arameter. Some advantages to use estimating function are: i) avoid to evaluate the density function, ii) allow us estimate negative and between 0 and 1 values for the ower arameter, iii) robust secification based on second-moments assumtions. We rovide an R imlementation for our new aroach. Keywords Tweedie regression, ower variance function, estimating function, maximum likelihood. Sulement materials htt:// 1

2 1 Introduction Statistical modeling is one of the most significant fields of alied statistics with alications in many fields of scientific study, such as sociology, economy, agronomy, medicine and others. There exists an infinity of different statistical models, but the class of Generalized Linear Models (GLM) (Nelder and Wedderburn, 1972) is the most used in the last three decades. The success of this aroach is due its ability to deal with different tyes of resonse variables, such as binary, count and continuous inside a general framework with a owerful scheme of inference based on the likelihood aradigm. Some of the most imortant articular cases of GLM class are: a linear regression model based on the Gaussian distribution for real resonse variable, Gamma and inverse Gaussian regression models for ositive real resonse variable, logistic regression based on the Binomial distribution for binary data and Poisson regression for count data. All these models are linked because they belong to the class of the exonential disersion models (Jørgensen, 1997), and share an amazing characteristic: they are described by their first two moments (mean and variance). Furthermore, the variance function describes the relationshi between the mean and variance of the resonse variable. Let Y denote the resonse variable and assume that the robability function or density robability function of Y belongs to the class of exonential disersion models and assume too that the E(Y ) = µ and the V ar(y ) = V (µ) = µ then Y T w (µ, ), where T w (µ, ) denotes a Tweedie (Tweedie, 1984), (Jørgensen, 1997) random variable with mean µ and variance µ and > 0 and (, 0] [1, ) are arameters that describe the variance structure of Y. Tweedie distribution has many interesting theoretical roerties for a detailed descrition, see Jørgensen (1997). For ractical situations in statistical modeling the Tweedie distribution is interesting because it delivers many imortant articular cases, and the arameter identifies these cases. For examle, = 0 we have the Gaussian distribution, for = 1 and = 1 we have the Poisson distribution, for = 2 and = 3 corresond to the Gamma and inverse Gaussian distributions. Another imortant case is 1 < < 2 that corresonds to the Comound Poisson distribution. Just by its articular cases the Tweedie distribution is already imortant for statistical modeling, but there exists an infinity of models, once the arameter may be estimated based on a data set, making this simle relationshi between mean and variance a rich class of statistical models. For ractical alications the estimation of the arameters that describe the variance structure ( and ) is imortant and deserves the same attention devoted in the regression arameters. The orthodox aroach is based on a likelihood aradigm, that is an efficient estimation method. A articularity about the Tweedie distribution is that outside the secial cases, its robability density function cannot be written in a closed form, and requires any numerical method to evaluate the density function. Dunn and Smyth (2001) roosed some methods to evaluate the density function of the Tweedie distribution, but these methods are comutationally demanding and shows a different level of accuracy for different regions of the arameter sace. This fact makes the rocess of inference based on likelihood difficult and sometimes slow. The main objective of the aer is to roose a new way to estimate the arameters ( and ) based on Pearson estimating functions (Jørgensen and Knudsen, 2004). This method is very fast comutationally, because it emloys merely the first two moments (mean and variance) and in this way avoids evaluating the robability density function. Furthermore, we resent an efficient and stable algorithm to obtain the oint estimates. The inference is based on asymtotic results, and we show exressions for the sensitivity and the Godambe information matrix. The variance of the Pearson estimating function is aroximated based on emirical, third and fourth moments. We run a simulation study to show the roerties of our aroach and comare with the maximum likelihood estimator in a finite samle scheme. 2

3 In the next Section we give some background about Tweedie distribution. In the Section 3 we resent the Tweedie regression models and two aroaches to make inference with resect the model arameters, maximum likelihood and estimating functions. Section 4 shows the main results from our simulation study and Section 5 reorts some final remarks. 2 Background The Tweedie distribution belongs to the class of exonential disersion models (EDM) (Jørgensen, 1997). Thus, for a random variable Y which follows an EDM, the density function can be written as: Y (y; µ, ) = a(y, ) ex{(yθ k(θ))/} (1) where µ = E(Y ) = k (θ) is the mean, > 0 is the disersion arameter, θ is the canonical arameter, and k(θ) is the cumulant function. The function a(y, ) cannot be written in closed form aart the articular cases cited. The variance is given by V ar(y ) = V (µ) where V (µ) = k (θ) is called the variance function. Tweedie densities are characterized by ower variance functions of the form V (µ) = µ, where (, 0] [1, ) is the index determining the distribution. Although, Tweedie densities are not known in closed form, their cumulant generating function (cgf) is simle. The cgf is given by where k(θ) is the cumulant function, and K(t) = {k(θ + t) k(θ)}/ θ = k(θ) = { µ log µ = 1 { µ log µ = 2. The remaining factor in the density, a(y, ) needs to be evaluated numerically. Jørgensen (1997) resents two series exressions for evaluating the density: one for 1 < < 2 and one for > 2. In the first case can be shown that, } P (Y = 0) = ex { µ2 (2 )) and for y > 0 that a(y, ) = 1 W (y,, ) y with W (y,, ) = i=1 W j and W j = y jα ( 1) αj j(1 α) (2 ) j j!γ( jα), where α = (2 )/(1 ). 3

4 A similar series exansion exists for > 2 and is given by: with V = i=1 V k and a(y, ) = 1 V (y,, ) πy V k = Γ(1 + αk)k(α 1) ( 1) αk Γ(1 + k)( 2) k y αk ( 1) k sin( kπα). Dunn and Smyth (2001) resents a detailed study about these series and an algorithm to evaluate the Tweedie density function based on these series exansion. The algorithm is imlemented in the ackage tweedie (Dunn, 2013) for the statistical software R(R Core Team, 2014) through the function dtweedie.series. Dunn and Smyth (2005) and Dunn and Smyth (2008) studied two more methods to evaluate the density function of the Tweedie distributions, one based on the inversion of cumulant generating function using the Fourier inversion and the sandleoint aroximation, for more details see (Dunn, 2013). In this aer we use only the aroach described above. 3 Tweedie regression models The Tweedie regression models were resented by Jørgensen and Paes De Souza (1994), Dunn and Smyth (2005), Hasan and Dunn (2011) and others. Consider indeendent resonses Y 1, Y 2,..., Y n are observed such that Y i T w (µ i, ) where the mean µ i is linked to linear redictor through a known link function g, g(µ i ) = x T i β where x i is a vector of covariates and β is a vector of unknown regression arameters. Let q be the dimension of β. On an equivalent way we can define the model using a matricial notation. Let Y a vector of resonse variable, then the Tweedie regression model can be defined by Y T w (µ, I) (2) where I is a n n dimensional identity matrix. In this case is easy to see that E(Y) = µ = g 1 (Xβ) and the V ar(y) = C = diag(µ ). In this aer we define the link function g as the logarithm function. Note that the model is equivalently defined by its joint distribution defined in (2) or by its first two moments (mean and variance). Denote the vector of arameters by θ = (β, λ = (, )). The arameter vector can be divided in two sets, the first are the regression arameters and the second are arameters that describe the variance structure. In this aer we are interested to make inference about the second set. The orthodox method is based on likelihood function and it will be describe in the next section. 3.1 Maximum likelihood Let Y (y; β, λ) denote the Tweedie robability or density robability function as given in equation (1) and evaluated as described in Section 2. Then, the log-likelihood function for a samle of size n is given by 4

5 l(, ) = n log Y (y i ; β, λ). (3) i=1 Maximizing the equation (3) with resect to β and λ we have the maximum likelihood estimator denoted by ˆβ M and ˆλ M. The maximization rocess can be done by different ways, Dunn and Smyth (2005) roosed a method based on the BFGS algorithm and Jørgensen and Paes De Souza (1994) roosed a different scheme based on rofile likelihood. In this aer we roose to use the Nelder- Mead (Nelder and Mead, 1965) method as imlemented in the function otim of the R statistical software (R Core Team, 2014). In our simulation studies Nelder-Mead method shows stable and efficient results. To make inference about ˆθ = ( ˆβ, ˆλ) T we use the well known asymtotic distribution of the maximum likelihood estimator, ˆθ N(θ, I o (ˆθ) 1 ) where I o (θ) denote the observed information of θ. Note that, in the Tweedie regression models we cannot comute the Fisher information, because the second derivatives of log-likelihood function are not available in a closed form. In this way, we use the observed information matrix comuted numerically using the Richardson method (Soetaert and Herman, 2009), on the oint ˆθ. Basically, our algorithm obtains the maximum likelihood estimates using the Nelder-Mead algorithm and comute the asymtotic variance of ˆθ based on the inverse of negative of the Hessian matrix comuted numerically by Richardson method. Note that this aroach is comutationally exensive, because we need evaluate the robability or density robability function of the Tweedie distribution many times inside the rocess of maximization. In the next section we shall resent a new way to make inference about β and λ based on estimating functions. 3.2 Estimating functions In this Section we describe estimating functions aroach to estimate θ = (β, λ). We adoted the quasi-score function for regression arameters and Pearson estimating function for disersion arameters. Jørgensen and Knudsen (2004) describes the aroach of estimating function, as well its roerties. The quasi-score function is defined by, where D T = β µ. The q q matrix is called the sensitivity matrix of ψ β and the q q matrix ψ β (β, λ) = D T C 1 (Y µ) (4) S β = E( β ψ β ) = D T C 1 D (5) is called the variability matrix of ψ β. In a similar way the Pearson estimating function is defined by, V β = V ar(ψ β ) = D T C 1 D (6) ψ λi (β, λ) = r T W λi r tr(w λi C 1 ) (7) where W λi = C 1 C λ i C 1 and r = (Y µ). Note that everything we need to evaluate these equations are the derivatives with resect to λ 1 = and λ 2 =. Is easy to show that, 5

6 C = diag(µ ) and C = diag( log(µ)µ ). (8) The entries (i, j) of the 2 2 sensitivity matrix of ψ λ are given by, ( ) ( ) 1 C 1 C S λij = E ψ λj = tr C C. (9) λ i λ i λ j We can show using results about characteristic function of linear and quadratic forms of Non- Normal variables (Knight, 1985), that the entries of variability matrix of ψ λ are given by, V λij = Cov(ψ λi ; ψ λj ) = 2tr(W λi CW λj C) + k k (4) l (W λi ) ll (W λj ) ll (10) where k (4) denote the fourth cumulant of Y. To take into account the covariance between the vectors β and λ, we need to comute the cross sensitivity and variability matrix. The entries of the cross sensitivity matrix between β and λ are given by, ( ) S βi λ j = E ψ βi = 0. (11) λ j In a similar way the entries of the cross sensitivity matrix between λ and β are given by, ( ) ( ) 1 C 1 C S λi β j = E ψ λi = tr C C. (12) β j λ i β j Finally we can show that the entries of the cross variability matrix between β and λ, are given by, n n n V λi β j = E A (j) r lr j r k, (13) l=1 j=1 k=1 W (ij) λ i where A = D T C 1 and A (j) denote the j th collumn of A. In a similar way W (ij) λ i denote the i th and j th entries of the matrix W λi. Furthermore, the joint sensitivity matrix of ψ β and ψ λ is given by ( ) Sβ S S θ = βλ, (14) S λβ whose entries are defined in equations (5), (9), (12) and (11). Likewise, the joint variability matrix of ψ β and ψ λ is given by ( ) Vβ V V θ = βλ, (15) whose entries are defined in equations (6), (10) and (13). Denote ˆθ e = ( ˆβ e, ˆλ e ) the estimate of θ, then the asymtotic distribution of ˆθ e is where J 1 θ where S T θ V λβ S λ V λ is the inverse of Godambe information matrix, = (S 1 θ )T. k ˆθ e N(θ, J 1 θ ) (16) J 1 θ = S 1 θ V θs T θ, (17) 6

7 Jørgensen and Knudsen (2004) roosed the chaser algorithm to solve the system of equations ψ β = 0 and ψ λ = 0. β (i+1) = β (i) S 1 β ψ β(β (i), λ (i) ) λ (i+1) = λ (i) S 1 λ ψ λ(β (i+1), λ (i) ) The chaser algorithm uses the insensibility roerty, which allow us to use two searate equations to udate β and λ, for details see Jørgensen and Knudsen (2004). The described rocedure was imlemented in R and a generic function called glm.tw() is made available on the sulement material web age. To comute the variance of disersion arameters we need information about the third central moment and fourth cumulant. In the case of the Tweedie regression models we can comute these quantities based on the equations resented in the Section 2. An alternative aroach is comute the emirical versions, in this way we avoid the suosition of multivariate Tweedie distribution for the data. The emirical fourth cumulant may be comuted based on the data by the following equation: k (4) l = (y l ŷ l ) 4 3( ˆˆµˆ l )2. The emirical third central moment may be comuted based on equation (13) ignoring the exectation. The main overhead about to use emirical cumulants instead of the theorical cumulants, is that the variance should be overestimated, in this way the confidence interval based on this aroach should be a little bigger than its should be. 4 Simulation study 4.1 Design of the study We made a simulation study to evaluate the roerties of the estimator based on the estimating function aroach and comare its roerties with the maximum likelihood estimator in finite samle. Our focus here is about the arameters that describe the variance structure of Tweedie regression models (,). We use five different samle sizes (n = 50, 100, 250, 500 and 1000), and comare two measures of estimator quality (bias and coverage rate). In this manner, we have a quality measure based on oint estimates and other based on confidence intervals. To decide about which values of and we take into account in the simulation study, we first lot grahics of the likelihood contours for = 0.5 and = 1.1, near the Poisson distribution, = 2 the Gama and = 3 Inverse Gaussian distributions. The Figure 1 shows these grahics. The grahics resented in Figure 1 show that the likelihood contours are similar a quadratic function for = 2 and = 3, indicating that for these values of the asymtotic distribution is well-behaved and near the Gaussian multivariate distribution. However, for = 1.1 the likelihood behavior is non-quadratic, it shows that small values of indicate more challenging setu to make inference. Thus, we choose the values of = 1.1, 1.3, 1.5, 1.7 and 1.9 for the simulation study. The arameter measures the variability, so bigger values of indicate more challenging setu to make inference. We choose the values of = 0.5, 1, 1.5, 2 and 2.5. Combining five samles sizes, five values of and five values of we have 125 different scenarios for our simulation study. All simulations was done using the R software and the ackage tweedie (Dunn, 2013). 7

8 n=50, = 1.1 n=100, = 1.1 n=250, = 1.1 n=500, = 1.1 n=1000, = n=50, = n=100, = n=250, = n=500, = n=1000, = n=50, = n=100, = n=250, = n=500, = n=1000, = Figure 1: Likelihood contours for different values of and samle sizes. 4.2 Results We erform simulations to comare with the erformance of estimators based on Pearson estimating functions against maximum likelihood estimators. We used two measures of quality estimator: the bias b = (θ ˆθ) and the coverage rate. Our simulations consist of 1000 realizations from the Tweedie regression model (Section 3). We used a regression structure with β 0 = 0.5 and β 1 = 1, our model has one covariate, that was generated as a sequence from 0 to 2 and length deending on samle size. We used five samle sizes n = 50, 100, 250, 500 and 1000 and different combinations between the arameters and, see Section 4.1. We choose to introduce the results through grahics. The Figure 2 resents the exected bias of ˆ for different samle sizes, values of and and estimation method, PEF (Pearson Estimating Function) and MLE (Maximum Likelihood Estimator). The Figure 2 shows that in general the PEF estimator overestimate and MLE estimator underestimate for small samle size, but the bias decrease when the samle size increase as required. The bias of PEF estimator increase when the value of increase. The bias of MLE estimator is similar for all values of. In general the values of do not affect the bias of ˆ. The bias of MLE estimator is lesser than PEF estimator, but for samle size around 100, the bias of PEF estimator is small enough to be useful for ractical situations. In a similar way the Figure 3 resents the exected bias for ˆ for different samle sizes, values of and and estimation methods. The results resented in Figure 3 show that the bias of PEF estimator is small for all samle sizes and arameter combinations. In fact the arameter is well estimated using the PEF estimator for any configuration. The MLE estimator is less accurate to estimate small values of using a small samle size, in this case MLE estimator overestimate, but again the bias decreases fast 8

9 PEF = PEF = PEF = PEF = PEF = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = 2.5 = 1.1 = 1.3 = 1.5 = 1.7 = 1.9 True Samle Size Samle Size Samle Size Samle Size Samle Size Figure 2: Exected bias of ˆ for different methods, samle sizes and arameter combinations. PEF = 1.1 PEF = 1.3 PEF = 1.5 PEF = 1.7 PEF = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = 1.9 = 0.5 = 1 = 1.5 = 2 = 2.5 Samle Size Samle Size Samle Size Samle Size Samle Size Figure 3: Exected bias of ˆ for different methods, samle sizes and arameter combinations. 9

10 when the samle size increase and for samle size around 100 the bias is small enough for ractical alications. In general the results indicate that both methods given good oint estimates for the arameters and. In this manner, we evaluated the quality of oint estimates. Now, we need to evaluate the quality of confidence intervals. For this task, we comuted the coverage rate of the confidence interval of ˆ and ˆ for different samle sizes, combinations of and and estimation methods. The coverage rate for the confidence interval of ˆ is shown in Figure PEF = PEF = PEF = PEF = PEF = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = 2.5 = 1.1 = 1.3 = 1.5 = 1.7 = 1.9 Samle Size Samle Size Samle Size Samle Size Samle Size Figure 4: of for different methods, samle sizes and arameter combinations. The results resented in Figure 4 show that for both methods the coverage rate is lesser than the nominal level for small samle size. The confidence intervals based on PEF aroach achieve the nominal level for samle size around 250 for all configurations. Although, for big values of the coverage rate is slightly bigger than the nominal level. The confidence interval based on MLE aroach is not realistic for small samle size and small values of and. For examle, for = 0.5 and = 1.1 the confidence interval based on MLE aroach does not achieve the nominal level same with samle size equal the For bigger values of the situation is better and for samle size around 250 the confidence intervals show coverage rate near the nominal level. In general the results demonstrate that confidence intervals based on PEF aroach does not deend on the combinations between and values, for all configurations the results evidenced that for samle size around 250 the confidence intervals are well estimated. On the other hand, MLE aroach has difficult to estimate confidence interval for small values of and. Similar analysis is resented in Figure 5 for arameter. Figure 5 shows that the coverage rate for confidence intervals based on PEF aroach are near the nominal level for all configurations considered, but again for big values of the coverage rate is slightly bigger than the nominal level. These results indicate that in general this aroach resents confidence intervals bigger than should be. These results were exected, because we are using emirical third and fourth moments to comute the variance of ˆ and ˆ. We argue that for ractical data analysis these confidence intervals are enough accurate. The confidence intervals based on MLE aroach are not realistic for small values of and, for examle for = 1.1 the confidence interval based on MLE aroach does not achieve the nominal level, same using samle size equal to When the values of and increase the results imrove and are near the nominal level for all samle size. 10

11 PEF = PEF = PEF = PEF = PEF = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = Samle Size MLE = 1.9 = 0.5 = 1 = 1.5 = 2 = 2.5 Samle Size Samle Size Samle Size Samle Size Samle Size Figure 5: of for different methods, samle sizes and arameter combinations. In general way both methods show that are able to comute interval confidence, with a coverage rate near the nominal level. Of course, the results imrove when the samle size increase, which is exected because our inferential methods are based on asymtotic results. 5 Conclusion In this aer, we resented a new aroach to make inferences with resect arameters of Tweedie regression models. Our aroach is based on the quasi-score function for regression arameters and the Pearson estimating function for disersion arameters. It is a well known result that quasi-score function yields the same estimator that maximum likelihood aroach for regression arameters. Thus, we focus on disersion arameters or the arameters that describe the variance structure of the Tweedie regression models. We erform a simulation study to evaluate the quality of our estimator and comare with the maximum likelihood aroach. The results show that both methods are similar, but the results based on Pearson estimating function are robust in the sense that for all combinations between arameters considered in the simulation study the PEF aroach shows good results. On the other hand, the MLE aroach showed difficult to estimate small values of and. Furthermore, we have many advantages to use estimating function aroach. First, we do not need to evaluate the density function, that is a hard comutational task. Second, we do not need hoe about negative or near 1 values of, once our aroach deals with this situation naturally. Moreover, we can estimate values of between 0 and 1, because our aroach is based on secondmoments assumtions, in this way we do not need to suose that the resonse variable is distributed as the Tweedie distribution, it becomes our aroach robust to missecification. A suggestion for future work with estimating function aroach and Tweedie regression models may well be extend the Tweedie models for non-indeendent data, for examle in longitudinal data analysis or reeat measures exeriments. Tweedie models may be good models to deal with rainfall data, in this case is imortant to be able to analyze data with satial and sace-time structures, so extend Tweedie models to deal with deendent data is a romising aroach and the use of estimation function become ossible to do it in an elegant way. 11

12 References Dunn, P. K. (2013). tweedie: Tweedie exonential family models. R ackage version Dunn, P. K. and Smyth, G. K. (2001). Tweedie family densities:methods of evaluation., Proceedings of the 16th International Worksho on Statistical Modelling, Odense, Denmark. Dunn, P. and Smyth, G. (2005). Series evaluation of tweedie exonential disersion model densities, Statistics and Comuting 15(4): Dunn, P. and Smyth, G. (2008). Evaluation of tweedie exonential disersion model densities by fourier inversion, Statistics and Comuting 18(1): Hasan, M. M. and Dunn, P. K. (2011). Two tweedie distributions that are near-otimal for modelling monthly rainfall in australia, International Journal of Climatology 31(9): Jørgensen, B. (1997). The Theory of Disersion Models, Chaman Hall. Jørgensen, B. and Knudsen, S. J. (2004). Parameter orthogonality and bias adjustment for estimating functions, Scandinavian Journal of Statistics 31(1): Jørgensen, B. and Paes De Souza, M. C. (1994). Fitting tweedies comound oisson model to insurance claims data, Scandinavian Actuarial Journal 1994(1): Knight, J. L. (1985). The joint characteristic function of linear and quadratic forms of non-normal variables, Sankhyā: The Indian Journal of Statistics, Series A ( ) 47(2): Nelder, J. A. and Mead, R. (1965). A simlex method for function minimization, The Comuter Journal 7(4): Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models, Journal of the Royal Statistical Society. Series A 135(3): R Core Team (2014). R: A Language and Environment for Statistical Comuting, R Foundation for Statistical Comuting, Vienna, Austria. Soetaert, K. and Herman, P. M. (2009). A Practical Guide to Ecological Modelling. Using R as a Simulation Platform, Sringer. ISBN Tweedie, M. C. K. (1984). An index which distinguishes between some imortant exonential families, in J. K. Ghosh and J. Roy (eds), Statistics: Alications and New Directions, Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, Calcutta: Indian Statistical Institute. 12

Flexible Tweedie regression models for continuous data

Flexible Tweedie regression models for continuous data arxiv:1609.03297v1 [stat.me] 12 Se 2016 Wagner H. Bonat and Célestin C. Kokonendji Abstract Tweedie regression models rovide a flexible family of