This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Size: px
Start display at page:

Download "This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and"

Transcription

1 This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier s archiving and manuscript policies are encouraged to visit:

2 Computational Statistics and Data Analysis 68 (2013) Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: On diagnostics in double generalized linear models Gilberto A. Paula Instituto de Matemática e Estatística - USP, Brazil a r t i c l e i n f o a b s t r a c t Article history: Received 19 December 2012 Received in revised form 6 June 2013 Accepted 6 June 2013 Available online 14 June 2013 Keywords: Deviance component residual Double gamma model Leverage measure Local influence Pearson residual Residual analysis The aim of this paper is to propose some diagnostic methods in double generalized linear models (DGLMs) for large samples. A review of DGLMs is given, including the iterative process for the estimation of the mean and precision coefficients as well as some asymptotic results. Then, a variety of diagnostic tools, such as leverage measures and curvatures of local influence under some usual perturbation schemes, the standardized deviance component, and Pearson residuals, are proposed. The diagnostic plots are constructed for the mean and precision models, and an illustrative example, in which the texture of four different forms of light snacks is compared across time with the texture of a traditional one, is analyzed under appropriate double gamma models. Some of the diagnostic procedures proposed in the paper are applied to analyze the fitted selected model Elsevier B.V. All rights reserved. 1. Introduction The class of double generalized linear models (DGLMs) was proposed by Smyth (1989), and Verbyla (1993) derived some case deletion diagnostics for linear heteroscedastic models under maximum likelihood (ML) and restricted maximum likelihood (REML) estimation. The REML method has been considered more reliable than ML for small samples (Smyth and Verbyla, 1999), and various papers have been published under this methodology. For example, Smyth and Verbyla (1999) investigated the sensitivity of the restricted maximum likelihood estimates (REMLEs) for some DGLMs, whereas Smyth and Jørgensen (2002) applied the framework of DGLMs to insurance claims. However, under the ML approach, little has been done on diagnostic methods. In this paper, some usual diagnostic quantities, such leverage measures, local influence curvatures, and Pearson and deviance component residuals are derived for DGLMs under ML. A large sample data set, in which the texture of five snack types is compared across time, is fitted under appropriate double gamma models, and a diagnostic analysis is performed with the quantities proposed in the paper to analyze the selected fitted model. The paper is organized as follows. In Section 2, a review of DGLMs is presented, whereas in Section 3 we derive some useful diagnostic quantities, such as generalized leverages, curvatures of local influence under some usual perturbation schemes, and standardized forms for the Pearson and deviance component residuals. All the calculations are performed for the mean and precision models. The application is given in Section 4, and Section 5 deals with some conclusions. Approximate standardized forms for the Pearson residuals are derived in the Appendix. 2. Review of DGLMs Let Y 1,..., Y n be independent random variables with the density function of Y i expressed in the exponential family form, f (y i ; θ i, φ i ) = exp[φ i {y i θ i b(θ i )} + c(y i ; φ i )], (1) where c(y i, φ i ) = d(φ i ) + φ i a(y i ) + u(y i ) (normal, inverse Gaussian, and gamma distributions), b( ), d( ), a( ), and u( ) are Correspondence to: Departamento de Estatística, IME-USP, Rua do Matão 1010, Cidade Universitária, , São Paulo-SP, Brazil. Tel.: addresses: giapaula@ime.usp.br, gilbertop056@gmail.com /$ see front matter 2013 Elsevier B.V. All rights reserved.

3 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) Table 1 Useful quantities derived for some exponential family distributions. Normal Inverse Gaussian Gamma t i y i µ i 1 2 (µ2 i + y 2) i {y i/2µ 2 i + µ 1 i + (2y i ) 1 } log(y i /µ i ) y i /µ i 1 d(φ) log φ 1 log φ φ log φ log Γ (φ) 2 2 d (φ) (2φ) 1 (2φ) 1 (1 + log φ) ψ(φ) d (φ) (2φ 2 ) 1 (2φ 2 ) 1 φ 1 ψ (φ) Γ ( ), ψ( ), and ψ ( ) denote the gamma, digamma, and trigamma functions. twice differentiable functions, θ i is the canonical parameter, and φ i (φ 1 i ) is the precision (dispersion) parameter. Alternatively, taking T i = Y i θ i b(θ i ) + a(y i ), one may express the density function of T i (given θ i ) in the exponential family form (1), namely f (t i ; φ i ) = exp{φ i t i + d(φ i ) + u(y i )}. From standard regularity conditions it follows that µ i = E(Y i ) = b (θ i ) and Var(Y i ) = φ 1 i V(µ i ), where V(µ i ) = V i = b (θ i ) is the variance function, E(T i ) = d (φ i ) and Var(T i ) = d (φ i ). Table 1 presents some of the quantities above derived for the normal, inverse Gaussian, and gamma distributions. The DGLMs are defined by assuming the systematic components g(µ i ) = η i = x i β and h(φ i ) = λ i = z i γ, where β = (β 1,..., β p ) and γ = (γ 1,..., γ q ) are the model parameters to be estimated, x i = (x i1,..., x ip ) and z i = (z i1,..., z 1q ) contain values of explanatory variables, and g( ) and h( ) are the link functions. Models (1) and (2), called the mean model and the precision model, respectively, belong to the class of generalized additive models for location, scale, and shape proposed by Rigby and Stasinopoulos (2005) Parameter estimation The score function for β and γ may be, respectively, expressed as U β = X W V (y µ) and U γ = Z H 1 γ (t µ T ), where X is an n p matrix of rows x i (i = 1,..., n), W = diag{ω 1,..., ω n } with weights ω i = (dµ i /dη i ) 2 /V i, V = diag{v 1,..., V n }, = diag{φ 1,..., φ n }, y = (y 1,..., y n ), µ = (µ 1,..., µ n ), Z is an n q matrix of rows z i (i = 1,..., n), H γ = diag{h (φ 1 ),..., h (φ n )}, t = (t 1,..., t n ), and µ T = (E(T 1 ),..., E(T n )) = ( d (φ 1 ),..., d (φ n )). The Fisher information matrices for β and γ are, respectively, given by K ββ = X WX and K γ γ = Z PZ, where P = diag{p 1,..., p n } with p i = d (φ i ){h (φ i )} 2, i = 1,..., n. The joint iterative process for obtaining the maximum likelihood estimates ˆβ and ˆγ takes the form and β (m+1) = (X (m) W (m) X) 1 X (m) W (m) y (m) (3) γ (m+1) = (Z P (m) Z) 1 Z P (m) z (m), for m = 0, 1, 2,..., where y = Xβ + W V (y µ) and z = Zγ + V 1 γ H γ (t µ T ) are the modified dependent variables and V γ = diag{ d (φ 1 ),..., d (φ n )}. Note that P = V γ H 2 γ. This joint iterative process is solved by alternating Eqs. (3) (4) until convergence. Starting values may be the maximum likelihood estimates (MLEs) from the generalized linear model (GLM) with constant dispersion. The iterative process for obtaining the REMLEs takes the same form as (3) (4) with the quantities P and z being modified appropriately (see, for instance, Smyth and Verbyla, 1999). Fahrmeir and Tutz (2001) presented some regularity conditions for attaining the asymptotic normality of the parameter estimates in GLMs. Assuming that such regularity conditions are extended for DGLMs, one has for large n that ˆβ N p (β, K 1 ββ ) and ˆγ N q(γ, K 1 γ γ ). Due (2) (4) to the orthogonality between β and γ, one has asymptotic independence between ˆβ and ˆγ. DGLMs may be performed by using, for instance, the packages dglm and gamlss available in R software. 3. Diagnostic methods 3.1. Leverage The main idea behind the concept of leverage is that of evaluating the influence of each response on its own predicted value. In DGLMs, the influence of y on ŷ and t on ˆt may be well represented by the principal diagonal elements of the n n matrices ( ŷ/ y ) and ( ˆt/ t ), respectively. Using results from Wei et al. (1998), we find the generalized leverage

4 46 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) matrices GLy = ( ŷ/ y ) = NX( Lββ ) 1 X V 1 N ˆθ (γ fixed) and GLt = ( ˆt/ t ) = H 1 γ V T Z( Lγ γ ) 1 Z H 1 γ ˆθ (β fixed), where θ = (β, γ ), N = diag{dµ 1 /dη 1,..., dµ n /dη n }, Lββ and Lγ γ are the observed Fisher information matrices, and V T = diag{ d (φ 1 ),..., d (φ n )}. For large n, we obtain GL y = NX(X WX) 1 X V 1 N (5) and GL t = H 1 γ V T Z(Z PZ) 1 Z H 1 γ. (6) Thus, the principal diagonal elements GLyii = ˆφ i ˆω i x i (X WX) 1 x i of (5) and GLtii = ˆpii z i (Z PZ) 1 z i of (6), for i = 1,..., n, may be interpreted as leverage measures on the predicted responses of the models (1) and (2), respectively Local influence Suppose that the log-likelihood function is expressed as L(θ) = n i=1 L i(θ), where L i (θ) denotes the contribution of the ith observation. If a perturbation scheme is applied in the model or data, the perturbed log-likelihood function takes the form L(θ δ), where δ = (δ 1,..., δ n ) is the perturbation vector and δ 0 denotes the no-perturbation vector, which satisfies L(θ δ 0 ) = L(θ). A measure of discrepancy between the perturbed and non-perturbed models is the likelihood displacement, LD(δ) = 2{L(ˆθ) L(ˆθ δ )}, where ˆθ δ denotes the maximum likelihood estimate under the perturbed model. The local influence approach (Cook, 1986) takes into account the influence of small perturbations in the model or data on the measure LD(δ). The main idea is to study the normal curvatures for β and γ in the unitary direction l available at ˆθ and δ 0. Such curvatures are expressed for large n as C l (β) = 2 l 1 K 1 ˆβ ˆβ 1l and C l (γ) = 2 l 2 K 1 ˆγ ˆγ 2l, respectively, where 1 and 2 are p n and q n matrices with elements 1ji = L(θ δ)/ β j ω i and 2ki = L(θ δ)/ γ k ω i for i = 1,..., n, j = 1,..., p and k = 1,..., q. In order to have a curvature invariant under uniform change of scale, Poon and Poon (1999) proposed the conformal normal curvature, defined, for large n, as B l (β) = l 1 K 1 ˆβ ˆβ 1l tr( 1 K 1 ˆβ ˆβ 1) 2 and B l (γ) = l 2 K 1 ˆγ ˆγ 2l. tr( 2 K 1 ˆγ ˆγ 2) 2 This curvature is characterized to allow for any unitary direction l that 0 B l 1. A suggestion is evaluating the normal curvature in the direction l = e i, where e i is an n 1 vector with 1 in the ith position and zeros in the remaining positions, and observing the index plot of B ei. We suggest using B ei > B + 4SE(B) to discriminate if an observation is influential or not, where B is the mean of B = {Bei, i = 1,..., n} and SE(B) denotes the standard error of B. Case-weight perturbation Under this perturbation scheme, we assume that L(θ δ) = n i=1 δ il i (θ), 0 δ i 1 and δ 0 = (1,..., 1). After some algebraic manipulation, we find that 1ji = ˆr Pi ( ˆφ i ˆω i ) x ij for j = 1,..., p and 2ji = ˆr Ti ˆpi z ij for j = 1,..., q, with r Pi = φ i (y i µ i )/ V i and r Ti = {t i + d (φ i )}/ d (φ i ) being the Pearson residuals for Y i and T i, respectively, for i = 1,..., n. Hence, for large n, we obtain C l (β) = 2 l DrP H DrP l (7) and C l (γ) = 2 l DrT R DrT l, (8) where DrP = diag{ˆr P1,..., ˆr Pn }, DrT = diag{ˆr T1,..., ˆr Tn }, H = W X(X WX) 1 X W, and R = P Z(Z PZ) 1 Z P. To assess the sensitivity of the parameter estimates ˆβ and ˆγ under the case-weight perturbation scheme, we can consider the largest directions l = l max in (7) and (8), which correspond to the eigenvectors relative to the largest eigenvalues of the matrices DrP H DrP and DrT R DrT, respectively. The total local influence may be also performed by evaluating the curvatures (7) and (8), respectively, in the direction of the ith observation, obtaining C i (β) = 2ĥ iiˆr 2 P i and C i (γ) = 2ˆr iiˆr 2 T i for i = 1,..., n, where ĥ ii and ˆr ii are the principal diagonal elements of the matrices H and R, respectively. Note that ĥii = GLyii and ˆr ii = GLtii.

5 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) Response perturbation Suppose that the ith observed response of the mean model (1) is perturbed as y iδ = y i + s yi δ i, where s yi is a consistent estimate of the standard deviation of Y i, δ i R and δ 0 = (0,..., 0) T. For φ i fixed, we have that 1ji = φ i x ij s yi ˆωi ˆV i, for j = 1,..., p and i = 1,..., n. Then, for large n, the normal curvature for β in the unitary direction l takes the form C l (β) = 2 l Ddy H Ddy l, where D dy = diag{s y1 /sd y1,..., s yn /sd yn } and sd yi denotes the standard deviation of Y i. Since s y i sd yi p 1, the total local influence, evaluating the curvature (9) in the direction of the ith observation, yields C i (β) ĥ ii. On the other hand, suppose for θ i fixed that the ith observed response of the precision model (2) is perturbed as t iδ = t i +s ti δ i, where s ti is a consistent estimate of the standard deviation of T i, δ i R and δ 0 = (0,..., 0) T. We found 2ji = s ti h (φ) z ij, for j = 1,..., q and i = 1,..., n. Then, for large n, the normal curvature for γ in the unitary direction l takes the form C l (γ) = 2 l Ddt R Ddt l, where D dt = diag{s t1 /sd t1,..., s tn /sd tn } and sd ti denotes the standard deviation of T i. Since s t i sd ti p 1, the total local influence, evaluating the curvature (10) in the direction of the ith observation, yields C i (β) ˆr ii. Explanatory variable perturbation Consider now the values of the tth explanatory variable of η i, assumed continuous, perturbed as x itδ = x it + s xt δ i, where s xt is an estimate of the standard deviation of X t, δ i R and δ 0 = (0,..., 0). Matrix 1 has elements 1ji = s xt ˆβ t ˆφ i x ij {ˆfi (y i ˆµ i ) ˆω i } for j = 1,..., p and j t, and 1ti = s xt ˆβ t ˆφ i x it {ˆfi (y i ˆµ i ) ˆω i } + s xt ˆφ i ˆωi ˆV i (y i ˆµ i ). The normal curvature for β in the unitary direction l yields C l (β) = 2 l 1 (X W X) 1 1 l, where 1 = s xt ˆβ t X { F Dr W} + sxt A 1 W V Dr, where F = diag{f 1,..., f n }, f i = d 2 θ i /dη 2 i, D r = diag{y 1 µ 1,..., y n µ n }, and A 1 is a p n matrix of zeros with 1s in the tth row. Similarly, consider the values of the tth explanatory variable of λ i, assumed continuous, perturbed as z itδ = z it + s zt δ i, where s zt is an estimate of the standard deviation of Z t, δ i R, and δ 0 = (0,..., 0). Matrix 2 has elements 2ji = s zt ˆγ t z ij [ĝ i {t i + d ( ˆφ i )} ˆp i ] for j = 1,..., q and j t, and 2ti = s xt ˆγ t z it [ĝ i {t i + d ( ˆφ i )} ˆp i ] + s zt h ( ˆφ i ) {t i + d ( ˆφ i )}. The normal curvature for γ in the unitary direction l yields C l (γ) = 2 l 2 (Z PZ) 1 2 l, where 2 = s zt ˆγ t Z { G Drt P} + szt A 2 H 1 γ D rt, where G = diag{g 1,..., g n }, g i = d 2 φ i /dλ 2, i D r t = diag{t 1 + d (φ 1 ),..., t n + d (φ n )}, and A 2 is a q n matrix of zeros with 1s in the tth row Residual analysis The aim of residual analysis is to assess departures from the assumptions made for the model, particularly for the error assumptions, and to detect outlying observations. Natural residuals in DGLMs are Pearson and deviance component residuals. Approximate standardized forms for the Pearson residuals ˆr Pi and ˆr Ti (see the Appendix) are given by t P1i = ˆφ i (y i ˆµ i ) (1 ĥ ii ) ˆVi and t P2i = {ˆti + d ( ˆφ i )}, d ( ˆφ i )(1 ˆr ii ) respectively. Deviance component residuals may be derived from the deviances of models (1) and (2). For model (1), one has D 1 (y; ˆµ) = n i=1 d2 1 (y i; ˆµ i ) (φ i fixed i), where d 2 1 (y i; ˆµ i ) = 2φ i [y i ( θ i ˆθ i ) + {b(ˆθ i ) b( θ i )}], with θ i = θ i (y i ) being the maximum likelihood estimate of θ i under the saturated model, and θ i satisfies b( θ i ) = y i. For model (2), the deviance takes the form D 2 (t; ˆφ) = n i=1 d2 2 (t i; ˆφ i ) (θ i fixed i), where φ = (φ 1,..., φ n ), d 2 2 (t i; ˆφ i ) = 2φ i [t i ( φ i ˆφ i ) + {d( ˆφ i ) d( φ i )}], with φ i = φ i (t i ) being the maximum likelihood estimate of φ i under the saturated model, and φ i satisfies d ( φ i ) = t i. Standardized forms are given by t D1i = ± d 2 1 (y i; ˆµ i ) ± d 2 2 and t D2i = (ˆti ; ˆφ i ), 1 ĥ 1 ˆr ii ii (9) (10) which may be supported by calculations of Cox and Snell (1968), where the signs are the same as those of (y i ˆµ i ) and {ˆti + d ( ˆφ)}, respectively. Even though the empirical distributions of the residuals t D1i and t D2i are not well known, we may suggest

6 48 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) Fig. 1. Robust boxplots of the texture for each snack type for all weeks (left) and across weeks for all snacks (right). Fig. 2. Profile of the means for each snack type across weeks (left) and profile of the variation coefficients for each snack type across weeks (right). performing a normal probability plot with generated envelope as suggested by Atkinson (1981) (see also Williams, 1987) to detect departures from the error assumptions as well as outlying observations in the mean and precision fitted models. 4. Application As illustration, we will consider a data set from an experiment developed in the School of Public Health University of São Paulo, in which four different forms of light snacks (named B, C, D, and E) were compared across 20 weeks with a traditional snack (named A). For the light snacks, the hydrogenated vegetable fat (hvf) was replaced by canola oil under different proportions: B (0% hvf, 22% canola oil), C (17% hvf, 5% canola oil), D (11% hvf, 11% canola oil) and E (5% hvf, 17% canola oil), whereas A (22% hvf, 0% canola oil). The experiment was conducted so that in each even week a random sample of 15 units of each snack type was analyzed in a laboratory and various variables were measured. Then, a total of 75 units was analyzed in each even week, making 750 units in total during the experiment (Paula et al., 2004). In this analysis we will only consider the variable texture, which will be compared across time among the five snack types. Fig. 1 presents the boxplots for the texture adjusted for asymmetric data (see Hubert and Vandervieren, 2008) for the five snack types and across weeks. We notice from both graphs skewed distributions to the right, with few extreme observations. The R package robustbase was used to construct the adjusted boxplots using the function adjbox. In Fig. 2, one has the mean and variation coefficient profiles for each snack type and across weeks. The means and variation coefficients seem to be different among the snacks, changing across weeks, with indication of quadratic tendencies for the means. ind Then, based on the graphs above we used the following double gamma models to fit the snack data: (1) Y ijk G(µ ij, φ ij ), (2) g(µ ij ) = η ij for η ij = β 0 + β i + β 6 weeks j or η ij = β 0 + β i + β 6 weeks j + β 7 weeks 2 j and (3) h(φ ij ) = λ ij for λ ij = γ 0 + γ i or λ ij = γ 0 + γ i + γ 6 weeks j or λ ij = γ 0 + γ i + γ 6 weeks j + γ 7 weeks 2 j, with g( ) identity and logarithmic and h( ) logarithmic, where Y ijk denotes the texture corresponding to the kth unit of the ith snack type in the jth week, for i = 1(A), 2(B), 3(C), 4(D), 5(D), j = 2, 4,..., 20 and k = 1,..., 15. The reciprocal link was not considered for g( ) due to difficulty in the parameter interpretation, and for other h( ) links the convergence was not attained. Because the aim of the study is to compare the snack types controlling the effects of the texture by time, we do not consider the possibility of

7 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) Fig. 3. Normal probability plots with generated envelopes for the deviance component residual for the mean model (left) and for the precision model (right). Table 2 Summary of the double gamma models fitted to the snacks data set. g( ) Predictor h( ) Predictor AIC BIC Identity G + W Log G G + W G + W + W G + W + W2 Log G G + W G + W + W Log G + W Log G G + W G + W + W G + W + W2 Log G G + W G + W + W G: group, W: weeks and W2: weeks 2. interactions among the snack types and the number of weeks. Table 2 summarizes the AIC (Akaike, 1973) and BIC (Schwarz, 1978) values for each fit, where AIC = 2L( θ) + 2(p + q) and BIC = 2L( θ) + (p + q) log(n). Thus, we selected with the smallest AIC and BIC values the following double gamma model: (1) Y ijk ind G(µ ij, φ ij ) with (2) log(µ ij ) = β 0 + β i + β 6 weeks j + β 7 weeks 2 j and (3) log(φ ij ) = γ 0 + γ i, for i = 1,..., 5 and j = 2, 4,..., 20. The parameter estimates are given in Table 3. We also fitted the final model under the restricted maximum likelihood method, and the parameter estimates were very similar to the ones given in Table 3. The normal probability plots for the deviance component residuals t D1i and t D2i with simulated envelopes (Fig. 3) do not present any unusual features. The graph of B ei against the number of weeks to assess the sensitivity of the mean coefficients is given in Fig. 4(a), and we notice eight observations for which B ei is above the cutoff line ( B + 4SE(B)). Such observations appear in the last weeks. In Fig. 4(b), one has the graph of B ei against the number of weeks to assess the sensitivity of the precision coefficients. Here, six observations appear with B ei above the cutoff line. It is interesting to notice that no outstanding observation corresponds to snack type A. Elimination of these observations does not change the inference for either the mean or the precision coefficients. Finally, the predicted mean and precision values for the texture across weeks are described in Figs. 5(a) and (b), respectively, for each snack type. We notice in Fig. 5(a) that snack type A has the largest mean values across weeks, followed by snack type C. Looking at Fig. 5(b), the largest predicted dispersions for the texture appear for the snack types A and C, respectively. The largest mean values for snack type A is expected, since it is the standard one, as well as the quadratic tendency for all snacks, because an aim of the study is to determine the ideal storing time, which seems to be about 12 weeks. 5. Conclusion In this paper, DGLMs are revisited. Useful diagnostic methods are derived and approximations are given for large n in which the bias estimates tend to be small under ML. Bias expressions up to order n 1 may be found, for instance, in Botter and Cordeiro (1998). The selected double gamma model presented an adequate fit to the snack data, as confirmed by the diagnostic analysis; however, other skew models could be applied, such as inverse Gaussian, log-normal, Weibull, and Birnbaum Saunders, among others. In particular, robust parameter estimates may be obtained by applying Birnbaum Saunders t

8 50 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) Table 3 Parameter estimates for the final double gamma model. Effect Mean Precision Estimate E/S.E. Estimate E/S.E. Intercept Group A Group B Group C Group D Group E Week Week Deviance d.f d.f Bei Bei Weeks Weeks Fig. 4. Graphs of B ei against weeks under the case-weight perturbation scheme for assessing the local influence on ˆβ (left) and on ˆγ (right), respectively. Predicted Mean Value Predicted Precision Weeks Weeks Fig. 5. Predicted mean value of texture (left) and predicted precision of texture (right) for each snack type across weeks. models (Paula et al., 2012). Extension of the diagnostic procedures discussed in this paper for small samples is more complex, and depends on the definition of appropriate influence measures for REMLEs to derive the curvatures of local influence. Concerning leverage measures, the approach proposed in Wei et al. (1998) may be adapted for REML estimation, and the residuals proposed in Section 3.3 may be applied under REML estimation; however, their empirical distributional properties should be considered in the analysis. Nevertheless, extension of the results for dispersion models (Jørgensen, 1997; Cordeiro et al., 1994) may be performed for large samples by adapting the expressions derived in this paper. The author developed codes in R to produce the plots given in Section 4 that may be available upon request. Acknowledgments The author is grateful to the editors and two anonymous referees. This work was supported by CNPq and FAPESP, Brazil.

9 G.A. Paula / Computational Statistics and Data Analysis 68 (2013) Appendix At the convergence of (3), the maximum likelihood estimate ˆβ may be interpreted as the least-squares solution of the linear regression of W ŷ on the columns of the matrix W X. Thus, there follows the relationship (I H) W Xŷ = W (ŷ ˆη), and, in particular, V (y ˆµ) = (I H) W ŷ. The approximation Var(ŷ ) = 1 W 1 supports the standardized form for t P1i (see, for instance, McCullagh and Nelder, 1989, p. 397). Similarly, at the convergence of (4), the maximum likelihood estimate ˆγ may be interpreted as the leastsquares solution of the linear regression of P ẑ on the columns of the matrix P Z. Thus, one has the relationship V 1 γ (ˆt ˆµT ) = (I R) P ẑ. The approximation Var(ẑ ) = P 1 supports the standardized form for t P2i. References Akaike, H., Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csàki, F. (Eds.), International Symposium on Information Theory. Akadémiai Kiadó, Budapest, Hungary, pp Atkinson, A.C., Two graphical display for outlying and influential observations in regression. Biometrika 68, Botter, D.A., Cordeiro, G.M., Improved estimators for generalized linear models with dispersion covariates. Journal of Statistical Computations and Simulation 62, Cook, R.D., Assessment of local influence (with discussion). Journal of the Royal Statistical Society. Series B 48, Cordeiro, G.M., Paula, G.A., Botter, D.A., Improved likelihood ratio tests for dispersion models. International Statistical Review 62, Cox, D.R., Snell, E.J., A general definition of residuals (with discussion). Journal of the Royal Statistical Society. Series B 30, Fahrmeir, L., Tutz, G., Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York. Hubert, H., Vandervieren, E., An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis 52, Jørgensen, B., The Theory of Dispersion Models. Chapman and Hall, London. McCullagh, P., Nelder, J.A., Generalized Linear Models, second ed. Chapman and Hall, London. Paula, G.A., de Moura, A.S., Yamaguchi, A.M., Sensorial stability of snacks with canola oil and hydrogenated vegetable fat. Technical Report. Center of Applied Statistics, University of São Paulo (in Portuguese). Paula, G.A., Leiva, V., Barros, M., Liu, S., Robust statistical modeling using Birnbaum Saunders-t distribution applied to insurance. Applied Stochastic Models in Business and Industry 28, Poon, W., Poon, Y.S., Conformal normal curvature and assessment of local influence. Journal of the Royal Statistical Society. Series B 61, Rigby, R.A., Stasinopoulos, D.M., Generalized additive models for location, scale and shape. Journal of Applied Statistics 54, Schwarz, C., Estimating the dimension of a model. Annals of Statistics 6, Smyth, G.K., Generalized linear models with varying dispersion. Journal of the Royal Statistical Society. Series B 51, Smyth, G.K., Jørgensen, B., Fitting Tweedie s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin 32, Smyth, G.K., Verbyla, A., Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10, Verbyla, A.P., Modelling variance heterogeneity: residual maximum likelihood and diagnostics. Journal of the Royal Statistical Society. Series B 55, Wei, B.C., Hu, Y.Q., Fung, W.K., Generalized leverage and its applications. Scandinavian Journal of Statistics 25, Williams, D.A., Generalized linear model diagnostic using the deviance and single case deletion. Applied Statistics 36,

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Francisco José A. Cysneiros 1 1 Departamento de Estatística - CCEN, Universidade Federal de Pernambuco, Recife - PE 5079-50

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

arxiv: v1 [stat.me] 20 Apr 2018

arxiv: v1 [stat.me] 20 Apr 2018 A new regression model for positive data arxiv:184.7734v1 stat.me Apr 18 Marcelo Bourguignon 1 Manoel Santos-Neto,3 and Mário de Castro 4 1 Departamento de Estatística, Universidade Federal do Rio Grande

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Double generalized linear compound poisson models to insurance claims data

Double generalized linear compound poisson models to insurance claims data Syddansk Universitet Double generalized linear compound poisson models to insurance claims data Andersen, Daniel Arnfeldt; Bonat, Wagner Hugo Published in: Electronic Journal of Applied Statistical Analysis

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

The Poisson-Weibull Regression Model

The Poisson-Weibull Regression Model Chilean Journal of Statistics Vol. 8, No. 1, April 2017, 25-51 Research Paper The Poisson-Weibull Regression Model Valdemiro Piedade Vigas 1, Giovana Oliveira Silva 2,, and Francisco Louzada 3 1 Instituto

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory

On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory On the Importance of Dispersion Modeling for Claims Reserving: Application of the Double GLM Theory Danaïl Davidov under the supervision of Jean-Philippe Boucher Département de mathématiques Université

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2009) 4482 4489 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda A log-extended

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research education use, including for instruction at the authors institution

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data Eduardo Elias Ribeiro Junior 1 2 Walmes Marques Zeviani 1 Wagner Hugo Bonat 1 Clarice Garcia

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Generalized Linear Models for a Dependent Aggregate Claims Model

Generalized Linear Models for a Dependent Aggregate Claims Model Generalized Linear Models for a Dependent Aggregate Claims Model Juliana Schulz A Thesis for The Department of Mathematics and Statistics Presented in Partial Fulfillment of the Requirements for the Degree

More information

Fisher information for generalised linear mixed models

Fisher information for generalised linear mixed models Journal of Multivariate Analysis 98 2007 1412 1416 www.elsevier.com/locate/jmva Fisher information for generalised linear mixed models M.P. Wand Department of Statistics, School of Mathematics and Statistics,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable

Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable Applied Mathematical Sciences, Vol. 5, 2011, no. 10, 459-476 Analysis of 2 n Factorial Experiments with Exponentially Distributed Response Variable S. C. Patil (Birajdar) Department of Statistics, Padmashree

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

Chapter 5: Generalized Linear Models

Chapter 5: Generalized Linear Models w w w. I C A 0 1 4. o r g Chapter 5: Generalized Linear Models b Curtis Gar Dean, FCAS, MAAA, CFA Ball State Universit: Center for Actuarial Science and Risk Management M Interest in Predictive Modeling

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

1 Theoretical Examples

1 Theoretical Examples Supplementary Document of Perturbation and Scaled Cook s Distances Hongtu Zhu, Joseph G. Ibrahim and Hyunsoon Cho Abstract We investigate two theoretical examples on generalized linear models and linear

More information

Horizonte, MG, Brazil b Departamento de F sica e Matem tica, Universidade Federal Rural de. Pernambuco, Recife, PE, Brazil

Horizonte, MG, Brazil b Departamento de F sica e Matem tica, Universidade Federal Rural de. Pernambuco, Recife, PE, Brazil This article was downloaded by:[cruz, Frederico R. B.] On: 23 May 2008 Access Details: [subscription number 793232361] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS

INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS Pak. J. Statist. 2016 Vol. 32(2), 81-96 INFERENCE FOR MULTIPLE LINEAR REGRESSION MODEL WITH EXTENDED SKEW NORMAL ERRORS A.A. Alhamide 1, K. Ibrahim 1 M.T. Alodat 2 1 Statistics Program, School of Mathematical

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Some explanations about the IWLS algorithm to fit generalized linear models

Some explanations about the IWLS algorithm to fit generalized linear models Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang To cite this version: Christophe Dutang. Some explanations about the IWLS algorithm to fit generalized linear

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software October 2014, Volume 61, Code Snippet 2. http://www.jstatsoft.org/ mdscore: An R Package to Compute Improved Score Tests in Generalized Linear Models Antonio Hermes

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded=- 10.95415pt,/pgf/stepy/.expanded=- 10.95415pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

The Log-generalized inverse Weibull Regression Model

The Log-generalized inverse Weibull Regression Model The Log-generalized inverse Weibull Regression Model Felipe R. S. de Gusmão Universidade Federal Rural de Pernambuco Cintia M. L. Ferreira Universidade Federal Rural de Pernambuco Sílvio F. A. X. Júnior

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. Citation Electronic Journal of Statistics, 2013, v. 7, p.

Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. Citation Electronic Journal of Statistics, 2013, v. 7, p. Title Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation Author(s) Yin, G; Ma, Y Citation Electronic Journal of Statistics, 2013, v. 7, p. 412-427 Issued Date 2013 URL http://hdl.handle.net/10722/189457

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

A test for improved forecasting performance at higher lead times

A test for improved forecasting performance at higher lead times A test for improved forecasting performance at higher lead times John Haywood and Granville Tunnicliffe Wilson September 3 Abstract Tiao and Xu (1993) proposed a test of whether a time series model, estimated

More information

Submitted to the Brazilian Journal of Probability and Statistics. Bootstrap-based testing inference in beta regressions

Submitted to the Brazilian Journal of Probability and Statistics. Bootstrap-based testing inference in beta regressions Submitted to the Brazilian Journal of Probability and Statistics Bootstrap-based testing inference in beta regressions Fábio P. Lima and Francisco Cribari-Neto Universidade Federal de Pernambuco Abstract.

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36 20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite

More information

Improved maximum likelihood estimators in a heteroskedastic errors-in-variables model

Improved maximum likelihood estimators in a heteroskedastic errors-in-variables model Statistical Papers manuscript No. (will be inserted by the editor) Improved maximum likelihood estimators in a heteroskedastic errors-in-variables model Alexandre G. Patriota Artur J. Lemonte Heleno Bolfarine

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Chapter 9. Model Assessment

Chapter 9. Model Assessment Chapter 9 Model Assessment In statistical modeling, once one has formulated a model and produced estimates and inferential quantities, the question remains of whether the model is adequate for its intended

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information