Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Francisco José A. Cysneiros 1 1 Departamento de Estatística - CCEN, Universidade Federal de Pernambuco, Recife - PE 5079-50 - Brazil, e-mail: cysneiros@de.ufpe.br Abstract: This work extends some diagnostics procedures to heteroscedastic symmetrical linear models. This class of models includes all symmetric continuous distributions, such as normal, Student-t, generalized Student-t, exponential power and logistic, among others. We present an iterative process for the parameter estimation and we derive the appropriate matrices for assessing the local influence under perturbation schemes. An standardized residual is deduced and illustrative example is given. S-Plus codes are available in the address www.de.ufpe.br/ cysneiros/elliptical/ heteroscedastic.html to implement the author s method. Keywords: Symmetrical distributions; Local influence; Residuals; Heteroscedastic models; Robust models. 1 Heteroscedastic symmetrical linear models The problem of modelling variances has been discussed by various authors, particularly in the econometric area. Under normal error, for instance, Cook and Weisberg (1983) present some graphical methods to detect heteroscedasticity. Smyth (1989) describes a method which allows modelling the dispersion parameter in some generalized linear models. Moving away from normal error, let ɛ i, i = 1,..., n, be independent random variables with density function of the form f ɛi (ɛ) = 1 φi g{ɛ 2 /φ i }, ɛ IR, (1) where φ i > 0 is the scale parameter, g : IR [0, ] is such that 0 g(u)du <. We shall denote ɛ i S(0, φ i ). The function g( ) is called density generator (see, for example, Fang, Kotz and Ng, 1990). We consider the linear regression model y i = µ i + φ i ɛ i, (2) where y = (y 1,..., y n ) T are the observed response values, µ i = x T i β, x i = (x i1,..., x ip ) T has values of p explanatory variables, β = (β 1,..., β p ) T and ɛ i S(0, 1). We have, when they exist, that E(Y i ) = µ i and Var(Y i ) =

2 Local Influence and Leverage in Heteroscedastic Symmetrical Linear Models ξφ i, where ξ > 0 is a constant given by ξ = 2ϕ (0), while ϕ (0) = dϕ(u)/du u=0 with ϕ( ) being a function such that ς(t) = e itµ ϕ(t 2 φ), t IR, where ς(t) = E(e ity ) is the characteristic function. We call the model defined by (1)-(2) heteroscedastic symmetrical linear model. We assume that the dispersion parameter φ i is parameterized as φ i = h(τ i ), where h( ) is a known one-to-one continuously differentiable function and τ i = z T i γ, where Z i = (z i1,..., z iq ) T has values of q explanatory variables and γ = (γ 1,..., γ q ) T. The function h( ) is usually called dispersion link function and it must be a positive-value function. One possible choice for h( ) is h(τ) = exp(τ). The dispersion covariates z i s are not necessarily the same location covariates x i s. It can be shown that β and γ are globally orthogonal parameters and the Fisher information matrix K for θ is blockdiagonal, namely K = diag{k β, K γ }. The Fisher information matrices K β and K γ for β and γ are given by K β = X T W 1 X and K γ = Z T W 2 Z, where W 1 = diag{d g /φ i } and W 2 = diag{ (fg 1)h i φ 2 i 2 }, for i = 1,..., n, where X is a n p matrix with rows x T i, v i = 2W g (u i ), u i = (y i µ i ) 2 /φ i, W g (u) = g (u) g(u), g (u) = g(u) u, h i = h(τ i) τ i and Z is a n q matrix with rows z T i. An iterative process to get the maximum likelihood estimates of β and γ may be developed by using, for example, the scoring Fisher method, which leads to the following system of equations: X T W (k) 1 Xβ(k+1) = X T W (k) 1 z(k) β and Z T W (k) 2 Zγ(k+1) = Z T W (k) 2 z(k) γ, where z β and z γ are n 1 vectors whose components take the forms z βi = µ i + v i d g (y i µ i ) and z γi = τ i + 2φ i (f g 1)h (v i u i 1), i d g = E{W 2 g (U 2 )U 2 } and f g = E{W 2 g (U 2 )U } with U S(0, 1). For example, the Student-t distribution with ν degrees of freedom one has d g = (ν + 1)/(ν + 3) and f g = 3(ν + 1)/(ν + 3). 2 Local influence The idea behind local influence is concerned with the study of the behaviour of some influence measure around the vector of no perturbation ω 0. For example, if the likelihood displacement LD(ω) = 2{L(ˆθ) L(ˆθ ω )} is used, where ˆθ ω denotes the maximum likelihood estimate under the perturbed model, the suggestion of Cook (1986) is to investigate the normal curvature of the lifted line LD(ω 0 + al), where a IR, around a = 0 for an arbitrary direction l, l = 1. He shows that the normal curvature may be expressed in the general form C l (θ) = 2 l T T L 1 θθ l, where is a (p + q) s matrix with elements ij = 2 L(θ ω)/ θ i ω j, i = 1,..., p + q and j = 1,..., s, with all the quantities evaluated at ˆθ.

Cysneiros 3 Lesaffre and Verbeke (1998) suggest evaluating the normal curvature at the direction of the ith observation, that consists in evaluating C l (θ) at the n 1 vector l i formed by zeros with one at the ith position. Paula et al. (2003) discuss some diagnostics procedures in homoscedastic symmetrical nonlinear regression models. Suppose the log-likelihood function for θ expressed as L(θ ω) = n i=1 ω ilog{g(u i )/ φ i }, where 0 ω j 1 is a case weights. Under this perturbation scheme the matrix T takes the form T = [D(g)D(e)X, D(m)Z] T where D(g) = diag{g 1,..., g n }, g i = vi φ i, D(m) = diag{m 1,..., m n }, m i = h i 2φ i (v i u i 1), D(e) = diag{e 1,..., e n } and e i = y i µ i. 3 Local influence on predictions Let q a p 1 vector explanatory variables values, for which we do not have necessarily an observed response. Then, the prediction at q is ˆµ(q) = p j=1 q j ˆβ j. Analogously, the point prediction at q based on the perturbed model becomes ˆµ(q, ω) = p j=1 q j ˆβ jw, where ˆβ ω = ( ˆβ 1ω,..., ˆβ pω ) T denotes the maximum likelihood estimate from the perturbed model. Thomas and Cook (1990) have investigated the effect of small perturbations on predictions at some particular point q in continuous generalized linear models. The objective function f(q, ω) = {ˆµ(q) ˆµ(q, ω)} 2 was chosen due to simplicity and invariance with respect to scale change. The normal curvature at the unit direction l takes, in this case, the form C l = l T fl, where f = 2 f/ ω ω T = 2 T ( L 1 ββ qqt L 1 ββ ), is evaluated at ω 0 and ˆβ. One has that l max (q) T L 1 ββ q. Consider an additive perturbation on the ith response, namely y iω = y i + ω i s i, where s i may be an estimate of the standardized deviation of y i and ω i IR. Then, the matrix equals X T D(a)D(s), where D(s) = diag{s 1,..., s n } and D(a) = diag{a 1,..., a n } a i = 1 φ i {v i W g(u i )u i }.. The vector l max (q) is constructed here by taking q = x i, which corresponds to the n 1 vector l max (x i ) D(s)D(a)X{X T D(a)X} 1 x i. A large value for l maxi (x i ) indicates that the ith observation should have substantial local influence on ŷ i. Then, the suggestion is to take the plot of the n 1 vector (l max1 (x 1 ),..., l maxn (x n )) T in order to identify those observations with high influence on its own fitted value. Residuals Because we have a symmetrical class of errors it is reasonable to think on the residual r i = y i ŷ i to perform residual analysis. A standardized version for r i may be attained by using the expansions up to order n 1 by Cox and Snell (1968). After some algebraic manipulations, we find that E(r) = 0 and Var(r) = ξφ{i n (d g ξ) 1 H},

Local Influence and Leverage in Heteroscedastic Symmetrical Linear Models where H = Φ 1/2 X(X T Φ 1 X) 1 X T Φ 1/2 and Φ = diag{φ 1,..., φ n }, I n is the identity matrix of order n, Therefore, a standardized form for r i is given by (y i ŷ i ) t ri = ˆφi ξ{1 (d g ξ) 1ĥ ii}. Simulation studies omitted here indicate that t ri has mean approximately zero, variance exceeding one, negligible skewness and some kurtosis. 5 Application To illustrate an application we shall consider the data set described in Montgomery et al. (2001, Table 3.2). The interest is on predicting the amount of time required by the router driver to service of vending machines in an outlet. The service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. They fitted a homoscedastic linear regression model with intercept where the response variable was the delivery time, y (min), the covariates were the number of cases of producted stocked (x 1 ) and the distance walked by the route driver (x 2 ) in a sample of 25 observations. In their diagnostic analysis, points 9 and 22 appear with large effects on the parameter estimates ( see Montgomery et al. 2001, pp. 0,3,5,6,7). We propose to fit heteroscedastic linear models under error distributions with heavier tails than the normal ones, namely y i = β 0 + β 1 x i1 + β 1 x i2 + φ i ɛ i, i = 1,..., 25 (3) with φ i = exp{α + γx i2 } and ɛ i S(0, 1) mutually independent errors. We tried various error distributions but only two models seem to fit the data as well as or better than the normal model, the Student-t with degrees of freedom and the logistic-ii models. The generated envelopes for the three postulated models do not present any unusual features, (see Figure 1). Figure 1 also presents the plots of C i under normal, Student-t and logistic-ii errors. Influential observations appear in Student-t model with smaller values than normal and logistic-ii models. Acknowledgments: The author received financial support from CNPq, Brazil. References Cook, R.D. (1986) Assessment of local influence (with discussion). Journal of the Royal Statistical Society, Series B, 8, 133-169.

Cysneiros 5 - -2 0 2-3 3 9 18 18 FIGURE 1. Envelopes and plots of C i under the normal (left), Student-t with d.f. (middle) and logistic-ii (right) fitted on the delivery data. Cook, R.D. and Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika 70, 1-10 Cox, D.R. and Snell, E.J. (1968). A general definition of residuals. Journal of the Royal Statistical Society, Series B, 30, 28-275. Fang, K.T., Kotz, S. and Ng, K.W. (1990). Symmetric Multivariate and Related Distributions. London: Chapman & Hall. Lesaffre, F. and Verbeke, G. (1998). Local influence in linear mixed models. Biometrics 5, 579-582. Montgomery, D.C.; Peck, E.A. and Vining, G.G. (2001). Introduction to Linear Regression Analysis, 3rd ed. New York: Wiley. Paula, G.A.; Cysneiros, F.J.A. and Galea, M. (2003). Local influence and Leverage in elliptical Nonlinear Regression Models. In: Proceedings of the 18th International Workshop on Statistical Modelling; Verbeke, G., Molenberghs, G; Aerts, A. and Fieuws, S. (Eds). Leuven: Katholieke Universiteit Leuven, pp. 361-366. Smyth, G.K. (1989). Generalized linear models with varying dispersion. Journal of the Royal Statistical Society, Series B, 51, 7-60. Thomas,W. and Cook, R.D. (1990). Assessing influence on predictions from generalized linear models. Technometrics 32, 59-65.