Introduction to Generalized Linear Models

Similar documents
Chapter 11: Simple Linear Regression and Correlation

Comparison of Regression Lines

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Negative Binomial Regression

Diagnostics in Poisson Regression. Models - Residual Analysis

STAT 3008 Applied Regression Analysis

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

x i1 =1 for all i (the constant ).

Kernel Methods and SVMs Extension

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Polynomial Regression Models

Chapter 5 Multilevel Models

/ n ) are compared. The logic is: if the two

Chapter 13: Multiple Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 8 Indicator Variables

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Ordinary Least Squares (OLS) Estimator

Statistics for Economics & Business

Composite Hypotheses testing

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Linear Regression Analysis: Terminology and Notation

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 15 Student Lecture Notes 15-1

x = , so that calculated

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

4.3 Poisson Regression

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Generalized Linear Methods

Lecture 4 Hypothesis Testing

Economics 130. Lecture 4 Simple Linear Regression Continued

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Limited Dependent Variables

Joint Statistical Meetings - Biopharmaceutical Section

Linear Approximation with Regularization and Moving Least Squares

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

ANOVA. The Observations y ij

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Basically, if you have a dummy dependent variable you will be estimating a probability.

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Lecture 6: Introduction to Linear Regression

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Statistics for Business and Economics

Introduction to Regression

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Properties of Least Squares

e i is a random error

First Year Examination Department of Statistics, University of Florida

18. SIMPLE LINEAR REGRESSION III

III. Econometric Methodology Regression Analysis

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

More metrics on cartesian products

Chapter 14: Logit and Probit Models for Categorical Response Variables

Lecture 12: Discrete Laplacian

28. SIMPLE LINEAR REGRESSION III

Chapter 6. Supplemental Text Material

January Examinations 2015

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Basic Business Statistics, 10/e

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Semiparametric geographically weighted generalised linear modelling in GWR 4.0

The Geometry of Logit and Probit

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Rockefeller College University at Albany

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

SIMPLE LINEAR REGRESSION

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

Lecture 3 Stat102, Spring 2007

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Learning Objectives for Chapter 11

Modeling and Simulation NETW 707

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

NUMERICAL DIFFERENTIATION

Unit 10: Simple Linear Regression and Correlation

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

9. Binary Dependent Variables

Topic- 11 The Analysis of Variance

Lecture 10 Support Vector Machines II

Laboratory 1c: Method of Least Squares

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

An R implementation of bootstrap procedures for mixed models

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

Lecture 6 More on Complete Randomized Block Design (RBD)

Hydrological statistics. Hydrological statistics and extremes

Laboratory 3: Method of Least Squares

Transcription:

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 Introducton to Generalzed Lnear Models I. Motvaton In ths lecture we extend the deas of lnear regresson to the more general dea of a generalzed lnear model (GLM. The essence of lnear models s that the response varable s contnuous and normally dstrbuted: here we relax these assumptons and consder cases where the response varable s non-normal and n partcular has a dscrete dstrbuton. Although these models are more general than lnear models, nearly all of the technques for testng hypotheses regardng the regresson coeffcents, and checkng the assumptons of the model apply drectly to a glm. In addton, a lnear model s smply a specal type of a generalsed lnear model and thus all of the dscusson below apples equally to lnear models. In Lecture we saw that a typcal statstcal model can be expressed as an equaton that equates the mean(s of the response varable(s to some functon of a lnear combnaton of the explanatory varables: E [ Y X x] = η β + β X + + β X η[ LC( X ; ] ( 0 p p = β = L, ( In equaton (, the form of the functon η( s known, as are Y and X (the latter for a partcular choce of explanatory varables. However, the parameters of the model, β = β, β, K, β, are not known and must be estmated. The smple lnear model s a ( 0 p specal case of the above n whch the functon η( s the dentty functon. Generalzed lnear models extend the deas underlyng the smple lnear model 0 p p, ~, ( µ E[ Y X ] = β + β X + β X + L + β X Y N σ ( where the Y are ndependent, to the followng more general stuatons:. Response varables can follow dstrbutons other than the Normal dstrbuton. They can be dscrete, or logcal (one of two categores.. The relatonshp between the response and the explanatory varables does not have to take on the smple lnear form above.

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 GLMs are based on the exponental famly of dstrbutons that shares many of the desrable statstcal propertes of the Normal dstrbuton. Consder a sngle random varable Y whose probablty dstrbuton depends only on a sngle parameter θ. The dstrbuton of Y belongs to the exponental famly f t can be wrtten n the form f ( y, θ = exp[ { yθ b( θ } a( φ + c( y, φ ] (3 where a, b and c are known functons. Furthermore, φ s a scale parameter that s treated as a nusance parameter f t s unknown. If φ s known, ths s an exponental-famly model wth canoncal parameter θ. Some well-known dstrbutons that belong to the exponental famly of dstrbutons nclude the Normal, exponental, Posson and bnomal dstrbutons. For example, consder the dscrete random varable Y that follows a Posson dstrbuton wth parameter λ. The probablty functon for Y s f ( y, λ y λ e = y! where y takes the values 0,,, K. Ths can be rewrtten as ( y, λ exp( y log λ λ log y! = exp( yθ exp( θ log y! f y, θ f = = (. Ths s exactly the form n (3 wth θ = log λ, φ =, ( φ =, b θ = exp θ and c( y, θ = log y!. Smlarly, all the other dstrbutons n the exponental famly can be rewrtten n form (. λ ( a ( ( In the case of GLMs we requre an extenson of numercal estmaton methods to estmate the parameters β from the lnear model n (, to a more general stuaton where there s some non-lnear functon, g, relatng x β, that s g ( µ x β = ( Y E = µ to the lnear component where g s called the lnk functon. The estmaton of the parameters s typcally based on maxmum lkelhood. Although explct mathematcal expressons can be found for the estmators n some specal cases, numercal optmsaton methods are usually needed. These methods are ncluded n most modern statstcal software packages. It s not the am of ths course to go nto any detal on estmaton, snce we wll focus on the applcaton of these models rather than ther estmaton. GLMs are extensvely used n the analyss of bnary data (e.g. logstc regresson and count data (e.g. Posson and log-lnear models. We wll consder some of these applcatons n the followng lecture. In ths lecture we wll ntroduce the basc GLM setup and assumptons.

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 II. Basc GLM setup and assumptons We wll defne the generalzed lnear model n terms of a set of ndependent random varables Y, Y,, each wth a dstrbuton from the exponental famly of K Y N dstrbutons. The generalzed lnear model has three components:. The random component: the response varables, Y, Y, K,Y N, are assumed to share the same dstrbuton from the exponental famly of dstrbutons ntroduced n the prevous secton, wth E = and constant varance σ. ( Ths part descrbes how the response varable s dstrbuted.. The systematc component: covarates (explanatory varables x, x, K, x p produce a lnear predctor η gven by η = p x j = β j 3. The lnk functon, g, between the random and systematc components. It descrbes how the covarates are related to the random component,.e. η = g ( µ, where µ = E( Y and can be any monotone dfferentable functon. An mportant aspect of generalzed lnear models that we need to keep n mnd s that they assume ndependent (or at least uncorrelated observatons. A second mportant assumpton s that there s a sngle error term n the model. Ths corresponds to Assumpton for the lnear model, namely, that the only error n the model has to do wth the response varable: we wll assume that the X varables are measured wthout error. In the case of generalzed lnear models we no longer assume constant varance of the resduals, although we stll have to know how the varance depends on the mean. The varance functon V ( µ relates the varance of Y s related to ts mean µ. The form of ths functon s determned by the dstrbuton that s assumed. Y µ III. Goodness of ft and comparng models Overvew A very mportant aspect of generalzed models, and ndeed all statstcal models (Lectures -4, s to evaluate the relevance of our model for our data and how well t fts the data. In statstcal terms ths s referred to as goodness of ft. We are also nterested n comparng dfferent models and selectng a model that s reasonably smple, but that provdes a good ft. Ths nvolves fndng a balance between mprovng the ft on one sde wthout unnecessary ncreasng the complexty of the model on the other. In a statstcal modellng framework we perform hypothess tests to compare how well two related models ft the data. In the generalzed lnear model framework, the two models we compare should have the same probablty dstrbuton and lnk functon, but they 3

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 can dffer wth regards to the number of parameters n the systematc component of the model. The smpler model relatng to the null hypothess s therefore a specal case of the other more general model. If the smpler model fts the data as well as the more general model t s retaned on the grounds of parsmony and cannot be rejected. If the more general model provdes a sgnfcantly better ft than the smple model, then s rejected n favour of the alternatve hypothess H, whch corresponds to the H 0 more general model. In order to make these comparsons we need goodness of ft statstcs to descrbe how well the models ft. These statstcs can be based on any of a number of crtera such as the maxmum value of the log-lkelhood functon, the mnmum value of the sum of squares crteron or a composte statstc based on the resduals. If f Y ( y, θ s the densty functon for a random varable Y gven the parameter θ, then the log-lkelhood based on a set of ndependent observatons of Y, y, y,k, y n, s then defned as l ( µ, y = log f ( y, θ where µ = ( µ, µ, K, µ n. It s mportant to note the subtle shft n emphass from the densty functon. In the densty functon f ( y, θ s consdered as a functon n y for fxed θ, whereas the log-lkelhood s prmarly consdered as a functon of θ for the partcular data observed ( y, y, K,. y n In order to test the hypotheses above, samplng dstrbutons of the goodness of ft statstcs are requred. In the followng subsecton we consder one such goodness of ft crteron, the devance, n a bt more detal. Fnally, we wll say somethng about the choce of scale for the analyss. It s an mportant aspect of the model selecton process, although scalng problems are consderably reduced n the generalzed lnear model setup. The normalty and constant varance assumpton of the lnear regresson model s for nstance no longer a requrement. The choce of scale s largely dependent on the purpose for whch the scale wll be used. It s also mportant to keep n mnd that no sngle scale wll smultaneously produce all the desred propertes. H 0 H 0 The Devance One way of assessng the adequacy of a model s to compare t wth a more general model wth the maxmum number of parameters that can be estmated. It s referred to as the saturated model. In the saturated model there s bascally one parameter per observaton. The devance assesses the goodness of ft for the model by lookng at the dfference between the log-lkelhood functons of the saturated model and the model under nvestgaton,.e. l( b sat, y l( b, y. Here b sat denotes the maxmum lkelhood estmator of the parameter vector of the saturated model, β, and b s the maxmum sat lkelhood estmator of the parameters of the model under nvestgaton, β. The 4

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 maxmum lkelhood estmator s the estmator that maxmses the lkelhood functon. The devance s defned as { l( b, y ( b y } D = sat l,. A small devance mples a good ft. The samplng dstrbuton of the devance s approxmately χ ( m p, υ, where ν s the non-centralty parameter. The devance has an exact χ dstrbuton f the response varables Y are normally dstrbuted. In ths ( Y case, however, D depends on var = σ whch, n practce, s usually unknown. Ths prevents the drect use of the devance as a goodness of ft statstc n ths case. For other dstrbutons of the Y, the devance may only be approxmately ch-square. It must be noted that ths approxmaton can be very poor for lmted amounts of data. In the case of the bnomal and Posson dstrbutons, for example, D can be calculated and used drectly as a goodness of ft statstc. If the scale parameter φ s unknown or known to have a value other than one, we us a scaled verson of the devance and we call t the scaled devance. D φ The devance forms the bass for most hypothess testng for generalzed lnear models. Suppose we are nterested n comparng the ft of two models. These models need to have the same probablty dstrbuton and the same lnk functon. The models also need to be herarchcal, whch means that the systematc component of the smpler model s a specal case of the lnear component of the more general model M. M 0 Consder the null hypothess H 0 : β = β 0 β = M β q that corresponds to model M 0 and a more general hypothess H : β = β β = M β p that corresponds to model M, wth q < p < N. We can test aganst H usng the dfference of the devance statstcs D = D H 0 D = { l( b, y l( b, y } { l( b, y l( b, y } = { l( b, y ( b y } 0 sat 0 sat l 0, 5

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 If both models descrbe the data well, then D 0 follows a χ ( N q dstrbuton and D follows a χ ( N p dstrbuton. It then follows that D has a χ ( p q dstrbuton under certan ndependence assumptons. If the value of D s consstent wth the χ ( p q dstrbuton we would generally choose the model correspondng to because t s smpler. If the value of D s n the crtcal regon of the χ p q dstrbuton we would reject n favour of H on the grounds that model H 0 provdes a sgnfcantly better descrpton of the data. It must be noted that ths model too may not ft the data partcularly well. M 0 H 0 ( If the devance can be calculated from the data, then D provdes a good method for hypothess testng. The samplng dstrbuton of D s usually better approxmated by the ch-squared dstrbuton than s the samplng dstrbuton of a sngle devance. M Model checkng As was the case for the smple lnear model, we should perform model checkng after we have ftted a partcular generalzed lnear model. As before we should look at whether the model s reasonable and we should nvestgate whether the varous assumptons we make when we ft and draw nference usng the model are satsfed. If the checks and nvestgatons do reveal that there s a problem, then there are a number of dfferent solutons avalable to us. These were dscussed n Lecture Notes on model checkng for lnear models. In ths secton we dscuss graphcal technques avalable to us for tryng to detect systematc departures from our generalzed lnear model that may for example be the result of an ncorrectly specfed lnk functon, varance functon or a msspecfcaton of the explanatory varables n the systematc component n the model. As was the case for the lnear model, many of these graphcal technques ental usng the resdual values from our ftted model. For generalzed lnear models we requre an extended defnton of resduals, applcable to dstrbutons other than the Normal dstrbuton. Many of these these resduals are dscussed n detal n McCullagh and Nelder (989. These resduals can be used to assess varous aspects of the adequacy of a ftted generalzed lnear model that we have mentoned above. The resduals should be unrelated to the explanatory varables and they can also be use to dentfy any unusual values that may requre some further nvestgaton. Varous plots of the resduals can be used to assess these propertes. Systematc pattern n the resdual plots can for example be ndcatve of an unsutable lnk functon, wrong scale of one or more of the predctors, or omsson of a quadratc term n a predctor. Examples of extended defntons of resduals that are wdely used n model checkng for GLMs nclude the Pearson and devance resduals. The Pearson resduals are just rescaled versons of the raw or response resduals and are defned as 6

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 r P ( y ˆ µ V ( ˆ µ =. ( Here V µ s the varance functon. The name s taken from the fact that for the Posson dstrbuton the Pearson resdual s just the sgned square root of the component of the Pearson X goodness-of-ft statstc, so that r p = X. If the devance s used as a measure of dscrepancy of a generalzed lnear model, then each unt contrbutes a quantty d to the devance, so that D = d. The devance resdual s defned as D ( y d r = sgn µ. Most often standardsed versons of the above resduals are used n model checkng. The standardsed versons of the Pearson and devance resdual are gven by r P = ˆ φ ( y ˆ µ V ( ˆ µ ( h and r D = φˆ r D ( h respectvely. In the above h s the equvalent to the leverage that we have defned for the lnear model. In general the devance resdual, ether unstandardzed or standardzed, s preferred to the Pearson resdual. Below we wll dscuss a few basc plots of the (standardzed resduals that can be used to check the valdty of our model.. Informal checks usng the resduals It s almost standard procedure to consder scatterplots of the resduals aganst some functon of the ftted values. Scatterplots of the standardzed devance resduals aganst the estmated lnear predctor ηˆ or aganst the ftted values transformed to a constant scale of the error dstrbuton are recommended for ths purpose. For a few commonly used error dstrbutons the followng transformed values are recommended: µˆ for Normal errors, µˆ for Posson errors, sn µˆ for bnomal errors, log µˆ for gamma errors. The plot should be centred at ˆ µ = 0 and a constant range. Typcal devatons from ths pattern nclude curvature n the resduals wth the mean and a systematc change n 7

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 the range of the resduals wth ftted value. Curvature may arse from several causes, ncludng the wrong choce of lnk functon, wrong choce of scale of one or more of the covarates, or omsson of a quadratc term n a covarate. A second useful scatterplot s that of the resduals aganst an explanatory varable n the lnear predctor. Ths plot should exhbt the same form as the plot above. The presence of systematc trend usually arses for the same reasons as for the plot above. Addtonally, the trend may also be the result of a faulty scale n another explanatory varable that s closely correlated wth the one under nvestgaton. The thrd scatterplot that we wll consder s termed an added-varable plot. Ths s equvalent to the partal regresson plot that we consdered for the lnear model. Ths plot helps us to check f an omtted explanatory varable should be ncluded n the lnear predctor. The added-varable plot for a partcular canddate explanatory varable s formed by (a. fndng the unstandardzed resduals for the exstng generalzed lnear model wth response varable Y and any already ncluded explanatory varables, (b. fndng the unstandardzed resduals for another lnear model n whch the canddate explanatory varable s treated as the response, usng the same lnear predctor as for Y (here, the canddate explanatory varable s treated as the response varable, and (c. plottng the frst set of resduals aganst the second set. The presence of a trend n the ponts n ths plot ndcates that you mght consder ncludng the partcular canddate varable n the model as an explanatory varable, and the shape of the trend can tell you what forms of the varable you mght nclude.. Checkng the varance functon A plot of the absolute resduals aganst the ftted values gves an nformal check on the adequacy of the assumed varance functon. An ll-chosen varance functon wll result n a trend n the mean. A postve trend ndcates that the current varance functon s ncreasng to slowly wth the mean, and vce versa. 3. Checkng the lnk functon An nformal check nvolves examnng the plot of the scale-adjusted dependent varable aganst ηˆ, the estmated lnear predctor. Ths should approxmately be a straght lne. For lnk functons of the power famly an upward curvature n the plot ponts to a lnk wth hgher power than that used. A downward curvature ponts to a lower power. For bnary data ths plot s unnformatve. McCullagh and Nelder (989 dscuss more formal methods for ths stuaton. Note that checks for the lnk functon are affected by falure to establsh the correct scales for one or more of the explanatory varables n the lnear predctor. Ths can be valdated usng partal resdual plots. They are descrbed below. 4. Checkng the scales of explanatory varables The partal resdual plot s a useful tool for checkng whether the correct scales have been used for the explanatory varables. In ts generalzed form the partal resdual s 8

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 defned by u = z ˆ η + ˆ γx where z s the adjusted dependent varable, ηˆ the ftted lnear predctor and γˆ the parameter estmate for the explanatory varable x. The plot of u aganst x provdes an nformal check f the scale of x s satsfactory. A correctly specfed scale should result n an approxmately lnear plot. The form of the plot may suggest a sutable alternatve f the scale s not approprate. Note that dstortons n ths plot may also occur when the scales of other explanatory varables are wrong, whch may requre that we look at partal resdual plots for several explanatory varables. 5. Checks for outlyng or nfluental ponts We also need to check for ndvdual ponts that may dffer from the general pattern set by the remander of the ponts. We defned the leverage h for ndvdual ponts n the context of the lnear model by whch to judge ther nfluence on the ft. We can consder these measures for GLMs as well, but we must note that a pont n the extreme of the explanatory varable s range wll not necessarly have a hgh leverage f ts weght s small. The Cook s dstance was ntroduced as a measure of nfluence for the lnear model n a prevous lecture. Adapted versons of the Cook s dstance can be used for generalzed lnear models. The Studentzed resduals e * = r * = Y ˆ Y( seˆ( Y Yˆ ( that were ntroduced n the context of the lnear model can also be used to assess the consstency of ndvdual ponts. One-step approxmatons of these resduals exst that are approprate for GLMs. To nterpret the n values that we get for the leverage, Cook s dstance and Studentzed resduals respectvely, we need some measure to assess how large extreme values would be n a sample of a gven sze even f no unusual ponts were present. Normal plots can be used for ths purpose. There are two forms of Normal plots: the half-normal plot and the Normal plot. We wll not go nto the theoretcal detal underlyng these plots. We wll only menton that the half- Normal plot s approprate for non-negatve quanttes lke the leverage and Cook s dstance, whle for the Studentzed resduals there are two optons, ether a half- Normal plot of * r or a full Normal plot of r tself. For ether plot the ordered values of the statstc are plotted aganst the expected order statstcs of a Normal sample. Extreme ponts wll appear at the extremes of the plot, and may possbly devate from the trend ndcated by the remander of the ponts. Note that s trend s not necessarly lnear n the case of the leverage or Cook s dstances. * 9

INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 6. Checks for correlatons n the errors An ndex plot of the resduals should can be used to assess correlaton n the resduals. If the resduals are ndependent ths should fluctuate randomly wthout systematc pattern. If the resduals are correlated specal modellng methods are needed. K. Javaras and W. Vos (00 References Davdson, A. C. and Snell, E. J. (99. Resduals and dagnostcs. Chapter 4 of Hnkley et al. (99. Dobson, A. (990. An Introducton to Generalzed Lnear Models ( nd ed.. Boca Raton, FL: Chapman and Hall/CRC. Hnkley, D. V., Red, N. and Snell, E. J. eds (99. Statstcal Theory and Modellng. In Honour of Sr Davd Cox. London: Chapman & Hall. McCullagh, P. and Nelder, J. A. (989. Generalzed Lnear Models ( nd ed.. London: Chapman and Hall. 0