STAT 4385 Topic 06: Model Diagnostics

Size: px
Start display at page:

Download "STAT 4385 Topic 06: Model Diagnostics"

Transcription

1 STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Spring, / 40

2 Outline Several Types of Residuals Raw, Standardized, Studentized Residuals Jackknife Residuals Assumption Checking Normality Independence Homoscedasticity Linearity Outlier Detection Outlier in X-Space Outlier in Y -Space Influential Points Mulcollinearity 2/ 40

3 in General Once a best model is selected, the next step is model diagnostics. Model diagnostics involves three specific tasks: checking model assumptions; detecting outliers; evaluating computational problems. Most methods are based on residuals of different types. 3/ 40

4 1987 Baseball Salary Data We consider the 1987 baseball salary data originally from the 1988 ASA (American Statistical Association) exposition competition. The data contain the salary information for 263 major league hitters and 22 predictors, as listed in the following table. The response variable is the logarithm of salary. One research question of interest is Are Baseball Salaries Based on Performance? 4/ 40

5 Variable Description: 1987 Baseball Salary Data X Name Description X Name Description 1 bat86 times at bat rbcr runs-batted-in 2 hit86 hits in wlkcr walks in career 3 hr86 home runs in leag86 league in 86 4 run86 runs in div86 division in 86 5 rb86 runs batted in in team86 team in 86 6 wlk86 walks in pos86 position in 86 7 yrs years in major league 18 puto86 put outs in 86 8 batcr times at bat - career 19 asst86 assists in 86 9 hitcr hits in career 20 err86 errors in hrcr home runs in career 21 leag87 league in runcr runs during career 22 team87 team in 87 5/ 40

6 Hoaglin and Velleman s (HV, 1995) Best Model This baseball data has been widely studied and various modeling analyses using different statistical methods were tried. Hoaglin and Velleman (HV, 1995) provided a nice overview on analyses and they found that the following model yields good model fit and leads to sensible interpretations. runcr log(salary) = β 0 + β 1 + β 2 run86 + yrs + β 3 min[(yrs 2) +, 5] + β 4 (yrs 7) + + ε, where ε N(0, σ 2 ), and the segmentation on year is based on a player s eligibility for arbitration or free agency. 6/ 40

7 Residuals Raw, Standardized, Studentized Residuals Several Types of Residuals There are several types of residuals, which are listed below in an ascending order of preference: The raw residual r i = y i ŷ i mimics the error term ε i = y i µ i. Motivated by the fact that ε i /σ N (0, 1), the standardized residual is defined as z i = r i /ˆσ. Noting that var(r i ) = σ 2 (1 h ii ), the studentized residual is defined as t i = r i / ˆσ 2 (1 h ii ). If the model is true, then t i N (0, 1) approximately. In the above definition, h ii is the i-th diagonal element of the hat matrix (or projection matrix) H = X(X T X) 1 X T. Recall that ŷ = Hy. h ii is also called the leverage of the i-th observation. 7/ 40

8 Residuals Raw, Standardized, Studentized Residuals Deleted Residuals In order to achieve independence between y i and its predicted value, the prediction of y i is calculated from the data by omitting the i-th observation; the same idea is used in obtaining PRESS. The deleted residual is defined as e ( i) = y i ŷ ( i) = r i /(1 h ii ), where ŷ ( i) = x t i ˆβ ( i) denotes the predicted value for y i by least squares fit on data that leave the ith observation out, and ˆβ ( i) denotes the resulting LSE of β. 8/ 40

9 Residuals Jackknife Residuals Jackknife Residuals The studentized deleted residual (also called the jackknife residual), given by r ( i) = r i σ 2 ( i) (1 h ii) = t i n p 2 n p 1 t 2 i, (1) where ˆσ 2 ( i), the estimate of σ2 based on the sample without the ith observation, can be computed via (n p 2) ˆσ 2 ( i) = (n p 1) ˆσ2 r 2 i /(1 h ii). 9/ 40

10 Residuals Jackknife Residuals 1987 Baseball Data: Residuals from the HV Model Original Data Residuals ID y x 1 x 2 x 3 x 4 ŷ i e i z i t i r ( i) / 40

11 Residuals Jackknife Residuals Facts on Jackknife Residuals The jackknife residual r ( i) is the most preferable residual for model diagnoses. Since ˆσ 2 ( i) in (1) is independent of ˆβ ( i) and hence r ( i), it can be verified that r ( i) t (n p 2) exactly if the model assumptions are correct. Moreover, the jackknife residuals can be easily computed using the second formula in (1). 11/ 40

12 Residuals Jackknife Residuals Plot of Jackknife Residuals vs. Fitted Values It is a common practice to plot r ( i) versus the predicted values ŷ i. Since r ŷ, r ( i) and ŷ i are independent of each other. If the model assumptions are valid, the jackknife residuals are expected to randomly scatter around the horizontal line y = 0. On the other hand, any systematic nonrandom pattern of the jackknife residuals may indicate some violation of the assumptions in one way or another. 12/ 40

13 Residuals Jackknife Residuals Hypothetical Plots of r ( i) vs. ŷ i (a) (b) r( i) r( i) y^ y^ 13/ 40

14 Assumption Checking Normality Checking Normality Note that r ( i) t (n p 2), which is approximated by N (0, 1) when n p. The informal histogram or quantile-quantile (Q-Q) plot of r ( i) s can be used to examine the normality assumption. Various goodness-of-fit formal tests, such as the Pearson s χ 2 test, Shapiro-Wilk (1965) test, or Kolmogorov-Smirnov test, have been used to formally test for normality. 14/ 40

15 Assumption Checking Normality Normality based on r ( i) (a) Histogram of Jackknife Residuals (b) Q Q Plot of Jackknife Residuals Frequency Sample Quantiles Jackknife Residual Theoretical Quantiles 15/ 40

16 Assumption Checking Independence Checking Independence Examining the assumption of independence among errors (or response observations) is not an easy task. There are only a few limited tests available. However, the plausibility of independence usually can be inspected from the experiment design or the way the data are collected. One common violation of independence occurs when observations are taken as a sequence in order of time and hence exhibit serial correlation. Graphically, the plot of r ( i) versus the sequence order i (or the lag plot of residuals) can be used to examine the dependence of errors. Furthermore, the run tests (Wald and Wolfowitz, 1940) can provide a rough check for randomness. The Durbin-Watson (1950; 1951) statistic and the autocorrelation function (ACF) test can be used to detect autocorrelation. 16/ 40

17 Assumption Checking Homoscedasticity Checking Homoscedasticity The assumption of homoscedasticity or equal variances can be inspected from the residual plot. For example, the plot of r ( i) vs. ŷ i (b) illustrates one scenario typically encountered with financial price data, where the error variance increases with the predicted value. It is interesting to note that the LSE remains unbiased under unequal error variances but is no longer BLUE. Formal tests for constant error variances include the White s (1980) test, Cook and Weisberg s (1983) score test, and several others, all checking whether or not the variability in e i or ei 2 can be accounted for by regressing it on the predictors X (or the estimate of mean response, ŷ). Another natural approach is to incorporate the error variance function explicitly in the model setting, and then check whether it reduces to constant variance. 17/ 40

18 Assumption Checking Homoscedasticity Jackknife Residual Plots: 1987 Baseball Salary Data Plot of Jackknife Residual vs. Fitted Value: 1987 Baseball Data Jackknife Residual Fitted Values 18/ 40

19 Assumption Checking Linearity Checking Linearity Inadequacy of linearity (i.e., linear in regression parameters) can be a serious problem. While the residual plot provides useful diagnostic information for this problem, it does not generally supply any clues as to the true functional form. Towards this end, partial residual plots have been recommended. The i-th partial residual for X j is defined as ( ˆβ0 + ˆβ 1x i1 + + ˆβ j 1 x i(j 1) + ˆβ j+1 x i(j+1) + + ˆβ ) px ip r (j) i = y i = ( y i x T i = r i + ˆβ j x ij for j = 1,..., p. ) ˆβ + ˆβ j x ij 19/ 40

20 Assumption Checking Linearity Partial Residual Plots The plot of r (j) i versus x ij provides a pictorial exploration of the appropriate functional form for one individual predictor X j after including other predictors. The figure on next page gives three examples that reflect different diagnostic interpretations regarding the functional form of X j : (a) X j might not be needed from the current model; (b) X j should be included in linear form; (c) A curvilinear form of X j is needed. 20/ 40

21 Assumption Checking Linearity Hypothetical Partial Residual Plots (a) (b) (c) r (j) r (j) r (j) x j x j x j 21/ 40

22 Assumption Checking Linearity Example: Partial Residual Plots with 1987 Baseball Data partial residuals residuals partial residuals x hits in during career hits in during career 22/ 40

23 Assumption Checking Linearity Partial Regression Plots Another similar tool, the partial leverage regression plot (i.e., the added variable plot), plots the residuals from the linear model that regresses Y on predictors without X j against the residuals from the linear model that regresses X j on other predictors. The partial regression plot can be interpreted in the same manner as the partial residual plot. 23/ 40

24 Outlier Detection Outlier Detection From the perspective of sensitivity analysis, variable selection is concerned about the influence of each column in X on model estimation while outlier detection is concerned about the influence of each row of the data. In the regression setting, an observation or row in X could be outlier mainly in three ways: outlier in X-space; outlier in Y -space; or being an influential point that affects the estimation of ˆβ and model prediction. 24/ 40

25 Outlier Detection Outlier in X-Space Outlier in X-Space An observation is said to have high leverage if it is outlier in terms of its predictor x i value. This can be assessed by the leverage h ii, which is closely related to the Mahalanobis distance from each x i to the center x = n i=1 x i/n. Points with h ii > 2(p + 1)/n are often considered outliers in x-space, recalling that h = (p + 1)/n. 25/ 40

26 Outlier Detection Outlier in X-Space Properties of Leverage h ii Relation to the Mahalanobis Distance from x: Let S X = n i=1 (x i x)(x i x) t /(n 1) denote the variance-covariance matrix of x i. Then, the Mahalanobis distance is d i = (x i x) t S 1 (x i x). It can be shown that h ii = 1/n + (n 1) d 2 i. An observation with high leverage h ii is the one that is distant from the center of points in the X-space. The value of h ii ranges from 1/n to 1 with average (p + 1)/n : tr(h) = n h ii = tr{x(x T X) 1 X T } = tr{(x T X) 1 X T X} = tr(i p+1) = p+1. i=1 26/ 40

27 Outlier Detection Outlier in Y -Space Outlier in Y -Space A response observation y i is identified to be an outlier if the observation is sufficiently different from its predicted value. The jackknife residual r ( i) is recommended for this assessment. Since r ( i) t (n p 2), the 2.5th and 97.5th percentiles from t (n p 2) may be used as benchmarks, yet at the risk of multiplicity. 27/ 40

28 Outlier Detection Influential Points Influential Points An observation is said to be an influential point if its removal or inclusion causes dramatic change in model estimations or predictions. The delete-one jackknife technique is the natural approach to tackle this issue. There are many measures developed depending on the specific aspect to be examined. 28/ 40

29 Outlier Detection Influential Points DFBETA DFBETA examines the influence of each observation on each ˆβ j, DFBETA ij = ˆβ j ˆβ j( i) ˆσ 2 ( i) (Xt X) 1 jj, where ˆβ j( i) denotes the LSE of β j without the i-th observation and (X t X) 1 jj is the j-th diagonal element of matrix (X t X) 1. 29/ 40

30 Outlier Detection Influential Points DFFITS DFFITS examines the influence of each observation on its own fitted value, DFFITS ij = ŷi ŷ ( i) h ii = r ( i). ˆσ ( i) 2 h 1 h ii ii 30/ 40

31 Outlier Detection Influential Points Cook s Distance The ultimate measure for detecting influential points is Cook s distance (Cook, 1977), D i = (ˆβ ˆβ( i) ) t X t X (ˆβ ˆβ( i) ) (p + 1) ˆσ 2 = ŷ ŷ ( i) 2 (p + 1) ˆσ 2 = r ( i) 2 p + 1 h ii. 1 h ii Muller and Mok (1997) studied the distribution of D i and provided some critical values. However, the multiplicity issue remains a concern when using these critical values for outlier detection in practice. For the sake of simplicity, one may use the benchmark of 1.0 to help identify potential outliers (see Weisberg, 2005). 31/ 40

32 Outlier Detection Influential Points HV Model with 1987 Baseball Data: Potential Outliers DFBETAs ID dfb.1 dfb.x1 dfb.x2 dfb.x3 dfb.x4 DFFIT cook.d hat / 40

33 Outlier Detection Influential Points Example: Outliers with 1987 Baseball Data The figure on next page provides a bubble plot of the three diagnostic measures, r ( i), h ii, and D i, where the size of the bubble corresponds to Cook s distance D i. Twenty-four potential outliers are found: eleven outliers are in x-space detected via the benchmark 2(p + 1)/n = 0.038, eleven outliers are in y-space identified by the benchmarks t(0.025, n p 2) = and t(0.975, n p 2) = 1.969, and two outliers (observations 92 and 252) are in both x-space and y-space. In addition, the Cook s distance measure indicate that the observation 252 has a large influence on regression parameter estimates, determined by either the benchmark 1 or Muller and Mok s critical value. 33/ 40

34 Outlier Detection Influential Points Example: Barble Plot for Outlier Detection 2(p+1)/n (n p 2) t r( i) (n p 2) t h ii 34/ 40

35 Mulcollinearity Multicollinearity One common numerical issue in linear regression is multicollinearity or collinearity, which occurs when two or more predictors in the linear model are highly correlated with each other. When multicollinearity occurs, the standard errors (SE) of some ˆβ j s can be unreasonably large, leading to difficulty in model interpretation. Multicollinearity may not be a big concern when the analytic goal is prediction. 35/ 40

36 Mulcollinearity Large SE To see why multicollinearity leads to large SE, a closer look reveals that SE( ˆβ j ) = s 1 R y Y 2 X ), (2) s j (1 R 2Xj (n p 1) X( j) where s y and s j are the sample standard deviation of y and x j, respectively; R 2 Y X denotes the R2 obtained by regressing Y on X; and RX 2 j X denotes the resulting R 2 value from regressing the ( j) j-th predictor X j on the remaining predictors X ( j). If X j can be expressed as a linear combination of other predictors, RX 2 j X ( j) would be 1 and SE( ˆβ j ) in (2) is infinite. 36/ 40

37 Mulcollinearity Assessing Multicollinearity - Method I The first method for detecting multicollinearity is to consider the spectral decomposition of X t X. Let λ 1 λ 2 λ p denote the eigenvalues of X t X. If X t X is not positive definite, some of its eigenvalues are zero. If the condition number, defined as λ 1 /λ p, is very large, then multicollinearity could be present. An informal rule of thumb is that if the condition number is 15, multicollinearity is a concern; if it is greater than 30 multicollinearity is a very serious concern. Belsley, Kuh, and Welsch (1980) insist 10 to 100 as a beginning and serious points that collinearity affects estimates in their book titled Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. 37/ 40

38 Mulcollinearity Assessing Multicollinearity - Method II Another measure for detecting collinearity is the variance inflation factor (VIF) VIF j = 1 1 R 2 X j X ( j) for j = 1,..., p. In practice, a maximum of VIF in excess of 10 is often considered as an indication of multicollinearity. 38/ 40

39 Mulcollinearity Why VIF? The name of VIF comes from the following observation. Suppose that we are working with normalized or standardized data, in which case X t X becomes the correlation matrix R X among predictors. From cov(ˆβ) = σ 2 R 1 X, it can be found that var( ˆβ j ) = σ 2 VIF j. If the columns in X are independent, then R X = I and hence var( ˆβ j ) = σ 2. Therefore, VIF j shows how much var( ˆβ j ) is inflated by the multicollinearity between X j and the remaining predictors in X, when compared to the independent case. 39/ 40

40 Mulcollinearity Discussion Thanks! Questions? 40/ 40

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Chapter 10 Building the Regression Model II: Diagnostics

Chapter 10 Building the Regression Model II: Diagnostics Chapter 10 Building the Regression Model II: Diagnostics 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 41 10.1 Model Adequacy for a Predictor Variable-Added

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to

More information

Residuals in the Analysis of Longitudinal Data

Residuals in the Analysis of Longitudinal Data Residuals in the Analysis of Longitudinal Data Jemila Hamid, PhD (Joint work with WeiLiang Huang) Clinical Epidemiology and Biostatistics & Pathology and Molecular Medicine McMaster University Outline

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Beam Example: Identifying Influential Observations using the Hat Matrix

Beam Example: Identifying Influential Observations using the Hat Matrix Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

((n r) 1) (r 1) ε 1 ε 2. X Z β+

((n r) 1) (r 1) ε 1 ε 2. X Z β+ Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der@virginia.edu http://www.math.virginia.edu/

More information

Linear Models, Problems

Linear Models, Problems Linear Models, Problems John Fox McMaster University Draft: Please do not quote without permission Revised January 2003 Copyright c 2002, 2003 by John Fox I. The Normal Linear Model: Structure and Assumptions

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Labor Economics with STATA. Introduction to Regression Diagnostics

Labor Economics with STATA. Introduction to Regression Diagnostics Labor Economics with STATA Liyousew G. Borga November 4, 2015 Introduction to Regression Diagnostics Liyou Borga Labor Economics with STATA November 4, 2015 64 / 85 Outline 1 Violations of Basic Assumptions

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Regression Diagnostics

Regression Diagnostics Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved. LINEAR REGRESSION LINEAR REGRESSION REGRESSION AND OTHER MODELS Type of Response Type of Predictors Categorical Continuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

L7: Multicollinearity

L7: Multicollinearity L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Applied linear statistical models: An overview

Applied linear statistical models: An overview Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description

More information

Regression in R. Seth Margolis GradQuant May 31,

Regression in R. Seth Margolis GradQuant May 31, Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence

More information

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM STAT212_E3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICS & STATISTICS Term 171 Page 1 of 9 STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 2017 @ 6:00 PM Name: ID #:

More information

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...

More information

Kutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science

Kutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science Contributions to Linear Regression Diagnostics using the Singular Value Decomposition: Measures to Identify Outlying Observations, Influential Observations and Collinearity in Multivariate Data Kutlwano

More information

Section 2 NABE ASTEF 65

Section 2 NABE ASTEF 65 Section 2 NABE ASTEF 65 Econometric (Structural) Models 66 67 The Multiple Regression Model 68 69 Assumptions 70 Components of Model Endogenous variables -- Dependent variables, values of which are determined

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and

More information

Lecture One: A Quick Review/Overview on Regular Linear Regression Models

Lecture One: A Quick Review/Overview on Regular Linear Regression Models Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

REGRESSION OUTLIERS AND INFLUENTIAL OBSERVATIONS USING FATHOM

REGRESSION OUTLIERS AND INFLUENTIAL OBSERVATIONS USING FATHOM REGRESSION OUTLIERS AND INFLUENTIAL OBSERVATIONS USING FATHOM Lindsey Bell lbell2@coastal.edu Keshav Jagannathan kjaganna@coastal.edu Department of Mathematics and Statistics Coastal Carolina University

More information

Diagnostics of Linear Regression

Diagnostics of Linear Regression Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Diagnostics and Remedial Measures: An Overview

Diagnostics and Remedial Measures: An Overview Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle a single answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

Multiple Linear Regression

Multiple Linear Regression Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Regression Analysis By Example

Regression Analysis By Example Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Phd Program in Transportation. Transport Demand Modeling. MODULE 2 Multiple Linear Regression

Phd Program in Transportation. Transport Demand Modeling. MODULE 2 Multiple Linear Regression Phd Program in Transportation Transport Demand Modeling Filipe Moura MODULE 2 Multiple Linear Regression Phd in Transportation / Transport Demand Modelling 1 Outline 1. Learning objectives 2. What is MR

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Diagnostic Procedures

Diagnostic Procedures Diagnostic Procedures Joseph W. McKean Western Michigan University Simon J. Sheather Texas A&M University Abstract Diagnostic procedures are used to check the quality of a fit of a model, to verify the

More information

Polynomial Regression

Polynomial Regression Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...

More information

Package svydiags. June 4, 2015

Package svydiags. June 4, 2015 Type Package Package svydiags June 4, 2015 Title Linear Regression Model Diagnostics for Survey Data Version 0.1 Date 2015-01-21 Author Richard Valliant Maintainer Richard Valliant Description

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

The Steps to Follow in a Multiple Regression Analysis

The Steps to Follow in a Multiple Regression Analysis ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Assumptions of the error term, assumptions of the independent variables

Assumptions of the error term, assumptions of the independent variables Petra Petrovics, Renáta Géczi-Papp Assumptions of the error term, assumptions of the independent variables 6 th seminar Multiple linear regression model Linear relationship between x 1, x 2,, x p and y

More information

Detection of single influential points in OLS regression model building

Detection of single influential points in OLS regression model building Analytica Chimica Acta 439 (2001) 169 191 Tutorial Detection of single influential points in OLS regression model building Milan Meloun a,,jiří Militký b a Department of Analytical Chemistry, Faculty of

More information

Chapter 12: Multiple Regression

Chapter 12: Multiple Regression Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

Available online at   (Elixir International Journal) Statistics. Elixir Statistics 49 (2012) 10108 Available online at www.elixirpublishers.com (Elixir International Journal) Statistics Elixir Statistics 49 (2012) 10108-10112 The detention and correction of multicollinearity effects in a multiple

More information