Chapter 10 Building the Regression Model II: Diagnostics

Size: px
Start display at page:

Download "Chapter 10 Building the Regression Model II: Diagnostics"

Transcription

1 Chapter 10 Building the Regression Model II: Diagnostics 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 41

2 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots a number of refined( 推敲 ) diagnostics for checking the adequacy of a regression model detecting improper functional form for a predictor variable outliers( 離群值 ) influential( 有影響的 ) observations multicollinearity hsuhl (NUK) LR Chap 10 2 / 41

3 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots Previous Chap. 3,6: check whether a curvature effect for that variable is required in the model the residual plots vs. the predictor variables: determine whether it would be helpful to add one or more of these variables to the model Figure : (Chap. 3)Residual Plots for Possible Omission of Important Predictor Variable-Productivity Example. hsuhl (NUK) LR Chap 10 3 / 41

4 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Limitation: not properly show the nature of the marginal effect of a predictor variable, given the other predictor variables in the model Added-variable plots(partial regression plots; adjusted variable plots;): provide graphic information about the marginal importance of a predictor variable X k, given the other predictor variables already in the model In an added-variable plot, both Y and X k under consideration are regressed against the other predictor variables and residuals are obtained for each. hsuhl (NUK) LR Chap 10 4 / 41

5 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) The plot of these residuals: 1 the marginal importance of this variable in reducing the residual variability 2 provide information about the nature of the marginal regression relation for X k under consideration for possible inclusion in the regression model hsuhl (NUK) LR Chap 10 5 / 41

6 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Illustration: the regression effect for X 1, given that X 2 is already in the model (Chap. 7, p partial determination) Ŷ i (X 2 ) = b 0 + b 2 X i2 e i (Y X 2 ) = Y i Ŷ i (X 2 ) regress X 1 on X 2 : ˆX i1 (X 2 ) = b 0 + b 2X i2 e i (X 1 X 2 ) = X i1 ˆX i1 (X 2 ) ( R 2 = R 2 Y 1 2 : e i(y X 2 ) is regressed on e i (X 1 X 2 )) added-variable for predictor variable X 1 : Plot e(y X 2 ) vs e(x 1 X 2 ) hsuhl (NUK) LR Chap 10 6 / 41

7 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Figure : Prototype Added-Variable Plots (a) X 1 contains no additional information; no helpful to add X 1 (b) a linear band with a nonzero slope; X 1 may be a helpful addition to the regression model already containing X 2 (c) a curvilinear band; X 1 may be helpful and suggesting the possible nature of the curvature effect hsuhl (NUK) LR Chap 10 7 / 41

8 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Added-variable plots: providing information about the possible nature of the marginal relationship for a predictor variable, given the other variables already in the regression model the strength of the relationship useful for uncovering( 發現 ) outlying data points that may have a strong influence in estimating the relationship of the predictor variable X k to the response variable hsuhl (NUK) LR Chap 10 8 / 41

9 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) (a) SSE(X 2 ) (b) SSE(X1, X2) Difference (SSE(X 2 ) SSE(X 1, X 2 )): SSR(X 1 X 2 ); provides information about the marginal strength of the linear relation of X 1 to the response variable, given that X 2 is in the model hsuhl (NUK) LR Chap 10 9 / 41

10 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Table 10.1 Average Annual Income Risk Aversion Amount of Life Insurance Carried Manager (thousand dollars) Score (thousand dollars) i X i1 X i2 Y i r 12 = Ŷ = X X 2 Residual plot: a linear relation for X 1 is not appropriate in the model already containing X 2 hsuhl (NUK) LR Chap / 41

11 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) ex<-read.table("ch10ta01.txt",header=f) colnames(ex)<-c("x1","x2","y") attach(ex) fit<-lm(y~x1+x2) par(mfrow=c(1,3)) plot(x1,resid(fit),pch=16) abline(0,0,lty=2,col="gray") ## Method 1: plot(resid(lm(y~x2)) ~ resid(lm(x1~x2)),col="blue",pch=16, xlab="e(x_1 X_2)", ylab="e(y X_2)") abline(lm(resid(lm(y~x2))~resid(lm(x1~x2))),col="red") ## Method 2: using avplot() library(car) avplot( model=lm( Y~X1+X2 ), variable=x1 ) hsuhl (NUK) LR Chap / 41

12 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Ŷ (X 2 ) = X 2 ; ˆX 1 (X 2 ) = X 2 Plots: through (0,0); b 1 = ; suggest the curvilinear relation between Y and X 1 X 2 is strongly positive; a slight concave upward shape R 2 Y 1 2 = Incorporating a curvilinear effect for X 1 hsuhl (NUK) LR Chap / 41

13 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Comments: An added-variable plot only suggests the nature of the functional relation in which a predictor variable should be added to the regression model but does not provide an analytic expression of the relation. Added-variable plots need to be used with caution for identifying the nature of the marginal effect of a predictor variable. may not show the proper form of the marginal effect of a predictor variable if the functional relations for some or all of the predictor variables already in the regression model are misspecified the relations of the predictor variable to the response variable are complex high multicollinearity among the predictor variables hsuhl (NUK) LR Chap / 41

14 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Any fitted multiple regression function can be obtained from a sequence of fitted partial regressions. Ex: having e(y X 2 ), e(x 1 X 2 ) e(ŷ X 2) = [e(X 1 X 2 )] [Ŷ Ŷ (X 2 )] = [X 1 ˆX 1 (X 2 )] Ŷ = X X 2 (Ŷ (X 2 ) = X 2 ; ˆX 1 (X 2 ) = X 2 ) hsuhl (NUK) LR Chap / 41

15 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations Outlying or Extreme: the observations for these cases are well separated from the remainder of the data large residuals; have dramatic effects hsuhl (NUK) LR Chap / 41

16 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Outlying: Y value, X values or both Not all outlying cases have a strong influence on the fitted regression function. A basic step: determine if the regression model under consideration is heavily influenced by one or a few cases in the data set hsuhl (NUK) LR Chap / 41

17 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Two refined measures for identifying cases with outlying Y observations: Residuals, Semistudentized Residuals: e i = Y i Ŷi; e i = e i MSE Hat matrix: H = X(X X) 1 X Ŷ = HY e = (I H)Y σ 2 {e} = σ 2 (I H) σ 2 {e i } = σ 2 (1 h ii ), h ii : the ith elements on the diagonal of H σ{e i, e j } = h ij σ 2, i j s 2 {e i } = MSE(1 h ii ), s{e i, e j } = h ij MSE hsuhl (NUK) LR Chap / 41

18 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) h ii = X i(x X) 1 X i, X i = [1 X i,1 X i,p 1 ] Small data set: n = 4 Ŷ = X X 2 MSE = s 2 {e 1 } = 574.9( ) = hsuhl (NUK) LR Chap / 41

19 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Deleted Residuals: The difference between Y i and Ŷi(i): (PRESS prediction error) delted residual: d i = Y i Ŷi(i) = e i 1 h ii h ii : the larger will be the deleted residuals as compared to the ordinary residual the estimated variance of d i : s 2 {d i } = MSE (i) (1 + X i(x (i)x (i) ) 1 X i ) = MSE (i) 1 h ii d i t((n 1) p) s{d i } hsuhl (NUK) LR Chap / 41

20 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Studentized Deleted Residuals: t i = ( d i s{d i } = e i MSE(i) (1 h ii ) (n p)mse = (n p 1)MSE (i) + e2 i 1 h ii [ ] 1/2 n p 1 t i = e i SSE(1 h ii ) ei 2 h ii : the larger will be the deleted residuals as compared to the ordinary residual ) hsuhl (NUK) LR Chap / 41

21 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) the estimated variance of d i : s 2 {d i } = MSE (i) (1 + X i(x (i)x (i) ) 1 X i ) = MSE (i) 1 h ii d i t((n 1) p) s{d i } Test for Outliers: whose studentized deleted residuals are large in absolute value If the regression model is appropriate, so that no case is outlying. Each t i t(n p 1). t i : the appropriate Bonferroni critical value: t(1 α/2n; n p 1) hsuhl (NUK) LR Chap / 41

22 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Body fat example t 13 < = t(1 α/2n; n p 1) (α =.10) The Bonferroni procedure provides a conservative( 保守的 ) test for the presence of an outlier. hsuhl (NUK) LR Chap / 41

23 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations Using H for identifying outlying X 0 h ii 1 ni=1 h ii = p h ii : called leverage; measure the distance between X i and X large h ii X i distant from the center of all Xs hsuhl (NUK) LR Chap / 41

24 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations (cont.) If X i is outlying has a large leverage h ii Ŷ i : a linear combination of Y (Ŷ = HY ) h ii : the weight of Y i the larger is h ii, the more important is Y i in determining Ŷi The larger is h ii, the smaller is σ 2 {e i }. h ii = 1 σ 2 {e i } = 0 Rule: 1 2 hii h ii > 2 h = 2 n = 2 p n ( 適用 : 2p n 1) { very high leverage:hii > 0.5 moderate leverage:h ii : hsuhl (NUK) LR Chap / 41

25 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations (cont.) Body fat example 2p/n = 0.30 h 33 = 0.372; h 15,15 = hsuhl (NUK) LR Chap / 41

26 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations (cont.) ex<-read.table("ch07ta01.txt",header=f) colnames(ex)<-c("x1","x2","x3","y") attach(ex) fit<-lm(y~x1+x2+x3) n<-length(y); p = 3 plot(x2~x1, pch=16 ) text(x1+0.5, X2, labels=as.character(1:length(x1)),col="red") hii<-hatvalues(fit) index<-hii>2*p/n points( X1[index], X2[index], cex=2.0, col= blue ) hsuhl (NUK) LR Chap / 41

27 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFFITS meansure Influence on single fitted value: DFFITS-measure the influence that case i has on Ŷi (DFFITS) i = ( Ŷi Ŷ i(i) hii = t i MSE(i) h ii 1 h ii ) 1/2 Rule: { DFFITS > 1 for small to medium data sets DFFITS > 2 p/n for large data sets If X i is an outlier and has a high h ii, (DFFITS) i will tend to be large absolutely. hsuhl (NUK) LR Chap / 41

28 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures Cook s distance Influence on all fitted value: Cook s distance-consider the influence of the ith case on all n fitted values nj=1 (Ŷ i Ŷ j(i) ) 2 D i = = (Ŷ Ŷ (i)) (Ŷ Ŷ (i)) pmse pmse [ ] = e2 i h ii pmse (1 h ii ) 2 1 the size of e i 2 the leverage h ii 3 e i or h ii D i Rule: D i F (p, n p) { little influence : P(F (p, n p) Di ) > 0.1 or 0.2 major influence : P(F (p, n p) D i ) > 0.5 hsuhl (NUK) LR Chap / 41

29 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures Cook s distance (cont.) hsuhl (NUK) LR Chap / 41

30 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFBETAS Influence on regression coefficients: DFBETAS-the difference between b k and b k(i) Rule: (DFBETAS) k(i) = b k b k(i) MSE(i) c kk, k = 0, 1,..., p 1 where c kk : the kth diagonal element of (X X) 1 { DFBETAS > 1 for small to medium data sets DFBETAS > 2 n for large data sets hsuhl (NUK) LR Chap / 41

31 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFBETAS (cont.) # Body fat example (Table 10.4) > influence.measures(fit) Influence measures of lm(formula = Y ~ X1 + X2) : dfb.1_ dfb.x1 dfb.x2 dffit cov.r cook.d hat inf e e e e e e e e e e e e e e e * e e e e e e e e e e * e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e * e e e e e e e e e e e e e e e e e e e e e e e e e hsuhl (NUK) LR Chap / 41

32 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFBETAS (cont.) Some final comments: Analysis of outlying and influential cases: a necessary component of good regression analysis neither automatic nor foolproof( 極簡單的 ) require good judgment by the analyst Methods described: ineffective Extensions of the single-case diagnostic procedures: computational requirements hsuhl (NUK) LR Chap / 41

33 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF Some problems: multicollinearity Adding or deleting X: change the regression coefficients Extra sum of squares: depending upon which other X k s are already included in the model X k highly correlated with each other s{b k } the estimated regression coefficients individually may not be statistically significant hsuhl (NUK) LR Chap / 41

34 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) informal diagnostics for multicollinearity: 1 Large changes in b k when X k is added or deleted, or when an observation is altered( 改變 ) or deleted 2 Nonsignificant results in individual tests on the regression coefficients for important predictor variables. 3 Estimated regression coefficients with an algebraic sign that is the opposite of that expected from theoretical consideration or prior experience. 4 large r XX 5 Wide confidence interval for β k hsuhl (NUK) LR Chap / 41

35 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Important limitations: do not provide quantitative measurements may not identify the nature of the multicollinearity sometimes the observed behavior may occur without multicollinearity being present hsuhl (NUK) LR Chap / 41

36 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Variance Inflation Factor (VIF)( 變異數膨脹因子 ) a formal method: detecting multicollinearity; widely accepted measure how much the variances of b k s are inflated( 膨脹 ) as compared to when the predictor variables are not linearly related. Illustration: Variance-covarianve matrix of b: σ 2 {b} = σ 2 (X X) 1 hsuhl (NUK) LR Chap / 41

37 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Using the standardized regression model: Variance-covarianve matrix of b : σ 2 {b } = (σ ) 2 r 1 XX (σ ) 2 : the errir term variance for the transformed model (VIF ) k : the kth diagonal element of r 1 XX σ 2 {b k} = (σ ) 2 (VIF ) k = (σ ) 2 1 R 2 k VIF for b k: (VIF ) k = (1 R 2 k ) 1, k = 1, 2,..., p 1 R 2 k : X k is regressed on the p 2 other X k s R 2 k = 0 (VIF ) k = 1: X k is not linearly related to X k s R 2 k 0 (VIF ) k > 1: indicate inflated variance for b k as a result of the intercorrelations among the X variables hsuhl (NUK) LR Chap / 41

38 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Perfect linear association with X k s R 2 k = 1 (VIF ) k and σ 2 {b k} are unbounded Rule: largest VIF value among all Xs as an indicator of the severity( 嚴重 ) of multicollinearity max{vif 1,..., VIF p 1 } > 10 hsuhl (NUK) LR Chap / 41

39 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) If VIF > 1 serious multicollinearity problems p 1 E (b k βk) 2 = (σ ) 2 k=1 p (VIF ) k k=1 large VIF larger differences between b k and β k If no linearly R 2 k 0 (VIF ) k = 1 p 1 E (b k βk) 2 = (σ ) 2 (p 1) k=1 VIF = (σ ) 2 p k=1 (VIF ) k (σ )(p 1) = p k=1 (VIF ) k (p 1) hsuhl (NUK) LR Chap / 41

40 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Figure : VIF-Body Fat Example with three Xs VIF 3 = 105 r 2 13 = = , r 2 23 = : not large X 3 : R 2 3 = 0.990; strong related to X 1, X 2 hsuhl (NUK) LR Chap / 41

41 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Comments Some program: using 1/(VIF ) k = 1 R 2 k < 0.01 (0.001, 0.001) Limitation: cannot distinguish between several simultaneous multicollinearities Other methods: more complex than VIF hsuhl (NUK) LR Chap / 41

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Chapter 11 Building the Regression Model II:

Chapter 11 Building the Regression Model II: Chapter 11 Building the Regression Model II: Remedial Measures( 補救措施 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 11 1 / 48 11.1 WLS remedial measures may

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34 Linear Regression 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34 Regression analysis is a statistical methodology that utilizes the relation between

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 ) Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 ) and Neural Networks( 類神經網路 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples of nonlinear

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Lecture One: A Quick Review/Overview on Regular Linear Regression Models

Lecture One: A Quick Review/Overview on Regular Linear Regression Models Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Beam Example: Identifying Influential Observations using the Hat Matrix

Beam Example: Identifying Influential Observations using the Hat Matrix Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the

More information

Chapter 2 Inferences in Regression and Correlation Analysis

Chapter 2 Inferences in Regression and Correlation Analysis Chapter 2 Inferences in Regression and Correlation Analysis 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 2 1 / 102 Inferences concerning the regression parameters

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Sections 7.1, 7.2, 7.4, & 7.6

Sections 7.1, 7.2, 7.4, & 7.6 Sections 7.1, 7.2, 7.4, & 7.6 Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 25 Chapter 7 example: Body fat n = 20 healthy females 25 34

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

STA 4210 Practise set 2a

STA 4210 Practise set 2a STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income

More information

6.1 Introduction. Regression Model:

6.1 Introduction. Regression Model: 6.1 Introduction Regression Model: y = Xβ + ɛ Assumptions: 1. The relationship between y and the predictors is linear. 2. The noise term has zero mean. ɛ 3. All ε s have the same variance σ 2. 4. The ε

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm

Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Chapter 1 Linear Regression with One Predictor Variable

Chapter 1 Linear Regression with One Predictor Variable Chapter 1 Linear Regression with One Predictor Variable 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 1 1 / 41 Regression analysis is a statistical methodology

More information

Regression Diagnostics

Regression Diagnostics Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Regression coefficients may even have a different sign from the expected.

Regression coefficients may even have a different sign from the expected. Multicolinearity Diagnostics : Some of the diagnostics e have just discussed are sensitive to multicolinearity. For example, e kno that ith multicolinearity, additions and deletions of data cause shifts

More information

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

Available online at   (Elixir International Journal) Statistics. Elixir Statistics 49 (2012) 10108 Available online at www.elixirpublishers.com (Elixir International Journal) Statistics Elixir Statistics 49 (2012) 10108-10112 The detention and correction of multicollinearity effects in a multiple

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Chapter 2 Multiple Regression (Part 4)

Chapter 2 Multiple Regression (Part 4) Chapter 2 Multiple Regression (Part 4) 1 The effect of multi-collinearity Now, we know to find the estimator (X X) 1 must exist! Therefore, n must be great or at least equal to p + 1 (WHY?) However, even

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Chapter 6 Multiple Regression

Chapter 6 Multiple Regression STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

The Effect of a Single Point on Correlation and Slope

The Effect of a Single Point on Correlation and Slope Rochester Institute of Technology RIT Scholar Works Articles 1990 The Effect of a Single Point on Correlation and Slope David L. Farnsworth Rochester Institute of Technology This work is licensed under

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Department of Mathematics The University of Toledo. Master of Science Degree Comprehensive Examination Applied Statistics.

Department of Mathematics The University of Toledo. Master of Science Degree Comprehensive Examination Applied Statistics. Department of Mathematics The University of Toledo Master of Science Degree Comprehensive Examination Applied Statistics April 8, 205 nstructions Do all problems. Show all of your computations. Prove all

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

3. Diagnostics and Remedial Measures

3. Diagnostics and Remedial Measures 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects

The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects Int. J. Contemp. Math. Sciences, Vol. 3, 2008, no. 17, 839-859 The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects Jung-Tsung Chiang Department of Business Administration

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1 Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

STAT Checking Model Assumptions

STAT Checking Model Assumptions STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

More information

Chapter 9 Other Topics on Factorial and Fractional Factorial Designs

Chapter 9 Other Topics on Factorial and Fractional Factorial Designs Chapter 9 Other Topics on Factorial and Fractional Factorial Designs 許湘伶 Design and Analysis of Experiments (Douglas C. Montgomery) hsuhl (NUK) DAE Chap. 9 1 / 26 The 3 k Factorial Design 3 k factorial

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

STA 4210 Practise set 2b

STA 4210 Practise set 2b STA 410 Practise set b For all significance tests, use = 0.05 significance level. S.1. A linear regression model is fit, relating fish catch (Y, in tons) to the number of vessels (X 1 ) and fishing pressure

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Chapter 12: Multiple Regression

Chapter 12: Multiple Regression Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x

More information

CHAPTER6 LINEAR REGRESSION

CHAPTER6 LINEAR REGRESSION CHAPTER6 LINEAR REGRESSION YI-TING HWANG DEPARTMENT OF STATISTICS NATIONAL TAIPEI UNIVERSITY EXAMPLE 1 Suppose that a real-estate developer is interested in determining the relationship between family

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information