Chapter 10 Building the Regression Model II: Diagnostics
|
|
- Herbert Malone
- 6 years ago
- Views:
Transcription
1 Chapter 10 Building the Regression Model II: Diagnostics 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 41
2 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots a number of refined( 推敲 ) diagnostics for checking the adequacy of a regression model detecting improper functional form for a predictor variable outliers( 離群值 ) influential( 有影響的 ) observations multicollinearity hsuhl (NUK) LR Chap 10 2 / 41
3 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots Previous Chap. 3,6: check whether a curvature effect for that variable is required in the model the residual plots vs. the predictor variables: determine whether it would be helpful to add one or more of these variables to the model Figure : (Chap. 3)Residual Plots for Possible Omission of Important Predictor Variable-Productivity Example. hsuhl (NUK) LR Chap 10 3 / 41
4 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Limitation: not properly show the nature of the marginal effect of a predictor variable, given the other predictor variables in the model Added-variable plots(partial regression plots; adjusted variable plots;): provide graphic information about the marginal importance of a predictor variable X k, given the other predictor variables already in the model In an added-variable plot, both Y and X k under consideration are regressed against the other predictor variables and residuals are obtained for each. hsuhl (NUK) LR Chap 10 4 / 41
5 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) The plot of these residuals: 1 the marginal importance of this variable in reducing the residual variability 2 provide information about the nature of the marginal regression relation for X k under consideration for possible inclusion in the regression model hsuhl (NUK) LR Chap 10 5 / 41
6 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Illustration: the regression effect for X 1, given that X 2 is already in the model (Chap. 7, p partial determination) Ŷ i (X 2 ) = b 0 + b 2 X i2 e i (Y X 2 ) = Y i Ŷ i (X 2 ) regress X 1 on X 2 : ˆX i1 (X 2 ) = b 0 + b 2X i2 e i (X 1 X 2 ) = X i1 ˆX i1 (X 2 ) ( R 2 = R 2 Y 1 2 : e i(y X 2 ) is regressed on e i (X 1 X 2 )) added-variable for predictor variable X 1 : Plot e(y X 2 ) vs e(x 1 X 2 ) hsuhl (NUK) LR Chap 10 6 / 41
7 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Figure : Prototype Added-Variable Plots (a) X 1 contains no additional information; no helpful to add X 1 (b) a linear band with a nonzero slope; X 1 may be a helpful addition to the regression model already containing X 2 (c) a curvilinear band; X 1 may be helpful and suggesting the possible nature of the curvature effect hsuhl (NUK) LR Chap 10 7 / 41
8 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Added-variable plots: providing information about the possible nature of the marginal relationship for a predictor variable, given the other variables already in the regression model the strength of the relationship useful for uncovering( 發現 ) outlying data points that may have a strong influence in estimating the relationship of the predictor variable X k to the response variable hsuhl (NUK) LR Chap 10 8 / 41
9 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) (a) SSE(X 2 ) (b) SSE(X1, X2) Difference (SSE(X 2 ) SSE(X 1, X 2 )): SSR(X 1 X 2 ); provides information about the marginal strength of the linear relation of X 1 to the response variable, given that X 2 is in the model hsuhl (NUK) LR Chap 10 9 / 41
10 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Table 10.1 Average Annual Income Risk Aversion Amount of Life Insurance Carried Manager (thousand dollars) Score (thousand dollars) i X i1 X i2 Y i r 12 = Ŷ = X X 2 Residual plot: a linear relation for X 1 is not appropriate in the model already containing X 2 hsuhl (NUK) LR Chap / 41
11 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) ex<-read.table("ch10ta01.txt",header=f) colnames(ex)<-c("x1","x2","y") attach(ex) fit<-lm(y~x1+x2) par(mfrow=c(1,3)) plot(x1,resid(fit),pch=16) abline(0,0,lty=2,col="gray") ## Method 1: plot(resid(lm(y~x2)) ~ resid(lm(x1~x2)),col="blue",pch=16, xlab="e(x_1 X_2)", ylab="e(y X_2)") abline(lm(resid(lm(y~x2))~resid(lm(x1~x2))),col="red") ## Method 2: using avplot() library(car) avplot( model=lm( Y~X1+X2 ), variable=x1 ) hsuhl (NUK) LR Chap / 41
12 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Ŷ (X 2 ) = X 2 ; ˆX 1 (X 2 ) = X 2 Plots: through (0,0); b 1 = ; suggest the curvilinear relation between Y and X 1 X 2 is strongly positive; a slight concave upward shape R 2 Y 1 2 = Incorporating a curvilinear effect for X 1 hsuhl (NUK) LR Chap / 41
13 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Comments: An added-variable plot only suggests the nature of the functional relation in which a predictor variable should be added to the regression model but does not provide an analytic expression of the relation. Added-variable plots need to be used with caution for identifying the nature of the marginal effect of a predictor variable. may not show the proper form of the marginal effect of a predictor variable if the functional relations for some or all of the predictor variables already in the regression model are misspecified the relations of the predictor variable to the response variable are complex high multicollinearity among the predictor variables hsuhl (NUK) LR Chap / 41
14 10.1 Model Adequacy for a Predictor Variable-Added Variable Plots Added Variable Plots (cont.) Any fitted multiple regression function can be obtained from a sequence of fitted partial regressions. Ex: having e(y X 2 ), e(x 1 X 2 ) e(ŷ X 2) = [e(X 1 X 2 )] [Ŷ Ŷ (X 2 )] = [X 1 ˆX 1 (X 2 )] Ŷ = X X 2 (Ŷ (X 2 ) = X 2 ; ˆX 1 (X 2 ) = X 2 ) hsuhl (NUK) LR Chap / 41
15 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations Outlying or Extreme: the observations for these cases are well separated from the remainder of the data large residuals; have dramatic effects hsuhl (NUK) LR Chap / 41
16 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Outlying: Y value, X values or both Not all outlying cases have a strong influence on the fitted regression function. A basic step: determine if the regression model under consideration is heavily influenced by one or a few cases in the data set hsuhl (NUK) LR Chap / 41
17 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Two refined measures for identifying cases with outlying Y observations: Residuals, Semistudentized Residuals: e i = Y i Ŷi; e i = e i MSE Hat matrix: H = X(X X) 1 X Ŷ = HY e = (I H)Y σ 2 {e} = σ 2 (I H) σ 2 {e i } = σ 2 (1 h ii ), h ii : the ith elements on the diagonal of H σ{e i, e j } = h ij σ 2, i j s 2 {e i } = MSE(1 h ii ), s{e i, e j } = h ij MSE hsuhl (NUK) LR Chap / 41
18 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) h ii = X i(x X) 1 X i, X i = [1 X i,1 X i,p 1 ] Small data set: n = 4 Ŷ = X X 2 MSE = s 2 {e 1 } = 574.9( ) = hsuhl (NUK) LR Chap / 41
19 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Deleted Residuals: The difference between Y i and Ŷi(i): (PRESS prediction error) delted residual: d i = Y i Ŷi(i) = e i 1 h ii h ii : the larger will be the deleted residuals as compared to the ordinary residual the estimated variance of d i : s 2 {d i } = MSE (i) (1 + X i(x (i)x (i) ) 1 X i ) = MSE (i) 1 h ii d i t((n 1) p) s{d i } hsuhl (NUK) LR Chap / 41
20 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Studentized Deleted Residuals: t i = ( d i s{d i } = e i MSE(i) (1 h ii ) (n p)mse = (n p 1)MSE (i) + e2 i 1 h ii [ ] 1/2 n p 1 t i = e i SSE(1 h ii ) ei 2 h ii : the larger will be the deleted residuals as compared to the ordinary residual ) hsuhl (NUK) LR Chap / 41
21 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) the estimated variance of d i : s 2 {d i } = MSE (i) (1 + X i(x (i)x (i) ) 1 X i ) = MSE (i) 1 h ii d i t((n 1) p) s{d i } Test for Outliers: whose studentized deleted residuals are large in absolute value If the regression model is appropriate, so that no case is outlying. Each t i t(n p 1). t i : the appropriate Bonferroni critical value: t(1 α/2n; n p 1) hsuhl (NUK) LR Chap / 41
22 10.2 Identifying Outlying Y Observations- Studentized Deleted Residuals Identifying Outlying Y Observations (cont.) Body fat example t 13 < = t(1 α/2n; n p 1) (α =.10) The Bonferroni procedure provides a conservative( 保守的 ) test for the presence of an outlier. hsuhl (NUK) LR Chap / 41
23 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations Using H for identifying outlying X 0 h ii 1 ni=1 h ii = p h ii : called leverage; measure the distance between X i and X large h ii X i distant from the center of all Xs hsuhl (NUK) LR Chap / 41
24 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations (cont.) If X i is outlying has a large leverage h ii Ŷ i : a linear combination of Y (Ŷ = HY ) h ii : the weight of Y i the larger is h ii, the more important is Y i in determining Ŷi The larger is h ii, the smaller is σ 2 {e i }. h ii = 1 σ 2 {e i } = 0 Rule: 1 2 hii h ii > 2 h = 2 n = 2 p n ( 適用 : 2p n 1) { very high leverage:hii > 0.5 moderate leverage:h ii : hsuhl (NUK) LR Chap / 41
25 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations (cont.) Body fat example 2p/n = 0.30 h 33 = 0.372; h 15,15 = hsuhl (NUK) LR Chap / 41
26 Identifying Outlying X Observations-Hat Matrix Leverage Values Identifying Outlying X Observations (cont.) ex<-read.table("ch07ta01.txt",header=f) colnames(ex)<-c("x1","x2","x3","y") attach(ex) fit<-lm(y~x1+x2+x3) n<-length(y); p = 3 plot(x2~x1, pch=16 ) text(x1+0.5, X2, labels=as.character(1:length(x1)),col="red") hii<-hatvalues(fit) index<-hii>2*p/n points( X1[index], X2[index], cex=2.0, col= blue ) hsuhl (NUK) LR Chap / 41
27 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFFITS meansure Influence on single fitted value: DFFITS-measure the influence that case i has on Ŷi (DFFITS) i = ( Ŷi Ŷ i(i) hii = t i MSE(i) h ii 1 h ii ) 1/2 Rule: { DFFITS > 1 for small to medium data sets DFFITS > 2 p/n for large data sets If X i is an outlier and has a high h ii, (DFFITS) i will tend to be large absolutely. hsuhl (NUK) LR Chap / 41
28 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures Cook s distance Influence on all fitted value: Cook s distance-consider the influence of the ith case on all n fitted values nj=1 (Ŷ i Ŷ j(i) ) 2 D i = = (Ŷ Ŷ (i)) (Ŷ Ŷ (i)) pmse pmse [ ] = e2 i h ii pmse (1 h ii ) 2 1 the size of e i 2 the leverage h ii 3 e i or h ii D i Rule: D i F (p, n p) { little influence : P(F (p, n p) Di ) > 0.1 or 0.2 major influence : P(F (p, n p) D i ) > 0.5 hsuhl (NUK) LR Chap / 41
29 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures Cook s distance (cont.) hsuhl (NUK) LR Chap / 41
30 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFBETAS Influence on regression coefficients: DFBETAS-the difference between b k and b k(i) Rule: (DFBETAS) k(i) = b k b k(i) MSE(i) c kk, k = 0, 1,..., p 1 where c kk : the kth diagonal element of (X X) 1 { DFBETAS > 1 for small to medium data sets DFBETAS > 2 n for large data sets hsuhl (NUK) LR Chap / 41
31 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFBETAS (cont.) # Body fat example (Table 10.4) > influence.measures(fit) Influence measures of lm(formula = Y ~ X1 + X2) : dfb.1_ dfb.x1 dfb.x2 dffit cov.r cook.d hat inf e e e e e e e e e e e e e e e * e e e e e e e e e e * e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e * e e e e e e e e e e e e e e e e e e e e e e e e e hsuhl (NUK) LR Chap / 41
32 Identifying Influential Cases-DFFITS, Cook s Distance, and DFBETAS Measures DFBETAS (cont.) Some final comments: Analysis of outlying and influential cases: a necessary component of good regression analysis neither automatic nor foolproof( 極簡單的 ) require good judgment by the analyst Methods described: ineffective Extensions of the single-case diagnostic procedures: computational requirements hsuhl (NUK) LR Chap / 41
33 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF Some problems: multicollinearity Adding or deleting X: change the regression coefficients Extra sum of squares: depending upon which other X k s are already included in the model X k highly correlated with each other s{b k } the estimated regression coefficients individually may not be statistically significant hsuhl (NUK) LR Chap / 41
34 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) informal diagnostics for multicollinearity: 1 Large changes in b k when X k is added or deleted, or when an observation is altered( 改變 ) or deleted 2 Nonsignificant results in individual tests on the regression coefficients for important predictor variables. 3 Estimated regression coefficients with an algebraic sign that is the opposite of that expected from theoretical consideration or prior experience. 4 large r XX 5 Wide confidence interval for β k hsuhl (NUK) LR Chap / 41
35 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Important limitations: do not provide quantitative measurements may not identify the nature of the multicollinearity sometimes the observed behavior may occur without multicollinearity being present hsuhl (NUK) LR Chap / 41
36 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Variance Inflation Factor (VIF)( 變異數膨脹因子 ) a formal method: detecting multicollinearity; widely accepted measure how much the variances of b k s are inflated( 膨脹 ) as compared to when the predictor variables are not linearly related. Illustration: Variance-covarianve matrix of b: σ 2 {b} = σ 2 (X X) 1 hsuhl (NUK) LR Chap / 41
37 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Using the standardized regression model: Variance-covarianve matrix of b : σ 2 {b } = (σ ) 2 r 1 XX (σ ) 2 : the errir term variance for the transformed model (VIF ) k : the kth diagonal element of r 1 XX σ 2 {b k} = (σ ) 2 (VIF ) k = (σ ) 2 1 R 2 k VIF for b k: (VIF ) k = (1 R 2 k ) 1, k = 1, 2,..., p 1 R 2 k : X k is regressed on the p 2 other X k s R 2 k = 0 (VIF ) k = 1: X k is not linearly related to X k s R 2 k 0 (VIF ) k > 1: indicate inflated variance for b k as a result of the intercorrelations among the X variables hsuhl (NUK) LR Chap / 41
38 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Perfect linear association with X k s R 2 k = 1 (VIF ) k and σ 2 {b k} are unbounded Rule: largest VIF value among all Xs as an indicator of the severity( 嚴重 ) of multicollinearity max{vif 1,..., VIF p 1 } > 10 hsuhl (NUK) LR Chap / 41
39 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) If VIF > 1 serious multicollinearity problems p 1 E (b k βk) 2 = (σ ) 2 k=1 p (VIF ) k k=1 large VIF larger differences between b k and β k If no linearly R 2 k 0 (VIF ) k = 1 p 1 E (b k βk) 2 = (σ ) 2 (p 1) k=1 VIF = (σ ) 2 p k=1 (VIF ) k (σ )(p 1) = p k=1 (VIF ) k (p 1) hsuhl (NUK) LR Chap / 41
40 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Figure : VIF-Body Fat Example with three Xs VIF 3 = 105 r 2 13 = = , r 2 23 = : not large X 3 : R 2 3 = 0.990; strong related to X 1, X 2 hsuhl (NUK) LR Chap / 41
41 Multicollinearity Diagnostics-VIF Variance Inflation( 膨脹 ) Factor-VIF (cont.) Comments Some program: using 1/(VIF ) k = 1 R 2 k < 0.01 (0.001, 0.001) Limitation: cannot distinguish between several simultaneous multicollinearities Other methods: more complex than VIF hsuhl (NUK) LR Chap / 41
Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationChapter 11 Building the Regression Model II:
Chapter 11 Building the Regression Model II: Remedial Measures( 補救措施 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 11 1 / 48 11.1 WLS remedial measures may
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationReview: Second Half of Course Stat 704: Data Analysis I, Fall 2014
Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationChapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models
Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models
More informationRegression Diagnostics Procedures
Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the
More informationLinear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34
Linear Regression 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34 Regression analysis is a statistical methodology that utilizes the relation between
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationREGRESSION DIAGNOSTICS AND REMEDIAL MEASURES
REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes
More informationNeed for Several Predictor Variables
Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables
More informationChapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )
Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 ) and Neural Networks( 類神經網路 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples of nonlinear
More informationRegression Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated
More informationLecture One: A Quick Review/Overview on Regular Linear Regression Models
Lecture One: A Quick Review/Overview on Regular Linear Regression Models Outline The topics to be covered include: Model Specification Estimation(LS estimators and MLEs) Hypothesis Testing and Model Diagnostics
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationRegression Diagnostics for Survey Data
Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics
More informationBeam Example: Identifying Influential Observations using the Hat Matrix
Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the
More informationChapter 2 Inferences in Regression and Correlation Analysis
Chapter 2 Inferences in Regression and Correlation Analysis 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 2 1 / 102 Inferences concerning the regression parameters
More informationStatistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm
Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More information1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.
1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationDiagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationQuantitative Methods I: Regression diagnostics
Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear
More informationSections 7.1, 7.2, 7.4, & 7.6
Sections 7.1, 7.2, 7.4, & 7.6 Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 25 Chapter 7 example: Body fat n = 20 healthy females 25 34
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationNo other aids are allowed. For example you are not allowed to have any other textbook or past exams.
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationRemedial Measures for Multiple Linear Regression Models
Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline
More informationSTA 4210 Practise set 2a
STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income
More information6.1 Introduction. Regression Model:
6.1 Introduction Regression Model: y = Xβ + ɛ Assumptions: 1. The relationship between y and the predictors is linear. 2. The noise term has zero mean. ɛ 3. All ε s have the same variance σ 2. 4. The ε
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationRidge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014
Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationStatistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm
Statistics GIDP Ph.D. Qualifying Exam Methodology May 26 9:00am-1:00pm Instructions: Put your ID (not name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you wish to have graded.
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationChapter 1 Linear Regression with One Predictor Variable
Chapter 1 Linear Regression with One Predictor Variable 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 1 1 / 41 Regression analysis is a statistical methodology
More informationRegression Diagnostics
Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationPrediction Intervals in the Presence of Outliers
Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationRegression coefficients may even have a different sign from the expected.
Multicolinearity Diagnostics : Some of the diagnostics e have just discussed are sensitive to multicolinearity. For example, e kno that ith multicolinearity, additions and deletions of data cause shifts
More informationAvailable online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)
10108 Available online at www.elixirpublishers.com (Elixir International Journal) Statistics Elixir Statistics 49 (2012) 10108-10112 The detention and correction of multicollinearity effects in a multiple
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationholding all other predictors constant
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationChapter 2 Multiple Regression (Part 4)
Chapter 2 Multiple Regression (Part 4) 1 The effect of multi-collinearity Now, we know to find the estimator (X X) 1 must exist! Therefore, n must be great or at least equal to p + 1 (WHY?) However, even
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationThe Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models
Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationChapter 6 Multiple Regression
STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationThe Effect of a Single Point on Correlation and Slope
Rochester Institute of Technology RIT Scholar Works Articles 1990 The Effect of a Single Point on Correlation and Slope David L. Farnsworth Rochester Institute of Technology This work is licensed under
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationECON 450 Development Economics
ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important
More informationDetecting and Assessing Data Outliers and Leverage Points
Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationRegression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr
Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More information6. Multiple Linear Regression
6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X
More informationDepartment of Mathematics The University of Toledo. Master of Science Degree Comprehensive Examination Applied Statistics.
Department of Mathematics The University of Toledo Master of Science Degree Comprehensive Examination Applied Statistics April 8, 205 nstructions Do all problems. Show all of your computations. Prove all
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More information3. Diagnostics and Remedial Measures
3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationThe Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects
Int. J. Contemp. Math. Sciences, Vol. 3, 2008, no. 17, 839-859 The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects Jung-Tsung Chiang Department of Business Administration
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationMLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationMultiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1
Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationSTAT Checking Model Assumptions
STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)
More informationChapter 9 Other Topics on Factorial and Fractional Factorial Designs
Chapter 9 Other Topics on Factorial and Fractional Factorial Designs 許湘伶 Design and Analysis of Experiments (Douglas C. Montgomery) hsuhl (NUK) DAE Chap. 9 1 / 26 The 3 k Factorial Design 3 k factorial
More informationSingle and multiple linear regression analysis
Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationRegression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics
Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationUnit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationSTA 4210 Practise set 2b
STA 410 Practise set b For all significance tests, use = 0.05 significance level. S.1. A linear regression model is fit, relating fish catch (Y, in tons) to the number of vessels (X 1 ) and fishing pressure
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationChapter 12: Multiple Regression
Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x
More informationCHAPTER6 LINEAR REGRESSION
CHAPTER6 LINEAR REGRESSION YI-TING HWANG DEPARTMENT OF STATISTICS NATIONAL TAIPEI UNIVERSITY EXAMPLE 1 Suppose that a real-estate developer is interested in determining the relationship between family
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationChapter 2 Multiple Regression I (Part 1)
Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationLinear Algebra Review
Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More information