Chap 10: Diagnostics, p384

Similar documents
[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Negative Binomial Regression

Comparison of Regression Lines

17 - LINEAR REGRESSION II

Chapter 15 Student Lecture Notes 15-1

Chapter 15 - Multiple Regression

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

SIMPLE LINEAR REGRESSION and CORRELATION

Topic 7: Analysis of Variance

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Chapter 13: Multiple Regression

β0 + β1xi. You are interested in estimating the unknown parameters β

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Basic Business Statistics, 10/e

Dependent variable for case i with variance σ 2 g i. Number of distinct cases. Number of independent variables

General Linear Models

Statistics for Economics & Business

Biostatistics 360 F&t Tests and Intervals in Regression 1

Topic 18: Model Selection and Diagnostics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Question 1 carries a weight of 25%; question 2 carries 20%; question 3 carries 25%; and question 4 carries 30%.

S136_Reviewer_002 Statistics 136 Introduction to Regression Analysis Reviewer for the 2 nd Long Examination

The SAS program I used to obtain the analyses for my answers is given below.

Statistics for Business and Economics

STATISTICS QUESTIONS. Step by Step Solutions.

Chapter 11: Simple Linear Regression and Correlation

On the detection of influential outliers in linear regression analysis

Chapter 9: Statistical Inference and the Relationship between Two Variables

Unit 10: Simple Linear Regression and Correlation

Chapter 14 Simple Linear Regression

β0 + β1xi. You are interested in estimating the unknown parameters β

Collinearity in regression (the dreaded disease, and how to live with it)

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

β0 + β1xi and want to estimate the unknown

Statistics MINITAB - Lab 2

Lecture 4 Hypothesis Testing

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Correlation and Regression

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

UNIVERSITY OF TORONTO. Faculty of Arts and Science JUNE EXAMINATIONS STA 302 H1F / STA 1001 H1F Duration - 3 hours Aids Allowed: Calculator

T E C O L O T E R E S E A R C H, I N C.

( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X.

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Introduction to Regression

Systems of Equations (SUR, GMM, and 3SLS)

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Regression. The Simple Linear Regression Model

Introduction to Generalized Linear Models

The Ordinary Least Squares (OLS) Estimator

STAT 3008 Applied Regression Analysis

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Addressing Alternative Explanations: Multiple Regression

Diagnostics in Poisson Regression. Models - Residual Analysis

REGRESSION ANALYSIS II- MULTICOLLINEARITY

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

January Examinations 2015

Addressing Alternative. Multiple Regression Spring 2012

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

EXAMINATION. N0028N Econometrics. Luleå University of Technology. Date: (A1016) Time: Aid: Calculator and dictionary

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

General Linear Model (Chapter 4)

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Learning Objectives for Chapter 11

Lecture 1 Linear Regression with One Predictor Variable.p2

First Year Examination Department of Statistics, University of Florida

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

x i1 =1 for all i (the constant ).

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Professor Chris Murray. Midterm Exam

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Economics 130. Lecture 4 Simple Linear Regression Continued

Topic 23 - Randomized Complete Block Designs (RCBD)

A METHOD FOR DETECTING OUTLIERS IN FUZZY REGRESSION

3 Variables: Cyberloafing Conscientiousness Age

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

STA302/1001-Fall 2008 Midterm Test October 21, 2008

A dummy variable equal to 1 if the nearby school is in regular session and 0 otherwise;

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Estimation Methods for Multicollinearity Proplem Combined with High Leverage Data Points

F8: Heteroscedasticity

Chapter 8 Quantitative and Qualitative Predictors

An R implementation of bootstrap procedures for mixed models

Lecture 2: Prelude to the big shrink

Transcription:

Chap 10: Dagnostcs, p384 Multcollnearty 10.5 p406 Defnton Multcollnearty exsts when two or more ndependent varables used n regresson are moderately or hghly correlated. - when multcollnearty exsts, regresson results can be confusng and msleadng. For example n a multple regresson model all partal slopes wll be sgnfcant wth a sgnfcant global F-test. Sgns of the regresson coeffcents mght not make sense. - Varance nflaton factors Tolerance : T = 1 R 2 VIF = 1 = 1 T 1 R 2 VIF = r 1 XX 1

/*Multcollnearty*/ optons ls=75; data cgar; nfle 'cgar.txt' frstobs=2; nput Row co tar ncotne weght; proc reg; model co = tar ncotne weght/ tol vf; model tar = ncotne weght; model ncotne = tar weght; model weght = tar ncotne; proc corr; var tar ncotne weght; run; Row co tar ncotne weght 1 13.6 14.1 0.86 0.9853 2 16.6 16.0 1.06 1.0938 3 23.5 29.8 2.03 1.1650 4 10.2 8.0 0.67 0.9280 5 5.4 4.1 0.40 0.9462 6 15.0 15.0 1.04 0.8885 7 9.0 8.8 0.76 1.0267 8 12.3 12.4 0.95 0.9225 9 16.3 16.6 1.12 0.9372 10 15.4 14.9 1.02 0.8858 11 13.0 13.7 1.01 0.9643 12 14.4 15.1 0.90 0.9316 13 10.0 7.8 0.57 0.9705 14 10.2 11.4 0.78 1.1240 15 9.5 9.0 0.74 0.8517 16 1.5 1.0 0.13 0.7851 17 18.5 17.0 1.26 0.9186 18 12.6 12.8 1.08 1.0395 19 17.5 15.8 0.96 0.9573 20 4.9 4.5 0.42 0.9106 21 15.9 14.5 1.01 1.0070 22 8.5 7.3 0.61 0.9806 23 10.6 8.6 0.69 0.9693 24 13.9 15.2 1.02 0.9496 25 14.9 12.0 0.82 1.1184 2

Example (Cgarette) The REG Procedure Model: MODEL1 Dependent Varable: co Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 495.25781 165.08594 78.98 <.0001 Error 21 43.89259 2.09012 Corrected Total 24 539.15040 Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Tolerance Intercept 1 3.20219 3.46175 0.93 0.3655. tar 1 0.96257 0.24224 3.97 0.0007 0.04623 ncotne 1-2.63166 3.90056-0.67 0.5072 0.04566 weght 1-0.13048 3.88534-0.03 0.9735 0.74970 Parameter Estmates Varance Varable DF Inflaton Intercept 1 0 tar 1 21.63071 ncotne 1 21.89992 weght 1 1.33386 3

Model: MODEL2 Dependent Varable: tar Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 734.81601 367.40801 226.94 <.0001 Error 22 35.61759 1.61898 Corrected Total 24 770.43360 Root MSE 1.27239 R-Square 0.9538 Dependent Mean 12.21600 Adj R-Sq 0.9496 The REG Procedure Model: MODEL3 Dependent Varable: ncotne Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 2.87120 1.43560 229.90 <.0001 Error 22 0.13738 0.00624 Corrected Total 24 3.00858 Root MSE 0.07902 R-Square 0.9543 Dependent Mean 0.87640 Adj R-Sq 0.9502 Model: MODEL4 Dependent Varable: weght Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.04622 0.02311 3.67 0.0421 Error 22 0.13846 0.00629 Corrected Total 24 0.18468 Root MSE 0.07933 R-Square 0.2503 Dependent Mean 0.97028 Adj R-Sq 0.1821 4

Pearson Correlaton Coeffcents, N = 25 Prob > r under H0: Rho=0 tar ncotne weght tar 1.00000 0.97661 0.49077 <.0001 0.0127 ncotne 0.97661 1.00000 0.50018 <.0001 0.0109 weght 0.49077 0.50018 1.00000 0.127 0.0109 r XX 1.00000 0.97661 0.49077 = 0.97661 1.00000 0.50018 0.49077 0.50018 1.00000 21.6307-21.0918-0.06586 r 1 = -21.0918 21.8999-0.60285 XX -0.0659-0.6028 1.33386 5

Outlers 10.2 p390 Deleted resduals d = y yˆ( ) y = th observed response y = the predcted value of the th repose ˆ() when the data for the th observaton s deleted from the analyss. Studentzed deleted resduals p396 s 2 d MSE = 1 h and d t = s d d t = sd n p 1 = e MSE 2 (1 h ) e 1/2 6

The studentzed deleted resdual t has a dstrbuton that s approxmated by a t- dstrbuton wth (n-1)-p d.f. The approprate Bonferron crtcal value therefore s t(1 α / 2 n, n 1 p) (p396) (n-1)-p= (24-1)-(4+1) = 18 and t = 3.59213 (1 0.025/24,18) Example Row cty traffc sales cty1 cty2 cty3 1 1 59.3 6.3 1 0 0 2 1 60.3 6.6 1 0 0 3 1 82.1 7.6 1 0 0 4 1 32.3 3.0 1 0 0 5 1 98.0 9.5 1 0 0 6 1 54.1 5.9 1 0 0 7 1 54.4 6.1 1 0 0 8 1 51.3 5.0 1 0 0 9 1 36.7 3.6 1 0 0 10 2 23.6 2.8 0 1 0 11 2 57.6 6.7 0 1 0 12 2 44.6 5.2 0 1 0 13 3 75.8 82.0 0 0 1 14 3 48.3 5.0 0 0 1 15 3 41.4 3.9 0 0 1 16 3 52.5 5.4 0 0 1 17 3 41.0 4.1 0 0 1 18 3 29.6 3.1 0 0 1 19 3 49.5 5.4 0 0 1 7

20 4 73.1 8.4 0 0 0 21 4 81.3 9.5 0 0 0 22 4 72.4 8.7 0 0 0 23 4 88.4 10.6 0 0 0 24 4 23.2 3.3 0 0 0 /*outlers*/ optons ls=75; data nflu; nfle 'nflu.txt' frstobs=2; nput Row cty traffc sales cty1 cty2 cty3; proc reg; model sales = cty1 cty2 cty3 traffc / nfluence; output out=a cookd=cook h=h rstudent=tres; proc prnt data=a; var TRes h cook; run; 8

The REG Procedure Model: MODEL1 Dependent Varable: sales Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 1469.76287 367.44072 1.66 0.1996 Error 19 4194.22671 220.74877 Corrected Total 23 5663.98958 Root MSE 14.85762 R-Square 0.2595 Dependent Mean 9.07083 Adj R-Sq 0.1036 Coeff Var 163.79550 Model: MODEL1 Dependent Varable: sales Output Statstcs Hat Dag Cov Obs Resdual RStudent H Rato DFFITS 1 0.1348 0.009366 0.1112 1.4742 0.0033 2 0.0719 0.004998 0.1114 1.4747 0.0018 3-6.8387-0.4984 0.1809 1.4939-0.2342 4 6.6324 0.4891 0.2003 1.5339 0.2447 5-10.7084-0.8606 0.3081 1.5482-0.5743 6 1.6217 0.1129 0.1138 1.4735 0.0405 7 1.7129 0.1192 0.1135 1.4724 0.0427 8 1.7378 0.1213 0.1181 1.4799 0.0444 9 5.6357 0.4079 0.1731 1.5134 0.1866 10 4.5527 0.3791 0.3763 2.0190 0.2945 11-3.8850-0.3202 0.3647 2.0048-0.2426 12-0.6677-0.0536 0.3342 1.9667-0.0380 13 56.4638 179.3101 0.2394 0.0000 100.6096 14-10.5571-0.7589 0.1429 1.3061-0.3098 15-9.1533-0.6578 0.1489 1.3673-0.2752 16-11.6812-0.8439 0.1451 1.2625-0.3477 17-8.8082-0.6327 0.1497 1.3806-0.2654 18-5.6714-0.4141 0.1875 1.5381-0.1990 19-10.5926-0.7616 0.1430 1.3049-0.3111 20-1.6668-0.1224 0.2038 1.6389-0.0619 21-3.5423-0.2639 0.2237 1.6557-0.1417 22-1.1128-0.0817 0.2028 1.6408-0.0412 23-5.0187-0.3824 0.2548 1.6888-0.2236 24 11.3406 1.0336 0.4527 1.7946 0.9400 9

Obs TRes h cook 1 0.009 0.11115 0.00000 2 0.005 0.11143 0.00000 3-0.498 0.18091 0.01143 4 0.489 0.20027 0.01248 5-0.861 0.30814 0.06688 6 0.113 0.11384 0.00035 7 0.119 0.11350 0.00038 8 0.121 0.11815 0.00042 9 0.408 0.17305 0.00728 10 0.379 0.37626 0.01816 11-0.320 0.36468 0.01236 12-0.054 0.33424 0.00030 13 179.310 0.23944 1.19566 14-0.759 0.14286 0.01963 15-0.658 0.14894 0.01561 16-0.844 0.14511 0.02455 17-0.633 0.14966 0.01455 18-0.414 0.18752 0.00828 19-0.762 0.14304 0.01980 20-0.122 0.20375 0.00081 21-0.264 0.22369 0.00422 22-0.082 0.20285 0.00036 23-0.382 0.25483 0.01047 24 1.034 0.45268 0.17608 10

Leverage values 10.3 p398 - h = Leverage of the th observaton. The leverage values are the dagonal elements of the hat matrx H= X( XX ) 1X Observatons wth 2p n h >. Are consdered by ths rule to ndcate outlyng cases wth regard to ther X values.. Note p n s the average leverage. 11

Cook s dstance, p402 - A measure of the overall nfluence of an observaton on the estmated β coeffcents. Cook s dstance: ( ˆ ) 2 y y h D = pmse (1 h )2 -Note that D depends on both the resdual ( y yˆ ) and the leverage h. - A large value of D ndcates that the th observaton has a strong nfluence on the estmated β coeffcents. - Values of D can be compared to the values of the F(p, n-p) Usually an observaton that falls above the 50 th percentle of the F dstrbuton s consdered to be an nfluental observaton. In fast-food sales example n = 24, p = 5 numerator d. f. = 5 and denomnator d. f = 24-5 = 19 and F = 0.9020 0.50 12