Multicollinearity Exercise

Similar documents
3 Variables: Cyberloafing Conscientiousness Age

Effect of Centering and Standardization in Moderation Analysis

Failure Time of System due to the Hot Electron Effect

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Statistics 5100 Spring 2018 Exam 1

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Chapter 8 Quantitative and Qualitative Predictors

ssh tap sas913, sas

Handout 1: Predicting GPA from SAT

Topic 18: Model Selection and Diagnostics

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Chapter 11 : State SAT scores for 1982 Data Listing

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Lecture 4: Multivariate Regression, Part 2

Lecture 11 Multiple Linear Regression

CHAPTER 3: Multicollinearity and Model Selection

Lecture 11: Simple Linear Regression

Model Selection Procedures

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

General Linear Model (Chapter 4)

Lab # 11: Correlation and Model Fitting

Lecture 1 Linear Regression with One Predictor Variable.p2

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Multicollinearity: What Is It and What Can We Do About It?

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

STAT 350: Summer Semester Midterm 1: Solutions

Topic 16: Multicollinearity and Polynomial Regression

Chapter 1 Linear Regression with One Predictor

STATISTICS 479 Exam II (100 points)

Lecture notes on Regression & SAS example demonstration

Topic 14: Inference in Multiple Regression

Residuals from regression on original data 1

Lecture 4: Multivariate Regression, Part 2

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Statistics for exp. medical researchers Regression and Correlation

Chapter 8 (More on Assumptions for the Simple Linear Regression)

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Overview Scatter Plot Example

The program for the following sections follows.

SPECIAL TOPICS IN REGRESSION ANALYSIS

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

y response variable x 1, x 2,, x k -- a set of explanatory variables

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Lecture 3: Inference in SLR

At this point, if you ve done everything correctly, you should have data that looks something like:

Chapter 6 Multiple Regression

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

STOR 455 STATISTICAL METHODS I

Week 7.1--IES 612-STA STA doc

Topic 20: Single Factor Analysis of Variance

7.3 Ridge Analysis of the Response Surface

Booklet of Code and Output for STAC32 Final Exam

Lecture 12 Inference in MLR

In Class Review Exercises Vartanian: SW 540

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Measuring relationships among multiple responses

Department of Mathematics The University of Toledo. Master of Science Degree Comprehensive Examination Applied Statistics.

Practice exam questions

Chap 10: Diagnostics, p384

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

ECON3150/4150 Spring 2016

The Steps to Follow in a Multiple Regression Analysis

17 - LINEAR REGRESSION II

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

CHAPTER 4: Forecasting by Regression

Econometrics Homework 1

14.32 Final : Spring 2001

Gregory Carey, 1998 Regression & Path Analysis - 1 MULTIPLE REGRESSION AND PATH ANALYSIS

SAMPLE QUESTIONS. Research Methods II - HCS 6313

Comparison of a Population Means

Correlation and the Analysis of Variance Approach to Simple Linear Regression

1 A Review of Correlation and Regression

Sections 7.1, 7.2, 7.4, & 7.6

Answer Key: Problem Set 6

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Problem Set 1 ANSWERS

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Introduction to Regression

Booklet of Code and Output for STAC32 Final Exam

EXST7015: Estimating tree weights from other morphometric variables Raw data print

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Analysis of Covariance

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

Transcription:

Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there is no need to print anything... Identify which variables are key participants in the most serious near linear dependency in the data. Hint: Look at the Variance Decomposition Proportions associated with the smallest eigenvalue of X X. 2. Which variable has the wrong sign for its coefficient in this regression? Explain why its sign is wrong. 3. What is the smallest value of the ridge constant (k) that fixes the sign of the coefficient you named in #2? 4. What is the smallest value of the ridge constant (k) that reduces all VIF s so that they are below the guideline of 0? 5. What is the smallest value of k that seems (in your opinion) to stabilize the coefficients? 6. If one principal component is removed, give the estimated coefficients for X, X2, X3, X4. Does this fix the one with the wrong sign? ******************************************************************* ************ LAW SCHOOL ADMISSION DATA ****************** **************** PARTLY FROM PAGE 599 OF SMITH *************** *******************************************************************; **** DATA FOR 20 STUDENTS ****** Y IS THE LAW SCHOOL GPA X IS THE UNDERGRADUATE SCHOOL GPA X2 IS THE LMAT PERCENTILE X3 IS A RATING OF THE UNDERGRADUATE SCHOOL QUALITY X4 IS THE GRE SCORE; DATA LAW; INPUT Y X X2 X3 X4 NO $; CARDS; 3.42 3.28.96 6 330 3.60 3.8.97 7 370 2 3.28 2.89.93 5 40 3 3.75 3.72.99 8 520 4 3.36 3.8.95 6 270 5 3.96 3.50.98 8 450 6 3.3 3.04.94 5 200 7 3.33 3.87.95 5 340 8 3.60 3.54.96 7 350 9 4.00 3.27.99 0 480 a 3.28 3.30.95 5 280 b 3.44 3.29.9 7 080 c 3.25 3.7.93 5 70 d 3.75 3.62.97 8 40 e 3.30 3.34.96 5 330 f 3.20 3.08.90 4 00 g 3.50 3.37.96 6 340 h 3.28 3.6.94 5 220 i 3.7 3.20.95 4 270 j 3.3 3.0.94 5 20 k ; TITLE 'LAW SCHOOL ADMISSIONS DATA'; PROC CORR; VAR Y X X2 X3 X4; PROC REG; MODEL Y=X X2 X3 X4 / COLLIN VIF; PROC REG RIDGE = 0 TO.0 BY.00 OUTEST=B; MODEL Y=X X2 X3 X4 ; PROC PRINT; PROC PLOT; PLOT (X X2 X3 X4) * _RIDGE_ / VREF=0 VPOS=25 HPOS=45; PROC REG DATA = LAW RIDGE= 0 TO.0 BY.00 OUTEST=C OUTVIF; MODEL Y=X X2 X3 X4; PROC PRINT; PROC REG DATA = LAW PCOMIT= 2 3 OUTEST=C; MODEL Y=X X2 X3 X4; PROC PRINT; run;

The CORR Procedure 5 Variables: Y X X2 X3 X4 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Y 20 3.45450 0.24522 69.09000 3.7000 4.00000 X 20 3.30500 0.2450 66.0000 2.89000 3.87000 X2 20 0.9550 0.02346 9.03000 0.90000 0.99000 X3 20 6.05000.5796 2.00000 4.00000 0.00000 X4 20 289 3.598 25770 00 520 Pearson Correlation Coefficients, N = 20 Prob > r under H0: Rho=0 Y X X2 X3 X4 Y.00000 0.4733 0.76094 0.95925 0.76574 0.0350 <.000 <.000 <.000 X 0.4733.00000 0.529 0.42078 0.65377 0.0350 0.064 0.0647 0.008 X2 0.76094 0.529.00000 0.69724 0.9878 <.000 0.064 0.0006 <.000 X3 0.95925 0.42078 0.69724.00000 0.69983 <.000 0.0647 0.0006 0.0006 X4 0.76574 0.65377 0.9878 0.69983.00000 <.000 0.008 <.000 0.0006

The REG Procedure Model: MODEL Dependent Variable: Y Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4.0743 0.26786 56.54 <.000 Error 5 0.0706 0.00474 Corrected Total 9.4249 Root MSE 0.06883 RSquare 0.9378 Dependent Mean 3.45450 Adj RSq 0.922 Coeff Var.99243 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Variance Inflation Intercept 2.37864 24.38266 0.0 0.9236 0 X 0.2572 0.64288 0.20 0.8476 96.67364 X2 6.05826 32.4872 0.9 0.853 2280.94358 X3 0.2977 0.047 9.6 <.000.9905 X4 0.00087848 0.00646 0.4 0.8937 2880.98384 Collinearity Diagnostics Number Eigenvalue Condition Index Proportion of Variation Intercept X X2 X3 X4 4.95347.00000.6469E8 0.0000022.027655E8 0.0020.375895E7 2 0.04096 0.9976 0.000005 0.00006334 4.377599E7 0.56236 6.249609E8

Number Eigenvalue Condition Index Collinearity Diagnostics Proportion of Variation Intercept X X2 X3 X4 3 0.00348 37.70759 0.00003532 0.0026 0.0000063 0.3407 0.0004449 4 0.00209 48.668 3.79043E7 0.0504 0.00000729 0.73 0.00052977 5.475688E7 5793.782 0.99996 0.98274 0.99999 0.0064 0.99906

The REG Procedure Model: MODEL Dependent Variable: Y

The REG Procedure Model: MODEL Dependent Variable: Y

MODEL TYPE DEPVAR RIDGE RMSE Intercept X X2 X3 X4 Y MODEL PARM S Y. 0.068829 2.37864 0.25 72 2 MODEL RIDGE Y 0.000 0.068829 2.37864 0.25 72 3 MODEL RIDGE Y 0.00 0.06887 0.89844 0.039 89 4 MODEL RIDGE Y 0.002 0.068880.7880 0.032 49 5 MODEL RIDGE Y 0.003 0.068886.28047 0.029 77 6 MODEL RIDGE Y 0.004 0.068892.3340 0.028 38 7 MODEL RIDGE Y 0.005 0.068899.36094 0.027 55 8 MODEL RIDGE Y 0.006 0.068907.37947 0.027 00 9 MODEL RIDGE Y 0.007 0.06895.3960 0.026 63 0 2 MODEL RIDGE Y 0.008 0.068925.39968 0.026 37 MODEL RIDGE Y 0.009 0.068935.40504 0.026 7 MODEL RIDGE Y 0.00 0.068947.40850 0.026 03 6.058 26 6.058 26.736 02.365 24.230 08.6 82.2 78.096 27.079 20.067 48.059 35.053 73 0.29 77 0.29 77 0.29 32 0.29 07 0.28 83 0.28 59 0.28 36 0.28 3 0.27 90 0.27 67 0.27 44 0.27 2.0008784 80.0008784 80.0000077 58 0.000068 635 0.000097 650 0.0003 203 0.00023 073 0.00030 0 0.00035 238 0.00039 379 0.00042 787 0.00045 676

Plot of X*_RIDGE_. Legend: A = obs, B = 2 obs, etc. X 0.5 ˆ A 0.0 ˆ 0.05 ˆ A A A A A A A A A A 0.00 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.

etc. Plot of X2*_RIDGE_. Legend: A = obs, B = 2 obs, X2 6 ˆ A 4 ˆ 2 ˆ A A A A A A A A A A 0 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.

etc. Plot of X3*_RIDGE_. Legend: A = obs, B = 2 obs, X3 0.5 ˆ A A A A A A A A A A A 0.0 ˆ 0.05 ˆ 0.00 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.

Plot of X4*_RIDGE_. Legend: A = obs, B = 2 obs, etc. X4 0.00025 ˆ A A A A A A A A A 0 ˆƒƒƒƒƒAƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0.00025 ˆ 0.0005 ˆ 0.00075 ˆ A 0.00 ˆ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.

MODEL TYPE DE P LAW SCHOOL ADMISSIONS DATA RIDGE RMSE Intcpt X X2 X3 X4 MODEL PARMS Y.. 0.068829 2.37864 0.257 6.06 0.2977 0.00 2 MODEL RIDGEVIF Y 0.000... 96.6736 2280.94.9905 2880.98 3 MODEL RIDGE Y 0.000. 0.068829 2.37864 0.257 6.06 0.2977 0.00 4 MODEL RIDGEVIF Y 0.00... 3.9053 59.05.95543 74.08 5 MODEL RIDGE Y 0.00. 0.06887 0.89844 0.0399.74 0.2932 0.00 6 MODEL RIDGEVIF Y 0.002... 2.86 7.99.94548 22.2 7 MODEL RIDGE Y 0.002. 0.068880.7880 0.0325.37 0.2907 0.00 8 MODEL RIDGEVIF Y 0.003....802 8.89.93597 0.72 9 MODEL RIDGE Y 0.003. 0.068886.28047 0.0298.23 0.2883 0.00 0 MODEL RIDGEVIF Y 0.004....6537 5.48.92660 6.4 MODEL RIDGE Y 0.004. 0.068892.3340 0.0284.6 0.2859 0.00 2 MODEL RIDGEVIF Y 0.005....5803 3.84.9732 4.34 3 MODEL RIDGE Y 0.005. 0.068899.36094 0.0275.2 0.2836 0.00 4 MODEL RIDGEVIF Y 0.006....5373 2.93.908 3.9 5 MODEL RIDGE Y 0.006. 0.068907.37947 0.0270.0 0.283 0.00 6 MODEL RIDGEVIF Y 0.007....5090 2.37.89898 2.48 7 MODEL RIDGE Y 0.007. 0.06895.3960 0.0266.08 0.2790 0.00 8 MODEL RIDGEVIF Y 0.008....4887 2.00.88992 2.02 9 MODEL RIDGE Y 0.008. 0.068925.39968 0.0264.07 0.2767 0.00 20 MODEL RIDGEVIF Y 0.009....473.74.88093.70 2 MODEL RIDGE Y 0.009. 0.068935.40504 0.0262.06 0.2744 0.00 22 MODEL RIDGEVIF Y 0.00....4606.55.8720.47 23 MODEL RIDGE Y 0.00. 0.068947.40850 0.0260.05 0.272 0.00

Obs MODEL TYPE D E P PCOMIT RMSE Intcpt X X2 X3 X4 MODEL PARMS Y. 0.06883 2.3786 0.257 6.0583 0.298.0008785 2 MODEL IPC Y. 0.06670.5276 0.0235 0.9076 0.295 0.000569 3 MODEL IPC Y. 2 0.0775 0.6633 0.375 3.6579 0.0658 0.0005384 4 MODEL IPC Y. 3 0.3095 0.7603 0.2087 2.782 0.0357 0.000538