Chapter 8 Quantitative and Qualitative Predictors

Similar documents
Topic 16: Multicollinearity and Polynomial Regression

Statistics 512: Applied Linear Models. Topic 4

Chapter 1 Linear Regression with One Predictor

Lecture 18 Miscellaneous Topics in Multiple Regression

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 20: Single Factor Analysis of Variance

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

3 Variables: Cyberloafing Conscientiousness Age

Chapter 6 Multiple Regression

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

Chapter 2 Inferences in Simple Linear Regression

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Multicollinearity Exercise

Lecture 11 Multiple Linear Regression

Topic 18: Model Selection and Diagnostics

General Linear Model (Chapter 4)

Lecture notes on Regression & SAS example demonstration

Lecture 1 Linear Regression with One Predictor Variable.p2

Overview Scatter Plot Example

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Lecture 11: Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Topic 28: Unequal Replication in Two-Way ANOVA

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

Lab # 11: Correlation and Model Fitting

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

STAT 3A03 Applied Regression With SAS Fall 2017

STOR 455 STATISTICAL METHODS I

Statistics 5100 Spring 2018 Exam 1

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Effect of Centering and Standardization in Moderation Analysis

Outline Topic 21 - Two Factor ANOVA

Topic 29: Three-Way ANOVA

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

STAT 350: Summer Semester Midterm 1: Solutions

Topic 32: Two-Way Mixed Effects Model

ST Correlation and Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Failure Time of System due to the Hot Electron Effect

Answer to exercise 'height vs. age' (Juul)

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Model Selection Procedures

SPECIAL TOPICS IN REGRESSION ANALYSIS

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Topic 14: Inference in Multiple Regression

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Chapter 12: Multiple Regression

Statistics for exp. medical researchers Regression and Correlation

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

STAT5044: Regression and Anova

Use of Dummy (Indicator) Variables in Applied Econometrics

Handout 1: Predicting GPA from SAT

Analysis of variance and regression. November 22, 2007

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Lecture 10 Multiple Linear Regression

MATH 644: Regression Analysis Methods

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Design of Engineering Experiments Chapter 5 Introduction to Factorials

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Inferences for Regression

Chapter 4. Regression Models. Learning Objectives

ssh tap sas913, sas

Statistics 512: Applied Linear Models. Topic 1

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Lecture 12 Inference in MLR

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Inference for Regression

Sections 7.1, 7.2, 7.4, & 7.6

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chap 10: Diagnostics, p384

Course Information Text:

y response variable x 1, x 2,, x k -- a set of explanatory variables

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

STATISTICS 110/201 PRACTICE FINAL EXAM

Regression Models - Introduction

Analysis of Covariance

CHAPTER 4: Forecasting by Regression

Lecture 13 Extra Sums of Squares

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Statistical Techniques II EXST7015 Simple Linear Regression

Chapter 4: Regression Models

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Chapter 8: Regression Models with Qualitative Predictors

using the beginning of all regression models

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

Notes 6. Basic Stats Procedures part II

df=degrees of freedom = n - 1

Categorical Predictor Variables

A discussion on multiple regression models

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

6. Multiple regression - PROC GLM

Transcription:

STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang

Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic, cubic polynomial relationships Can often lead to a multicollinearity problem unless original variables are centered Centering X ij = X ij X j 8-1

Example: Power Cell (p.300) Response variable is the life (in cycles) of a power cell Explanatory variables are Charge rate (3 levels) Temperature (3 levels) This is a designed experiment Notice X i1 X i2 = 0 r(x 1,X 2 ) = 0 Standardizing the explanatory variables Newly coded values x ij = 0,±1 Notice x i1 x 2 i1 = 0 r(x 1,x 2 1 ) = 0 Notice x i2 x 2 i2 = 0 r(x 2,x 2 2 ) = 0 8-2

options nocenter; data a1; infile U:\.www\datasets525\Ch07ta09.txt ; input cycles chrate temp; proc print data=a1; run; data a1; set a1; chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp; proc reg data=a1; model cycles=chrate temp chrate2 temp2 ct; run; 8-3

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 55366 11073 10.57 0.0109 Error 5 5240.43860 1048.08772 Corrected Total 10 60606 Root MSE 32.37418 R-Square 0.9135 Dependent Mean 172.00000 Adj R-Sq 0.8271 Coeff Var 18.82220 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 337.72149 149.96163 2.25 0.0741 chrate 1-539.51754 268.86033-2.01 0.1011 temp 1 8.91711 9.18249 0.97 0.3761 chrate2 1 171.21711 127.12550 1.35 0.2359 temp2 1-0.10605 0.20340-0.52 0.6244 ct 1 2.87500 4.04677 0.71 0.5092 MULTICOLLINEARITY? Overall significant model but no individual variables significant 8-4

proc corr data=a1; var chrate temp chrate2 temp2 ct; run; quit; Pearson Correlation Coefficients, N = 11 Prob > r under H0: Rho=0 chrate temp chrate2 temp2 ct chrate 1.00000 0.00000 0.99103 0.00000 0.60532 1.0000 <.0001 1.0000 0.0485 temp 0.00000 1.00000 0.00000 0.98609 0.75665 1.0000 1.0000 <.0001 0.0070 chrate2 0.99103 0.00000 1.00000 0.00592 0.59989 <.0001 1.0000 0.9862 0.0511 temp2 0.00000 0.98609 0.00592 1.00000 0.74613 1.0000 <.0001 0.9862 0.0084 ct 0.60532 0.75665 0.59989 0.74613 1.00000 0.0485 0.0070 0.0511 0.0084 As anticipated, high correlation between X 1 and X 2 1 as well as X 2 and X 2 2 8-5

SAS Codes: Standardizing data a2; set a1; schrate=chrate; stemp=temp; keep cycles schrate stemp; proc standard data=a2 out=a3 mean=0 std=1; var schrate stemp; proc print data=a3; run; data a3; set a3; schrate2=schrate*schrate; stemp2=stemp*stemp; sct=schrate*stemp; proc reg data=a3; model cycles=schrate stemp schrate2 stemp2 sct; run; quit; Note that this standardizes values so the mean is zero and variance 1. The key aspect is centering to remove the collinearity. 8-6

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 55366 11073 10.57 0.0109 Error 5 5240.43860 1048.08772 Corrected Total 10 60606 Root MSE 32.37418 R-Square 0.9135 Dependent Mean 172.00000 Adj R-Sq 0.8271 Coeff Var 18.82220 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 162.84211 16.60761 9.81 0.0002 schrate 1-43.24831 10.23762-4.22 0.0083 stemp 1 58.48205 10.23762 5.71 0.0023 schrate2 1 16.43684 12.20405 1.35 0.2359 stemp2 1-6.36316 12.20405-0.52 0.6244 sct 1 6.90000 9.71225 0.71 0.5092 Notice that this is the same overall F test but now the two main effects are significant. A linear model appears reasonable. Could do a general linear test (test schrate2,stemp2, sct;). Notice also that the p-values here are the same for the coded variable analysis but the coefficients are different. 8-7

Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable depends on the value of another variable Model this relationship as the product of predictors Special Cases: One binary (Y/N) and one continuous Two continuous predictors 8-8

Interaction Models: Special Case #1 X 1 = 0 or 1 identifying two groups X 2 is a continuous variable Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i1 X i2 +ε i When X 1 = 0 (Group 1) Y i = β 0 +β 2 X i2 +ε i When X 1 = 1 (Group 2) Y i = (β 0 +β 1 )+(β 2 +β 3 )X i2 +ε i 8-9

Results in two regression lines β 2 is the slope for Group 1 β 2 +β 3 is the slope for Group 2 Similar relationship for the intercepts Three Hypotheses of Interest H 0 : β 1 = β 3 = 0 : regression lines are the same H 0 : β 1 = 0 : intercepts are the same H 0 : β 3 = 0 : slopes are the same 8-10

Example: Insurance Innovation (p.316) Y is the number of months for an insurance company to adopt an innovation X 1 is the size of the firm X 2 is the type of firm X 2 = 0 mutual fund firm X 2 = 1 stock firm Do stock firms adopt innovation faster? Is this true regardless of size? data a1; infile U:\.www\datasets525\Ch8ta02.txt ; input months size stock; 8-11

/* Scatterplot */ symbol1 v=m i=sm70 c=black l=1; symbol2 v=s i=sm70 c=black l=3; proc sort data=a1; by stock size; proc gplot data=a1; plot months*size=stock/frame; run; 8-12

Investigate the model: months = size stock size*stock Test whether different types of firms adopt innovation at different paces regardless of size data a1; set a1; sizestoc=size*stock; proc reg data=a1; model months=size stock sizestoc; test stock, sizestock; run; 8-13

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1504.41904 501.47301 45.49 <.0001 Error 16 176.38096 11.02381 Corrected Total 19 1680.80000 Root MSE 3.32021 R-Square 0.8951 Dependent Mean 19.40000 Adj R-Sq 0.8754 Coeff Var 17.11450 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 33.83837 2.44065 13.86 <.0001 size 1-0.10153 0.01305-7.78 <.0001 stock 1 8.13125 3.65405 2.23 0.0408 sizestoc 1-0.00041714 0.01833-0.02 0.9821 Test 1 Results for Dependent Variable months Mean Source DF Square F Value Pr > F Numerator 2 158.12584 14.34 0.0003 Denominator 16 11.02381 8-14

Investigate the model: months = size stock proc reg data=a1; model months=size stock; run; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1504.41333 752.20667 72.50 <.0001 Error 17 176.38667 10.37569 Corrected Total 19 1680.80000 Root MSE 3.22113 R-Square 0.8951 Dependent Mean 19.40000 Adj R-Sq 0.8827 Coeff Var 16.60377 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 33.87407 1.81386 18.68 <.0001 size 1-0.10174 0.00889-11.44 <.0001 stock 1 8.05547 1.45911 5.52 <.0001 No apparent interaction here. The slope of the line does not depend on the type of firm. 8-15

/* Constant Slope */ symbol1 v=m i=rl c=black; symbol2 v=s i=rl c=black; proc gplot data=a1; plot months*size=stock/frame; run; quit; 8-16

Interaction Models: Special Case #2 X 1 and X 2 are continuous variables Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i1 X i2 +ε i Can be written Y i = β 0 +(β 1 +β 3 X i2 )X i1 +β 2 X i2 +ε i Y i = β 0 +β 1 X i1 +(β 2 +β 3 X i1 )X i2 +ε i The coefficient of one explanatory variable depends on the value of the other explanatory variable Cannot discuss each predictor individually 8-17

Qualitative Predictors Coding a Variable with Two Classes The Insurance Innovation example: X 1 = size of firm Either stock firm or mutual fund firm in the study = a qualitative predictor with two classes, X 2 = { 1, if stock firm 0, otherwise X 3 = Cannot include both indicators in the model { 1, if mutual fund 0, otherwise The model below contains perfectly collinear columns SAS will drop the last column Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i3 +ε i The additive model below is appropriate Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i interpretation: E{Y i } = β 0 +β 1 X i1 if mutual fund E{Y i } = (β 0 +β 2 )+β 1 X i1 if stock firm 8-18

Alternative Coding of a Two-Class Variable An alternative coding X 2 = { 1, if stock firm 1, if mutual fund The additive model is still appropriate interpretation: Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i E{Y i } = (β 0 +β 2 )+β 1 X i1 E{Y i } = (β 0 β 2 )+β 1 X i1 if stock firm if mutual fund β 0 is an average intercept of regression line will use this coding for Analysis of Variance 8-19

Coding a Variable with Three Classes First option: extend the indicator X 2 = 0, if mutual fund 1, if stock firm 2, if foreign firm The model below is still appropriate Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i interpretation: enforces an equal change in E{Y} for each extra indicator E{Y i } = β 0 +β 1 X i1 E{Y i } = (β 0 +β 2 )+β 1 X i1 E{Y i } = (β 0 +2β 2 )+β 1 X i1 if mutual fund if stock firm if foreign firm may be too restrictive 8-20

Alternative Coding of a Three-Class Variable Second option: X 2 = { 1, if stock firm 0, otherwise X 3 = { 1, if foreign firm 0, otherwise The model below contains two indicators interpretation: Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i3 +ε i E{Y i } = β 0 +β 1 X i1 E{Y i } = (β 0 +β 2 )+β 1 X i1 E{Y i } = (β 0 +β 3 )+β 1 X i1 if mutual fund if stock firm if foreign firm also more flexibility in presence of interactions X 1 X 2 and X 1 X 3 8-21

Constrained Regression At times may want to put constraints on regression coefficients β 1 = 5 β 1 = β 2 Can do this in SAS by redefining explanatory variables Page 268, redefine so model is Ỹ vs X 2 Can also use RESTRICT statement in PROC REG PROC REG DATA=test; MODEL Y=X1 X2 X3; RUN; QUIT; Restrict β 1 =5: RESTRICT X1=5; Restrict β 1 = β 2 : RESTRICT X1=X2; 8-22

Chapter Review Polynomial Regression Interaction Models Qualitative Predictors Constrained Regression 8-23