Chapter 6 Multiple Regression

Size: px

Start display at page:

Download "Chapter 6 Multiple Regression"

David Reed
5 years ago
Views:

1 STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang

2 The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight, Diet, Fitness Level Traffic Count vs Time, Location, Population, Month Goal: There is a total amount of variation in Y (SSTO). We want to explain as much of this variation as possible using a linear model and our explanatory variables Y i = β 0 +β 1 X i1 + +β p 1 X i,p 1 +ε i Have p 1 predictors p coefficients 6-1

3 First Order Model with Two Predictors Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i ; i = 1,...,n β 0 is the intercept and β 1 and β 2 are the regression coefficients Meaning of regression coefficients β 1 describes change in mean response per unit increase in X 1 when X 2 is held constant β 2 describes change in mean response per unit increase in X 2 when X 1 is held constant Variables X 1 and X 2 are additive. Value of X 1 does not affect the change due to X 2. There is no interaction. The response surface is a plane. 6-2

4 Additive Response Surface Ŷ i = X i X i2 6-3

5 Interaction Model Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i1 X i2 +ε i Meaning of parameters: Change in X 1 when X 2 = x 2 E[Y] = {β 0 +β 1 (X 1 +1)+β 2 x 2 +β 3 (X 1 +1)x 2 } {β 0 +β 1 X 1 +β 2 x 2 +β 3 X 1 x 2 } = β 1 +β 3 x 2 Change in X 2 when X 1 = x 1 E[Y] = β 2 +β 3 x 1 Rate of change due to one variable affected by the other 6-4

6 Interaction Response Surface Ŷ i = X i1 +1.2X i2.75x i1 X i2 6-5

7 Qualitative Predictors Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i1 X i2 +ε i where Y is a senior student s GPA, X 1 is the SAT score. Let X 2 = 1 if case from Purdue, and X 2 = 0 if from IU Meaning of parameters: Case from Purdue (X 2 = 1): E[Y] = β 0 +β 1 X 1 +β 2 1+β 3 X 1 (1) = (β 0 +β 2 )+(β 1 +β 3 )X 1 Case from other location (X 2 = 0) E[Y] = β 0 +β 1 X 1 +β 2 0+β 3 X 1 (0) = β 0 +β 1 X 1 Have two regression lines β 2 quantify the difference between intercepts β 3 quantify the difference between slops 6-6

8 Polynomial Regression and Transformations Polynomial regression: where X i2 = X 2 i. Y i = β 0 +β 1 X i +β 2 X 2 i +ε i = β 0 +β 1 X i1 +β 2 X i2 +ε i this is a linear model because it is a linear function of parameters β Transformations Y i = 1 β 0 +β 1 X i1 +β 2 X i2 +ε i 1 Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i log(y i ) = β 0 +β 1 X i1 +β 2 X i2 +ε i the last one is a linear model on the log(y i ) scale 6-7

9 General Linear Regression In Matrix Terms As an equation Y i = β 0 +β 1 X i1 + +β p 1 X i,p 1 +ε i As an array Y 1 Y 2. = Y n β 1 X 11 X 12 X 0 1 p 1 β 1 X 21 X 22 X 2 p 1 1 β X n1 X n2 X n p 1 β p 1 ε 1 ε 2. ε n In matrix notation Y = Xβ +ε Distributional assumptions: ε N(0,σ 2 I) Y N(Xβ,σ 2 I) 6-8

10 Estimation of Regression Coefficients Least squares estimates find b to minimize (Y Xb) (Y Xb) b = (X X) 1 X Y Fitted values define a (hyper)plane Ŷ = X(X X) 1 X Y = HY HY forms a response surface Residuals e = Y Ŷ = (I H)Y 6-9

11 The Distribution of Residuals e = Y Ŷ = (I H)Y I H is symmetric and idempotent Expected value E(e) = 0 Covariance Matrix σ 2 (e) = σ 2 (I H)(I H) = σ 2 (I H) Var(e i ) = σ 2 (1 h ii ) where h ii = X i (X X) 1 X i Residuals are usually correlated, i.e., cov(e i,e j ) = σ 2 h ij, i j Will use this information to look for outliers 6-10

12 Estimation of σ 2 Similar approach as before Now p model parameters s 2 = e e n p = (Y Xb) (Y Xb) n p = SSE n p = MSE 6-11

13 ANOVA TABLE Source of Variation df SS MS F Value Regression p 1 SSR MSR=SSR/(p 1) MSR/MSE (Model) Error n p SSE MSE=SSE/(n p) Total n 1 SSTO F Test: Tests if the predictors collectively help explain the variation in Y H 0 : β 1 = β 2 =... = β p 1 = 0 H a : at least one β k 0, 1 k p 1 F = SSR/(p 1) SSE/(n p) H 0 F(p 1,n p) Reject H 0 if F > F(1 α,p 1,n p) No conclusions possible regarding individual predictors 6-12

14 Testing Individual Predictor t-test Have already shown that b N ( β,σ 2 (X X) 1) This implies b k N(β k,σ 2 (b k )) Perform t test H 0 : β k = 0 vs H a : β k 0 b k β k s(b k ) t n p so t = b k s(b k ) t n p under H 0 Reject H 0 if t > t(1 α/2,n p) Confidence interval for β k b k ±t(1 α/2,n p)s{b k } 6-13

15 General Linear Test H 0 : β k = 0 vs H a : β k 0 Full Model : Y i = β 0 + p 1 j=1 β j X ji +ε i Reduced Model : Y i = β 0 + k 1 j=1 β j X ji + p 1 j=k+1 β j X ji +ε i F = (SSE(R) SSE(F))/1 SSE(F)/(n p) Reject H 0 if F > F(1 α,1,n p) 6-14

16 Equivalence of t-test and General Linear Test Can show that F = (t ) 2 both tests result in the same conclusion Both tests investigate significance of a predictor given the other variables are already in the model i.e. significance of the variable which is fitted last 6-15

17 Coefficient of Multiple Determination Coefficient of Determination R 2 describes proportionate reduction in total variation associated with the full set of X variables R 2 = SSR SSTO = 1 SSE SSTO, 0 R2 1 R 2 usually increases with the increasing p Adjusted R 2 a attempts to account for p R 2 a = 1 SSE/(n p) SSTO/(n 1), 0 R2 a 1 The adustment is often insufficient 6-16

18 Example: Purdue Computer Science Students Computer Science majors at Purdue have a large drop-out rate Goal: Find predictors of success (defined as high GPA) Predictors must be available at time of entry into program. These are: GPA: grade points average after three semesters HSM: high-school math grades HSS: high-school science grades HSE: high-school english grades SATM: SAT Math SATV: SAT Verbal Data available on n = 224 students 6-17

19 Now investigate the model: GPA = HSM HSS HSE options nocenter linesize=72; goptions colors=( none ); data a1; infile U:\.www\datasets525\csdata.dat ; input id gpa hsm hss hse satm satv; proc reg data=a1; model gpa=hsm hss hse; run; quit; 6-18

20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept hsm <.0001 hss hse

21 Estimation of Mean Response E(Y h ) We are interested in predictors X h Can show Ŷ h N ( X h β,σ2 X h (X X) 1 X h ) Individual CI for X h Ŷ h ± t(1 α/2,n p)s{ŷ h } Bonferroni CI for g vectors X h Ŷ h ± t(1 α/(2g),n p)s{ŷ h } Working-Hotelling confidence band for the whole regression line Ŷ h ± pf(1 α,p,n p) s{ŷ h } Be careful to be only in range of X s 6-20

22 Predict New Observation Y h(new) = E(Y h )+ε s 2 (pred) = s 2 (Ŷ h )+MSE Ŷ h +ε N ( X h β,σ2 (1+X h (X X) 1 X h ) ) Individual CI of Y h(new) Ŷ h ± t(1 α/2,n p)s{pred} Bonferroni CI for g vectors X h Ŷ h ± t(1 α/(2g),n p)s{pred} Simultaneous Scheffé prediction limits for g vectors X h Ŷ h ± gf(1 α,g,n p) s{pred} 6-21

23 Diagnostics Diagnostics play a key role in both the development and assessment of multiple regression models Most previous diagnostics carry over to multiple regression Given more than one predictor, must also consider relationship between predictors Specialized diagnostics discussed later in Chapters 9 and

24 Scatterplot Matrix Scatterplot matrix summarizes bivariate relationships between Y and X j as well as between X j and X k (j,k = 1,2,...,p 1) Nature of bivariate relationships Strength of bivariate relationships Detection of outliers Range spanned by X s Can be generated within SAS Click on Solutions Analysis Interactive data analysis Inside SAS/Insight window, click on library WORK Click on the desired data set (NOTE: data must already be read into SAS) Click Open (shift) Click on desired variables Go to menu Analyze Choose option Scatterplot(XY) 6-23

25 Correlation Matrix Complementary summary Displays all pairwise correlations Must be wary of Nonlinear relationships Outliers Influential observations 6-24

26 Example: Purdue Computer Science Student Descriptive Statistics: using PROC MEANS or PROC UNIVARIATE proc means data=cs maxdec=2; var gpa hsm hss hse satm satv; run; The MEANS Procedure Variable N Mean Std Dev Minimum Maximum gpa hsm hss hse satm satv maxdec = 2 sets the number of decimal places in the output to

27 proc univariate data=cs noprint; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal; run; Top Left: GPA Top Right: HSM Bottom Left: HSS Bottom Right: HSE 6-26

28 Upper SATM Lower SATV 6-27

29 Scatterplot Matrix in SAS: From the menu bar, select Solutions -> analysis -> interactive data analysis 6-28

30 Uses PROC CORR to report correlation values; Use NOPROB statement to get rid of those p-values. [1] proc corr data=a1; [3] proc corr data=a1; var hsm hss hse; var hsm hss hse satm satv; with gpa; hsm hss hse hsm hsm hss hse <.0001 <.0001 gpa hss <.0001 <.0001 <.0001 <.0001 <.0001 hse satm satv <.0001 <.0001 gpa [2] proc corr data=a1 noprob; var satm satv; satm satv satm satv

31 Residual Plots Used for similar assessment of assumptions Model is correct Normality Constant Variance Independence Plot e vs Ŷ (overall) Plot e vs X j (with respect to X j ) Plot e vs missing variable (e.g., X j X k ) 6-30

32 Tests Univariate graphical summaries of e are still preferred NORMAL option in PROC UNIVARIATE test normality Modified Levene s and Breusch-Pagan for constant variance Lack of fit test: need repeated observations where all X fixed at same levels 6-31

33 Lack of Fit Test Compare (reduced) linear model H 0 : E(Y i ) = β 0 +β 1 X i1 + +β p 1 X i,p 1 (full) model where Y has c means (i.e. c combinations of X i ) H a : E(Y i ) β 0 +β 1 X i1 + +β p 1 X i,p 1 F = {SSE(R) SSE(F)}/{(n p) (n c)} H 0 SSE(F)/(n c) F(c p,n c) under Reject H 0 if F > F(1 α,c p,n c) If reject H 0, conclude that a more complex relationship between Y and X 1,...,X p 1 is needed 6-32

34 Calculate SSE(F) in SAS * Analysis of Variance - Full Model proc glm; class x1 x2; model y=x1*x2; run; CLASS and MODEL specify that every combination of levels of X 1 and X 2 has their own mean Plug the SSE of this model into the lack of fit test 6-33

35 Example: Purdue Computer Science Student Investigate the model: GPA = HSM HSS HSE PROC REG reports SSE(R) = with df(sse) = 220. proc glm data=a1; class hsm hss hse; model gpa=hsm*hss*hse; run; quit; Class Level Information Class Levels Values hsm hss hse Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total F = {SSE(R) SSE(F)}/{(n p) (n c)} SSE(F)/(n c) = F96,124 1 (0.95). = { }/{ } /124 = > 6-34

36 Chapter Review Data and Notation Model in Matrix Terms Parameter Estimation ANOVA F-test Estimation of Mean Responses Prediction of New Observations Diagnostics 6-35

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression