Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com

The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error Y β β X β X β X 0 1 1 2 2 k k ε

Equation The coefficients of the multiple regression model are estimated using sample data Multiple regression equation with k independent variables: Estimated (or predicted) value of y Estimated intercept Estimated slope coefficients ŷ i b 0 b 1 x 1i b 2 x 2i b k x k,i We will always use a computer to obtain the regression slope coefficients and other regression summary measures.

Sales Example Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 14 450 5.00 3.5 15 300 7.00 2.7 Multiple regression equation: Sales t = b 0 + b 1 (Price) t + b 2 (Advertising) t + e t

Output Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 Sales 306.526-24.975(Price) 74.131(Advertising) ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

Adjusted R 2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable Hence, models with different number of explanatory variables cannot be compared by R 2 What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom? * Adjusted R 2 penalizes excessive use of unimportant independent variables Adjusted R 2 is always smaller than R 2 (except when R 2 =1) R 2

F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F test statistic Hypotheses: H 0 : β 1 = β 2 = = β k = 0 (no linear relationship) H 1 : at least one β i 0 (at least one independent variable affects Y)

F-Test for Overall Significance (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations 15 F MSR MSE With 2 and 12 degrees of freedom 14730.0 2252.8 6.5386 P-value for the F-Test ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213-2.30565 0.03979-48.57626-1.37392 Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

11-9 The ANOVA Table in Regression Source of Variation Sum of Squares Regression SSR Degrees of Freedom Mean Square F Ratio (k) MSR SSR k F MSR MSE Error SSE (n-(k+1)) =(n-k-1) MSE SSE ( n k 1) Total SST (n-1) MST SST ( n 1) R 2 = SSR SST = 1 - SSE 2 R ( n ( k 1)) F SST 2 R 2 = 1 - ( 1 R ) ( k ) SSE (n - (k + 1)) SST (n - 1) = MSE MST

Tests of the Significance of Individual Regression Parameters 11-10 Hypothesis tests about individual regression slope parameters: (1) H 0 : b 1 = 0 H 1 : b 1 0 (2) H 0 : b 2 = 0 H 1 : b 2 0. (k) H 0 : b k = 0 H 1 : b k 0 Test statistic for test i: t bi 0 k 1 s( b ) ( n ( ) i

The Concept of Partial Regression Coefficients In multiple regression, the interpretation of slope coefficients requires special attention: ŷ i b 0 b 1 x 1i b 2 x 2i Here, b 1 shows the relationship between X 1 and Y holding X 2 constant (i.e. controlling for the effect of X 2 ).

Purifying X 1 from X 2 (i.e. Removing the effect of X 2 on X 1 : Run a regression of X 2 on X 1 X 2i = 0 + 1 X 1i + v i v i = X 2i ( 0 + 1 X 1i ) is X 2 purified from X 1 Then, run a regression of Y i on v i. Y i = 0 + 1 v i. 1 is the b 1 in the original multiple regression equation.

b 1 shows the relationship between X 1 purified from X 2 and Y. Whenever, a new explanatory variable is added into the regression equation or removed from from the equation, all b coefficients change. (unless, the covariance of the added or removed variable with all other variables is zero).

The Principle of Parsimony: Any insignificant explanatory variable should be removed out of the regression equation. The Principle of Generosity: Any significant variable must be included in the regression equation. Choosing the best model: Choose the model with the highest adjusted R 2 or F or the lowest AIC (Akaike Information Criterion) or SC (Schwarz Criterion). Apply the stepwise regression procedure.

For example: A researcher may be interested in the relationship between Education and Income and Number of Children in a family. Independent Variables Education Family Income Dependent Variable Number of Children

For example: Research Hypothesis: As education of respondents increases, the number of children in families will decline (negative relationship). Research Hypothesis: As family income of respondents increases, the number of children in families will decline (negative relationship). Independent Variables Dependent Variable Education Number of Children Family Income

For example: Null Hypothesis: There is no relationship between education of respondents and the number of children in families. Null Hypothesis: There is no relationship between family income and the number of children in families. Independent Variables Dependent Variable Education Number of Children Family Income

Bivariate regression is based on fitting a line as close as possible to the plotted coordinates of your data on a two-dimensional graph. Trivariate regression is based on fitting a plane as close as possible to the plotted coordinates of your data on a three-dimensional graph. Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6 Education (X 1 ) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9 Income 1=$10K (X 2 ): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

Plotted coordinates (1 10) for Education, Income and Number of Children Y 0 X 2 X 1 Case: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Children (Y): 2 5 1 9 6 3 0 3 7 7 2 5 1 9 6 3 0 3 7 14 2 5 1 9 6 Education (X 1 ) 12 16 2012 9 18 16 14 9 12 12 10 20 11 9 18 16 14 9 8 12 10 20 11 9 Income 1=$10K (X 2 ): 3 4 9 5 4 12 10 1 4 3 10 4 9 4 4 12 10 6 4 1 10 3 9 2 4

What multiple regression does is fit a plane to these coordinates. Y 0 X 2 X 1 Case: 1 2 3 4 5 6 7 8 9 10 Children (Y): 2 5 1 9 6 3 0 3 7 7 Education (X 1 ) 12 16 2012 9 18 16 14 9 12 Income 1=$10K (X 2 ): 3 4 9 5 4 12 10 1 4 3

Mathematically, that plane is: Y = a + b 1 X 1 + b 2 X 2 a = y-intercept, where X s equal zero b=coefficient or slope for each variable For our problem, SPSS says the equation is: Y = 11.8 -.36X 1 -.40X 2 Expected # of Children = 11.8 -.36*Educ -.40*Income

Let s take a moment to reflect Why do I write the equation: Y = a + b 1 X 1 + b 2 X 2 Whereas KBM often write: Y i = a + b 1 X 1i + b 2 X 2i + e i One is the equation for a prediction, the other is the value of a data point for a person.

Model 1 Model Summary Adjust ed Std. Error of R R Square R Square the Estimate.757 a.573.534 2.33785 a. Predictors: (Constant), Income, Education ANOVA b 57% of the variation in number of children is explained by education and income! Y = 11.8 -.36X 1 -.40X 2 Model 1 Regress ion Res idual Total Sum of Squares df Mean Square F Sig. 161.518 2 80.759 14.776.000 a 120.242 22 5. 466 281.760 24 a. Predictors: (Constant), Income, Education b. Dependent Variable: Children Model 1 (Constant) Education Income Uns tandardized Coef f icients a. Dependent Variable: Children Coefficients a Standardized Coef f icients B Std. Error Beta t Sig. 11. 770 1. 734 6. 787.000 -. 364.173 -. 412-2.105.047 -. 403.194 -. 408-2.084.049

Model 1 Model Summary Adjust ed Std. Error of R R Square R Square the Estimate.757 a.573.534 2.33785 a. Predictors: (Constant), Income, Education ANOVA b r 2 (Y Y) 2 - (Y Y) 2 (Y Y) 2 Y = 11.8 -.36X 1 -.40X 2 Model 1 Coefficients a Regress ion Res idual Total Sum of Squares df Mean Square F Sig. 161.518 2 80.759 14.776.000 a 120.242 22 5. 466 281.760 24 a. Predictors: (Constant), Income, Education b. Dependent Variable: Children 161.518 261.76 =.573 Model 1 (Constant) Education Income Uns tandardized Coef f icients a. Dependent Variable: Children Standardized Coef f icients B Std. Error Beta t Sig. 11. 770 1. 734 6. 787.000 -. 364.173 -. 412-2.105.047 -. 403.194 -. 408-2.084.049

So what does our equation tell us? Y = 11.8 -.36X 1 -.40X 2 Expected # of Children = 11.8 -.36*Educ -.40*Income Try plugging in some values for your variables.

So what does our equation tell us? ^ Y = 11.8 -.36X 1 -.40X 2 Expected # of Children = 11.8 -.36*Educ -.40*Income If Education equals:& If Income Equals: Then, children equals: 0 0 11.8 10 0 8.2 10 10 4.2 20 10 0.6 20 11 0.2

So what does our equation tell us? ^ Y = 11.8 -.36X 1 -.40X 2 Expected # of Children = 11.8 -.36*Educ -.40*Income If Education equals:& If Income Equals: Then, children equals: 1 0 11.44 1 1 11.04 1 5 9.44 1 10 7.44 1 15 5.44

So what does our equation tell us? ^ Y = 11.8 -.36X 1 -.40X 2 Expected # of Children = 11.8 -.36*Educ -.40*Income If Education equals:& If Income Equals: Then, children equals: 0 1 11.40 1 1 11.04 5 1 9.60 10 1 7.80 15 1 6.00

If graphed, holding one variable constant produces a twodimensional graph for the other variable. Y 11.40 Y 11.44 b = -.36 b = -.4 6.00 5.44 0 15 X 1 = Education 0 15 X 2 = Income

Dummy Explanatory Variables Qualitative binomial (0,1) variables. D i Y i = β 0 + β 1 X i + β 2 D i + u i For D i = 0 : Y i = β 0 + β 1 X i + u i For D i = 1 : Y i = β 0 + β 1 X i + β 2 +u i Y i = (β 0 +β 2 )+ β 1 X i +u i To measure the effect of D i on the relation between X and Y Y i = β 0 + β 1 X i + β 2 X i *D i + u i For D i = 0 : Y i = β 0 + β 1 X i + u i For D i = 1 : Y i = β 0 + β 1 X i + β 2 X i +u i Y i = β 0 + (β 1 +β 2 )X i +u i

Warning: Dummy variables can be used only as regressors. Should the dependent variable be binomial, you need to use Logit or Probit regression models, which employ ML estimator. This is because the binomial feature violates the normal distribution assumption which renders t-statistics invalid. (you can learn these techniques in Econometrics II) Time-period dummies can be used for: 1) measuring the stability of a relationship over time 2) to treat outliers Seasonal dummies can be used to treat seasonal variation in seasonally-unadjusted data. Simply create n 1 dummies for n seasonal sections and use them as regressors. You may include the seasonal dummies in the regression to control for seasonal variation.

The way you use nominal variables in regression is by converting them to a series of dummy variables. Recode into different Nomimal Variable Dummy Variables Race 1. White 1 = White 0 = Not White; 1 = White 2 = Black 2. Black 3 = Other 0 = Not Black; 1 = Black 3. Other 0 = Not Other; 1 = Other

The way you use nominal variables in regression is by converting them to a series of dummy variables. Recode into different Nomimal Variable Dummy Variables Religion 1. Catholic 1 = Catholic 0 = Not Catholic; 1 = Catholic 2 = Protestant 2. Protestant 3 = Jewish 0 = Not Prot.; 1 = Protestant 4 = Muslim 3. Jewish 5 = Other Religions 0 = Not Jewish; 1 = Jewish 4. Muslim 0 = Not Muslim; 1 = Muslim 5. Other Religions 0 = Not Other; 1 = Other Relig.

When you need to use a nominal variable in regression (like race), just convert it to a series of dummy variables. When you enter the variables into your model, you MUST LEAVE OUT ONE OF THE DUMMIES. Leave Out One White Enter Rest into Regression Black Other

The reason you MUST LEAVE OUT ONE OF THE DUMMIES is that regression is mathematically impossible without an excluded group. If all were in, holding one of them constant would prohibit variation in all the rest. Leave Out One Catholic Enter Rest into Regression Protestant Jewish Muslim Other Religion

The regression equations for dummies will look the same. For Race, with 3 dummies, predicting self-esteem: Y = a + b 1 X 1 + b 2 X 2 a = the y-intercept, which in this case is the predicted value of self-esteem for the excluded group, white. b 1 = the slope for variable X 1, black b 2 = the slope for variable X 2, other

If our equation were: For Race, with 3 dummies, predicting self-esteem: Y = 28 + 5X 1 2X 2 Plugging in values for the dummies tells you each group s selfesteem average: a = the y-intercept, which in this case is the predicted value of self-esteem for the excluded group, white. 5 = the slope for variable X 1, black -2 = the slope for variable X 2, other White = 28 Black = 33 Other = 26 When cases values for X 1 = 0 and X 2 = 0, they are white; when X 1 = 1 and X 2 = 0, they are black; when X 1 = 0 and X 2 = 1, they are other.

Dummy variables can be entered into multiple regression along with other dichotomous and continuous variables. For example, you could regress selfesteem on sex, race, and education: Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + b 4 X 4 How would you interpret this? Y = 30 4X 1 + 5X 2 2X 3 + 0.3X 4 X 1 = Female X 2 = Black X 3 = Other X 4 = Education

How would you interpret this? Y = 30 4X 1 + 5X 2 2X 3 + 0.3X 4 X 1 = Female X 2 = Black X 3 = Other X 4 = Education 1. Women s self-esteem is 4 points lower than men s. 2. Blacks self-esteem is 5 points higher than whites. 3. Others self-esteem is 2 points lower than whites and consequently 7 points lower than blacks. 4. Each year of education improves self-esteem by 0.3 units.

How would you interpret this? Y = 30 4X 1 + 5X 2 2X 3 + 0.3X 4 Plugging in some select values, we d get self-esteem for select groups: White males with 10 years of education = 33 Black males with 10 years of education = 38 X 1 = Female X 2 = Black X 3 = Other Other females with 10 years of education = 27 X 4 = Education Other females with 16 years of education = 28.8

How would you interpret this? Y = 30 4X 1 + 5X 2 2X 3 + 0.3X 4 X 1 = Female X 2 = Black X 3 = Other X 4 = Education The same regression rules apply. The slopes represent the linear relationship of each independent variable in relation to the dependent while holding all other variables constant. Make sure you get into the habit of saying the slope is the effect of an independent variable while holding everything else constant.

Seasonal-adjusment using dummy variables Example: Suppose a researcher is using seasonallyunadjusted data at the quarterly frequency for the variable Y t. For 4 quarters, create 3 dummies: D 1 = 1 if t is Q 1, 0 otherwise D 2 = 1 if t is Q 2, 0 otherwise D 3 = 1 if t is Q 3, 0 otherwise The residuals of the regression: Y t = β 0 + β 1 D 1,t + β 2 D 2,t + β 3 D 3,t + ε t is the seasonally-adjusted Y t

Log Transformations Y i = β 0 + β 1 X i + u i The β 1 in the above regression indicates the expected change in Y i resulting from a 1-unit increase in X i. not the relationship in % terms If you need to compute the expected % change in Y i resulting from a 1% increase in X i, you need to run the following regression: Ln(Y i )= β 0 + β 1 Ln(X i ) + u i

Assumptions of OLS Estimator 1) E(e i ) = 0 (unbiasedness) 2) Var(e i ) is constant (homoscedasticity) 3) Cov(u i,u j ) = 0 (independent error terms) 4) Cov(u i,x i ) = 0 (error terms unrelated to X s) ei ~ iid (0, 2 ) Gauss-Markov Theorem: If these conditions hold, OLS is the best linear unbiased estimator (BLUE). Additional Assumption: e i s are normally distributed.

Time Series Regressions Lagged variable: Y t = β 0 +β 1 X t +β 2 X t-1 +u t Autoregressive Model: X t = β 1 X t-1 +β 2 X t-2 +u t Time-Trend: Y t = β 0 + β 1 X t + β 2 Tt+u t

Spurious Regressions As a general and very strict rule: All variables in a time-series regression must be stationary. Never run a regression with nonstationary variables! * DW statistic will warn. A nonstationary variable can be made stationary by taking its first difference. If X is nonstationary, stationary. ΔX = Xt Xt-1 may be

Exercise: How to create a regression? Statistic descriptive: Mean, median, etc Correlation: not over 0.5 for xi (explanatory variables) Stationary: ADF test Run regression Test heteroscedasticity, Normality Test VIF in case of Multicollinearity