Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Last updated: Oct 18, 2012 LINEAR REGRESSION

Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R.

Simple Linear

Objectives 4 Understand linear regression with one predictor Understand how we assess the fit of a regression model Total sum of squares Model sum of squares Residual sum of squares F R 2 Know how to do regression using R Interpret a regression model

What is? 5 A way of predicting the value of one variable from another. It is a hypothetical model of the relationship between two variables. We will focus on a linear relationship, in which the outcome variable is predicted by a straight line.

Describing a Straight Line 6 Y = b + bx + ε i 0 i i i b i b 0 coefficient for the predictor Gradient (slope) of the regression line Direction/strength of relationship Intercept (value of Y when X = 0) Point at which the regression line crosses the Y-axis (ordinate)

Intercepts and Gradients 7 Same intercept, different slopes Same slope, different intercepts

8 The Method of Least Squares - - + + - - + + + - - - + - - This graph shows a scatterplot of some data with a line representing the general trend. The vertical lines (dotted) represent the differences (or residuals) between the line and the actual data

Why Least Squares? 9 It can be shown that if the noise is 0-mean and independent and identically-distributed (IID), then the maximum probability linear model is that which minimizes the sum of squared residuals.

How Good Is the Model? 10 The regression line is only a model based on the data. This model might not reflect reality. We need some way of testing how well the model fits the observed data. How? Slide 10

Sources of Variability 11 SS T Total variability (variability between scores and the mean). SS R Residual/error variability (variability between the regression model and the actual data). SS M Model variability (difference in variability between the model and the mean). SS T SS T = SS M + SS R SS M SS R

Sources of Variability: Sums of Squares 12 Let t i = observed values of outcome variable. y i = model predictions for outcome variable. Then The total variation is SS T = ( t i t ) 2 The residual (error) variation is SS R = n i=1 ( t i y i ) 2 The variation explained by the model is SS M = n i=1 n i=1 ( y i t ) 2

Testing the Model: ANOVA 13 If the model results in better prediction than using the mean, then we expect SS M to be much greater than SS R

Coefficient of Determination: R 2 14 R 2 The proportion of variance accounted for by the regression model. The Pearson Correlation Coefficient Squared 2 R = SS SS M T

Mean Squares: Testing Significance 15 Let t i = observed values of outcome variable. y i = model predictions for outcome variable. Then The unexplained (residual) mean squares is MS R = 1 df R SS R = 1 n 2 SS = 1 R n 2 The model mean squares is n i=1 ( t i y i ) 2 ( ) 2 MS M = 1 SS df M = 1 n M 2 1 SS = SS = y y M M i i=1 F = MS MS M R

More on the F-Statistic 16 The F-statistic can be used to compare any two nested models. Let s label the models 1 and 2, where model 2 is an elaboration of model 1. Then we can test whether model 2 significantly increases the explained variance using the statistic F = SS 1 SS 2 df 2 df 1 SS 2 n df 2 where the SS refer to the sum of squared residuals from the respective models and n is the total number of data points.

More on the F-Statistic 17 In the case of simple linear regression: Model 1 is the mean (horizontal line) with one degree of freedom. Model 2 is the sloped line with two degrees of freedom. Thus F = SS 1 SS 2 df 2 df 1 SS 2 n df 2 = SS T SS R 2 1 SS R n 2 = SS T SS R SS R n 2 = SS M SS R n 2 = MS M MS R.

: An Example 18 A record company boss was interested in predicting record sales from advertising. Data 200 different album releases Outcome variable: Sales (CDs and downloads) in the week after release Predictor variable: The amount (in units of 1000) spent promoting the record before release.

19 Doing Simple Using R Commander

in R 20 We run a regression analysis using the lm() function lm stands for linear model. This function takes the general form: newmodel<-lm(outcome ~ predictor(s), data = dataframe, na.action = an action))

in R 21 albumsales.1 <- lm(album1$sales ~ album1$adverts) Or we can tell R what dataframe to use (using data = nameofdataframe), and then specify the variables without the dataframename$ before them: albumsales.1 <- lm(sales ~ adverts, data = album1)

Output of a Simple 22 We have created an object called albumsales.1 that contains the results of our analysis. We can show the object by executing: summary(albumsales.1) >Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.341e+02 7.537e+00 17.799 <2e-16 *** adverts 9.612e-02 9.632e-03 9.979 <2e-16 *** Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 65.99 on 198 degrees of freedom Multiple R-squared: 0.3346, Adjusted R-squared: 0.3313 F-statistic: 99.59 on 1 and 198 DF, p-value: < 2.2e-16

Adjusted R-Squared 23 Note that if n = 2, R 2 = 1. In general, when two variables in fact have 0 correlation in the population, the expected R 2 for a sample of size n is 1/(n-1). In other words, the standard sample R 2 is a biased estimator of the population R 2. The adjusted R 2 provides an alternative, approximately unbiased estimator of the population R 2.

Using the Model 24 Record Salesi = b0 + b1 = 134.14 + Advertising Budget ( 0.09612 Advertising Budget ) i i Record Sales i ( 0.09612 Advertising Budget ) = 134.14 + i = 134.14 + = 143.75 ( 0.09612 100)

Objectives 25 Understand linear regression with one predictor Understand how we assess the fit of a regression model Total sum of squares Model sum of squares Residual sum of squares F R 2 Know how to do regression using R Interpret a regression model

Multiple

Objec-ves 27 Understand when to use mul-ple regression. Understand the mul-ple regression equa-on and what the betas represent. Understand different methods of regression Hierarchical Stepwise Forced Entry Understand how to do a mul-ple regression using R Understand how to interpret mul-ple regression. Understand the assump-ons of mul-ple regression and how to test them Slide 27

Multiple 28 Multiple regression extends linear regression to allow for 2 or more independent variables. There is still only one dependent (criterion) variable. We can think of the independent variables as predictors of the dependent variable. The main complication in multiple regression arises when the predictors are not statistically independent. 28 PSYC 6130, PROF. J. ELDER

: An Example 29 A record company boss was interested in predic-ng record sales from adver-sing. Data 200 different album releases Outcome variable: Sales (CDs and Downloads) in the week aner release Predictor variables The amount (in s) spent promo-ng the record before release (see last lecture) Number of plays on the radio (new variable)

The Model with One Predictor 30 Slide 30

Multiple as an Equation 31 With mul-ple regression the rela-onship is described using a straighworward generaliza-on of the equa-on for a straight line. y + = b0 b1 X1 + b2 X2 + + b n X n + εi Slide 31

Degrees of freedom 32 df = n k 1 where n = sample size k = number of predictors PSYC 6130, PROF. J. ELDER 32

33 b 0 b 0 is the intercept. The intercept is the value of the Y variable when all Xs = 0. This is the point at which the regression plane crosses the Y-axis. Slide 33

Coefficients 34 b 1 is the regression coefficient for variable 1. b 2 is the regression coefficient for variable 2. b n is the regression coefficient for n th variable. Slide 34

The Model with Two Predictors 35 b Adverts b 0 b airplay Slide 35

Coefficient of Multiple Determination 36 The proportion of variance explained by all of the independent variables together is called the coefficient of multiple determination (R 2 ). R is called the multiple correlation coefficient. R measures the correlation between the predictions and the actual values of the dependent variable. The correlation r iy of predictor i with the criterion (dependent variable) Y is called the validity of predictor i. 36 PSYC 6130, PROF. J. ELDER

Methods of Mul-ple 37 Hierarchical: Experimenter decides the order in which variables are entered into the model. Forced Entry: All predictors are entered simultaneously. Stepwise: Predictors are selected using their semi- par-al correla-on with the outcome. Slide 37

Hierarchical 39 Known predictors (based on past research) are entered into the regression model first. New predictors are then entered in a separate step/block. Experimenter makes the decisions. Slide 39

Hierarchical 40 It is the best method: Based on theory tes-ng. You can see the unique predic-ve influence of a new variable on the outcome because known predictors are held constant in the model. Bad Point: Relies on the experimenter knowing what they re doing! Slide 40

Forced Entry 41 All variables are entered into the model simultaneously. The results obtained depend on the variables entered into the model. It is important, therefore, to have good theore-cal reasons for including a par-cular variable. Slide 41

Stepwise 42 Select as the first predictor the variable that yields the largest R 2. Having selected the 1st predictor, a second is chosen from the remaining predictors. The semi- par-al correla-on is used as a criterion for selec-on. Slide 42

Stepwise 43 Step 2: Having selected the 1 st predictor, a second one is chosen from the remaining predictors. The semi- par.al correla.on is used as a criterion for selec-on. Slide 43

Semi- Par-al Correla-on 44 Par-al correla-on: measures the rela-onship between two variables, controlling for the effect that a third variable has on them both. A semi- par-al correla-on: Measures the rela-onship between two variables controlling for the effect that a third variable has on only one of the others. Slide 44

Semipartial Correlations 45 The semipartial correlations measure the correlation between each predictor and the criterion when all other predictors are held fixed. In this way, the effects of correlations between predictors are eliminated. In general, the semipartial correlations are smaller than the pairwise correlations. 45 PSYC 6130, PROF. J. ELDER

46 Problems with Stepwise Methods Rely on a mathema-cal criterion. Variable selec-on may depend upon only slight differences in the Semi- par-al correla-on. These slight numerical differences can lead to major theore-cal differences. Should be used only for explora-on Slide 46

Multicollinearity 47 Multicollinearity occurs when two predictors are strongly correlated. The result is that a family of solutions exist that trade off the regression weights between correlated predictors. This makes estimation of the regression coefficents b i unreliable. Also note that in this case, the coefficient of determination R 2 for the model will be much less than the sum of the R 2 values for each predictor alone.

48 Uncorrelated Predictors Variance explained by assignments Variance explained by midterm Total variance 2 r 1Y 2 r 2Y R =Total proportion of variance explained = r Y σ Y + r Y σ Y 2 2 2 2 2 1 2

49 Correlated Predictors Variance explained by assignments Variance explained by midterm Total variance 2 r 1Y 2 r 2Y R =Total proportion of variance explained < r Y + r 2 2 2 1 2 Y

Example 50 Predicting records sales (Y) from advertising (X 1 ) and airplay (X 2 ). Y = b 0 + b 1 X 1 + b 2 X 2 + ε albumsales.2 <- lm(sales ~ adverts + airplay, data = album2)

Coefficients 51 b 1 = 0.087. So, as adver-sing increases by 1, record sales increase by 0.087 units. b 2 = 3589. So, each -me (per week) a song is played on radio 1 its sales increase by 3589 units. Slide 51

Constructing a Model 52 y Sales = b + b X + b X = 0 1 41124 + 1 2 2 0.087Adverts + 3589plays Sales 1 Million Advertising,15 plays = 41124 + ( 0.087 1,000,000) + ( 3589 15) = 41124 + 87000 + 53835 = 181959 Slide 52

Standardized Coefficients 53 The coefficients b do not directly inform us of the importance of each predictor, since that also depends upon the dispersion of the predictors. To better assess importance, it is useful to transform the regression equation to standardized form: z y = β 0 + β 1 z 1 + β 2 z 2 + + β n z n + ε i where z y is the z-score for the outcome variable z i is the z-score for the i th predictor X i.

Standardised Coefficients 54 lm.beta(albumsales.2) β 1 = 0.523 As adver-sing increases by 1 standard devia-on, record sales increase by 0.523 of a standard devia-on. β 2 = 0.546 When the number of plays on radio increases by 1 s.d. its sales increase by 0.546 standard devia-ons. Slide 54

Comparing Models 55 In standard linear regression we use an F-statistic to determine whether the linear model is significantly better than the mean in predicting the outcome variable. In hierarchical regression, we can use the same method to determine whether the addition of a new predictor leads to a significant improvement in predicting the outcome variable. In R, this can be achieved using the anova() function.

Comparing Models 56 anova(model.1, model.2,, model.n) Note that models must be hierarchical (nested): model.(i+1) includes all predictors of model.i, plus 1 or more additional predictors. Example: anova(albumsales.1, albumsales.2) Model 1: sales ~ adverts Model 2: sales ~ adverts + airplay Res.Df RSS Df Sum of Sq F Pr(>F) 1 198 862264 2 197 480428 1 381836 156.57 < 2.2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1

Generaliza-on 57 When we run regression, we hope to be able to generalize the sample model to the en-re popula-on. To do this, several assump-ons must be met. Viola-ng these assump-ons stops us generalizing conclusions to our target popula-on. Slide 57

Assumptions 58 Quantitative variables Linear dependence of outcome variable on predictors Homoscedasticity Independent, normally distributed errors Limited multicollinearity

Standardized Residuals 59 If the errors are normally distributed: ~95% of standardized residuals should lie between ±2. ~99% of standardized residuals should lie between ± 2.5. Slide 59

Normality of Errors: Histograms 60 Good Bad

Testing Independence 61 The Durbin-Watson Test Looks for statistical correlations between residuals of neighbouring cases. Statistic should be close to 2 if cases are independent. Example: > dwt(model3) lag Autocorrelation D-W Statistic p-value 1 0.0026951 1.949819 0.716 Alternative hypothesis: rho!= 0

Testing for Multicollinearity 62 Can use the VIF function. Values less than 10 are ok. Example > vif(model3) adverts airplay attract 1.014593 1.042504 1.038455

Objec-ves 63 Understand when to use mul-ple regression. Understand the mul-ple regression equa-on and what the betas represent. Understand different methods of regression Hierarchical Stepwise Forced Entry Understand how to do a mul-ple regression using R Understand how to interpret mul-ple regression. Understand the assump-ons of mul-ple regression and how to test them Slide 63