Review of Multiple Regression

Similar documents
Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Investigating Models with Two or Three Categories

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Simple Linear Regression

Review of Statistics 101

Model Estimation Example

Business Statistics. Lecture 9: Simple Regression

WELCOME! Lecture 13 Thommy Perlinger

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

ECON 497 Midterm Spring

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology

Introducing Generalized Linear Models: Logistic Regression

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Can you tell the relationship between students SAT scores and their college grades?

Week 8: Correlation and Regression

Correlation and simple linear regression S5

y response variable x 1, x 2,, x k -- a set of explanatory variables

Ordinary Least Squares Regression Explained: Vartanian

Sociology 593 Exam 2 Answer Key March 28, 2002

MATH 1150 Chapter 2 Notation and Terminology

Chapter 9 - Correlation and Regression

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Daniel Boduszek University of Huddersfield

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

1 Correlation and Inference from Regression

Sociology 593 Exam 2 March 28, 2002

In Class Review Exercises Vartanian: SW 540

Practical Biostatistics

Inferences for Regression

Mathematical Notation Math Introduction to Applied Statistics

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Using SPSS for One Way Analysis of Variance

Review of the General Linear Model

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Do not copy, post, or distribute

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Statistics and Quantitative Analysis U4320

9. Linear Regression and Correlation

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Chapter 10-Regression

Multiple linear regression

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

( ), which of the coefficients would end

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Advanced Quantitative Data Analysis

5:1LEC - BETWEEN-S FACTORIAL ANOVA

Difference in two or more average scores in different groups

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Regression Analysis: Exploring relationships between variables. Stat 251

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

appstats8.notebook October 11, 2016

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

A Re-Introduction to General Linear Models (GLM)

Binary Logistic Regression

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Multiple linear regression S6

Simple, Marginal, and Interaction Effects in General Linear Models

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Final Exam - Solutions

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysing data: regression and correlation S6 and S7

Simple Linear Regression: One Quantitative IV

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Chapter 27 Summary Inferences for Regression

Topic 1. Definitions

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Mathematics for Economics MA course

Single and multiple linear regression analysis

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter 12 - Lecture 2 Inferences about regression coefficient

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

4/22/2010. Test 3 Review ANOVA

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Simple Linear Regression Using Ordinary Least Squares

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Mathematical Notation Math Introduction to Applied Statistics

16.400/453J Human Factors Engineering. Design of Experiments II

Transcription:

Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate analysis of variance (MANOVA)] have a long tradition in the social and behavioral sciences for examining a variety of different data structures. For our first example, let s consider a case where the goal is to examine whether student achievement can be explained by a set of student background variables. One way to think about statistical modeling is in terms of an attempt to account for variation in a dependent variable such as student achievement variance that is believed to be associated with one or more explanatory variables such as gender and other demographic categories (e.g, socioeconomic status, race/ethnicity) or personal attributes measured as continuous variables (e.g., motivation, previous learning). This is analogous to thinking about how much variance in student achievement (R 2 ) is accounted for by a given set of explanatory variables. Let s assume we start with a small sample of 40 students who are measured on a reading test. We might first try to determine whether there is a difference in their scores related to gender. When we actually split the sample reading scores by gender (with 20 males and 20 females), we can see there is only a small difference in the average scores of males and females. Descriptive Statistics for Example Male Female Variable Mean SD Mean SD Reading 66.53 2.57 67.00 3.20 As we have learned previously, we could use a t-test to investigate whether the difference in male and female reading means would be considered statistically significant in the population from which this small random sample is drawn. If we subtract the two means we can see that the difference is 0.47 (with females scoring 0.47 points higher than males). Another option would be one-way ANOVA to investigate whether this difference is statistically significant (even through there are only two groups). For example, if we used a simple one-way ANOVA, we would be testing the similarity of group means for males and females by partitioning the total sum of squares for reading into a portion describing differences in reading variability due to groups (i.e., gender) and a portion describing differences in variability due to individuals. In this case, the F-ratio would provide an indication of the ratio of between-groups variability (i.e., defined as between-groups mean squares) to within-groups variability (i.e., defined as withingroups mean squares). We can see that the F ratio is not large enough to be statistically significant (2.256/8.437 =.267).

Ronald H. Heck 2 read ANOVA Sum of Squares Df Mean Square F Sig. Between Groups 2.256 1 2.256.267.608 Within Groups 320.618 38 8.437 Total 322.874 39 Multiple Regression We can also use multiple regression to investigate whether there is a statistically significant effect due to gender on students math scores. Multiple regression is a good choice when we are interested in the effects of a set of predictors (e.g., gender, socioeconomic status, motivation, previous skills) on a single outcome like a reading score. It will isolate the effect of each predictor separately while controlling for the effects of the others. In this way, it can provide a summary of the relative effects of several variables on the Y outcome simultaneously. We will summarize the results below. As you can see, the first output of interest is an ANOVA table which summarizes the sum of squares information you should be familiar with from our previous work with ANOVA. If you look closely, you will see that the terminology is slightly different between the two approaches. In multiple regression, the between-groups Sum of Squares is referred to as Regression Sum of Squares and the within-groups Sum of Squares is referred to as Residual Sum of Squares. Closer inspection of the ANOVA table indicates that there are very few sum of squares due to the predictor ( gender) and, therefore, most of the variation in the reading outcome in this first model is due to individuals (or residual sum of squares). As in the one-way ANOVA table, the F ratio is that same (F= 0.267) as is the significance level (p = 0.608). We can see while the terminology is slightly different, the results are the same (as we would expect). ANOVA a Model Sum of Squares df Mean Square F Sig. 1 Regression 2.256 1 2.256.267.608 b Residual 320.618 38 8.437 Total 322.874 39 b. Predictors: (Constant), female We also obtain an estimate of the variance accounted for in the reading outcome due to gender. We can estimate the variance accounted for in reading by gender as the regression mean squares over the residual mean squares (2.256/322.748 = 0.007), which suggests gender only accounts for about 0.7% (less than 1%) of the variance in students reading scores. The adjusted r-square coefficient (which takes into consideration the sample size, the number of variables in the model, and strength of relationships) is actually negative (which would be impossible).

Ronald H. Heck 3 Model Summary Adjusted R Model R R Square Square Std. Error of the Estimate 1.084 a.007 -.019 2.90470 a. Predictors: (Constant), female We also obtain a table describing the effect of the predictor (female) on students reading scores. If the predictor is dichotomous (as in this case), the effect is summarized as a difference in means between the group coded 0 (males) and the group coded 1 (females). Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta T Sig. 1 (Constant) 66.525.650 102.423.000 female.475.919.084.517.608 First, we can see the intercept is 66.525, which is the reading score for males (who are coded 0). In a multiple regression analysis, the intercept (or constant) is the expected Y estimate (the reading score) when all the variables in the model are 0. In this case, this would refer to males, since they are coded 0. Second, we can see that the unstandardized beta coefficient (B) is 0.475, which is the interpreted as the change in Y for a one-unit change in the independent variable (female). In this case, this suggests that as gender changes from male (0) to female (1) the estimated math score would increase from 66.525 by 0.475 (to 67.00), which is the reading test score for females. We can write a prediction equation as follows: Y * female e 0 1 where 0 is the intercept (where the line crosses the Y-axis) and 1is the slope (which in this case is the estimated difference in means), and e is an error term suggesting we cannot make perfect predictions. When we substitute in the estimates from the table we obtain the following: Yˆ 66.525 0.475* female Sometimes people use Y ˆ (Y-hat) to indicate a predicted score. In this case, we can see that if we substitute in 0 (since males are coded 0) in the equation for female and multiply 0 by 0.475, we will obtain the intercept score [66.525 +0.475(0) = 66.525], which is the mean reading score for males. Similarly, if we substitute in 1 into the equation for female (since females are coded 1) and then add the result to the intercept, we will obtain the average mean for females of 67.00 [66.525 +0.475(1) = 67.00]. Third, we can see in the table a standardized beta (Beta) coefficient which provides an estimate of the relative size of the coefficient in standard deviation units. This is the same as the correlation if there is only one predictor in the model. In this case we would describe the standardized Beta as small (0.084). We can obtain the standardized beta in the table by

Ronald H. Heck 4 multiplying the unstandardized beta (0.475) by the ratio of the standard deviation of female (0.51) to the standard deviation of Y (2.88). Descriptive Statistics N Minimum Maximum Mean Std. Deviation Read 40 59.10 73.40 66.7625 2.87729 Female 40 0 1.50.506 Valid N (listwise) 40 When we obtain the ratio of the standard deviations (0.177), we multiply it by 0.475 and obtain 0.084, which matches the regression coefficient table. Testing the Null Hypothesis for Gender A final important piece of information in the table is the standard error for the predictor. In this case, for female the standard error that is relatively large (0.919) compared with the estimated coefficient of 0.475. This can be used to develop a test of the null hypothesis that reading scores for males and females are the same in the population. The ratio of the unstandardized beta estimate to its standard error (β/se) can be used to provide a t-test of statistical significance for each predictor. In this case the ratio is 0.475/0.919, or 0.517, which is the t-ratio in the table. We can see that this t-ratio is not large enough to be statistically significant (p =.608). In this case, therefore, we would fail to reject the null hypothesis that gender affects reading outcomes. Examining a Continuous Predictor Now let s see what happens if we investigate a continuous predictor. In this case we will use a measure of student socioeconomic status (SES). We can start with a scatterplot to observe how strong the relationship appears. We can see in the scatter plot below that there is some relationship between the tendency for higher SES background to be related to higher reading scores in this small sample. You can imagine putting a regression line in between the pairs of scores for the 40 individuals in the sample. You might imagine a line starting in the lower left of the graph (somewhere around Y = 60) and having a positive slope passing through the highest concentration of points.

Ronald H. Heck 5 We can actually add such a regression line that provides the variance in reading scores accounted for by SES. We can also obtain the equation for the best fitting regression line, which we will discuss subsequently. We can see the equation in the figure matches the estimates for the intercept (constant) and SES-reading slope in the regression table on page 6. We can see from the ANOVA table that the relationship between SES and reading is much stronger than for gender (since the regression sum of squares is 133.824 out of 322.874). The

Ronald H. Heck 6 estimated r-square would therefore be estimated as 0.414. If we take the square root, we can see the correlation (r) between SES and reading is about 0.64. ANOVA a Model Sum of Squares Df Mean Square F Sig. 1 Regression 133.824 1 133.824 26.899.000 b Residual 189.050 38 4.975 Total 322.874 39 b. Predictors: (Constant), SES We can confirm this in the Model Summary table with the r-square being 0.414(and the adjusted r-square being slightly less at 0.40). Model Summary Adjusted R Model R R Square Square Std. Error of the Estimate 1.644 a.414.399 2.23047 a. Predictors: (Constant), SES The table with the regression coefficients provides the relationship between SES and reading. Here we can see that the intercept is much lower than the previous model (30.826). This would be interpreted as the reading score for an individual who had an SES level of 0. In this case, the actual SES scores in the data set (which we might think of as a scale from 0 to 10) range from about 6 on the scatter plot to a maximum of 7.50 (with a mean of about 6.7). So we have a situation where the intercept (i.e. the predicted reading score when an individual has an SES score of 0) does not actually describe the reading level of anyone in the sample. Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 30.826 6.938 4.443.000 SES 5.347 1.031.644 5.186.000 There are ways to recode the data so the intercept describes an individual with the lowest SES score in the sample, or even the average SES score in the sample. But we can also just leave it as it is. We can write out the mathematical equation for the prediction student reading scores according to their levels of SES. Y SES e 0 1 * In this case if we substitute in the information from the regression coefficient table, we obtain the following: Yˆ 30.826 5.347* SES

Ronald H. Heck 7 We can see in the table that the unstandardized beta is 5.347. We can use this coefficient to estimate what would be the predicted reading score for individuals with selected levels of SES. For example, if a person had an SES score of 6.0, we would estimate that her or his predicted reading score would be the following: Yˆ 30.826 5.347*6.0. If we multiply 5.347*6 (32.082) and add it to 30.826, we obtain a predicted score of 62.908. If you look on the graph, you will see that this corresponds to the level of Y when SES is equal to 6. Combining Both Predictors in a Model Finally, let s see what happens when we include both predictors in the model. The proposed equation will be the following: Y * female * SES e 0 1 2 We obtain the following information. We can see that in the ANOVA table the whole model is below the p = 0.05 commonly used level of statistical significance (i.e., p <.001). This evaluates the overall set of predictors in explaining reading achievement. ANOVA a Model Sum of Squares Df Mean Square F Sig. 1 Regression 151.742 2 75.871 16.404.000 b Residual 171.132 37 4.625 Total 322.874 39 b. Predictors: (Constant), SES, female Similarly, the r-square (and adjusted r-square) increase when both predictors are in the model. This is helpful in evaluating the fit of the model to the data (higher values indicating better fit). The two variables account for almost 50% of the variance in reading scores. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.686 a.470.441 2.15062 a. Predictors: (Constant), SES, female Finally, we can evaluate the effects of each predictor on the reading outcome. What would you conclude in the table below? First regarding, gender should we reject or fail to reject the null hypothesis at the suggested p value of 0.05? Second, should we reject or fail to reject the null hypothesis that SES is related to student achievement in reading? For SES, a one-unit increase in SES (say from 0 to 1) would result in a

Ronald H. Heck 8 5.8 increase in reading score (holding gender constant). Using the last equation, for males, the predicted score would be 27.080 + 5.802 = 32.882. For females, we would add 1.374 (32.882 + 1.374 = 34.256). Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 27.080 6.955 3.894.000 female 1.374.698.242 1.968.057 SES 5.802 1.021.699 5.685.000 Examining Residuals Besides examining the r-square coefficient as a measure of the model s fit to the data, another way of considering model fit is to examine the residuals (or errors) resulting from the model. For example, we can either plot the predicted values from the model and the residuals or the standardized predicted values and standardized residuals. Residuals are defined as the observed values in the model minus the predicted values (observed predicted = residual). So if the observed score of an individual is 42 and the predicted value is 42, the residual would be 0, and the individual would lie right on the regression line. The resulting plot should result in no relationship between predicted and residual values. In the figure below you can see the residuals are stretched out across the standardized predicted values, indicating no relationship.

Ronald H. Heck 9 Another way to examine the normality of the residuals is to plot the observed distribution of residuals to the expected distribution of residuals (referred to as a P-P plot). This provides a summary of how well the model predicts for various values of the dependent variable. If the two distributions are identical, they would all lie on the line. In this case, in the normal probability plot below initially the observed residuals are below the line (suggesting a smaller number of large negative residuals than expected). Near the middle, the observed points are above the line, suggesting the observed cumulative proportion exceeds the expected proportion. Then at the top, there are more below the line again. You can note that the closer the circles are to the line overall, the more plausible it is that the data were sampled from a normally distributed population. Finally, another common examination is the distribution of the standardized residuals. Often a standardized residual greater than +/-3 is considered an outlier. The standardized residuals should have a mean of 0 and the standard deviation should be close to 1.0 (which they do). Descriptive Statistics N Minimum Maximum Mean Std. Deviation Standardized Residual 40-1.88907 3.09323.0000000.97402153 Valid N (listwise) 40 If we print a histogram of the residuals we can see there is one residual larger than +3 in this small data set. We can say the residuals are relatively normally distributed (i.e., the skewness is 0.45 and the kurtosis is about 1.36, not tabled). Overall we can conclude the model fits the data reasonably well.

Ronald H. Heck 10 Investigating an Interaction Finally, we might test whether there is an interaction present between SES and gender. The test of an interaction may be considered as a test of parallel lines; that is, whether the effect of SES on reading is the same for males and females. We can start by viewing a scatter plot of reading by SES with separate lines indicated for males and females. Here we can see that the slopes of the lines for males and females are different, with the SES-reading slope being steeper for females (dotted line) that the SES-reading slope for males. The null hypothesis would be that the SES-reading slope does not depend on gender.

Ronald H. Heck 11 We can create the interaction term (using compute and multiplying female by SES) and save it in the data set. When we multiply female (coded 1) by SES we will obtain values of the interaction for females but since males are coded 0, the interaction value for males will be 0. So we can interpret the interaction as the possible advantage or disadvantage of SES on reading for females. When we estimate the model we obtain the following results. ANOVA a Model Sum of Squares Df Mean Square F Sig. 1 Regression 154.614 3 51.538 11.027.000 b Residual 168.260 36 4.674 Total 322.874 39 b. Predictors: (Constant), femaleses, SES, female We can see in the ANOVA table above the model is significant (F = 11.027, p <.001). More information, however, is provided by the table of unstandardized and standardized coefficients. This table reveals that neither female (p =.497) nor the interaction of female*ses (p =.438) is significant in accounting for students reading scores. When interactions are added to the model, they often change the coefficient of the main effects. We can also notice the coefficients look a little odd compared with the last table. They are not incorrect, but the interaction can complicate the calculations a little bit. We would interpret the intercept in this case as the predicted score of a male (coded 0) who has an SES score of 0 (31.435). The female (coded 1) with an SES score of 0 would be assumed to have a score of 31.435 9.687 or 21.748 (which accounts for the fact that in the graph the male and female regression lines cross). The effect of SES for males is simple to estimate from the equation above. For a one unit increase in SES, the estimated effect on reading scores for males would be the following: ˆ Y = 31.435-9.687(0) + 5.161(1) + 1.65(0) = 31.435 + 5.161 The estimated score would then be 36.596. For females, the estimated score would be as follows: ˆ Y = 31.435-9.687(1) + 5.161(1) + 1.65(1) = 31.435-2.876 The estimated score would then be 28.599. The estimated score reflects the slight positive advantage for the interaction effect (femaleses) in reducing the gap in reading scores for females. It should be noted in passing that at high levels of SES, there would be an advantage associated with the interaction term for females. We can observe in the graph that the advantage in reading scores for females becomes positive and increases at about SES = 6.0 or above.

Ronald H. Heck 12 In this case, however, even though the previous graph looks like the lines are not parallel, we must fail to reject the null hypothesis. In other words, we assume that the effect of SES on reading achievement does not depend on gender. Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta T Sig. 1 (Constant) 31.435 8.930 3.520.001 female -9.687 14.130-1.705 -.686.497 SES 5.161 1.312.621 3.935.000 femaleses 1.650 2.105 1.933.784.438 When the test of the interaction is not statistically significant, we can revert back to the model without the interaction as the final model, unless there is some theoretical reason why we might want to keep the model with the interaction effect as the final model. Centering SES on the Individual with lowest SES We may be able to improve the look of the coefficients in the interaction model by centering SES on a more meaningful value in the data set. For example, we might choose to re-center SES on the individual with lowest SES in the sample. We will select 6.0. We can compute the value by giving the variable a new name (SESlow) and then computing the new value as follows: SES 6.0. This subtracts 6.0 from everyone s score so the new intercept individual will have an SES value of 0 for the original value of 6.0 on the SES scale. Notice how this changes the interpretation of the new interaction model. We can see the effect of the new variable SESlow is still the same (5.161) and the interaction for female*seslow is also the same (1.650). The intercept however is now 62.404 (interpreted as the value for a male with SES of 0, which is now equal to the old value of 6.0). The effect for female at this point on the original graph is now 0.213, which is not significant (p = 0.897). So substantively the model does not change but if you look at the graph of the male and female regression lines, the model is now centered right near the original point where the two regression lines cross (at SES of about 6.0). The model is not substantively different but it may be easier for readers to understand if we use a more meaningful place to center SES. Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 62.404 1.154 54.099.000 female.213 1.639.038.130.897 SESlow 5.161 1.312.621 3.935.000 femaleseslow 1.650 2.105.222.784.438

Ronald H. Heck 13 How to Recode a Variable in SPSS In our example, the lowest value of SES is 6.0 (actually 5.98). We can re-code SES so that the meaning of the intercept in the regression model is equal to the person with a score of SES score of 6.0 (approximately the lowest score in the data set). We can recode individuals SES scores such that 0 will represent the person with the lowest SES score in the data set. In this case, we will use 6.0. This can be accomplished by subtracting 6.0 to everyone s current SES score. So for example, the first individual with an SES score of 6.76 after subtracting 6.0 will have a score of 0.76. To do so, Open TRANSFORM and then open Compute Variable. If you open that, you will see Target Variable on the left top. We can type the name of the new variable (SESlow) there. This will save the new variable in the data set. Next we select SES and click it into the Numeric Expression box. Then we will subtract 6 from everyone s SES score. So the Numeric Expression box should look like this: SES 6 Then click on OK. You can check that the first person now has a score of 0.76 (since we have subtracted 6 from the original score of 6.76). If that is the case, then you have performed the transformation accurately. Now you can run your regression using Female and SESlow as the two predictors. You will obtain the following results. The meaning of the intercept is now a male student with an SES score of 6.0 (which has been recoded to be 0). You can see that other coefficients are unaffected by the recode. Coefficients a Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 61.892.946 65.409.000 female 1.374.698.242 1.968.057 SESlow 5.802 1.021.699 5.685.000 Creating an Interaction Term in SPSS To create the interaction term, we open TRANSFORM and Compute Variable. We will give the target variable the name femaleseslow. We place female in the Numeric Expression Box. Then we click in an asterisk. Finally, we place SESlow in the Numeric Expression Box. The equation should look like this: female*seslow We then click OK. You should see the interaction in the data set. Now you can run the new regression model. It should match the interaction model in the handout.