Interaction effects between continuous variables (Optional)

Similar documents
Sociology Exam 2 Answer Key March 30, 2012

Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

Specification Error: Omitted and Extraneous Variables

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised January 20, 2018

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Interpreting coefficients for transformed variables

S o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7,

Problem Set 1 ANSWERS

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Lecture 24: Partial correlation, multiple regression, and correlation

Lab 10 - Binary Variables

Comparing Logit & Probit Coefficients Between Nested Models (Old Version)

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Sociology Exam 1 Answer Key Revised February 26, 2007

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

General Linear Model (Chapter 4)

Problem Set 10: Panel Data

Lecture 7: OLS with qualitative information

Lab 07 Introduction to Econometrics

Lecture 12: Interactions and Splines

Rockefeller College University at Albany

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Lab 6 - Simple Regression

Section Least Squares Regression

Econometrics Homework 1

Practice exam questions

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

ECO220Y Simple Regression: Testing the Slope

Lecture 4: Multivariate Regression, Part 2

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

1 A Review of Correlation and Regression

Week 3: Simple Linear Regression

ECON3150/4150 Spring 2016

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Applied Statistics and Econometrics

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

ECON3150/4150 Spring 2015

1 Independent Practice: Hypothesis tests for one parameter:

Lecture 3: Multivariate Regression

sociology 362 regression

Econometrics II Censoring & Truncation. May 5, 2011

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Statistical Modelling in Stata 5: Linear Models

ECON 497 Final Exam Page 1 of 12

Empirical Application of Simple Regression (Chapter 2)

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

sociology 362 regression

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Multivariate Regression: Part I

Handout 11: Measurement Error

At this point, if you ve done everything correctly, you should have data that looks something like:

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

ECON3150/4150 Spring 2016

Analysis of repeated measurements (KLMED8008)

Regression #8: Loose Ends

Lecture#12. Instrumental variables regression Causal parameters III

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Econometrics Midterm Examination Answers

Binary Dependent Variables

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Handout 12. Endogeneity & Simultaneous Equation Models

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Unemployment Rate Example

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

5.2. a. Unobserved factors that tend to make an individual healthier also tend

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Heteroskedasticity Example

Lecture 4: Multivariate Regression, Part 2

Practice 2SLS with Artificial Data Part 1

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

ECON Introductory Econometrics. Lecture 13: Internal and external validity

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Graduate Econometrics Lecture 4: Heteroskedasticity

Statistical Inference with Regression Analysis

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Course Econometrics I

A Re-Introduction to General Linear Models (GLM)

Fixed and Random Effects Models: Vartanian, SW 683

Heteroskedasticity Richard Williams, University of Notre Dame, Last revised January 30, 2015

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

2. We care about proportion for categorical variable, but average for numerical one.

Transcription:

Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, https://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat difficult topic. The course readings provide much more detail, and you should go over these carefully if you feel these kinds of interaction terms will be useful in your work. Interactions between two continuous variables. We have focused on interactions between categorical and continuous variables. However, there can also be interactions between two continuous variables. For example, suppose that Intentions and Actual Behavior are both measured as continuous variables. Suppose further it is believed that the effect of intentions on behavior (i.e. the correspondence between what one wants to do and what one actually does) is greater at higher levels of income; that is, the higher one s income is, the more consistently one behaves. This model would be written as Behavior = α + β Intentions + β Income + β ( Intentions * Income) = α + ( β + β * Income) Intentions + βincome = α + β Intentions + ( β + β * Intentions) * Income A positive value for the effect of the interaction term would imply that the higher the income, the greater (more positive) the effect of intentions on behavior was. Similarly, the higher the intentions, the greater (more positive) the effect of income on behavior. EXAMPLES. The greater the resources available, the stronger the effect of intentions on behavior. Those who are most able to get what they want are most likely to get it. The more religious someone is, the stronger the effect of moral values on behavior. The more religious someone is, the more compelled they will be to act on their moral values. Interpreting Interactions between two continuous variables. As Jaccard, Turrisi and Wan (Interaction effects in multiple regression) and Aiken and West (Multiple regression: Testing and interpreting interactions) note, there are a number of difficulties in interpreting such interactions. There are also various problems that can arise. Both books note with regret that such interaction terms are not used more widely in the social sciences. Those who feel that such interaction terms may be theoretically justified in their analyses should consult these works for additional details. A few highlights from their discussions may be helpful: In general, models with interaction effects should also include the main effects of the variables that were used to compute the interaction terms, even if these main effects are not significant. Otherwise, main effects and interaction effects can get confounded. Further, it can be shown that, if main effects are not included, arbitrary changes in the zero point of the original variables can result in important changes in the apparent effects of the interaction terms. Interaction effects between continuous variables (Optional) Page

In models with multiplicative terms, the regression coefficients for X and X reflect conditional relationships. B is the effect of X on Y when X = 0. Similarly, B is the effect of X on Y when X = 0. For example, when X = 0, we get Y + β X + β 0 + β ( X + β ( X * X *0) ) So, we can say that, for a person who has a score of 0 on X, a unit increase in X will produce, on average, a B increase in Y. However, suppose someone has a score of on X. The effect of X is then Y + β X + β * + β ( X + β + β X = α + β + ( β + β ) X + β ( X * X ) *) So, when X =, a one unit increase in X will produce a (B + B) unit increase in Y. In short, if we want to ask, What is the effect of X on Y, the answer is It depends on what X equals. Of course, given the way X and X are scaled, 0 may or may not be a particularly meaningful value, e.g. no adult is 0 years old, nobody has an IQ or a height or a weight of 0. Hence, the main (i.e. non-interaction) effects in a model with interaction terms may have little meaning and may even be misleading. Effects can therefore often be made more interpretable by centering variables first. When we center a variable, we subtract the mean from each case, and then compute the interaction terms. When variables are centered, B is the effect of X on Y for the person who is average on X. Alternatively, rather than centering around the mean, we might center around some other meaningful value. For example, we might recenter education so that a score of 0 corresponds to years of high school. Then, the coefficient for the other X would correspond to the effect that variable has for high school graduates. We can also simply do the above by hand, e.g. take high, medium and low values for X and see what the effect of X is in each case. For example, suppose it conveniently is the case that the intercept is 0 and the betas are all, i.e. y = X + X + XX Interaction effects between continuous variables (Optional) Page

Suppose further that 0, 5, and 0 are low, medium and high values of X. We then get X = 0 X = 5 X = 0 Effect of X on Y 6 Hence, the effect of X on Y is times greater for high values of X than it is for low values of X. More on Centering Continuous Variables. Centering can be useful when both variables are continuous. Again, what a model ultimately says and predicts does not depend on whether or not you center, but centering may make the results a little more meaningful and easy to interpret. For example, suppose a model includes the IVs education, income, and income*education, where income and education have both been centered about their means. Hence, the overall model is E( Y ) = α + βeduc + βinc + βeduc* Inc = α + ( β + βinc)* Educ + βinc = α + β Educ + ( β + β Educ)* Inc As the algebraically equivalent statements show, the effect of education on Y depends on the level of income, and the effect of income on Y depends on the level of education. Note that, for a person of average income (i.e. has a score of 0 on the centered Income variable) the model simplifies to ( Y ) = α + β Educ E Hence, when the variables are centered, the main effect of education is the effect of education on a person who has average income. Similarly, for a person with an average level of education (centered education = 0), the model simplifies to ( Y ) = α + β Inc E That is, when variables are centered, the main effect of income is the effect of income on a person who has an average level of education. Finally, for a person who has average education and average income, E (Y ) i.e. when variables are centered, the intercept is the predicted Y score for an average person. = α Interaction effects between continuous variables (Optional) Page

If you didn t center, the main effect of education would be the effect of education on a person who had 0 income the main effect of income would correspond to the effect of income on a person with 0 years of education the intercept would be the predicted Y score for a person who has 0 years of education and 0 income. Such numbers may or may not be terribly interesting. It depends on whether or not 0 happens to be an interesting value for your variables. Again, what the model ultimately says and what it predicts are the same whether you center or not. But, looking at the effect of income on a person with average education may be more substantively interesting that focusing on the effect of income on a person who has no education. Whether or not you center won t change the estimated effect of the interaction (in this case B) but it will change the estimated main effects. This is because the main effects mean different things depending on whether or not you have centered. Graphing interactions between continuous variables. Graphing can be tricky for interactions involving two or more continuous variables but can still be useful. One approach is to plug in substantively interesting values for one of the IVs and then plot the other IV against the DV. For example, you could plot INCOME versus Y for high, medium and low levels of education (i.e. just like we had separate lines for men and women, we could have separate lines for each of the selected values of education. ) If you did this for every possible value of education you d either get a very messy graph or a -dimensional one, but if you choose a few interesting values you can give a feel for how the effect of income differs by education. For example, in the following hypothetical graph, we plot the relationship between Income and Y for 8,, and 6 years of education. The graph shows that, as education increases, the effect of income on Y gets greater and greater: 0000 8 years years 6 years 0000 0000 0 0 50 00 50 inc Interaction effects between continuous variables (Optional) Page 4

In Stata, we could do something like. webuse nhanesf, clear. sum health age weight Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- health 05.486.0696 5 age 07 47.567 7.678 0 74 weight 07 7.90088 5.555 0.84 75.88. reg health age weight c.age#c.weight Source SS df MS Number of obs = 05 -------------+------------------------------ F(, 0) = 546.46 Model 059.0906 686.64 Prob > F = 0.0000 Residual 975.9 0.560889 R-squared = 0.70 -------------+------------------------------ Adj R-squared = 0.67 Total 505.04 04.454908 Root MSE =.07 -------------------------------------------------------------------------------- health Coef. Std. Err. t P> t [95% Conf. Interval] ---------------+---------------------------------------------------------------- age -.0966.000909-6.6 0.000 -.05708 -.0604 weight.008507.00 0.87 0.8 -.00089.006004 c.age#c.weight -.0000865.00004 -.0 0.044 -.000708 -.4e-06 _cons 4.578.568 9.64 0.000 4.468 4.896 --------------------------------------------------------------------------------. quietly margins, at(weight=(0(0)80) age = (0, 50, 80)). marginsplot, noci scheme(sj) ytitle("predicted health") Adjusted Predictions Predicted health.5.5 4 0 40 50 60 70 80 90 00 0 0 0 40 50 60 70 80 weight (kg) age=0 age=80 age=50 This shows us that the effect of weight on self-reported health is almost 0 for 0 year olds. But, as people get older and older, the effect of weight on self-reported health becomes more and more negative. Interaction effects between continuous variables (Optional) Page 5