Interaction effects for continuous predictors in regression modeling

Size: px
Start display at page:

Download "Interaction effects for continuous predictors in regression modeling"

Transcription

1 Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage of wide applicability and ease of interpretation. The model has the form y x x i i p pi i where y is the response variable, { x,, x p } are predictor variables, and is an error term. An implication of this model is that the partial relationship between y and any predictor x (given the other predictors are held fixed) is the same across all values of the predictors; specifically that holding all else fixed, a one unit change in x is associated with an expected unit change in y, for any value of x and any values of the other predictors. When considering the constant relationship between y for any value of another predictor, this is often referred to as the lack of an interaction effect of x on y given the value of a third variable. From a mathematical point of view, this is represented by the fact that the partial derivative y x is a constant. It is not uncommon for researchers and data analysts to consider the possibility that the effect of a predictor on the response could be different depending on the value of a third variable; that is, the presence of an interaction effect. The classic situation of this occurring is if the third variable is defining subgroups in the data, with the implication being that the slope of x differs depending on group membership. It is well-known that such a model can be fit by including in a regression model a set of indicator variables to define the groups, and all of the pairwise products of the indicator variables and the variable x (this can also be accomplished using effect codings; see Mayhew and Simonoff, 5, for a full discussion of the use of effect codings to define subgroups in a data set). Consider the simplest situation of the presence of two subgroups A and B in the data and a single predictor x. Say an indicator variable I defines group membership, with I corresponding to membership in group A and I corresponding to membership group B. Fitting the regression model based on I, x, and their product Ix yi xi Ii 3Iixi i is equivalent to fitting the two separate lines y x for members of group A ( I ) and 5, Jeffrey S. Simonoff i i i * * yi xi 3xi i xi i for members of group B ( I ). As can be seen, by including the product of I in the regression model, different slopes for the two groups are implied, representing the interaction effect of group membership and the numerical variable x. This generalizes for more than two

2 subgroups to an analysis of covariance model (see Chatteree and Simonoff, 3, for extensive discussion of fitting such models). This fact has had the unfortunate effect of resulting in researchers attempting to represent interactions between two numerical variables in the same way, by including their product as a predictor in a fitted regression, yi x i xi 3x ix i i. (.) This is problematic because using the t-test for whether the slope of the product variable equals as an interaction test potentially results in errors of both types, Type I (mistakenly identifying a pattern that does not correspond to an interaction effect as an interaction) and Type II (mistakenly deciding that no interaction effect is present when it actually is), no matter how large the sample is or how strong the underlying relationships are. We will treat each of these issues in turn in the next two sections, illustrating them with simulated data. The data are a deliberately simplified version of the problem where the patterns are obvious, in order to illustrate the issues clearly; in a real data situation with multiple additional predictors the patterns could easily be less obvious to the eye, but ust as serious. We will then discuss how to graphically uncover an interaction effect between two numerical variables, and how the use of additive models (a generalization of the linear model) can be an appropriate way to avoid mistakenly identifying a supposed interaction effect. We will then suggest a simple alternative approach for identifying interactions between numerical variables. Problems with the product test for interactions Mistakenly identifying nonlinearity as an interaction (Type I error) The key idea is to recognize that (.) is not an interaction equation, but rather a nonlinear one. If nonlinearity is mistakenly identified as an interaction, a Type I error occurs. This can easily happen if the variables x are correlated with each other. Consider the following situation. Say the true underlying relationship is a quadratic one on variable x alone; that is, y x x i i i i. The model on only x clearly cannot account for this quadratic relationship. If the product model (.) is fit instead, and if x are highly correlated, * * * * y x x x x x x i i i i i i 3 i i i, because up to constant terms or terms in x or x alone x x ixi. Thus, if a product term is included in the regression its t-statistic will be statistically significant, implying an interaction between x, when in fact what is present is a nonlinear relationship in x alone. Consider the following simulated example. The following regression output is based on fitting a regression with two predictors, x : 5, Jeffrey S. Simonoff

3 The regression equation is y = x -.8 x Predictor Coef SE Coef T P Constant x x S =.47 R-Sq = 9.% R-Sq(ad) = 7.% Analysis of Variance Source DF SS MS F P Regression Residual Error Total The overall regression is statistically significant, but neither predictor is; the reason for this is that the two predictors are highly correlated (the correlation between them is.994). The product test for an interaction now adds the product variable to the regression: The regression equation is y = x -.59 x +. xx Predictor Coef SE Coef T P Constant x x xx S =.7673 R-Sq = 76.5% R-Sq(ad) = 75.8% Analysis of Variance Source DF SS MS F P Regression Residual Error Total The t-test is extremely highly statistically significant, apparently indicating an extremely strong interaction between the two predictors, but that is not in fact the case. The scatter plot below demonstrates what is actually going on: there is a quadratic relationship between y, and the high correlation between x has resulted in the product of the two variables taking the place of the x term. Thus, a nonlinear relationship in a single predictor has been misidentified as an interaction effect involving two predictors. 5, Jeffrey S. Simonoff

4 y Scatterplot of y vs x x 3 Mistakenly missing the presence an interaction (Type II error) The product term in equation (.) can be viewed as an interaction effect on the response, as it does correspond to a differential effect of x on y given the value of x ; specifically, y 3x. x The problem with the test is that this is a very specific form of an effect, and many interaction effects do not correspond to a relationship even close to this form. As a result, there are many situations where an actual interaction will be missed by the test of whether the slope of the product term equals. Consider the following simulated example. The following regression output is based on fitting a regression with two predictors, x (note that y are not the same as in the previous example): The regression equation is y = x -.39 x Predictor Coef SE Coef T P Constant x x , Jeffrey S. Simonoff

5 S =.39 R-Sq = 5.% R-Sq(ad) = 3.3% Analysis of Variance Source DF SS MS F P Regression Residual Error Total The overall regression is marginally statistically significant, as is the slope coefficient for x. The product test for an interaction now adds the product variable to the regression: The regression equation is y = x -.66 x -.9 xx Predictor Coef SE Coef T P Constant x x xx S =.78 R-Sq = 5.5% R-Sq(ad) =.5% Analysis of Variance Source DF SS MS F P Regression Residual Error Total As is apparent, the product variable is not close to being statistically significant here, apparently implying that there is no interaction effect, but that is not in fact the case. There is in fact a very strong interaction effect: if x 35 or x 7 the slope, and otherwise the slope. This can be seen in the following scatter plot, where the regions are labeled Low, Mid, and High: 5, Jeffrey S. Simonoff

6 y 3 Scatterplot of y vs x Region High Low Mid x 3 Since this interaction does not look like a product term, the test has no power to identify it, even though doing so correctly would result in a strong fit (an R more than 75% and a highly statistically significant interaction effect corresponding to different slopes for the three regions of x ). Identifying interaction effects Given the deficiencies in using the product of two numerical predictors to test for the presence of an interaction effect, a natural question to ask is whether there are better methods. The answer is yes, as we discuss here. We first describe a graphical technique (termed a trellis display) that can help expose the presence of an interaction effect, and we then discuss how the linear regression model can be generalized to an additive model that is flexible enough to distinguish between nonlinear relationships and actual interaction effects. Both of these techniques are available as part of the free software package R. We then note how fitting an analysis of covariance model can easily test for the presence of an interaction effect in a way that is much more effective in general than is multiplying numerical variables. 5, Jeffrey S. Simonoff

7 y Trellis displays A trellis display is a version of a conditioning plot; it highlights patterns in the data conditioning on the value of a specific variable. Since this is precisely what an interaction effect in regression represents (the relationship between the response and a predictor changing based on the value of another variable), such a display is ideal for exploring graphically the possibility of an interaction effect. The display below gives a display for the second data set given above prepared using the lattice package of the R software package (Sarkar, 8). Recall that in that data set the slope between y changes depending on the value of x. The plot is constructed by defining subregions based on the conditioning variable x ; a simple default (used here) is to divide the data into regions with roughly equal numbers of observations. Each panel of the display is a scatter plot of y versus x for the observations in that x subregion. The subregions go from smallest values of x in the lower left to largest values in the upper right, and are identified by the shading at the top of each plot in the display. - - x x x - - x x x x - - 5, Jeffrey S. Simonoff

8 It is apparent in the display that for smaller values of x there is a direct relationship between y, for moderate values there is an inverse relationship, and for large values there is again a direct relationship. Thus, the plot easily summarizes the interaction effect in the data. As is true for any scatter plot in a multiple regression the display is in general only suggestive, since it cannot account for the effects of predictors other than x on the relationship between y given x, but it is certainly worth constructing if the possibility of an interaction effect is contemplated. Additive models Additive models (Hastie and Tibshirani, 99) are a generalization of linear models in which linear terms are replaced with arbitrary, usually smooth, functions of predictors. The simplest version of the model takes the form y f ( x ) f ( x ), i i p pi i where the functions f ( ) can be generalizations beyond the linear terms in a linear model. These functions are typically assumed to be smooth, and are estimated using kernel-based local polynomials, smoothing splines, and so on (see Simonoff, 996, for a discussion of smoothing methods). These models provide a compromise between linear models (with their ease of interpretation but strong assumption of linearity of effects) and arbitrary nonlinear models (with their greater flexibility but difficulties in specification and estimation) by hypothesizing that effects can be nonlinear, but do not interact with each other. They can be fit using either the gam or mgcv packages in the R software package. So, for example, for the first data set given above an additive model fit can automatically highlight the nonlinear relationship between y, and given that the unimportance of x : 5, Jeffrey S. Simonoff

9 x x In this display each of the plots gives the effect of the variable given the presence of the other. The superimposed lines correspond to estimates of the underlying functions f and f, and show that once x is included in the model x does not add anything, even though a simple scatter plot of y on x would show a quadratic pattern because of the high correlation between x (the smoothness of the fitted curves must be chosen by the data analyst; Simonoff and Tsai, 999, discuss this statistical problem, but from a practical point of view it is often satisfactory to choose the curves by eye). Analysis of covariance This does not directly address the problem of identifying interactions if they exist, beyond identifying when a nonlinear relationship has been misidentified as an interaction. Thus, the plot of the additive terms for the second data set above (where there is an interaction effect) shows that an additive model is not an adequate representation of the relationships, as the additive 5, Jeffrey S. Simonoff

10 model tries to use a parabolic curve to estimate a much more complex relationship between y and x : - - x x While it is possible to generalize the additive model to allow for terms that are explicitly smooth interactions of predictors, a more straightforward approach is to build on the trellis display, and explore a regression model that allows for different slopes for a predictor depending on the value of another variable. This is not correct unless the groupings happen to correspond exactly to true subgroups in the data (recall, for example, that the true relationship in the second data set is based on three subgroups in the data, not the six automatically chosen in the trellis display), but is flexible enough to usually identify the existence of a potential interaction that could then be explored further. That is, fit an analysis of covariance model that includes an interaction effect and construct a partial F-test for whether this provides significantly better fit than does a constant 5, Jeffrey S. Simonoff

11 shift model. This corresponds to fitting separate lines to each of the subplots in the trellis display if there are no other predictors in the model, but generalizes the display to account for the potential effects of other variables if there are any. If that is done for the second data set above, the interaction effect is clearly supported, with a partial F-test equal to 46.5 on (5,88) degrees of freedom, yielding a p-value vanishingly close to, strongly implying improved performance for lines with different slopes over a set of parallel lines. Closer examination of the trellis display would then show that there seem to be three separate regimes defining the interaction, which could be explored further. References Chatteree, S. and Simonoff, J.S. (3), Handbook of Regression Analysis, Wiley: Hoboken, NJ. Hastie, T.J. and Tibshirani, R.J. (99), Generalized Additive Models, Chapman and Hall: London. Mayhew, M.J. and Simonoff, J.S. (5), Nonwhite, No More: Effect Coding as an Alternative to Dummy Coding with Implications for Researchers in Higher Education, Journal of College Student Development, 56, Sarkar, D. (8), Lattice: Multivariate Data Visualization with R, Springer: New York. Simonoff, J.S. (996), Smoothing Methods in Statistics, Springer: New York. Simonoff, J.S. and Tsai, C.-L. (999), Semiparametric and Additive Model Selection Using an Improved Akaike Information Criterion, Journal of Computational and Graphical Statistics, 8, -4. 5, Jeffrey S. Simonoff

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Is economic freedom related to economic growth?

Is economic freedom related to economic growth? Is economic freedom related to economic growth? It is an article of faith among supporters of capitalism: economic freedom leads to economic growth. The publication Economic Freedom of the World: 2003

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1 MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS F. Chiaromonte 1 Pool of available predictors/terms from them in the data set. Related to model selection, are the questions: What is the relative importance

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

1 Chapter 2, Problem Set 1

1 Chapter 2, Problem Set 1 1 Chapter 2, Problem Set 1 1. The first model is the smoothest because it imposes a straight line. Only two degrees of freedom are lost. The second model exhibits the most jagged fit because each distinct

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Analysis of Bivariate Data

Analysis of Bivariate Data Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr&reg 2 Independent

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

STAT 212 Business Statistics II 1

STAT 212 Business Statistics II 1 STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Fractional Polynomial Regression

Fractional Polynomial Regression Chapter 382 Fractional Polynomial Regression Introduction This program fits fractional polynomial models in situations in which there is one dependent (Y) variable and one independent (X) variable. It

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

Multilevel Analysis, with Extensions

Multilevel Analysis, with Extensions May 26, 2010 We start by reviewing the research on multilevel analysis that has been done in psychometrics and educational statistics, roughly since 1985. The canonical reference (at least I hope so) is

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Ph.D. Preliminary Examination Statistics June 2, 2014

Ph.D. Preliminary Examination Statistics June 2, 2014 Ph.D. Preliminary Examination Statistics June, 04 NOTES:. The exam is worth 00 points.. Partial credit may be given for partial answers if possible.. There are 5 pages in this exam paper. I have neither

More information

Modelling Survival Data using Generalized Additive Models with Flexible Link

Modelling Survival Data using Generalized Additive Models with Flexible Link Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Regression tree-based diagnostics for linear multilevel models

Regression tree-based diagnostics for linear multilevel models Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many

More information

STATISTICAL COMPUTING USING R/S. John Fox McMaster University

STATISTICAL COMPUTING USING R/S. John Fox McMaster University STATISTICAL COMPUTING USING R/S John Fox McMaster University The S statistical programming language and computing environment has become the defacto standard among statisticians and has made substantial

More information

27. SIMPLE LINEAR REGRESSION II

27. SIMPLE LINEAR REGRESSION II 27. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Soil Phosphorus Discussion

Soil Phosphorus Discussion Solution: Soil Phosphorus Discussion Summary This analysis is ambiguous: there are two reasonable approaches which yield different results. Both lead to the conclusion that there is not an independent

More information

Multiple Linear Regression

Multiple Linear Regression Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot. SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot. 2. Fit the linear regression line. Regression Analysis: y versus x y

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology Jeffrey R. Edwards University of North Carolina 1 Outline I. Types of Difference Scores II. Questions Difference

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Estimating complex causal effects from incomplete observational data

Estimating complex causal effects from incomplete observational data Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä,

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Chapter 12: Multiple Regression

Chapter 12: Multiple Regression Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x

More information

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS GEORGE W. COBB Mount Holyoke College INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS Springer CONTENTS To the Instructor Sample Exam Questions To the Student Acknowledgments xv xxi xxvii xxix 1. INTRODUCTION

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information