Multiple Regression: Chapter 13. July 24, 2015

Size: px
Start display at page:

Download "Multiple Regression: Chapter 13. July 24, 2015"

Transcription

1 Multiple Regression: Chapter 13 July 24, 2015

2 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors) Note: the predictors can be quantitative, categorical, quadratic term, interaction term

3 Concentrate on: Reading computer output Interpreting coefficients for each Predictor Determining which order to test in picking the simplest model that does a good job for predicting Y

4 The Basic of MR Model: Y = α + β 1 X 1 + β 2 X β p X p + ɛ ( predictors: X 1, X 2,..., X p, # predictors: p) Assumptions: ɛ iid N(0, σ) Parameters: coefficients: β 1, β 2,..., β p constant: α

5 Reading the computer output: 1. Fitted Equation: ŷ = a + b 1 X 1 + b 2 X b p X p 2. ANOVA Test: H 0 : β 1 = β 2 =... = β p = 0 (nothing good in model) H a : at least one β i 0 (something good) Test Statistic: F = MSR MSE ANOVA table for regression Source df SS M S F P-value Regression p SSReg MSR MSR (from F table with MSE Error n p 1 SSE MSE df num, df denom ) Total n 1 SST

6 3. t test for Individual Predictors: H 0 : β i = 0 vs H a : β i 0 b Test Statistic: t = i 0 standard error of b i p-value computed from t-table with df= n p 1 (error) Interpretation: if p-value small, reject H 0 - conclude that predictor X i is a GOOD predictor of Y (X i provides significance information about Y ) AFTER all other predictors in the model are accounted for

7 Important Issues in Multiple Regression Don t just add predictors to the model - think! For p = n we have oversaturated model with R 2 = 100% (not useful to predict for larger populations, but only for this particular dataset) adjusted R 2 only increases if the new predictor added to the model is good, whereas R 2 goes up or stays the same even if the new predictors are bad Remember to look at p-values for each predictor.

8 Multicollinearity: when several predictors are correlated with each other, then the ANOVA p-value may be small even if all the individual t-test p-values are large. Correlated predictors give overlapping or redundant information. (Don t throw out all the predictors but take them out of model slowly) Sample size should be at least 5 to 20 times bigger than the number of predictors

9 Example: The following is the dataset on Blood Alcohol Content (BAC) and the Number of Beers consumed (NOB) with two more variables Weight and Sex. We fit different regression models and compare the output. BAC NOB Weight Sex M 1 F f f f m m f f m f m f m f m m m 1 0

10 Regression Analysis: BAC versus NOB The regression equation is BAC = NOB Predictor Coef SE Coef T P Constant NOB S = R-Sq = 80.0% R-Sq(adj) = 78.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

11 Regression Analysis: BAC versus NOB, M_1 The regression equation is BAC = NOB M_1 Predictor Coef SE Coef T P Constant NOB M_ S = R-Sq = 85.3% R-Sq(adj) = 83.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

12 Regression with Dummy Variables: Dummy Variable: Categorical variable coded as 0 or 1 0 if female Example: Let X 2 =Gender = 1 if male (baseline group has zero for dummy variable) Model (no interaction): Y = α + β 1 X 1 + β 2 X 2 + ɛ Note: This model gives two lines - one for females and one for males with same slope but different intercepts. F (X 2 = 0) Y = α + β 1 X 1 + ɛ M (X 2 = 1) Y = (α + β 2 ) + β 1 X 1 + ɛ

13 Interpret Coefficients: α β 1 β 2 y-intercept for baseline group (F) slope for both groups change in intercept for males compared to females

14 Regression Analysis: BAC versus NOB, Weight The regression equation is BAC = NOB Weight Predictor Coef SE Coef T P Constant NOB Weight S = R-Sq = 95.2% R-Sq(adj) = 94.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

15 Regression Analysis: BAC versus NOB, Weight, M_1 The regression equation is BAC = NOB Weight M_1 Predictor Coef SE Coef T P Constant NOB Weight M_ S = R-Sq = 95.3% R-Sq(adj) = 94.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

16 Question: What if gender coded the other way? Regression Analysis: BAC versus NOB, Weight, F_1 The regression equation is BAC = NOB Weight F_1 Predictor Coef SE Coef T P Constant NOB Weight F_ S = R-Sq = 95.3% R-Sq(adj) = 94.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

17 Interaction model: (with dummy) Y = α + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + ɛ Note: This model gives two lines - one for females and one for males with different slopes and different intercepts. F (X 2 = 0) Y = α +β 1 X 1 +ɛ M (X 2 = 1) Y = (α + β 2 ) +(β 1 + β 3 )X 1 +ɛ Interpret Coefficients: α β 1 β 2 β 3 y-intercept for baseline group (F) slope for baseline group (F) change in intercept for males compared to females change in slope for males compared to females

18 Regression Analysis: BAC versus NOB, Weight, M_1, Weight*M_1 The regression equation is BAC = NOB Weight M_ Weight*M_1 Predictor Coef SE Coef T P Constant NOB Weight M_ Weight*M_ S = R-Sq = 95.5% R-Sq(adj) = 93.9% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

19 What if we had 3 groups? Suppose we want to predict BAC from NOB and Race: white, black, hispanic Need 2 dummy variables for 3 categories. Let X 2 = 1 if black 0 otherwise, X 3 = 1 if hispanic 0 otherwise Note: Race = White, is the baseline zero for both dummy variables.

20 No Interaction model with 2 Dummies: Y = α + β 1 X 1 + β 2 X 2 + β 3 X 3 + ɛ which gives the following 3 equations: X 2 = 0, X 3 = 0 (W): X 2 = 1, X 3 = 0 (B): X 2 = 0, X 3 = 1 (H): Y = α + β 1 X 1 + ɛ Y = (α + β 2 ) + β 1 X 1 + ɛ Y = (α + β 3 ) + β 1 X 1 + ɛ Interpret Coefficients: α β 1 β 2 β 3 intercept for baseline group (W) slope for all 3 groups change in intercept for blacks compared to whites change in intercept for hispanic compared to whites

21 Interaction Model: add interactions between the quantitative variable (X 1 ) and the dummy variables (X 2, X 3 ) Y = α+β 1 X 1 +β 2 X 2 +β 3 X 3 +β 4 X 1 X 2 +β 5 X 1 X 3 +ɛ which gives the following 3 equations: X 2 = 0, X 3 = 0 (W): X 2 = 1, X 3 = 0 (B): X 2 = 0, X 3 = 1 (H): Y = α + β 1 X 1 + ɛ Y = (α + β 2 ) + (β 1 + β 4 )X 1 + ɛ Y = (α + β 3 ) + (β 1 + β 5 )X 1 + ɛ Interpret Coefficients: α intercept for baseline group (W) β 1 slope for W β 2 change in intercept for B compared to W β 3 change in intercept for H compared to W β 4 change in slope for B compared to W change in slope for H compared to W β 5

22 In regression, if we have only one categorical predictor, REGRESSION ONE-WAY ANOVA Revisit the ONE-WAY ANOVA Example: Compare average weight loss for three diets. Data: Weight loss under 3 diets low FAT low CAL low CARB

23 ANOVA results (output): One-way ANOVA: lowfat, lowcal, lowcarb Source DF SS MS F P Factor Error Total S = R-Sq = 66.79% R-Sq(adj) = 59.41% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev lowfat ( * ) lowcal ( * ) lowcarb ( * ) Pooled StDev = 2.598

24 Now, let s set up the problem as regression with dummy variables. Y = weight loss (response) Let X 1 = 1 if lowcal 0 otherwise, X 2 = 1 if lowcarb 0 otherwise Model: Y = α + β 1 X 1 + β 2 X 2 + ɛ Interpret Coefficients: α β 1 β 2 intercept for baseline group (lowfat) change in intercept for lowcal compared to lowfat change in intercept for lowcarb compared to lowfat

25 REGRESSION results (output): Regression Analysis: Y versus x1, x2 The regression equation is Y = x x2 Predictor Coef SE Coef T P Constant x x S = R-Sq = 66.8% R-Sq(adj) = 59.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

26 More about RESIDUALS: Plot of RESIDUALS vs FITTED value will exaggerate any pattern present in data other than linear trend. How to judge non constant variance in response from residual vs fitted plot? (example in class) Recall: residuals = y ŷ (i.e. linear trend is removed from the model) any pattern (or trend) still present in residual vs fitted value plot suggests that the linear regression was not enough. Need to add quadratic (or other polynomial) terms in the equation (examples in class)

27 QUADRATIC REGRESSION Model: Y = α + β 1 X + β 2 X 2 + ɛ, note that p = 2 predictors (X, X 2 ) Assumptions: ɛ iid N(0, σ) Fitted Equation (output): ŷ = a + b 1 X + b 2 X 2 Interpret Coefficient: Only interpret the coefficient for the quadratic term. Is β 2 significantly different from zero? if yes - keep quadratic term - look for sign of b 2 (determines whether curvature opens up or down) if no - throw X 2 out - do SLR

28

29

30 Example: Suppose we are interested in predicting the GPA of students in college (CGPA) using 16 different predictor variables. Data were collected from a random sample of 59 college students. What is the response variable in this problem? What are the values of n and p? What are Ho and Ha that you can test using the ANOVA table? What is your decision, based on the following ANOVA table? What is your conclusion?

31 Regression Analysis: CGPA versus Height, Gender,... The regression equation is CGPA = Height Gender Haircut Job Studytime Smokecig Dated HSGPA HomeDist BrowseInternet WatchTV Exercise ReadNewsP Vegan PoliticalDegree PoliticalAff Predictor Coef SE Coef T P Constant Height Gender Haircut Job Studytime Smokecig Dated HSGPA HomeDist BrowseInternet WatchTV Exercise ReadNewsP Vegan PoliticalDegree PoliticalAff

32 S = R-Sq = 43.2% R-Sq(adj) = 21.5% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

33 Best Subsets Regression: CGPA versus Height, Gender,... Response is CGPA B o r l o i P w t o s i l S e R c i t S H I E e a t H u m o n W x a l i H G a d o m t a e d D c e e i y k D H e e t r N V e a i n r t e a S D r c c e e g l g d c J i c t G i n h i w g r A Mallows h e u o m i e P s e T s s a e f Vars R-Sq R-Sq(adj) Cp S t r t b e g d A t t V e P n e f X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

34 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

35 Regression Analysis: CGPA versus HSGPA, Exercise The regression equation is CGPA = HSGPA Exercise Predictor Coef SE Coef T P Constant HSGPA Exercise S = R-Sq = 31.6% R-Sq(adj) = 29.2% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

36

37 LOGISTIC REGRESSION Y = Categorical Response (Yes/No) or Binary Response (1 or 0) Example: Predict the probability that a person pay bills on time based on past credit history, income, employment, age, etc.. Example: Predict the probability that a person gets lung cancer based on smoking, family history, asthma, age, gender, race, eating habit, exercise habit, etc..

38 Logistic Regression Model: (with 1 predictor variable) p = exp(α + βx) 1 + exp(α + βx) Example: Whether a person has travel credit card. X = annual income (in thousand euros), y = (partial dataset..) income y if yes 0 if no

39 Link Function: Logit Response Information Variable Value Count y 1 31 (Event) 0 69 Total 100 Logistic Regression Table Predictor Coef SE Coef Z P Constant income

40 Interpretations: Annual income is a good predictor of probability of having a travel credit card the probability of having a travel credit card increases (because of the positive sign of the coefficient) with higher annual income.

41 Prediction Equation: ˆp = exp( X) 1 + exp( X) i.e. a = 3.52, b = predict the probability that person with annual income 12K (euros) has a travel credit card (answer: ˆp = 0.09) predict the probability that person with annual income 65K (euros) has a travel credit card (answer: ˆp = 0.97) the probability of having a travel credit card is 50% when X = a b = = (why?)

42 Multiple Logistic Regression: Example: Predict Marijuana use (Y/N) based on Alcohol use (Y/N) and cigarette smoking (Y/N) for HS seniors. Data: 2276 HS seniors in non-urban area outside Dayton, Ohio. Marijuana Cigarette Alcohol Frequency

43 Binary Logistic Regression: Marijuana versus Alcohol, Cigarette Link Function: Logit Response Information Variable Value Count Marijuana (Event) Total 2276 Frequency: Frequency Logistic Regression Table Predictor Coef SE Coef Z P Constant Alcohol Cigarette

44 Predict probability of using Marijuana if Alcohol use = Yes and Cigarette smoking = Yes ˆp = exp( ) 1 + exp( ) = Alcohol use = No and Cigarette smoking = Yes ˆp = exp( ) 1 + exp( ) = 0.079

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables 26.1 S 4 /IEE Application Examples: Multiple Regression An S 4 /IEE project was created to improve the 30,000-footlevel metric

More information

STAT 212 Business Statistics II 1

STAT 212 Business Statistics II 1 STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?) 12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Chapter 14 Multiple Regression Analysis

Chapter 14 Multiple Regression Analysis Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

STAT Chapter 10: Analysis of Variance

STAT Chapter 10: Analysis of Variance STAT 515 -- Chapter 10: Analysis of Variance Designed Experiment A study in which the researcher controls the levels of one or more variables to determine their effect on the variable of interest (called

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Section 5: Dummy Variables and Interactions

Section 5: Dummy Variables and Interactions Section 5: Dummy Variables and Interactions Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Example: Detecting

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

SMAM 314 Exam 42 Name

SMAM 314 Exam 42 Name SMAM 314 Exam 42 Name Mark the following statements True (T) or False (F) (10 points) 1. F A. The line that best fits points whose X and Y values are negatively correlated should have a positive slope.

More information

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Announcements Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall 2017 Duke University, Department of Statistical Science Work on your project! Due date- Sunday

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

MBA Statistics COURSE #4

MBA Statistics COURSE #4 MBA Statistics 51-651-00 COURSE #4 Simple and multiple linear regression What should be the sales of ice cream? Example: Before beginning building a movie theater, one must estimate the daily number of

More information

Section 4: Multiple Linear Regression

Section 4: Multiple Linear Regression Section 4: Multiple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 The Multiple Regression

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

22S39: Class Notes / November 14, 2000 back to start 1

22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

This document contains 3 sets of practice problems.

This document contains 3 sets of practice problems. P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them

More information

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression Models for Quantitative and Qualitative Predictors: An Overview Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

CIVL 7012/8012. Simple Linear Regression. Lecture 3

CIVL 7012/8012. Simple Linear Regression. Lecture 3 CIVL 7012/8012 Simple Linear Regression Lecture 3 OLS assumptions - 1 Model of population Sample estimation (best-fit line) y = β 0 + β 1 x + ε y = b 0 + b 1 x We want E b 1 = β 1 ---> (1) Meaning we want

More information

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1 Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

Advanced Regression Summer Statistics Institute. Day 2: MLR and Dummy Variables

Advanced Regression Summer Statistics Institute. Day 2: MLR and Dummy Variables Advanced Regression Summer Statistics Institute Day 2: MLR and Dummy Variables 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Correlation & Regression Chapter 5

Correlation & Regression Chapter 5 Correlation & Regression Chapter 5 Correlation: Do you have a relationship? Between two Quantitative Variables (measured on Same Person) (1) If you have a relationship (p

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Lecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012

Lecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012 Lecture Notes 2 Advanced Topics Econ 2050, Principles of Statistics Kevin R Foster, CCNY Spring 202 Endogenous Independent Variables are Invalid Need to have X causing Y not vice-versa or both! NEVER regress

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

27. SIMPLE LINEAR REGRESSION II

27. SIMPLE LINEAR REGRESSION II 27. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Models with qualitative explanatory variables p216

Models with qualitative explanatory variables p216 Models with qualitative explanatory variables p216 Example gen = 1 for female Row gpa hsm gen 1 3.32 10 0 2 2.26 6 0 3 2.35 8 0 4 2.08 9 0 5 3.38 8 0 6 3.29 10 0 7 3.21 8 0 8 2.00 3 0 9 3.18 9 0 10 2.34

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Applied Regression Analysis. Section 2: Multiple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression Applied Regression Analysis Section 2: Multiple Linear Regression 1 The Multiple Regression Model Many problems involve more than one independent variable or factor which affects the dependent or response

More information

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable, Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

Sociology 593 Exam 2 March 28, 2002

Sociology 593 Exam 2 March 28, 2002 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar Multiple Regression and Model Building 11.220 Lecture 20 1 May 2006 R. Ryznar Building Models: Making Sure the Assumptions Hold 1. There is a linear relationship between the explanatory (independent) variable(s)

More information

ANOVA: Analysis of Variation

ANOVA: Analysis of Variation ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical

More information

Ch 11- One Way Analysis of Variance

Ch 11- One Way Analysis of Variance Multiple Choice Questions Ch 11- One Way Analysis of Variance Use the following to solve questions 1 &. Suppose n = 8 and there are 4 groups, how many between groups (samples) degrees of freedom are there?

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

ACOVA and Interactions

ACOVA and Interactions Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA

More information

Answer Key: Problem Set 5

Answer Key: Problem Set 5 : Problem Set 5. Let nopc be a dummy variable equal to one if the student does not own a PC, and zero otherwise. i. If nopc is used instead of PC in the model of: colgpa = β + δ PC + β hsgpa + β ACT +

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information