( ), which of the coefficients would end

Similar documents
Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Practical Biostatistics

Review of Multiple Regression

Categorical Predictor Variables

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Simple Linear Regression

ECON 497 Midterm Spring

General Linear Model (Chapter 4)

Chapter 4: Regression Models

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Correlation and simple linear regression S5

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

22s:152 Applied Linear Regression

Chapter 9 - Correlation and Regression

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Sociology 593 Exam 2 March 28, 2002

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Sociology 593 Exam 2 Answer Key March 28, 2002

Multiple OLS Regression

Inferences for Regression

SPSS LAB FILE 1

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Chapter 8: Regression Models with Qualitative Predictors

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Multiple linear regression

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Basic Business Statistics 6 th Edition

Design of Engineering Experiments Chapter 5 Introduction to Factorials

Regression and Models with Multiple Factors. Ch. 17, 18

Chapter 13. Multiple Regression and Model Building

Ordinary Least Squares Regression Explained: Vartanian

Lecture 10 Multiple Linear Regression

Multiple linear regression S6

REVIEW 8/2/2017 陈芳华东师大英语系

Chapter 10-Regression

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Midterm 2 - Solutions

Section 5: Dummy Variables and Interactions

Workshop 7.4a: Single factor ANOVA

Bivariate Regression Analysis. The most useful means of discerning causality and significance of variables

Two-Way ANOVA. Chapter 15

y response variable x 1, x 2,, x k -- a set of explanatory variables

Daniel Boduszek University of Huddersfield

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Regression Analysis: Exploring relationships between variables. Stat 251

Interactions between Binary & Quantitative Predictors

Regression Models - Introduction

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Chapter 14 Student Lecture Notes 14-1

Final Exam - Solutions

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Lecture 4: Multivariate Regression, Part 2

Chapter 4. Regression Models. Learning Objectives

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

28. SIMPLE LINEAR REGRESSION III

1. Define the following terms (1 point each): alternative hypothesis

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

Basic Business Statistics, 10/e

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Simple Linear Regression: One Qualitative IV

Design & Analysis of Experiments 7E 2009 Montgomery

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STATISTICS. Multiple regression

Chapter 7 Linear Regression

4/22/2010. Test 3 Review ANOVA

WORKSHOP 3 Measuring Association

FREC 608 Guided Exercise 9

Multiple Regression: Chapter 13. July 24, 2015

Inference for Regression Simple Linear Regression

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Regression. Notes. Page 1. Output Created Comments 25-JAN :29:55

Introduction to Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

9. Linear Regression and Correlation

STA102 Class Notes Chapter Logistic Regression

Econometrics. 5) Dummy variables

Sociology 593 Exam 1 February 14, 1997

Longitudinal Data Analysis of Health Outcomes

FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008)

THE PEARSON CORRELATION COEFFICIENT

Section 5.4 Residuals

Chapter 5 Friday, May 21st

Unit 6 - Simple linear regression

Lab 10 - Binary Variables

Regression Models for Quantitative and Qualitative Predictors: An Overview

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Interactions and Factorial ANOVA

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Transcription:

Discussion Sheet 29.7.9 Qualitative Variables We have devoted most of our attention in multiple regression to quantitative or numerical variables. MR models can become more useful and complex when we consider qualitative variables variables that represent a category such as male/female. These variables can be numerically coded with and and entered into an MR model just as we would any other variable. This allows for more powerful models.. Consider the following data set: x Category y. 2.. 5. 2. -. 2. 8.. -4... 4. -7. 4. 4. Here is the scatterplot of the standardized residuals versus y for using x to predict y:.5 Scatterplot Dependent Variable: Y Regression Sta ndardized Residual..5. -.5 -. -.5-2 Do the residuals look random to you? Y 2. Here is the ANOVA table for this model: Model Sum of SquaresdfMean SquareF Sig. Regression.... Residual 78. 6 6. Total 78. 7 Does this model look like x is a significant help to predicting y? Is the model significant at the α =.5 level?. Here is the coefficient information for this model Unstandardized Coefficients Standardized Coefficientst Sig. Model B Std. ErrorBeta Constant.5 6.874.59.629 X. 2.5... If you were to perform hypotheses tests of H : β i = i =, up as significantly different than to indicate the variables are useful to the model?, which of the coefficients would end

4. Here is the ANOVA table for an MR model using x and Category to predict y: ANOVA Model Sum of df Mean F Sig. Squares Square Regression 288. 2 44. 8..28 Residual 9. 5 8. Total 78. 7 Would you say this model is significant at the α =.5 level? Calculate its coefficient of determination? Does this seem large enough to indicate the model is helpful? 5. Here is a plot of the residuals versus y for the model above:.5 Scatterplot Dependent Variable: Y Regression Sta ndardized Residual..5. -.5 -. -.5-2 Y Do the residuals seem random to you? What, if any, systematic pattern do they suggest? 6. Here is the information on the coefficients for the above model: Coefficients Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta Constant -2.5.969 -.6.556 X..42... CATEGORY 2...87 4.. If you were to perform hypotheses tests of H : β i = i =,,2, which of the coefficients would end up as significantly different than to indicate the variables are useful to the model? 2

Qualitative Variables There are times when we have categorical information that we would like to include into a regression model. These may be categories that imply no order such as Male/Female or they may be categories that have an implied order such as Freshmen/Sophomores/Juniors/Seniors. This kind of information can be incorporated into an MR model by using quantitative or dummy variables. Each such variable represents a one category and equals for an observation in that category and otherwise. For example, if female x = if male is a way of representing male/female information in a regression model. Since the Freshmen/Sophomore etc. information represents ordered category we could use a variable with values,2, and 4 to capture this order. This is only done is the categories have some natural, theoretically significant order. In general, for k categories with no implied order we use k / variables as follows: if observation is in category i x i = i =,K,k otherwise We would thus have a k string of s and s to indicate which category an observation fell into. What about an observation that fell into category k? It would not need its own variable since it could be represented by all s on the k qualitative variables indicating that it was not in any of the other categories. 7. Category in the data above seems to be such a quantitative variable. Comparing the model with it and without it, what does it seem to do for the model?

Qualitative Variables in Regression Models A single qualitative variable if in Category x= otherwise in a regression model with no other variables would make the difference between a model y = β + ε and another model y = β + β x + ε. If we found the prediction equations for the two variables they would be yˆ = βˆ and yˆ = βˆ + βˆ x. Since x is either or we would have when x = in the second model yˆ = βˆ + βˆ x = βˆ + βˆ = βˆ. When x = in the second model yˆ = βˆ + βˆ x = βˆ + βˆ = βˆ + βˆ. Since βˆ and βˆ are constants, the difference between x = and x = is that there are two horizontal lines. When x = we have the line yˆ = βˆ. When x = we have the line yˆ = βˆ + βˆ, a different constant. The MR model then becomes that of two parallel horizontal lines. Suppose we had the same qualitative variable and another variable, x, that was quantitative. Then the model y = β + β x + β 2 x + ε with prediction equation yˆ = βˆ + βˆ x + βˆ 2 x when x = would represent yˆ = βˆ + βˆ + βˆ 2 x = βˆ + βˆ 2 x and when x = would represent yˆ = βˆ + βˆ + βˆ 2 x = βˆ + βˆ + βˆ 2 x. That is, depending on whether x = yˆ = βˆ + βˆ 2 x or x = yˆ = βˆ + βˆ + βˆ 2 x the model would represent two parallel non-horizontal lines assuming βˆ with different intercepts. Suppose we had the same qualitative variable and another variable, x, that was quantitative. Then the model y = β + β x + β 2 x + β xx + ε with prediction equation yˆ = βˆ + βˆ x + βˆ 2 x + βˆ xx when x = would represent yˆ = βˆ + βˆ + βˆ 2 x + βˆ x = βˆ + βˆ 2 x and when x = would represent yˆ = βˆ + βˆ + βˆ x + βˆ x = βˆ + βˆ + βˆ + βˆ x. That is, depending on whether x = 2 2 yˆ = βˆ + βˆ 2 x or x = yˆ = βˆ + βˆ + βˆ 2 + βˆ x the model would represent two non-parallel lines assuming βˆ with different intercepts assuming βˆ. Higher Order and Interaction Terms This process can be continued for quadratic terms and beyond. Terms like xx are called interaction terms and when one of the variables is a qualitative variable, it acts as a toggle switch to turn on and off differences in the intercept, slope, or both or even more with higher order terms. 8. What do you think would happen to the data above if we used a model of y = β + βcategory + β 2 x + β x Category + ε 9. Here is the ANOVA table from such a model versus y. Model Sum of SquaresdfMean SquareFSig. Regression78. 26... Residual. 4. Total 78. 7 Does it seem that including the interaction term was helpful in improving the model? What is the new coefficient of determination? Is the model significant at the α =.5 level Careful!? 4

. Here is the coefficient information for this new model: Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta Constant 5.... X -.. -.488.. CATEGORY -.. -.28.. x * category 6...29.. Which of the variables is significant in this model careful again?. There is no plot of residuals since all residuals were that is, the model fits extremely well. Judging from the model equation, what would be the graph of the model that was just fitted to these data? Suggested Homework:.,.4 Solutions to be Posted:.,.4 5