Chapter 8 Multivariate Regression Analysis

Similar documents
Statistics for Economics & Business

Lecture 6: Introduction to Linear Regression

Soc 3811 Basic Social Statistics Third Midterm Exam Spring 2010

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Business and Economics

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Chapter 13: Multiple Regression

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Basic Business Statistics, 10/e

Chapter 14 Simple Linear Regression

Chapter 15 - Multiple Regression

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Introduction to Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

STATISTICS QUESTIONS. Step by Step Solutions.

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Dummy variables in multiple variable regression model

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Chapter 11: Simple Linear Regression and Correlation

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Lecture 3 Stat102, Spring 2007

/ n ) are compared. The logic is: if the two

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Negative Binomial Regression

Comparison of Regression Lines

28. SIMPLE LINEAR REGRESSION III

x = , so that calculated

x i1 =1 for all i (the constant ).

Statistics MINITAB - Lab 2

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

18. SIMPLE LINEAR REGRESSION III

a. (All your answers should be in the letter!

Chapter 8 Indicator Variables

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Chapter 9: Statistical Inference and the Relationship between Two Variables

STAT 3008 Applied Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Scatter Plot x

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Correlation and Regression

PBAF 528 Week Theory Is the variable s place in the equation certain and theoretically sound? Most important! 2. T-test

Sociology 301. Bivariate Regression II: Testing Slope and Coefficient of Determination. Bivariate Regression. Calculating Expected Values

January Examinations 2015

17 - LINEAR REGRESSION II

Basically, if you have a dummy dependent variable you will be estimating a probability.

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Economics 130. Lecture 4 Simple Linear Regression Continued

Chapter 5 Multilevel Models

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Properties of Least Squares

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 12 Analysis of Covariance

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Sociology 470. Bivariate Regression. Extra Points. Regression. Liying Luo Job talk on Thursday 11/3 at Pond 302

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Continuous vs. Discrete Goods

Learning Objectives for Chapter 11

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Diagnostics in Poisson Regression. Models - Residual Analysis

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

First Year Examination Department of Statistics, University of Florida

ANOVA. The Observations y ij

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

Topic 7: Analysis of Variance

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

The Ordinary Least Squares (OLS) Estimator

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

experimenteel en correlationeel onderzoek

Regression Analysis. Regression Analysis

Polynomial Regression Models

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Topic- 11 The Analysis of Variance

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

) is violated, so that V( instead. That is, the variance changes for at least some observations.

Linear Regression Analysis: Terminology and Notation

β0 + β1xi. You are interested in estimating the unknown parameters β

Regression. The Simple Linear Regression Model

Statistics II Final Exam 26/6/18

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Chapter 3 Describing Data Using Numerical Measures

Chapter 6. Supplemental Text Material

Transcription:

Chapter 8 Multvarate Regresson Analyss 8.3 Multple Regresson wth K Independent Varables 8.4 Sgnfcance tests of Parameters

Populaton Regresson Model For K ndependent varables, the populaton regresson and predcton models are: The prncples of bvarate regresson can be generalzed to a stuaton of several ndependent varables (predctors) of the dependent varable The sample predcton equaton s: K K X X X... K K X b X b X b a... ˆ K K X X X... ˆ

Predct number of chldren ever born () to the 008 GSS respondents (N=,906) as a lnear functon of educaton (X ), occup l prestge (X ), no. of sblngs (X 3 ), and age (X 4 ): ˆ.8.080X.00X.0678X 3. 035X 4 People wth more educaton and hgher-prestge jobs have fewer chldren, but older people and those rased n famles wth many sblngs have more chldren. Use the equaton to predct the expected number of kds by a person wth X = ; X = 40; X 3 = 8; X 4 = 55: ˆ.8.080().00(40).067(8).035(55) For X = 6; X = 70; X 3 = ; X 4 = 5: ˆ.8.080(6).00(70).067().035(5)

OLS Estmaton of Coeffcents As wth bvarate regresson, the computer uses Ordnary Least Squares methods to estmate the ntercept (a), slopes (b X ), and multple coeffcent of determnaton (R ) from sample data. OLS estmators mnmze the sum of squared errors for the lnear predcton: mn e See SSDA#4 Boxes 8. and 8.3 for detals of best lnear unbased estmator (BLUE) characterstcs and the dervatons of OLS estmators for the ntercept a and slope b

Nested Equatons A set of nested regresson equatons successvely adds more predctors to an equaton to observe changes n ther slopes wth the dependent varable Predctng chldren ever born () by addng educaton (X ); occupatonal prestge (X ); sblngs (X 3 ); age (X 4 ). (Standard errors n parentheses) () () (3) (4) ˆ 3.606 0.4 X R (0.65) (.0) ˆ 0.05 3.473 0.33X 0.006X R 0.05 (0.73) (.04) (.003) ˆ.865 0.09X 0.006X 0.073X 3 R 0.066 (0.99) (.05) (.003) (.0) ˆ.8 0.080X 0.00X 0.067X 3 0.035X 4 R 0.93 (0.) (.04) (.003) (.0) (.00)

F-test for The hypothess par for the multple H0 : ρ 0 coeffcent of determnaton remans the same as n the bvarate case: H : ρ 0 But the F-test must also adjust the sample estmate of R for the df assocated wth the K predctors: F K, NK MS REGRESSION MS ERROR ( R R / K ) /( N K ) As you enter more predctors nto the equaton n an effort to pump up your R, you must pay the hgher cost of an addtonal df per predctor to get that result.

Test the null hypothess H 0 : = 0 for Equaton 3: Source SS df MS F Regresson 354.7 Error 5,0. Total 5,365.8 --------------------- df R, df E c.v..05 3,.60.0 3, 3.78.00 3, 5.4 Decson about H 0 : Prob. Type I error: Concluson:

Dfference n for Nested Equatons We can also test whether addng predctors to a second, nested regresson equaton ncreases : 0 ρ ρ : H 0 ρ ρ : H 0 ) ) /( ( ) ( / ) ( ) ),( ( K N R K K R R F K N K K where subscrpts and refer to the equatons wth fewer and more predctors, respectvely The F-statstc tests whether addng predctors ncreases the populaton rho-square, relatve to the dfference n the two nested equatons degrees of freedom:

Is the for Eq. larger than the for Eq.? F ( R R ) / ( K K) ( R ) /( N K ) ( ),(648 ) df R, df E c.v..05, 3.84.0, 6.63.00, 0.83 Decson: Prob. Type I error: Interpretaton: Addng occupaton to the regresson equaton wth educaton dd not sgnfcantly ncrease the explaned varance n number of chldren ever born. In the populaton, the two coeffcents of determnaton are equal; each explans about 5% of the varance of.

Now test the dfference n for Eq. 4 versus Eq. 3: ( R4 R3 ) / ( K4 K3) F( 4 3),(648 4 ) df R, df E c.v. ( R ) /( N K ) 4 4.05.0.00, 3.84, 6.63, 0.83 Decson: Prob. Type I error: Interpretaton: Addng age to the regresson equaton wth three other predctors greatly ncreases the explaned varance n number of chldren ever born. The coeffcent of determnaton for equaton #4 seems to be almost three tmes larger than for equaton #3.

Adjustng R for K predctors The meanng of the multple regresson coeffcent of determnaton s dentcal to the bvarate case: R X ( ) ( ( ) ˆ ) R X SS SS SS TOTAL TOTAL ERROR SS SS REGRESSION TOTAL However, when you report the sample estmate of a multple regresson R, you must adjust ts value by degree of freedom for each of the K predctors: R adj R ( K)( R ) ( N K ) For large sample N and low R, not much wll change.

Adjust the sample R for each of the four nested equatons (N =,906): Eq. R K Adj. R : 0.05 : 0.05 3: 0.066 3 4: 0.93 4

Here are those four nested regresson equatons agan wth the number of ever-born chldren as the dependent varable. Now we ll examne ther regresson slopes. Predct chldren ever born () by addng educaton (X ); occupatonal prestge (X ); sblngs (X 3 ); age (X 4 ) (Standard errors n parentheses) () () (3) (4) ˆ 3.606 0.4 X R (0.65) (.0) ˆ 0.05 3.473 0.33X 0.006X R 0.05 (0.73) (.04) (.003) ˆ.865 0.09X 0.006X 0.073X 3 R 0.066 (0.99) (.05) (.003) (.0) ˆ.8 0.080X 0.00X 0.067X 3 0.035X 4 R 0.93 (0.) (.04) (.003) (.0) (.00)

Interpretng Nested b yx The multple regresson slopes are partal or net effects. When other ndependent varables are statstcally held constant, the sze of b X often decreases. These changes occur f predctor varables are correlated wth each other as well as wth the dependent varable. Two correlated predctors dvde ther jont mpact on the dependent varable between both b yx coeffcents. For example, age and educaton are negatvely correlated (r = -.7): older people have less schoolng. When age was entered nto equaton #4, the net effect of educaton on number of chldren decreased from b = -.4 to b = -.080. So, controllng for respondent s age, an addtonal year of educaton decreases the number of chldren ever born by a much smaller amount.

t-test for Hypotheses about t-test for hypotheses about K predctors uses famlar procedures A hypothess par about the populaton H 0 : β j regresson coeffcent for jth predctor could have a two-taled hypothess: H : β j 0 0 Or, a hypothess par could ndcate the researcher s expected drecton (sgn) of the regresson slope: H H 0 : : β β j j 0 0 Testng an hypothess about j uses a t-test wth N-K- degrees of freedom (.e., a Z-test for a large sample) t N-K- b j s β b j j where b j s the sample regresson coeffcent & denomnator s the standard error of the samplng dstrbuton of j (see formula n SSDA#4, p. 66)

Here are two hypotheses, about educaton ( ) and occupatonal prestge ( ), to be tested usng Eq, 4: Test a two-tal hypothess about : t 648-4- -tal -tal.05.65.96.0.33.58.00 3.0 3.30 Decson: Prob. Type I error: Test a one-tal hypothess about : t 648-4- Decson: Prob. Type I error:

Test one-taled hypotheses about expected postve effects sblngs ( 3 ) and age ( 4 ) on number of chldren ever born: t 648-4- Decson: Prob. Type I error: t 648-4- Decson: Prob. Type I error: Interpretaton: These sample regresson statstcs are very unlkely to come from a populaton whose regresson parameters are zero ( j = 0).

Standardzng regresson slopes (*) Comparng effects of predctors on a dependent varable s dffcult, due to dfferences n unts of measurement Beta coeffcent (*) ndcates effect of an X predctor on the dependent varable n standard devaton unts * X b X s s X. Multply the b X for each X by that predctor s standard devaton. Dvde by the standard devaton of the dependent varable, The result s a standardzed regresson equaton, wrtten wth Z-score predctors, but no ntercept term: Zˆ * Z * Z... * Z K K

ˆ Standardze the regresson coeffcents n Eq. 4.8 0.080X 0.00X 0.07X 3 0. 035X 4 Use these stnd. devs. to change all the b X to *: Varable s.d. ( X ) : * X 3.08.080.70 0 Chldren.70 X Educ. 3.08 ( X ) : * X 3.89 0.00.70 X Occup. 3.89 X 3 Sbs 3.9 ( X 3 ) : * X 3 3.9 0.067.70 X 4 Age 7.35 ( X 4 ) : * X 4 7.35 0.035.70 Wrte the standardzed Zˆ 0.4Z 0.0Z 0.3Z3 0. 36Z4 equaton:

Interpretng * Standardzng regresson slopes transforms predctors effects on the dependent varable from ther orgnal measurement unts nto standard-devaton unts. Hence, you must nterpret and compare the * effects n standardzed terms: Educaton * = -0.4 a -standard devaton dfference n educaton levels reduces the number of chldren born by one-seventh st. dev. Occupatonal * = -0.0 a -standard devaton dfference n prestge reduces N of chldren born by one-hundredth st. dev. Sblngs * = +0.3 a -standard devaton dfference n sblngs ncreases the number of chldren born by one-eghth st. dev. Age * = +0.36 a -standard devaton dfference n age ncreases the number of chldren born by more than one-thrd st. dev. Thus, age has the largest effect on number of chldren ever born; occupaton has the smallest mpact (and t s not sgnfcant)

Let s nterpret a standardzed regresson, where annual church attendance s regressed on X = relgous ntensty (a 4-pont scale), X = age, and X 3 = educaton: ˆ 0..3X 0.X 0.09X 3 Radj (3.05) (0.50) (0.03) (0.7) The standardzed regresson equaton: ˆ Z 0.50Z 0.08Z 0. 0Z 3 0.69 Interpretatons: Only two predctors sgnfcantly ncrease church attendance The lnear relatons explan 6.9% of attendance varance Relgous ntensty has strongest effect (/ std. devaton) Age effect on attendance s much smaller (/ std. dev.)

Dummy Varables n Regresson Many mportant socal varables are not contnuous but measured as dscrete categores and thus cannot be used as ndependent varables wthout recodng Examples of such varables nclude gender, race, relgon, martal status, regon, smokng, drug use, unon membershp, socal class, college graduaton Dummy varable coded to ndcate the presence of an attrbute and 0 ts absence. Create & name one dummy varable for each of the K categores of the orgnal dscrete varable. For each dummy varable, code a respondent f s/he has that attrbute, 0 f lackng that attrbute 3. Every respondent wll have a for only one dummy, and 0 for the K- other dummy varables

GSS codes for SEX are arbtrary: = Men & = Women Recode SEX as two new dummes MALE FEMALE = Men 0 = Women 0 MARITAL fve categores from = Marred to 5 = Never MARITAL MARRD WIDOWD DIVORCD SEPARD NEVERD = Marred 0 0 0 0 = Wdowed 0 0 0 0 3 = Dvorced 0 0 0 0 4 = Separated 0 0 0 0 5 = Never 0 0 0 0

SPSS RECODE to create K dummy varables (-0) from MARITAL The ORIGINAL 008 GSS FREQUENCIES: Vald Mssng Total MARRIED WIDOWED 3 DIVORCED 4 SEPARATED 5 NEVER MARRIED Total 9 NA martal MARITAL STATUS Cumulatv e Frequency Percent Vald Perc ent Percent 97 48.0 48. 48. 64 8. 8. 56.3 8 3.9 3.9 70. 70 3.5 3.5 73.7 53 6. 6. 3 00.0 08 99. 8 00.0 5. 03 00.0 Every case s coded on one dummy varable and 0 on the other four dummes. The MARITAL category frequences above appear n the row for the fve martal status dummy varables below: RECODE STATEMENTS: COMPUTE marryd=0. COMPUTE wdowd=0. COMPUTE dvord=0. COMPUTE separd=0. COMPUTE neverd=0. IF (martal EQ ) marryd=. IF (martal EQ ) wdowd=. IF (martal EQ 3) dvord=. IF (martal EQ 4) separd=. IF (martal EQ 5) neverd=. RECODE MARRD WIDOWD DIVORD SEPARD NEVERD 97 64 8 70 53 0,046,854,737,948,487 TOTAL,08,08,08,08,08

Lnear Dependency among Dummes Gven K dummy varables, f you know a respondent s codes for K - dummes, then you also know that person s code for the Kth dummy! Ths lnear dependency s smlar to the degrees of freedom problem n ANOVA. Thus, to use a set of K dummy varables as predctors n a multple regresson equaton, you must omt one of them. Only K- dummes can be used n an equaton. The omtted dummy category serves as the reference category (or baselne), aganst whch to nterpret the K- dummy varable effects (b) on the dependent varable

Use four of the fve martal status dummy varables to predct annual sex frequency n 008 GSS. WIDOWD s the omtted dummy, servng as the reference category. ˆ 8.8 5.4 3.8. 53.0 DMARR DDIV DSEP DNEVER Radj (5.5) (6.0) (6.9) (0.3) (6.3) 0.054 Wdows are coded 0 on all four dummes, so ther predcton s: Marred: Dvorced: Separated: Never: ˆ 8.8 5.4 (0) 3.8 (0).(0) 53.0 (0) per year ˆ 8.8 5.4 () 3.8 (0).(0) 53.0 (0) per year ˆ 8.8 5.4 (0) 3.8 ().(0) 53.0 (0) per year ˆ ˆ 8.8 5.4 (0) 3.8 (0).() 53.0 (0) per year 8.8 5.4 (0) 3.8 (0).(0) 53.0 () per year Whch persons are the least sexually actvty? Whch the most?

ANCOVA Analyss of Covarance (ANCOVA) equaton has both dummy varable and contnuous predctors of a dependent varable Martal status s hghly correlated wth age (wdows are older, never marreds are younger), and annual sex actvty falls off steadly as people get older. Look what happens to the martal effects when age s controlled, by addng AGE to the martal status predctors of sex frequency: ˆ 7. 5.5 0. 3.4 0.4.7 DMARR DDIV DSEP DNEVER X AGE Radj (9.) (6.) (6.9) (0.) (7.) (0.) 0.7 Each year of age reduces sex by.7 tmes per year. Among people of same age, marreds have more sex than others, but never marreds now have less sex than wdows! What would you predct for: Never marreds aged? Marreds aged 40? Wdows aged 70?

Add FEMALE dummy to regresson of church attendance on X = relgous ntensty, X = age, and X 3 = educaton: ˆ 0.9.96X 0.0X 0.09X 3.0DFEM Radj (3.06) (0.50) (0.03) (0.7) (.05) 0.70 The standardzed regresson equaton: Zˆ 0.49Z 0. 04D 0.08Z 0.0Z 3 FEM Interpretatons: Women attend church.0 tmes more per year than men Other predctors effects unchanged when gender s added Age effect s twce as larger as gender effect Relgous ntensty remans strongest predctor of attendance