Chapter 8 Multvarate Regresson Analyss 8.3 Multple Regresson wth K Independent Varables 8.4 Sgnfcance tests of Parameters
Populaton Regresson Model For K ndependent varables, the populaton regresson and predcton models are: The prncples of bvarate regresson can be generalzed to a stuaton of several ndependent varables (predctors) of the dependent varable The sample predcton equaton s: K K X X X... K K X b X b X b a... ˆ K K X X X... ˆ
Predct number of chldren ever born () to the 008 GSS respondents (N=,906) as a lnear functon of educaton (X ), occup l prestge (X ), no. of sblngs (X 3 ), and age (X 4 ): ˆ.8.080X.00X.0678X 3. 035X 4 People wth more educaton and hgher-prestge jobs have fewer chldren, but older people and those rased n famles wth many sblngs have more chldren. Use the equaton to predct the expected number of kds by a person wth X = ; X = 40; X 3 = 8; X 4 = 55: ˆ.8.080().00(40).067(8).035(55) For X = 6; X = 70; X 3 = ; X 4 = 5: ˆ.8.080(6).00(70).067().035(5)
OLS Estmaton of Coeffcents As wth bvarate regresson, the computer uses Ordnary Least Squares methods to estmate the ntercept (a), slopes (b X ), and multple coeffcent of determnaton (R ) from sample data. OLS estmators mnmze the sum of squared errors for the lnear predcton: mn e See SSDA#4 Boxes 8. and 8.3 for detals of best lnear unbased estmator (BLUE) characterstcs and the dervatons of OLS estmators for the ntercept a and slope b
Nested Equatons A set of nested regresson equatons successvely adds more predctors to an equaton to observe changes n ther slopes wth the dependent varable Predctng chldren ever born () by addng educaton (X ); occupatonal prestge (X ); sblngs (X 3 ); age (X 4 ). (Standard errors n parentheses) () () (3) (4) ˆ 3.606 0.4 X R (0.65) (.0) ˆ 0.05 3.473 0.33X 0.006X R 0.05 (0.73) (.04) (.003) ˆ.865 0.09X 0.006X 0.073X 3 R 0.066 (0.99) (.05) (.003) (.0) ˆ.8 0.080X 0.00X 0.067X 3 0.035X 4 R 0.93 (0.) (.04) (.003) (.0) (.00)
F-test for The hypothess par for the multple H0 : ρ 0 coeffcent of determnaton remans the same as n the bvarate case: H : ρ 0 But the F-test must also adjust the sample estmate of R for the df assocated wth the K predctors: F K, NK MS REGRESSION MS ERROR ( R R / K ) /( N K ) As you enter more predctors nto the equaton n an effort to pump up your R, you must pay the hgher cost of an addtonal df per predctor to get that result.
Test the null hypothess H 0 : = 0 for Equaton 3: Source SS df MS F Regresson 354.7 Error 5,0. Total 5,365.8 --------------------- df R, df E c.v..05 3,.60.0 3, 3.78.00 3, 5.4 Decson about H 0 : Prob. Type I error: Concluson:
Dfference n for Nested Equatons We can also test whether addng predctors to a second, nested regresson equaton ncreases : 0 ρ ρ : H 0 ρ ρ : H 0 ) ) /( ( ) ( / ) ( ) ),( ( K N R K K R R F K N K K where subscrpts and refer to the equatons wth fewer and more predctors, respectvely The F-statstc tests whether addng predctors ncreases the populaton rho-square, relatve to the dfference n the two nested equatons degrees of freedom:
Is the for Eq. larger than the for Eq.? F ( R R ) / ( K K) ( R ) /( N K ) ( ),(648 ) df R, df E c.v..05, 3.84.0, 6.63.00, 0.83 Decson: Prob. Type I error: Interpretaton: Addng occupaton to the regresson equaton wth educaton dd not sgnfcantly ncrease the explaned varance n number of chldren ever born. In the populaton, the two coeffcents of determnaton are equal; each explans about 5% of the varance of.
Now test the dfference n for Eq. 4 versus Eq. 3: ( R4 R3 ) / ( K4 K3) F( 4 3),(648 4 ) df R, df E c.v. ( R ) /( N K ) 4 4.05.0.00, 3.84, 6.63, 0.83 Decson: Prob. Type I error: Interpretaton: Addng age to the regresson equaton wth three other predctors greatly ncreases the explaned varance n number of chldren ever born. The coeffcent of determnaton for equaton #4 seems to be almost three tmes larger than for equaton #3.
Adjustng R for K predctors The meanng of the multple regresson coeffcent of determnaton s dentcal to the bvarate case: R X ( ) ( ( ) ˆ ) R X SS SS SS TOTAL TOTAL ERROR SS SS REGRESSION TOTAL However, when you report the sample estmate of a multple regresson R, you must adjust ts value by degree of freedom for each of the K predctors: R adj R ( K)( R ) ( N K ) For large sample N and low R, not much wll change.
Adjust the sample R for each of the four nested equatons (N =,906): Eq. R K Adj. R : 0.05 : 0.05 3: 0.066 3 4: 0.93 4
Here are those four nested regresson equatons agan wth the number of ever-born chldren as the dependent varable. Now we ll examne ther regresson slopes. Predct chldren ever born () by addng educaton (X ); occupatonal prestge (X ); sblngs (X 3 ); age (X 4 ) (Standard errors n parentheses) () () (3) (4) ˆ 3.606 0.4 X R (0.65) (.0) ˆ 0.05 3.473 0.33X 0.006X R 0.05 (0.73) (.04) (.003) ˆ.865 0.09X 0.006X 0.073X 3 R 0.066 (0.99) (.05) (.003) (.0) ˆ.8 0.080X 0.00X 0.067X 3 0.035X 4 R 0.93 (0.) (.04) (.003) (.0) (.00)
Interpretng Nested b yx The multple regresson slopes are partal or net effects. When other ndependent varables are statstcally held constant, the sze of b X often decreases. These changes occur f predctor varables are correlated wth each other as well as wth the dependent varable. Two correlated predctors dvde ther jont mpact on the dependent varable between both b yx coeffcents. For example, age and educaton are negatvely correlated (r = -.7): older people have less schoolng. When age was entered nto equaton #4, the net effect of educaton on number of chldren decreased from b = -.4 to b = -.080. So, controllng for respondent s age, an addtonal year of educaton decreases the number of chldren ever born by a much smaller amount.
t-test for Hypotheses about t-test for hypotheses about K predctors uses famlar procedures A hypothess par about the populaton H 0 : β j regresson coeffcent for jth predctor could have a two-taled hypothess: H : β j 0 0 Or, a hypothess par could ndcate the researcher s expected drecton (sgn) of the regresson slope: H H 0 : : β β j j 0 0 Testng an hypothess about j uses a t-test wth N-K- degrees of freedom (.e., a Z-test for a large sample) t N-K- b j s β b j j where b j s the sample regresson coeffcent & denomnator s the standard error of the samplng dstrbuton of j (see formula n SSDA#4, p. 66)
Here are two hypotheses, about educaton ( ) and occupatonal prestge ( ), to be tested usng Eq, 4: Test a two-tal hypothess about : t 648-4- -tal -tal.05.65.96.0.33.58.00 3.0 3.30 Decson: Prob. Type I error: Test a one-tal hypothess about : t 648-4- Decson: Prob. Type I error:
Test one-taled hypotheses about expected postve effects sblngs ( 3 ) and age ( 4 ) on number of chldren ever born: t 648-4- Decson: Prob. Type I error: t 648-4- Decson: Prob. Type I error: Interpretaton: These sample regresson statstcs are very unlkely to come from a populaton whose regresson parameters are zero ( j = 0).
Standardzng regresson slopes (*) Comparng effects of predctors on a dependent varable s dffcult, due to dfferences n unts of measurement Beta coeffcent (*) ndcates effect of an X predctor on the dependent varable n standard devaton unts * X b X s s X. Multply the b X for each X by that predctor s standard devaton. Dvde by the standard devaton of the dependent varable, The result s a standardzed regresson equaton, wrtten wth Z-score predctors, but no ntercept term: Zˆ * Z * Z... * Z K K
ˆ Standardze the regresson coeffcents n Eq. 4.8 0.080X 0.00X 0.07X 3 0. 035X 4 Use these stnd. devs. to change all the b X to *: Varable s.d. ( X ) : * X 3.08.080.70 0 Chldren.70 X Educ. 3.08 ( X ) : * X 3.89 0.00.70 X Occup. 3.89 X 3 Sbs 3.9 ( X 3 ) : * X 3 3.9 0.067.70 X 4 Age 7.35 ( X 4 ) : * X 4 7.35 0.035.70 Wrte the standardzed Zˆ 0.4Z 0.0Z 0.3Z3 0. 36Z4 equaton:
Interpretng * Standardzng regresson slopes transforms predctors effects on the dependent varable from ther orgnal measurement unts nto standard-devaton unts. Hence, you must nterpret and compare the * effects n standardzed terms: Educaton * = -0.4 a -standard devaton dfference n educaton levels reduces the number of chldren born by one-seventh st. dev. Occupatonal * = -0.0 a -standard devaton dfference n prestge reduces N of chldren born by one-hundredth st. dev. Sblngs * = +0.3 a -standard devaton dfference n sblngs ncreases the number of chldren born by one-eghth st. dev. Age * = +0.36 a -standard devaton dfference n age ncreases the number of chldren born by more than one-thrd st. dev. Thus, age has the largest effect on number of chldren ever born; occupaton has the smallest mpact (and t s not sgnfcant)
Let s nterpret a standardzed regresson, where annual church attendance s regressed on X = relgous ntensty (a 4-pont scale), X = age, and X 3 = educaton: ˆ 0..3X 0.X 0.09X 3 Radj (3.05) (0.50) (0.03) (0.7) The standardzed regresson equaton: ˆ Z 0.50Z 0.08Z 0. 0Z 3 0.69 Interpretatons: Only two predctors sgnfcantly ncrease church attendance The lnear relatons explan 6.9% of attendance varance Relgous ntensty has strongest effect (/ std. devaton) Age effect on attendance s much smaller (/ std. dev.)
Dummy Varables n Regresson Many mportant socal varables are not contnuous but measured as dscrete categores and thus cannot be used as ndependent varables wthout recodng Examples of such varables nclude gender, race, relgon, martal status, regon, smokng, drug use, unon membershp, socal class, college graduaton Dummy varable coded to ndcate the presence of an attrbute and 0 ts absence. Create & name one dummy varable for each of the K categores of the orgnal dscrete varable. For each dummy varable, code a respondent f s/he has that attrbute, 0 f lackng that attrbute 3. Every respondent wll have a for only one dummy, and 0 for the K- other dummy varables
GSS codes for SEX are arbtrary: = Men & = Women Recode SEX as two new dummes MALE FEMALE = Men 0 = Women 0 MARITAL fve categores from = Marred to 5 = Never MARITAL MARRD WIDOWD DIVORCD SEPARD NEVERD = Marred 0 0 0 0 = Wdowed 0 0 0 0 3 = Dvorced 0 0 0 0 4 = Separated 0 0 0 0 5 = Never 0 0 0 0
SPSS RECODE to create K dummy varables (-0) from MARITAL The ORIGINAL 008 GSS FREQUENCIES: Vald Mssng Total MARRIED WIDOWED 3 DIVORCED 4 SEPARATED 5 NEVER MARRIED Total 9 NA martal MARITAL STATUS Cumulatv e Frequency Percent Vald Perc ent Percent 97 48.0 48. 48. 64 8. 8. 56.3 8 3.9 3.9 70. 70 3.5 3.5 73.7 53 6. 6. 3 00.0 08 99. 8 00.0 5. 03 00.0 Every case s coded on one dummy varable and 0 on the other four dummes. The MARITAL category frequences above appear n the row for the fve martal status dummy varables below: RECODE STATEMENTS: COMPUTE marryd=0. COMPUTE wdowd=0. COMPUTE dvord=0. COMPUTE separd=0. COMPUTE neverd=0. IF (martal EQ ) marryd=. IF (martal EQ ) wdowd=. IF (martal EQ 3) dvord=. IF (martal EQ 4) separd=. IF (martal EQ 5) neverd=. RECODE MARRD WIDOWD DIVORD SEPARD NEVERD 97 64 8 70 53 0,046,854,737,948,487 TOTAL,08,08,08,08,08
Lnear Dependency among Dummes Gven K dummy varables, f you know a respondent s codes for K - dummes, then you also know that person s code for the Kth dummy! Ths lnear dependency s smlar to the degrees of freedom problem n ANOVA. Thus, to use a set of K dummy varables as predctors n a multple regresson equaton, you must omt one of them. Only K- dummes can be used n an equaton. The omtted dummy category serves as the reference category (or baselne), aganst whch to nterpret the K- dummy varable effects (b) on the dependent varable
Use four of the fve martal status dummy varables to predct annual sex frequency n 008 GSS. WIDOWD s the omtted dummy, servng as the reference category. ˆ 8.8 5.4 3.8. 53.0 DMARR DDIV DSEP DNEVER Radj (5.5) (6.0) (6.9) (0.3) (6.3) 0.054 Wdows are coded 0 on all four dummes, so ther predcton s: Marred: Dvorced: Separated: Never: ˆ 8.8 5.4 (0) 3.8 (0).(0) 53.0 (0) per year ˆ 8.8 5.4 () 3.8 (0).(0) 53.0 (0) per year ˆ 8.8 5.4 (0) 3.8 ().(0) 53.0 (0) per year ˆ ˆ 8.8 5.4 (0) 3.8 (0).() 53.0 (0) per year 8.8 5.4 (0) 3.8 (0).(0) 53.0 () per year Whch persons are the least sexually actvty? Whch the most?
ANCOVA Analyss of Covarance (ANCOVA) equaton has both dummy varable and contnuous predctors of a dependent varable Martal status s hghly correlated wth age (wdows are older, never marreds are younger), and annual sex actvty falls off steadly as people get older. Look what happens to the martal effects when age s controlled, by addng AGE to the martal status predctors of sex frequency: ˆ 7. 5.5 0. 3.4 0.4.7 DMARR DDIV DSEP DNEVER X AGE Radj (9.) (6.) (6.9) (0.) (7.) (0.) 0.7 Each year of age reduces sex by.7 tmes per year. Among people of same age, marreds have more sex than others, but never marreds now have less sex than wdows! What would you predct for: Never marreds aged? Marreds aged 40? Wdows aged 70?
Add FEMALE dummy to regresson of church attendance on X = relgous ntensty, X = age, and X 3 = educaton: ˆ 0.9.96X 0.0X 0.09X 3.0DFEM Radj (3.06) (0.50) (0.03) (0.7) (.05) 0.70 The standardzed regresson equaton: Zˆ 0.49Z 0. 04D 0.08Z 0.0Z 3 FEM Interpretatons: Women attend church.0 tmes more per year than men Other predctors effects unchanged when gender s added Age effect s twce as larger as gender effect Relgous ntensty remans strongest predctor of attendance