In Class Review Exercises Vartanian: SW 540

Size: px

Start display at page:

Download "In Class Review Exercises Vartanian: SW 540"

Homer Pierce
6 years ago
Views:

1 In Class Review Exercises Vartanian: SW Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE intercept 5 8 Black 9 3 Age We want to use residuals to determine the relationship of mental health problems and income, controlling for age. What is the partial r coefficient using residuals? 3. How do we determine significance in an ANOVA model? What factors do we compare? 4. Use the two types of analyses we ve learned to examine the following rankings. Math Reading Given the following nominal scale variables, what is the direction of the relationship between the variables, what is the chi-square value, and do we have a statistically significant relationship between the variables? Treatment Control Total Depressed Not Depressed Total Given the following results, what is your prediction for income for the mean individual? The DV is family income. The IVs are years of education of the head (I/R), parents expect the child to get a college degree (dummy excluded are those parents with lower expectations), and family members hit each other (dummy excluded are those that do not hit each other). C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 1

2 Regression Model 1 Model Summary Adjusted Std. Error of R R Square R Square the Estimate.415 a a. Predictors: (Constant), family members hit each other, education of head, PCGs educ expectat of child: coll degr ANOVA b Model Squares df Square F Sig. 1 Regression 1.21E E a Residual 5.81E Total 7.02E a. Predictors: (Constant), family members hit each other, education of head, PCGs educ expectat of child: coll degr b. Dependent Variable: total family income Model 1 (Constant) education of head PCGs educ expectat of child: coll degr family members hit each other Coefficients a Unstandardized Coefficients a. Dependent Variable: total family income Standardized Coefficients B Std. Error Beta t Sig Descriptive Statistics total family income education of head family members hit each other PCGs educ expectat of child: coll degr Valid N (listwise) N Minimum Maximum Std. Deviation #7. A. Indicate the meaning of the standardized coefficient estimates in problem #6. B. If education of the head increased by grades, what is your prediction for the change in income? Give this in standard deviation units and in change in actual income. C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 2

3 #8. You are examining a number of childhood predictor variables on the log of income as an adult. You use a number of race variables (with white exclude), gender (with female excluded), whether either of the person s parents were born outside of the U.S. (dummy), 8 th grade grades, standardized test score (I/R), and socioeconomic status in 8 th grade (made into an interval level variable). A. Why do we use log dependent variables? B. Which of the variables has the greatest effect on the log of income? C. Interpret the coefficient estimates for each of the variables. Coefficients a Model 1 (Constant) ASIAN BLACK AMIND HISPANIC MALE either parent born outside US GRADES IN 8TH GRADE SOCIO-ECONOMIC STATUS COMPOSITE STANDSCO a. Dependent Variable: LOGINC Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig E E E E E E D. You run a second model with regular income as the DV. Which of these two models is the better model? The results are given below. C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 3

4 Model 1 (Constant) ASIAN BLACK AMIND HISPANIC MALE either parent born outside US GRADES IN 8TH GRADE SOCIO-ECONOMIC STATUS COMPOSITE STANDSCO a. Dependent Variable: INCOME99 Coefficients a Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig A. Are either of the following models statistically significant? B. What is the meaning of a model being statistically significant? C. What is the adjusted or corrected R 2 value in each of the models? D. Is there much difference between the R 2 value and the corrected R 2 value? Why? E. In Model 1, what is the predicted level of income (not log income) for someone who has 0 sibling, is female, has a 3.5 GPA (grades) and 0 SES. Model 1: Using Log income as the DV. IVs are # of siblings, gender, SES (I/R), 8 th grade grades. Model: MODEL1 Dependent Variable: loginc Source DF Squares Model Error Corrected Total Variable Label DF Estimate Error t Value Pr > t Intercept Intercept <.0001 sibs female BYSES SOCIO-ECONOMIC STATUS <.0001 COMPOSITE BYGRADS GRADES IN 8TH GRADE <.0001 Model 2, Using Income as the DV Dependent Variable: INCOME99 Source DF Squares Square F Value Pr > F C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 4

5 Model E <.0001 Error E Corrected Total E12 Root MSE R-Square Dependent Adj R-Sq Coeff Var Variable Label DF Estimate Error t Value Pr > t Intercept Intercept <.0001 sibs female BYSES SOCIO-ECONOMIC STATUS <.0001 COMPOSITE BYGRADS GRADES IN 8TH GRADE <.0001 C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 5

6 10. You want to determine if area of residence helps in predicting grade point average. You are looking at 3 different areas: Urban, suburban, and rural. You get the results below, including a post hoc Scheffe test. Oneway ANOVA GRADES IN 8TH GRADE Squares df Square F Sig. Between Groups Within Groups Total Post Hoc Tests Dependent Variable: GRADES IN 8TH GRADE Scheffe Multiple Comparisons (I) Area of residence Urban Suburban Rural (J) Area of residence Suburban Rural Urban Rural Urban Suburban *. The mean difference is significant at the.05 level. A. Is there a significant difference among groups? B. Which groups are different? C. How different are the individual groups? Difference 95% Confidence Interval (I-J) Std. Error Sig. Lower Bound Upper Bound * * * * You have the following 3 groups and want to determine if there is a statistically significant difference in the groups. Is there? Married Never Married Divorce/Separate Sum C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 6

7 12. Given the following data, how would we use residuals to determine partial r s and partial b s? data a; input kids nbpov income; cards; ; proc reg; model income=kids; output out=b r=resid1 p=pred1; run; proc reg;model nbpov=kids; output out=c r=resid2 p=pred2; run; proc reg;model kids=nbpov; output out=e r=resid3 p=pred3; run; proc reg;model income=nbpov; output out=f r=resid4 p=pred4; run; proc reg;model kids=income; output out=e r=resid5 p=pred5; run; proc reg;model nbpov=income; output out=f r=resid6 p=pred6; run; C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 7

8 The SAS System 06:54 Wednesday, December 10, The REG Procedure Model: MODEL1 Dependent Variable: income Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variable DF Estimate Error t Value Pr > t Intercept <.0001 kids The SAS System 06:54 Wednesday, December 10, The REG Procedure Model: MODEL1 Dependent Variable: nbpov Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variable DF Estimate Error t Value Pr > t Intercept <.0001 kids C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 8

9 The SAS System 06:54 Wednesday, December 10, The REG Procedure Model: MODEL1 Dependent Variable: kids Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variable DF Estimate Error t Value Pr > t Intercept nbpov The SAS System 06:54 Wednesday, December 10, The REG Procedure Model: MODEL1 Dependent Variable: income Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variable DF Estimate Error t Value Pr > t Intercept <.0001 nbpov C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 9

10 The SAS System 06:54 Wednesday, December 10, The REG Procedure Model: MODEL1 Dependent Variable: kids Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variable DF Estimate Error t Value Pr > t Intercept income The SAS System 06:54 Wednesday, December 10, The REG Procedure Model: MODEL1 Dependent Variable: nbpov Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Adj R-Sq Coeff Var Parameter Estimates Variable DF Estimate Error t Value Pr > t Intercept <.0001 income C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 10

11 13. You have standardized all of your continuous variables by dividing by their respective standard deviations. Interpret the following output. DV: Income IV: Kids IV: nb poverty rate IV: White b SE Kids NBPOV White C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 11

12 #1 Answer. Yp=5 + 9 Black + 7 Age Black: Yp=5+9+7Age Yp=14+7Age White: Yp=5 + 7Age Answer for #2. MHP=X1 Income=Y Age=X2 Y=a+bx2+e1 X1=a+bx2+e2 Correlate e1 and e2. Answer for #3. We compare the within to the between. If we find a relatively large between and a relatively small within, group membership helps us in predicting the outcome. #4 Answer. Correlations Kendall's tau_b Spearman's rho MATH READING MATH READING Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N MATH READING Neither Kendall or Spearman show a significant relationship. C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 12

13 #5 Answer. Treatment Control Total Depressed 30 (28) 50 (52) 80 Not Depressed 40 (42) 80 (78) 120 Total Positive relationship between the control group and not depressed. Chi-square value is: Obs Expect Difference Difference Squared chi-squared value /28= /52= /42= /78=.05 chi-square value = =.37 at 1 DF. CV at.05 level is Therefore, you will fail to reject H 0. #6 Answer. To come up with the predicted value for the mean individual, you would use the Yp equation with the b values from the coefficient estimates, and use the mean values for the variables for the X values. Yp = * *.67+(-9701)*.36 = Answer for #7. A. For a one standard deviation increase in education level of the head, we predict that income will increase by.339 standard deviation units. B. If education level increased by roughly 2 standard deviation units, we would predict that income would increase by.678 standard deviation units. Because the standard deviation for income is 54528, a 2 standard deviation increase in education level would lead to a.678*54528 = increase in income. Answers for #8. A. We use log dependent variables to decrease the effects of outliers and to help interpret the effects of independent variables on scaled dependent variables. B. It appears that 8 th grade grades has the larger effect on log income relative to the other I/R variables. Standardized test scores appear to have no effect on income as an adult. C. You would need to make the transformation of these coefficient estimates using the exponential function in order to get the percentage change for a one unit increase in the IV or the percentage difference between the include and the excluded group. C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 13

14 D. You would need to know the R2 values of the two models to know which is a better fitting model. Answers for #9. First Equation: R 2 = 96.19/ = Adjusted R 2 = [4/( ) * ( )] = There are at least 2 ways to determine the F value. One way is by using the R 2 value. F 4,8802 =[(.0175)/4] / [( )/8802] = Or use the mean square model/mean square error = (96.19/4) / ( /8802) = The critical value for a.05 F test at 4 and 8802 DFs is For a.01 test, the critical value is Our F value exceeds these critical values and therefore we will reject the null hypothesis. Second Equation R 2 = / = (roughly) (we could have added lots of zeros onto the end of these values but as long as we know that denominator is one decimal place larger than the numerator, we can determine their relative values.) Adjusted R 2 = [4/( ) * ( )] = = There are at least 2 ways to determine the F value. One way is by using the R 2 value. F 4,8802 =[(.0171)/4] / [( )/8802] = Or use the mean square model/mean square error = ( /4) / ( /8802) = (Again, we know that the denominator has an extra decimal place or is one decimal place farther to the right -- relative to the numerator.) The critical value for a.05 F test at 4 and 8802 DFs is For a.01 test, the critical value is Our F value exceeds these critical values and therefore we will reject the null hypothesis. The meaning of a model being statistically significant: In all likelihood, the set of independent variables helps us in explaining the variance of the dependent variable. The R2 and corrected R2 are very similar because we are not using many independent variables in the model and because our sample size is very large. E. Yp= ( )*0+( )*1+(.07569)*0+(.09279)*3.5 = Take the exponential of this: e = $ C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 14

15 Answers for #10. A. The F test indicates that there is an overall difference among the groups, statistically significant at the.001 level. B. There is no difference between urban and suburban group. There is a difference between the rural and the urban groups, and the suburban and rural groups. C. The GPA of urban groups is.0648 points higher than the rural group. The GPA for the suburban group is.0600 high relative to the rural group. Answers for #11. Determine the within sums of squares: Married group: mean =6. (6-6) 2 +(5-6) 2 +(6-6) 2 + (7-6) 2 = 2 NM Group: =8. (7-8) 2 +(8-8) 2 + (9-8) 2 + (8-8) 2 = 2 DS Group: =9. (9-9) 2 + (10-9) 2 +(8-9) 2 + (9-9) 2 = 2 The within sums of square = 6. Divide this value by n-k, or 12-3 =9. The mean of the within sums of squares = 6/9 =.667. To determine the between sums of squares we first need to determine the overall mean value: ( )/12 = 92/12 = *(6-7.67) 2 + 4*(8-7.67) 2 + 4*(9-7.67) 2 = 4*(-1.67) 2 + 4*(.33) 2 + 4*(1.33) 2 = 4* * * 1.77 = To get the mean between SS, divide by k-1, or 2: 18.64/2 = F 2,9 = 9.32/.667 =13.97 The critical F values are 4.26 at the.05 level and 8.02 at the.01 level. We will reject the null hypothesis at both levels. Sort of an answer for #12. We would use the intercept values and b coefficients from the various bivariate regression models to determine residuals for particular partial correlations and b coefficient estimates that were of interest to us. For example, if we wanted to partial out the effect of kids, we would use the first two regression models, come up with residuals for each of those two models, then either run bivariate correlations between the two residuals or run a regression with the two residuals. You will choose the residual that results from the equation using the dependent variable as the dependent variable in a regression model. You will use the residual that results from the use of the independent variable as independent variable in the regression. 13. For a 1 SD increase in kids, income is predicted to increase by 1.2 SD units. For a 1 SD unit increase in NB poverty, income is predicted to decrease by.5 SD units. Whites have incomes that are 1.6 SD units higher than non-whites. C:\WP60_1\LECT1.PHD\Final\Review Exercises in Class Final.doc 15

Review of Multiple Regression

Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate