FREC 608 Guided Eercise 9 Problem. Model of Average Annual Precipitation An article in Geography (July 980) used regression to predict average annual rainfall levels in California. Data on the following variables were collected for 30 meteorological weather stations scattered throughout California. For the group work we will focus on a bivariate regression of Annual Percip on Latitude. You will have the option of eamining all the variables for this problem for the last assignment Annual Precip DEPENDENT VARIABLE: Annual Precipitation in inches Altitude The altitude of the station in feet Latitude The latitude of the station in degrees Distance Distance from the coast in miles Facing I made this into a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as, whereas stations on the leeward side were coded as 0 a. The following are the descriptive statistics on each of the variable. Briefly describe Annual Precipitation using the mean, median, std deviation and so forth. Annual Percip Altitude Latitude Distance Facing The Mean Annual Precipitation is Mean 9.8 375.30 37.03 78.70 0.43 9.8. The mean is larger than the Standard Error 3.03 382.8 0.49 2.65 0.09 median indicating right skew. There Median 5.35 290.00 36.70 74.50 0.00 Mode 8.20 452.00 33.80.00 0.00 is an etreme range in the data Standard Deviation 6.62 2096.75 2.67 69.30 0.50 from.66 inches to 73.2 inches. Sample Variance 276.26 4396344.63 7. 4802.63 0.25 The spread of the data is large relative Kurtosis 3.05 0.78 -.09 -.9-2.06 to the mean: the CV is 83.8%. Skewness.70.46 0.23 0.42 0.28 Range 73.2 6930.00 9.20 97.00.00 Minimum.66-78.00 32.70.00 0.00 Maimum 74.87 6752.00 4.90 98.00.00 Sum 594.22 4259 0.8 236 3 Count 30 30 30 30 30 b. The following is the Covariance matri on the variables. Using the formula to the right, generate the Correlation Matri for this data. Remember, the variances (and thus the standard deviations) for each variable is on the diagonal of the covariance matri. Annual Precip Altitude Latitude Distance Facing Annual Precip 267.055 Altitude 074.224 4249799.80 Latitude 24.709 248.472 6.873 Distance -233.940 80627.290 28.825 4642.543 Facing 4.843 5.470-0.05-6.537 0.246 r = Cov σ σ 2 2 Annual Precip Altitude Latitude Distance Facing Annual Precip.000 Altitude 0.302.000 Latitude 0.577 0.23.000 Distance -0.20 0.574 0.6.000 Facing 0.598 0.050-0.0-0.490.000 For Annual Precip and Latitude: 24.709/(267.055.5 *6.873.5 ) =.577
c. Briefly describe the correlation between Annual Precip and Latitude. Does this correlation make sense? Remember, this data is from California weather stations. r =.577 As the Latitude increases the annual precipitation also increases. The correlation is moderately in strength. This makes sense since as you move further north in CA (higher Latitude) there tends to be more rainfall. Inches of rain 80 70 60 50 40 30 20 0 0 CA Annual Precipitation by Latitude 32 34 36 38 40 42 Latitude d. Facing is a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as, whereas stations on the leeward side were coded as 0. Interpret the correlation between Annual Precip and Facing. r =.598. Since FACING is a dummy variable the correlation interpretation is a little different. Instead of as FACING increases, ANNUAL PRECIP increases, we will say that Westward facing stations tend to have more rainfall. Since the correlation is moderate in strength, there are moderate differences in average rainfall on west side and lee side stations. e. Now we will shift to the bivariate regression of Annual Precip on Latitude. The following are formulas for regression based on covariance β SS = SS XY X β 0 = β X Y Using the covariances, the variance for Latitude, and the means, calculate estimates for the regression coefficients. For b it is simply the covariance divided by the variance for Latitude. b = 24.709/6.873 = 3.595 b 0 = 9.8 3.595*37.03 = -3.33
Confirm your results from the regression output from Ecel. SUMMARY OUTPUT Regression Statistics Multiple R 0.577 R Square 0.333 Adjusted R Square 0.309 Standard Error 3.89 Observations 30 df SS MS F Sig F Regression 2664.887 2664.887 3.956 0.00 Residual 28 5346.766 90.956 Total 29 80.654 Coeff Std Error t Stat P-value Intercept -3.303 35.72-3.72 0.004 Latitude 3.595 0.962 3.736 0.00 f. Verify that R 2 in a bivariate regression is simply the correlation (r) squared. Interpret R 2 for this model. r 2 =.577 2 =.333 R 2 =.333 One third (33.3%) of the variability in Annual Precipitation is eplained by knowing the Latitude of the station. g. What does the model predict for annual precipitation when the latitude is 33 degrees est. Annual Precip = -3.303 + 3.595(33) = 5.332 36 degrees est. Annual Precip = -3.303 + 3.595(36) = 6.7 40 degrees est. Annual Precip = -3.303 + 3.595(40) = 30.497
PROBLEM 2. This focuses on whether females mid-level managers have lower salaries than males. The data set contains the following variables for 220 mid-level managers of firms (we will only focus on these four variables): SALARY Dependent Variable Base annual salary in $,000s SEX POSITION = Female; 0 = Male An inde of the position of the employee in the firm, based on the number of employees supervised, size of budget and so forth. A higher number means higher level in the company YEARS EXP The number of years of eperience a. The following is the correlation matri for this data. Briefly describe the correlations between each of the independent variables and the dependent variable SALARY Salary Se Position YearsEper Salary.000 Se -0.38.000 Position 0.89-0.323.000 YearsEper 0.32-0.446 0.570.000 Correlation between Se and Salary is -.38 Women earn slightly less salary than men Correlation between Position and Salary is.89 Strong correlation, those in higher positions earn more salary Correlation between YearsEper and Salary is.32 There is a weak positive relationship between eperience and salary b. For reference, I am including the results for this data. Then we will do the very same thing in regression. Note the means, variances, and conclusion from the results. Based on the result, could we conclude there is a difference in salary between men and women at alpha =.05? Briefly summarize the results. Anova: Single Factor SUMMARY Groups Count Sum Average Variance Females 75 0535 40.467 56.44 Males 45 20896 44.0 53.63 Source of Variation SS df MS F P-value F crit Between Groups 656.276 656.276 4.249 0.040 3.884 Within Groups 33674.90 28 54.472 Total 3433.77 29 The provides a test of the difference in salary between men and women in the sample. The test confirms that there is a significant difference in salary between men and women women earn less. The F-test is significant at p =.04.
c. The following is the regression statistics for the multi-variate regression of SALARY on SEX. SEX is a dummy variable where = Females and 0 = Males. SUMMARY OUTPUT Regression Statistics Multiple R 0.38 R Square 0.09 Adjusted R Square 0.05 Standard Error 2.429 Observations 220 df SS MS F Sig F Regression 656.276 656.276 4.249 0.040 Residual 28 33674.90 54.472 Total 29 3433.77 Coef Std Error t Stat P-value Lower 95% Upper 95% Intercept 44.0.032 39.622 0.000 42.076 46.45 Se -3.644.768-2.06 0.040-7.28-0.60 d. Confirm for yourself the following: The R-square for both and Regression are the same. The Tables are identical - sums of squares, df, Mean squares and the F-test are the same. The pooled variances are the same (think about this one!) e. Solve the equation to get the estimated salary for males and females. To do this you need to use the estimated coefficients and realize that SEX can only take on two values: 0 and. Confirm that: The equation estimates the mean salary for males and females When Se = (Females) est Salary = 44.0 3.644() = 40.466 the average salary for women When Se = 0 (Males) est Salary = 44.0 3.644(0) = 44.0 the average salary for men The slope coefficient represents the difference in salary between males and females The difference in mean salary is: 44.0 40.466 = 3.644 The intercept represents the reference group (in this case the group represented as zero for SEX) Since SEX = represents females, the reference group is males. The intercept is the mean for males