FREC 608 Guided Exercise 9

Similar documents
ANSWERS TO EXAMPLE ASSIGNMENT (For illustration only)

Categorical Predictor Variables

Intro to Linear Regression

Intro to Linear Regression

Inferences for Regression

Statistics and Quantitative Analysis U4320

Chapter 14 Multiple Regression Analysis

ANSWER KEY. Part I: Weather and Climate. Lab 16 Answer Key. Explorations in Meteorology 72

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Midterm 2 - Solutions

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

1995, page 8. Using Multiple Regression to Make Comparisons SHARPER & FAIRER. > by SYSTAT... > by hand = 1.86.

Section 5: Dummy Variables and Interactions

General Linear Model (Chapter 4)

Final Exam - Solutions

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

MBA Statistics COURSE #4

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Statistics 5100 Spring 2018 Exam 1

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Lab 10 - Binary Variables

Introduction to Econometrics. Review of Probability & Statistics

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

A Re-Introduction to General Linear Models (GLM)

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

The Empirical Rule, z-scores, and the Rare Event Approach

Sociology 593 Exam 2 March 28, 2002

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Linear models Analysis of Covariance

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Linear models Analysis of Covariance

LI EAR REGRESSIO A D CORRELATIO

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Stat 501, F. Chiaromonte. Lecture #8

Regression - Modeling a response

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

STATISTICAL DATA ANALYSIS IN EXCEL

REVIEW 8/2/2017 陈芳华东师大英语系

22s:152 Applied Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Two-Way ANOVA. Chapter 15

Chapter 9 - Correlation and Regression

Multiple Regression: Chapter 13. July 24, 2015

Correlation & Simple Regression

Simple Linear Regression Analysis

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Climate. Annual Temperature (Last 30 Years) January Temperature. July Temperature. Average Precipitation (Last 30 Years)

Data Set 8: Laysan Finch Beak Widths

Lecture (chapter 13): Association between variables measured at the interval-ratio level

More on Variability. Overview. The Variance is sensitive to outliers. Marriage Data without Nevada

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Lecture 24: Partial correlation, multiple regression, and correlation

H. Diagnostic plots of residuals

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Sociology 593 Exam 2 Answer Key March 28, 2002

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Ordinary Least Squares Regression Explained: Vartanian

Psych 230. Psychological Measurement and Statistics

AP Final Review II Exploring Data (20% 30%)

Introduction to Linear regression analysis. Part 2. Model comparisons

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

Interactions between Binary & Quantitative Predictors

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Practical Biostatistics

( ), which of the coefficients would end

Multiple OLS Regression

Bivariate data analysis

Review of Multiple Regression

Sem. 1 Review Ch. 1-3

Formula for the t-test

Section Least Squares Regression

Problem Set 10: Panel Data

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

using the beginning of all regression models

Lecture 3: Inference in SLR

1 A Review of Correlation and Regression

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Practice Questions for Exam 1

Biological Applications of ANOVA - Examples and Readings

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

Midterm 2 - Solutions

SPSS Guide For MMI 409

Analysis of Bivariate Data

Grade 9 Data Handling - Probability, Statistics

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

FOUNDATION COURSE MOCK TEST PAPER - 1. PAPER 3: BUSINESS MATHEMATICS, LOGICAL REASONING AND STATISTICS Time Allowed 3 Hours Maximum Marks: 100

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

SPSS LAB FILE 1

Transcription:

FREC 608 Guided Eercise 9 Problem. Model of Average Annual Precipitation An article in Geography (July 980) used regression to predict average annual rainfall levels in California. Data on the following variables were collected for 30 meteorological weather stations scattered throughout California. For the group work we will focus on a bivariate regression of Annual Percip on Latitude. You will have the option of eamining all the variables for this problem for the last assignment Annual Precip DEPENDENT VARIABLE: Annual Precipitation in inches Altitude The altitude of the station in feet Latitude The latitude of the station in degrees Distance Distance from the coast in miles Facing I made this into a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as, whereas stations on the leeward side were coded as 0 a. The following are the descriptive statistics on each of the variable. Briefly describe Annual Precipitation using the mean, median, std deviation and so forth. Annual Percip Altitude Latitude Distance Facing The Mean Annual Precipitation is Mean 9.8 375.30 37.03 78.70 0.43 9.8. The mean is larger than the Standard Error 3.03 382.8 0.49 2.65 0.09 median indicating right skew. There Median 5.35 290.00 36.70 74.50 0.00 Mode 8.20 452.00 33.80.00 0.00 is an etreme range in the data Standard Deviation 6.62 2096.75 2.67 69.30 0.50 from.66 inches to 73.2 inches. Sample Variance 276.26 4396344.63 7. 4802.63 0.25 The spread of the data is large relative Kurtosis 3.05 0.78 -.09 -.9-2.06 to the mean: the CV is 83.8%. Skewness.70.46 0.23 0.42 0.28 Range 73.2 6930.00 9.20 97.00.00 Minimum.66-78.00 32.70.00 0.00 Maimum 74.87 6752.00 4.90 98.00.00 Sum 594.22 4259 0.8 236 3 Count 30 30 30 30 30 b. The following is the Covariance matri on the variables. Using the formula to the right, generate the Correlation Matri for this data. Remember, the variances (and thus the standard deviations) for each variable is on the diagonal of the covariance matri. Annual Precip Altitude Latitude Distance Facing Annual Precip 267.055 Altitude 074.224 4249799.80 Latitude 24.709 248.472 6.873 Distance -233.940 80627.290 28.825 4642.543 Facing 4.843 5.470-0.05-6.537 0.246 r = Cov σ σ 2 2 Annual Precip Altitude Latitude Distance Facing Annual Precip.000 Altitude 0.302.000 Latitude 0.577 0.23.000 Distance -0.20 0.574 0.6.000 Facing 0.598 0.050-0.0-0.490.000 For Annual Precip and Latitude: 24.709/(267.055.5 *6.873.5 ) =.577

c. Briefly describe the correlation between Annual Precip and Latitude. Does this correlation make sense? Remember, this data is from California weather stations. r =.577 As the Latitude increases the annual precipitation also increases. The correlation is moderately in strength. This makes sense since as you move further north in CA (higher Latitude) there tends to be more rainfall. Inches of rain 80 70 60 50 40 30 20 0 0 CA Annual Precipitation by Latitude 32 34 36 38 40 42 Latitude d. Facing is a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as, whereas stations on the leeward side were coded as 0. Interpret the correlation between Annual Precip and Facing. r =.598. Since FACING is a dummy variable the correlation interpretation is a little different. Instead of as FACING increases, ANNUAL PRECIP increases, we will say that Westward facing stations tend to have more rainfall. Since the correlation is moderate in strength, there are moderate differences in average rainfall on west side and lee side stations. e. Now we will shift to the bivariate regression of Annual Precip on Latitude. The following are formulas for regression based on covariance β SS = SS XY X β 0 = β X Y Using the covariances, the variance for Latitude, and the means, calculate estimates for the regression coefficients. For b it is simply the covariance divided by the variance for Latitude. b = 24.709/6.873 = 3.595 b 0 = 9.8 3.595*37.03 = -3.33

Confirm your results from the regression output from Ecel. SUMMARY OUTPUT Regression Statistics Multiple R 0.577 R Square 0.333 Adjusted R Square 0.309 Standard Error 3.89 Observations 30 df SS MS F Sig F Regression 2664.887 2664.887 3.956 0.00 Residual 28 5346.766 90.956 Total 29 80.654 Coeff Std Error t Stat P-value Intercept -3.303 35.72-3.72 0.004 Latitude 3.595 0.962 3.736 0.00 f. Verify that R 2 in a bivariate regression is simply the correlation (r) squared. Interpret R 2 for this model. r 2 =.577 2 =.333 R 2 =.333 One third (33.3%) of the variability in Annual Precipitation is eplained by knowing the Latitude of the station. g. What does the model predict for annual precipitation when the latitude is 33 degrees est. Annual Precip = -3.303 + 3.595(33) = 5.332 36 degrees est. Annual Precip = -3.303 + 3.595(36) = 6.7 40 degrees est. Annual Precip = -3.303 + 3.595(40) = 30.497

PROBLEM 2. This focuses on whether females mid-level managers have lower salaries than males. The data set contains the following variables for 220 mid-level managers of firms (we will only focus on these four variables): SALARY Dependent Variable Base annual salary in $,000s SEX POSITION = Female; 0 = Male An inde of the position of the employee in the firm, based on the number of employees supervised, size of budget and so forth. A higher number means higher level in the company YEARS EXP The number of years of eperience a. The following is the correlation matri for this data. Briefly describe the correlations between each of the independent variables and the dependent variable SALARY Salary Se Position YearsEper Salary.000 Se -0.38.000 Position 0.89-0.323.000 YearsEper 0.32-0.446 0.570.000 Correlation between Se and Salary is -.38 Women earn slightly less salary than men Correlation between Position and Salary is.89 Strong correlation, those in higher positions earn more salary Correlation between YearsEper and Salary is.32 There is a weak positive relationship between eperience and salary b. For reference, I am including the results for this data. Then we will do the very same thing in regression. Note the means, variances, and conclusion from the results. Based on the result, could we conclude there is a difference in salary between men and women at alpha =.05? Briefly summarize the results. Anova: Single Factor SUMMARY Groups Count Sum Average Variance Females 75 0535 40.467 56.44 Males 45 20896 44.0 53.63 Source of Variation SS df MS F P-value F crit Between Groups 656.276 656.276 4.249 0.040 3.884 Within Groups 33674.90 28 54.472 Total 3433.77 29 The provides a test of the difference in salary between men and women in the sample. The test confirms that there is a significant difference in salary between men and women women earn less. The F-test is significant at p =.04.

c. The following is the regression statistics for the multi-variate regression of SALARY on SEX. SEX is a dummy variable where = Females and 0 = Males. SUMMARY OUTPUT Regression Statistics Multiple R 0.38 R Square 0.09 Adjusted R Square 0.05 Standard Error 2.429 Observations 220 df SS MS F Sig F Regression 656.276 656.276 4.249 0.040 Residual 28 33674.90 54.472 Total 29 3433.77 Coef Std Error t Stat P-value Lower 95% Upper 95% Intercept 44.0.032 39.622 0.000 42.076 46.45 Se -3.644.768-2.06 0.040-7.28-0.60 d. Confirm for yourself the following: The R-square for both and Regression are the same. The Tables are identical - sums of squares, df, Mean squares and the F-test are the same. The pooled variances are the same (think about this one!) e. Solve the equation to get the estimated salary for males and females. To do this you need to use the estimated coefficients and realize that SEX can only take on two values: 0 and. Confirm that: The equation estimates the mean salary for males and females When Se = (Females) est Salary = 44.0 3.644() = 40.466 the average salary for women When Se = 0 (Males) est Salary = 44.0 3.644(0) = 44.0 the average salary for men The slope coefficient represents the difference in salary between males and females The difference in mean salary is: 44.0 40.466 = 3.644 The intercept represents the reference group (in this case the group represented as zero for SEX) Since SEX = represents females, the reference group is males. The intercept is the mean for males