Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Similar documents
What Lies Beneath: A Sub- National Look at Okun s Law for the United States.

Summary of Natural Hazard Statistics for 2008 in the United States

Meteorology 110. Lab 1. Geography and Map Skills

SUPPLEMENTAL NUTRITION ASSISTANCE PROGRAM QUALITY CONTROL ANNUAL REPORT FISCAL YEAR 2008

Lecture 26 Section 8.4. Mon, Oct 13, 2008

Sample Statistics 5021 First Midterm Examination with solutions

Challenge 1: Learning About the Physical Geography of Canada and the United States

New Educators Campaign Weekly Report

MINERALS THROUGH GEOGRAPHY

Standard Indicator That s the Latitude! Students will use latitude and longitude to locate places in Indiana and other parts of the world.

Your Galactic Address

Correction to Spatial and temporal distributions of U.S. winds and wind power at 80 m derived from measurements

Jakarta International School 6 th Grade Formative Assessment Graphing and Statistics -Black

Nursing Facilities' Life Safety Standard Survey Results Quarterly Reference Tables

Abortion Facilities Target College Students

Multivariate Statistics

, District of Columbia

Annual Performance Report: State Assessment Data

Preview: Making a Mental Map of the Region

RELATIONSHIPS BETWEEN THE AMERICAN BROWN BEAR POPULATION AND THE BIGFOOT PHENOMENON

Multivariate Statistics

Analyzing Severe Weather Data

FLOOD/FLASH FLOOD. Lightning. Tornado

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee May 2018

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2017

Cooperative Program Allocation Budget Receipts Southern Baptist Convention Executive Committee October 2018

Swine Enteric Coronavirus Disease (SECD) Situation Report June 30, 2016

Outline. Administrivia and Introduction Course Structure Syllabus Introduction to Data Mining

MINERALS THROUGH GEOGRAPHY. General Standard. Grade level K , resources, and environmen t

QF (Build 1010) Widget Publishing, Inc Page: 1 Batch: 98 Test Mode VAC Publisher's Statement 03/15/16, 10:20:02 Circulation by Issue

Use your text to define the following term. Use the terms to label the figure below. Define the following term.

Club Convergence and Clustering of U.S. State-Level CO 2 Emissions

Printable Activity book

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

LABORATORY REPORT. If you have any questions concerning this report, please do not hesitate to call us at (800) or (574)

Additional VEX Worlds 2019 Spot Allocations

Intercity Bus Stop Analysis

Analysis of the USDA Annual Report (2015) of Animal Usage by Research Facility. July 4th, 2017

A. Geography Students know the location of places, geographic features, and patterns of the environment.

Class business PS is due Wed. Lecture 20 (QPM 2016) Multivariate Regression November 14, / 44

Multivariate Statistics

Swine Enteric Coronavirus Disease (SECD) Situation Report Sept 17, 2015

Hourly Precipitation Data Documentation (text and csv version) February 2016

The Heterogeneous Effects of the Minimum Wage on Employment Across States

EXST 7015 Fall 2014 Lab 08: Polynomial Regression

Evolution Strategies for Optimizing Rectangular Cartograms

Office of Special Education Projects State Contacts List - Part B and Part C

Swine Enteric Coronavirus Disease (SECD) Situation Report Mar 5, 2015

Multivariate Analysis

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Appendix 5 Summary of State Trademark Registration Provisions (as of July 2016)

2005 Mortgage Broker Regulation Matrix

JAN/FEB MAR/APR MAY/JUN

Online Appendix: Can Easing Concealed Carry Deter Crime?

Regression Diagnostics

Module 19: Simple Linear Regression

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

Crop Progress. Corn Mature Selected States [These 18 States planted 92% of the 2017 corn acreage]

(Specification B) (JUN H01) (JaN11GEOG101) General Certificate of Education Secondary Education Advanced Higher TierSubsidiary Examination

2006 Supplemental Tax Information for JennisonDryden and Strategic Partners Funds

Multivariate Classification Methods: The Prevalence of Sexually Transmitted Diseases

Osteopathic Medical Colleges

Stem-and-Leaf Displays

Lecture 5: Ecological distance metrics; Principal Coordinates Analysis. Univariate testing vs. community analysis

C Further Concepts in Statistics

North American Geography. Lesson 2: My Country tis of Thee

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Forecasting the 2012 Presidential Election from History and the Polls

OUT-OF-STATE 965 SUBTOTAL OUT-OF-STATE U.S. TERRITORIES FOREIGN COUNTRIES UNKNOWN GRAND TOTAL

Multiway Analysis of Bridge Structural Types in the National Bridge Inventory (NBI) A Tensor Decomposition Approach

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

An Analysis of Regional Income Variation in the United States:

High School World History Cycle 2 Week 2 Lifework

Review of Multiple Regression

Chapter 11 : State SAT scores for 1982 Data Listing

If you have any questions concerning this report, please feel free to contact me. REPORT OF LABORATORY ANALYSIS

Fungal conservation in the USA

Smart Magnets for Smart Product Design: Advanced Topics

Draft Report. Prepared for: Regional Air Quality Council 1445 Market Street, Suite 260 Denver, Colorado Prepared by:

Statistical Mechanics of Money, Income, and Wealth

Alpine Funds 2017 Tax Guide

If you have any questions concerning this report, please feel free to contact me. REPORT OF LABORATORY ANALYSIS

REGRESSION ANALYSIS BY EXAMPLE

Alpine Funds 2016 Tax Guide

SAMPLE AUDIT FORMAT. Pre Audit Notification Letter Draft. Dear Registrant:

Drought Monitoring Capability of the Oklahoma Mesonet. Gary McManus Oklahoma Climatological Survey Oklahoma Mesonet

Chapter 5 Linear least squares regression

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Multiple Regression Analysis

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

Locations of Monitoring Stations in the Mercury Trends Network

Direct Selling Association 1

National Organization of Life and Health Insurance Guaranty Associations

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

WORKING PAPER NO AN ALTERNATIVE DEFINITION OF ECONOMIC REGIONS IN THE U.S. BASED ON SIMILARITIES IN STATE BUSINESS CYCLES

Empirical Application of Panel Data Regression

JAN/FEB MAR/APR MAY/JUN

BlackRock Core Bond Trust (BHK) BlackRock Enhanced International Dividend Trust (BGY) 2 BlackRock Defined Opportunity Credit Trust (BHL) 3

KS PUBL 4YR Kansas State University Pittsburg State University SUBTOTAL-KS

Transcription:

Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 984. y ˆ = a + b x + b 2 x 2K + b n x n where n is the number of variables Example: In an earlier bivariate regression example we attempted to predict the state homicide rate (Homicide) using only poverty (). Our model could explain approximately 30% of the variation in homicide rates using poverty as the only explanatory variable. Now we want to improve our earlier model by incorporating additional explanatory (independent) variables. In this example we have included percent minority population (), and the per capita alcohol consumption level (). Our hypothesis is that these three independent variables will explain a significant portion of state homicide rates. However, we are concerned that the independent variables may be correlated, which would violate the independence assumption of multivariate regression. In multiple regression there is the assumption that the independent variables are not correlated. When the independent variables are themselves related, it is termed multicollinearity. Remember that in regression we partition out the explanatory power of one variable while holding the others constant. Think of multicollinearity as overlap in explanatory power among the independent variables. This overlap makes it impossible to determine which of the independent variables is explaining the dependent variable. In other words, we can not hold the independent variables constant since they are associated with each other. Luckily there are several tools available in SPSS that allow us to gauge the severity of any collinearity among the independent variables. Since this process is very similar to that of bivariate linear regression, we will let SPSS do the calculations. For our example: y = Homicide x = x 2 = x 3 = State Homicide AL 6. 28.9.555 4.4 AZ 3.9 24.54 2.2 0. AR 5.8 20.06.455 3.2 CA 4.2 40.59.837 4.5 CO 9.3 7.27 2.204 6.2 CT 7.9 8.43.853 7.3 DE 9.2 25.38 2.649 5 D.C. 20.2 69.36 3.47 78. FL 2.5 22.02 2.297.2 GA 3 34.93.774 2.4 ID.8 9.07.942 3.4 IL 0.7 26.54.907 3.2 IN 9.5 2.55.602 7.2 IA 9. 6.03.692 2.3 KS 9.9 4.53 8.6 KY 5.8 9.96.444 6.7 LA 9.6 36.09.924 22.7 ME 0.9 3.02 2.033.9 MD 8.5 35.98.775 4.6 MA 9.3 5.5 2.066 4.4 MI 0.5 9.9.739.9 MN 7.9 0.52 2.03 3.4 MS 9.9 38.64.728 9.2 MO.7 5.6.88 2.6 MT 4.6 9.38 2.85 4.7 NE 9.7 0.37.828 4. NV 0.5 24.78 3.232.9 NH 6.5 3.99 3.454 2.3 NJ 8.5 27.5.855 5.9 NM 8.4 33.22.976 0.8 NY 4.6 32.07.598 4.5 NC 2.3 27.92.687 2.6 ND.9 7.54 2.097 3.4 OH 0.6 5.08.656 5.9 OK 4.7 23.94.584 0. OR.6 3.56.996 5 PA 4.62.83 7.9 RI.9 5.03 2.054 4.9 SC 4. 32.8.967.5 SD 3.2.3.966 4.4 TN 3.5 9.8.638.6 TX 5.4 29.03.83 3 UT 9.4 0.82.033 3.6 VT 9.4 3.28 2.0 2.6 VA 9.6 27.7.77 8.8 WA 0.6 8.3.848 5.7 WV 7.9 5.02.437 7.9 WI 8.7 2.34 5 WY.4 8.04 2.357 3.2

SPSS Output for the State Homicide Rates Example: First Try Descriptive Statistics Mean Std. Deviation N 9.94.0069 49 2.88 3.3408 49 20.26 2.47224 49.94733.443452 49 Summary b Adjusted Std. Error of Durbin- R R Square R Square the Estimate Watson.853 a.728.70 5.9252 2.327 a. Predictors: (Constant),, PCTMinority, PCTPov b. Dependent Variable: The multivariate model has substantially more explanatory power than the earlier model (0.70 vs 0.32). Another means of assessing the power of the model is by comparing the standard deviation of the dependent variable () to the standard error of the estimate. Without prior knowledge of poverty, minority population, and alcohol consumption the best guess at the homicide rate is 9.94, with a standard deviation of.00. Note that the standard error of the estimate is only 5.9252, or about half of the standard deviation of homicides. This indicates that the predicted values from our model have a much lower error level (amount of deviation). Correlations Pearson Correlation Sig. (-tailed) N.000.566.82.265.566.000.53 -.39.82.53.000.28.265 -.39.28.000..000.000.033.000..000.70.000.000..9.033.70.9. You can have SPSS print out a correlation matrix for the regression variables by accessing Analyze > Regression > Linear > Statistics, then click the Part and partial correlations radio button. There appears to be some correlation among the independent variables, the most potentially problematic is between and (r = 0.53, p = 0.000). The other correlations are not significant: and (r = 0.28, p = 0.9), and and (r = -0.39, p = 0.70).

Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig. 4235.465 3 4.822 40.23.000 a 579.875 45 35.08 585.340 48 a. Predictors: (Constant),,, b. Dependent Variable: From the F test we see that the slope of the line is significantly different than zero, so the model has meaning. (Constant) Unstandardized Standardized a Correlations t Sig. Zero-order Partial Part Collinearity Statistics VIF B Std. Error Beta Tolerance -22.282 5.500-4.05.000.828.32.25 2.656.0.566.368.206.675.482.574.083.65 6.89.000.82.77.535.677.478 5.390 2.006.27 2.686.00.265.372.209.924.082 a. Dependent Variable: The table above gives the collinearity diagnostics and can be accessed through Analyze > Regression > Linear > Statistics, and then click on the Collinearity diagnostics radio button. Tolerance is the amount of the variance in a given independent variable what can not be explained by other independent variables. In this case 67.5% of the variance in, 67.7% of the variance in, and 92.4% of the variance in can not be explained by the others meaning that very little of the variance in each independent variable can be explained by others. Also, VIFs (variance inflation factors) higher than 2 are considered problematic and our VIFs are all less than.5. These independent variables are not highly correlated with each other, and therefore multicollinearity should not be a problem. Also from the above table we get the regression model. = 22.282+ 0.828( ) + 0.574( ) + 5.390( ) From this model we can see that all of the variables are positively related to homicide, meaning that as they increase, homicides increase. and have approximately the same influence on the model (0.828 and 0.574) since they are both in percent units. is measured in different units (gallons per year) and so its slope parameter is not directly comparable to the others.

Dependent Variable: Regression Standardized Predicted Value 5 4 3 2 0 - -2 8-2 0 Regression Standardized Residual 2 4 The residual plot reveals two important considerations. The extreme outlier (Washington, D.C.) and that as the residual values increase, the predicted values decrease. 50.00000 8 Unstandardized Predicted Value 25.00000 0.00000 R Sq Linear = 0.728 0.0 20.0 40.0 60.0 80.0 The Washington D.C. observation is having a substantial influence on the regression line. Give its undue influence, it might be best to remove it from the analysis and rerun the model.

SPSS Output for the State Homicide Rates Example: Second Try Summary b Adjusted Std. Error of R R Square R Square the Estimate.885 a.784.769 2.293 a. Predictors: (Constant),,, b. Dependent Variable: The explanatory power of the model has now increased from 0.70 to 0.769 and the standard error of the estimate has decreased from 5.9252 to 2.293. The decrease in the standard error of the estimate means that on average the predicted values are much closer to the observed values. It appears that the model has improved. Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig. 837.89 3 279.273 53.0.000 a 23.369 44 5.258 069.88 47 a. Predictors: (Constant),,, b. Dependent Variable: The F test tell us that the variation explained by the model is not due to chance. (Constant) Unstandardized a. Dependent Variable: a Standardized B Std. Error Beta t Sig. -2.446 2.463 -.993.326.470.23.32 3.832.000.32.036.695 8.948.000 -.458.858 -.040 -.534.596 Note that the y-intercept (constant) is not significant (0.326). This is a product of the variable, which is also not significant (0.596). This variable should be dropped from the analysis since it is not helpful in predicting homicides. Dependent Variable: Regression Standardized Predicted Value 2 0-2 Note that the predicted vs. residual plot now appears to be randomly distributed. This signals that we are on the right path to perfecting our model. -2.5 0.0 Regression Standardized Residual 2.5

SPSS Output for the State Homicide Rates Example: Third Try Summary b Adjusted Std. Error of R R Square R Square the Estimate.884 a.782.773 2.2748 a. Predictors: (Constant),, b. Dependent Variable: The explanatory power of the model has again increased from 0.769 to 0.773 and the standard error of the estimate has decreased from 2.293 to 2.2748. It appears that the model has continued to improve. Regression Residual Total ANOVA b Sum of Squares df Mean Square F Sig. 836.322 2 48.6 80.807.000 a 232.866 45 5.75 069.88 47 a. Predictors: (Constant),, b. Dependent Variable: Again, the F test tell us that the variation explained by the model is not due to chance. (Constant) Unstandardized a. Dependent Variable: a However, now the y-intercept and independent variables are significant, resulting in the final model: = 3.556+ 0.490( ) + 0.32( ) Standardized B Std. Error Beta t Sig. -3.556.306-2.724.009.490.6.325 4.23.000.32.036.695 9.06.000 Unstandardized Predicted Value 20.00000 5.00000 0.00000 5.00000 Mississippi New Mexico California South Carolina Georgia New York North Carolina Alabama Arizona Texas Maryland Oklahoma New Jersey Florida Illinois Virginia Nevada Delaware Washington Tennessee Rhode Island Michigan Montana West Virginia Oregon Pennsylvania Missouri South Dakota Colorado Wyoming Utah Kansas Indiana North Dakota Wisconsin Minnesota Iowa Maine Vermont Louisiana R Sq Linear = 0.782 The observed vs. predicted plot shows that the model fits the data well. There does not appear to be any spatial bias to the predicted values (e.g. a single region does appear to be over or under predicted). Given the variables in our data set, this is the best model for explaining homicides. It took several attempts to develop the final model. Very rarely are models perfect the first time. Use the tools available to produce the best possible model. 0.00000 New Hampshire 0.0 5.0 0.0 5.0 20.0 25.0