Multiple Regression and Model Building (cont d) + GIS Lecture 21 3 May 2006 R. Ryznar

Multiple Regression and Model Building (cont d) + GIS 11.220 Lecture 21 3 May 2006 R. Ryznar

Model Summary b 1-[(SSE/n-k+1)/(SST/n-1)] Model 1 Adjusted Std. Error of R R Square R Square the Estimate.991 a.982.977 46.801 a. Predictors: (Constant), SizeSquared, HomeSize SSE Model 1 Regression Residual Total b. Dependent Variable: EnergyUse R 2 =SSR/SST or 1-(SSE/SST) ANOVA b Sum of Squares df Mean Square F Sig. 831069.5 2 415534.773 189.710.0001 a 15332.554 7 2190.365 846402.1 9 a. Predictors: (Constant), SizeSquared, HomeSize b. Dependent Variable: EnergyUse Coefficients a S 2 = SSE/n (k + 1) Sometimes called MSE F= R 2 /k (1-R 2 )/[n-(k+1)] Model 1 (Constant) HomeSize SizeSquared a. Dependent Variable: EnergyUse Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. -1216.1438870 242.80636850-5.009.00155 2.39893018.24583560 4.049 9.758.00003 -.00045004.00005908-3.161-7.618.00012 y = 2 β + β x + β + ε 0 1 2 x K=number of X variables

y 0 1 = β + β x + ε Model 1 Model Summary b Adjusted Std. Error of R R Square R Square the Estimate.912 a.832.811 133.438 a. Predictors: (Constant), HomeSize b. Dependent Variable: EnergyUse ANOVA b Model 1 Regression Residual Total Sum of Squares df Mean Square F Sig. 703957.2 1 703957.183 39.536.000 a 142444.9 8 17805.615 846402.1 9 a. Predictors: (Constant), HomeSize b. Dependent Variable: EnergyUse Model 1 (Constant) HomeSize Unstandardized Coefficients a. Dependent Variable: EnergyUse Coefficients a Standardized Coefficients B Std. Error Beta t Sig. 578.928 166.968 3.467.008.540.086.912 6.288.000

Correlation with Y (r) (survival time) x 1.346 x 2.593 x 3.665 x 4.726 X variables SSE R 2 X 1 (Blood Clotting) 3.4961.120 X 2 (Prognostic Ind.) 2.5763.352 X 3 (Enzyme Func.) 2.2153.442 X 4 (Liver Func.) 1.8776.527 X 1, X 2 2.2325.438 X 1, X 3 1.4072.646 X 1, X 4 1.8758.528 x 1 x 2 x 3 x 4 x 1 1.090 -.150.502 x 2 1 -.024.369 x 3 1.416 x 4 1 X 2, X 3 0.7430.813 X 2, X 4 1.3922.650 X 3, X 4 1.2453.687 X 1, X 2, X 3 0.1099.972 X 1, X 2, X 4 1.3905.650 X 1, X 3, X 4 1.1156.719 X 2, X 3, X 4 0.4652.883 X 1, X 2, X 3, X 4 0.1098.972

Standardized coefficients used to establish a common metric for comparison income = α + β years income = α + 2( years of education) + β ( I. Q.) 1 ( 2 of education) + 1( I. Q.) + ε + ε Can you say that years of education is more important than I.Q.? Of course, you cannot, because they are not measured with the same metric. One way to solve this problem of comparing beta coefficients is to use standardized coefficients. Standardized coefficients are calculated in a regression equation using the z-scores of the dependent (Y) and independent (X) variables.

Interpreting the standardized coefficients One standard deviation of x 1 will increase y by the standardized coefficient associated with x 1. Model 1 (Constant) HomeSize SizeSquared a. Dependent Variable: EnergyUse EnergyUse HomeSize SizeSquared Valid N (listwise) Descriptive Statistics Coefficients a Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. -1216.1438870 242.80636850-5.009.00155 2.39893018.24583560 4.049 9.758.00003 N Mean Std. Deviation 10 1594.70 306.667 -.00045004.00005908-3.161-7.618.00012 10 1880.00 517.623 10 3775540 2153984.105 10 Every increase of 1 s.d. in X 1 increases the Y by 4.049 s.d., i.e., 4.049 * 306.667=1241.69 or using the unstandardized coefficients 2.39893018 * 517.623=1241.74 (rounding errors but they should be equal)

Dummy variables Income = 5.41+ 1.9* ASIAMER + 2.5* CAUCAS + 0.7* HISPAN + 2.2* OTHER +.95* 12 yrs of educ

Multicolinearity Data for 67 Florida Counties fem = Percentage of households headed by a female inc = Median income hs = Percentage of residents over 25 years old with at least a high school diploma urb = Percentage of residents living in an urban environment cr = Number of crimes per capita unemrt = Unemployment rate

unemrt cr urb hs un inc fem fem inc un hs urb cr unemrt

Correlations fem inc un hs urb cr unemrt fem Pearson Correlation 1 -.561** -.055 -.511** -.435** -.143 -.055 Sig. (2-tailed).000.661.000.000.248.661 N 67 67 67 67 67 67 67 inc Pearson Correlation -.561** 1 -.119.793**.730**.432** -.119 Sig. (2-tailed).000.337.000.000.000.337 N 67 67 67 67 67 67 67 un Pearson Correlation -.055 -.119 1 -.250* -.053 -.001 1.000** Sig. (2-tailed).661.337.041.670.996.000 N 67 67 67 67 67 67 67 hs Pearson Correlation -.511**.793** -.250* 1.791**.468** -.250* Sig. (2-tailed).000.000.041.000.000.041 N 67 67 67 67 67 67 67 urb Pearson Correlation -.435**.730** -.053.791** 1.678** -.053 Sig. (2-tailed).000.000.670.000.000.670 N 67 67 67 67 67 67 67 cr Pearson Correlation -.143.432** -.001.468**.678** 1 -.001 Sig. (2-tailed).248.000.996.000.000.996 N 67 67 67 67 67 67 67 unemrt Pearson Correlation -.055 -.119 1.000** -.250* -.053 -.001 1 Sig. (2-tailed).661.337.000.041.670.996 N 67 67 67 67 67 67 67 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Detecting Multicollinearity with the Variance Inflation Factor (VIF) The percentage of each variable not related to the other predictors. Model 1 (Constant) fem inc hs unemrt urb a. Dependent Variable: cr Unstandardized Coefficients Coefficients a Standardized Coefficients Collinearity Statistics B Std. Error Beta t Sig. Tolerance VIF.024.042.579.565.002.001.172 1.516.135.646 1.547 1.450E-08.000.002.015.988.313 3.191.000.001 -.090 -.482.632.237 4.217.000.001.030.304.762.842 1.188.001.000.824 5.172.000.328 3.049 VIF = 1/Tolerance. If Tolerance =1, then VIF =1. As VIF becomes larger, greater overlap exists among predictors.

Z scores for crime per capita

Z scores for % living in urbanized area

Positive and significant z-score indicates spatial clustering of high values. Negative and significant z-score indicates spatial clustering of low values.

Final Paper data in GIS ma_eqv.dbf ma_eqv_intro.txt MA Kind of Community (KOC) data for all cities/towns in MA A brief explanation of the MA Department of Revenue s Kind-of- Community classification of MA cities and towns GIS Spatial Data Set (formatted as ArcGIS shapefiles and located in the gis sub-directory): ma_towns00 majmhda1 maj_pop1 p525_ma majmhdcl.avl Town boundaries for MA cities and towns Major roads for MA 9see class for road type distinctions) Major MA lakes and ponds (for better cartography) Boundaries for MA PUMA regions Pre-configured classification and symbols for MA major roads