Multiple Regression and Model Building (cont d) + GIS Lecture 21 3 May 2006 R. Ryznar

Similar documents
Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Item-Total Statistics. Corrected Item- Cronbach's Item Deleted. Total

In Class Review Exercises Vartanian: SW 540

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Model Building Chap 5 p251

Sociology 593 Exam 1 February 14, 1997

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Correlation and simple linear regression S5

Topic 18: Model Selection and Diagnostics

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Lecture 3: Multivariate Regression

Multivariate Correlational Analysis: An Introduction

Multiple linear regression S6

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Ordinary Least Squares Regression Explained: Vartanian

STAT 212 Business Statistics II 1

Statistics 5100 Spring 2018 Exam 1

Lecture 4: Multivariate Regression, Part 2

Multiple Regression Analysis

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Multiple linear regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Multiple Regression Methods

y response variable x 1, x 2,, x k -- a set of explanatory variables

MORE ON SIMPLE REGRESSION: OVERVIEW

Practical Biostatistics

Simple Linear Regression: One Qualitative IV

MATH ASSIGNMENT 2: SOLUTIONS

Daniel Boduszek University of Huddersfield

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Correlation and Regression Bangkok, 14-18, Sept. 2015

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

CRP 272 Introduction To Regression Analysis

Chapter 4. Regression Models. Learning Objectives

Analysing data: regression and correlation S6 and S7

Using the Regression Model in multivariate data analysis

Technical Appendix C: Methods. Multilevel Regression Models

Simple Linear Regression

Simple Linear Regression: One Quantitative IV

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Sociology Research Statistics I Final Exam Answer Key December 15, 1993

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Chapter 9 - Correlation and Regression

ECON 497 Midterm Spring

Equation Number 1 Dependent Variable.. Y W's Childbearing expectations

Sociology 593 Exam 1 February 17, 1995

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Descriptive Statistics

Statistics and Quantitative Analysis U4320

Inter Item Correlation Matrix (R )

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Lecture 4: Multivariate Regression, Part 2

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Review of Multiple Regression

Bivariate Regression Analysis. The most useful means of discerning causality and significance of variables

Interactions, Dummies, and Outliers

1 Correlation and Inference from Regression

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Sociology 593 Exam 2 March 28, 2002

General Linear Model (Chapter 4)

Regression. Notes. Page 1. Output Created Comments 25-JAN :29:55

Chapter 7 Student Lecture Notes 7-1

STA 4210 Practise set 2a

Sociology 593 Exam 1 Answer Key February 17, 1995

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Introduction to Linear regression analysis. Part 2. Model comparisons

LI EAR REGRESSIO A D CORRELATIO

Simple Linear Regression: One Qualitative IV

( ), which of the coefficients would end

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

Lecture 19: Inference for SLR & Transformations

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Technical Appendix C: Methods

Unit 11: Multiple Linear Regression

Confidence Interval for the mean response

Single and multiple linear regression analysis

Regression Models. Chapter 4. Introduction. Introduction. Introduction

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Lecture 11 Multiple Linear Regression

Model Selection Procedures

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

SCHOOL OF MATHEMATICS AND STATISTICS

Chapter 3 Multiple Regression Complete Example

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Sociology 593 Exam 2 Answer Key March 28, 2002

Multiple Regression: Chapter 13. July 24, 2015

Categorical Predictor Variables

9. Linear Regression and Correlation

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Transcription:

Multiple Regression and Model Building (cont d) + GIS 11.220 Lecture 21 3 May 2006 R. Ryznar

Model Summary b 1-[(SSE/n-k+1)/(SST/n-1)] Model 1 Adjusted Std. Error of R R Square R Square the Estimate.991 a.982.977 46.801 a. Predictors: (Constant), SizeSquared, HomeSize SSE Model 1 Regression Residual Total b. Dependent Variable: EnergyUse R 2 =SSR/SST or 1-(SSE/SST) ANOVA b Sum of Squares df Mean Square F Sig. 831069.5 2 415534.773 189.710.0001 a 15332.554 7 2190.365 846402.1 9 a. Predictors: (Constant), SizeSquared, HomeSize b. Dependent Variable: EnergyUse Coefficients a S 2 = SSE/n (k + 1) Sometimes called MSE F= R 2 /k (1-R 2 )/[n-(k+1)] Model 1 (Constant) HomeSize SizeSquared a. Dependent Variable: EnergyUse Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. -1216.1438870 242.80636850-5.009.00155 2.39893018.24583560 4.049 9.758.00003 -.00045004.00005908-3.161-7.618.00012 y = 2 β + β x + β + ε 0 1 2 x K=number of X variables

y 0 1 = β + β x + ε Model 1 Model Summary b Adjusted Std. Error of R R Square R Square the Estimate.912 a.832.811 133.438 a. Predictors: (Constant), HomeSize b. Dependent Variable: EnergyUse ANOVA b Model 1 Regression Residual Total Sum of Squares df Mean Square F Sig. 703957.2 1 703957.183 39.536.000 a 142444.9 8 17805.615 846402.1 9 a. Predictors: (Constant), HomeSize b. Dependent Variable: EnergyUse Model 1 (Constant) HomeSize Unstandardized Coefficients a. Dependent Variable: EnergyUse Coefficients a Standardized Coefficients B Std. Error Beta t Sig. 578.928 166.968 3.467.008.540.086.912 6.288.000

Correlation with Y (r) (survival time) x 1.346 x 2.593 x 3.665 x 4.726 X variables SSE R 2 X 1 (Blood Clotting) 3.4961.120 X 2 (Prognostic Ind.) 2.5763.352 X 3 (Enzyme Func.) 2.2153.442 X 4 (Liver Func.) 1.8776.527 X 1, X 2 2.2325.438 X 1, X 3 1.4072.646 X 1, X 4 1.8758.528 x 1 x 2 x 3 x 4 x 1 1.090 -.150.502 x 2 1 -.024.369 x 3 1.416 x 4 1 X 2, X 3 0.7430.813 X 2, X 4 1.3922.650 X 3, X 4 1.2453.687 X 1, X 2, X 3 0.1099.972 X 1, X 2, X 4 1.3905.650 X 1, X 3, X 4 1.1156.719 X 2, X 3, X 4 0.4652.883 X 1, X 2, X 3, X 4 0.1098.972

Standardized coefficients used to establish a common metric for comparison income = α + β years income = α + 2( years of education) + β ( I. Q.) 1 ( 2 of education) + 1( I. Q.) + ε + ε Can you say that years of education is more important than I.Q.? Of course, you cannot, because they are not measured with the same metric. One way to solve this problem of comparing beta coefficients is to use standardized coefficients. Standardized coefficients are calculated in a regression equation using the z-scores of the dependent (Y) and independent (X) variables.

Interpreting the standardized coefficients One standard deviation of x 1 will increase y by the standardized coefficient associated with x 1. Model 1 (Constant) HomeSize SizeSquared a. Dependent Variable: EnergyUse EnergyUse HomeSize SizeSquared Valid N (listwise) Descriptive Statistics Coefficients a Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. -1216.1438870 242.80636850-5.009.00155 2.39893018.24583560 4.049 9.758.00003 N Mean Std. Deviation 10 1594.70 306.667 -.00045004.00005908-3.161-7.618.00012 10 1880.00 517.623 10 3775540 2153984.105 10 Every increase of 1 s.d. in X 1 increases the Y by 4.049 s.d., i.e., 4.049 * 306.667=1241.69 or using the unstandardized coefficients 2.39893018 * 517.623=1241.74 (rounding errors but they should be equal)

Dummy variables Income = 5.41+ 1.9* ASIAMER + 2.5* CAUCAS + 0.7* HISPAN + 2.2* OTHER +.95* 12 yrs of educ

Multicolinearity Data for 67 Florida Counties fem = Percentage of households headed by a female inc = Median income hs = Percentage of residents over 25 years old with at least a high school diploma urb = Percentage of residents living in an urban environment cr = Number of crimes per capita unemrt = Unemployment rate

unemrt cr urb hs un inc fem fem inc un hs urb cr unemrt

Correlations fem inc un hs urb cr unemrt fem Pearson Correlation 1 -.561** -.055 -.511** -.435** -.143 -.055 Sig. (2-tailed).000.661.000.000.248.661 N 67 67 67 67 67 67 67 inc Pearson Correlation -.561** 1 -.119.793**.730**.432** -.119 Sig. (2-tailed).000.337.000.000.000.337 N 67 67 67 67 67 67 67 un Pearson Correlation -.055 -.119 1 -.250* -.053 -.001 1.000** Sig. (2-tailed).661.337.041.670.996.000 N 67 67 67 67 67 67 67 hs Pearson Correlation -.511**.793** -.250* 1.791**.468** -.250* Sig. (2-tailed).000.000.041.000.000.041 N 67 67 67 67 67 67 67 urb Pearson Correlation -.435**.730** -.053.791** 1.678** -.053 Sig. (2-tailed).000.000.670.000.000.670 N 67 67 67 67 67 67 67 cr Pearson Correlation -.143.432** -.001.468**.678** 1 -.001 Sig. (2-tailed).248.000.996.000.000.996 N 67 67 67 67 67 67 67 unemrt Pearson Correlation -.055 -.119 1.000** -.250* -.053 -.001 1 Sig. (2-tailed).661.337.000.041.670.996 N 67 67 67 67 67 67 67 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Detecting Multicollinearity with the Variance Inflation Factor (VIF) The percentage of each variable not related to the other predictors. Model 1 (Constant) fem inc hs unemrt urb a. Dependent Variable: cr Unstandardized Coefficients Coefficients a Standardized Coefficients Collinearity Statistics B Std. Error Beta t Sig. Tolerance VIF.024.042.579.565.002.001.172 1.516.135.646 1.547 1.450E-08.000.002.015.988.313 3.191.000.001 -.090 -.482.632.237 4.217.000.001.030.304.762.842 1.188.001.000.824 5.172.000.328 3.049 VIF = 1/Tolerance. If Tolerance =1, then VIF =1. As VIF becomes larger, greater overlap exists among predictors.

Z scores for crime per capita

Z scores for % living in urbanized area

Positive and significant z-score indicates spatial clustering of high values. Negative and significant z-score indicates spatial clustering of low values.

Final Paper data in GIS ma_eqv.dbf ma_eqv_intro.txt MA Kind of Community (KOC) data for all cities/towns in MA A brief explanation of the MA Department of Revenue s Kind-of- Community classification of MA cities and towns GIS Spatial Data Set (formatted as ArcGIS shapefiles and located in the gis sub-directory): ma_towns00 majmhda1 maj_pop1 p525_ma majmhdcl.avl Town boundaries for MA cities and towns Major roads for MA 9see class for road type distinctions) Major MA lakes and ponds (for better cartography) Boundaries for MA PUMA regions Pre-configured classification and symbols for MA major roads