PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Similar documents
CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Inference in the Multiple-Regression

INSTRUMENTAL VARIABLES

Lab 1 The Scientific Method

, which yields. where z1. and z2

Pattern Recognition 2014 Support Vector Machines

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

CHM112 Lab Graphing with Excel Grading Rubric

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Attribute Data. ArcGIS reads DBF extensions. Data in any statistical software format can be

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Tutorial 3: Building a spectral library in Skyline

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Experiment #3. Graphing with Excel

How do scientists measure trees? What is DBH?

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

ENSC Discrete Time Systems. Project Outline. Semester

Resampling Methods. Chapter 5. Chapter 5 1 / 52

A Matrix Representation of Panel Data

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Kinetic Model Completeness

Eric Klein and Ning Sa

Functional Form and Nonlinearities

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

We can see from the graph above that the intersection is, i.e., [ ).

Hypothesis Tests for One Population Mean

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Differentiation Applications 1: Related Rates

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Checking the resolved resonance region in EXFOR database

Data Analysis, Statistics, Machine Learning

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

AP Statistics Notes Unit Two: The Normal Distributions

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

Section 11 Simultaneous Equations

CEE3430 Engineering Hydrology HEC HMS Bare Essentials Tutorial and Example

Misc. ArcMap Stuff Andrew Phay

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Tree Structured Classifier

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Comparing Several Means: ANOVA. Group Means and Grand Mean

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Professional Development. Implementing the NGSS: High School Physics

Introduction to Regression

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

IN a recent article, Geary [1972] discussed the merit of taking first differences

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

IAML: Support Vector Machines

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

TP1 - Introduction to ArcGIS

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Chapter 3: Cluster Analysis

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

20 Faraday s Law and Maxwell s Extension to Ampere s Law

Physics 2010 Motion with Constant Acceleration Experiment 1

BASD HIGH SCHOOL FORMAL LAB REPORT

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Activity Guide Loops and Random Numbers

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

SUMMER REV: Half-Life DUE DATE: JULY 2 nd

Review Problems 3. Four FIR Filter Types

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

Pipetting 101 Developed by BSU CityLab

Distributions, spatial statistics and a Bayesian perspective

NUMBERS, MATHEMATICS AND EQUATIONS

AEC 874 (2007) Field Data Collection & Analysis in Developing Countries. VII. Data Analysis & Project Documentation

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

CONTENTS OF PART IV NOTES FOR SUMMER STATISTICS INSTITUTE COURSE COMMON MISTAKES IN STATISTICS SPOTTING THEM AND AVOIDING THEM

Turing Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement:

CHAPTER 2 Algebraic Expressions and Fundamental Operations

Thermodynamics Partial Outline of Topics

Churn Prediction using Dynamic RFM-Augmented node2vec

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions.

Purchase Order Workflow Processing

Study Group Report: Plate-fin Heat Exchangers: AEA Technology

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

Hubble s Law PHYS 1301

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1

Simple Linear Regression (single variable)

Smoothing, penalized least squares and splines

Five Whys How To Do It Better

Making and Experimenting with Voltaic Cells. I. Basic Concepts and Definitions (some ideas discussed in class are omitted here)

Revised 2/07. Projectile Motion

Transcription:

There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the type f spatial prcess underlying yur data and infrm yur selectin f an apprpriate spatial regressin mdel (i.e., spatial errr r spatial lag in GeDa). The secnd part is intended t intrduce hw t specify and interpret tw spatial regressin mdels: the spatial errr mdel and the spatial lag mdel. The tw appraches have different assumptins and theretical implicatins abut the frm f the spatial prcess being analyzed. The spatial errr identifies spatial autcrrelatin in the errr structure f the specified regressin mdel. In cntrast, the spatial lag mdel identifies spatial autcrrelatin in the cvariance structure f the dependent variable. Objectives. Cnduct an OLS regressin analysis in GeDa using multiple weights matrices Examine the spatial and nn spatial diagnstics Save and explre the residuals frm the OLS mdel Specify and examine the diagnstics f a spatial lag and spatial errr regressin analysis Cmpare the results f the mdels and interpret the substantive implicatins Part 1: Spatial Diagnstics OLS. Open GeDa and lad suth00.shp using FIPS as the key field. What are imprtant crrelates f child pverty that shuld be included in the regressin mdel? In GeDa, yu can run a series f standard OLS regressins; nte that the assumptins f linearity and nrmality apply. Decisins abut variable transfrmatins and utliers shuld be made befre running an OLS regressin. The results f the regressin, f curse, als can assist this analytical prcess. Regressin Run an OLS regressin analysis f child pverty and sme reasnable crrelates (Regress>) Change the utput title; this helps keep yur recrds rganized when yu run multiple mdels (e.g., OLS1) Change the utput title with each run, r it will verwrite the riginal file; it des nt append t a single file The utput file is saved t the directry where the data are lcated The extensin is *.OLS and can be read in Wrdpad r MS Wrd Specify the utput frmat The Predicted Value and Residual ptin is nt useful with large data sets since it prints the values fr each bservatin and, thus, creates a huge utput (text) file This infrmatin can be added t the data table at anther pint The Cefficient Variance Matrix ptin prvides the variance f the estimates (n the diagnal) and all cvariances Used t carry ut custmized tests f cnstraints n the mdel cefficients in statistical packages ther than GeDa (e.g., STATA) The Mran s I z value ptin reprts an estimate f the spatial autcrrelatin in the residuals f the mdel yu are specifying Select this ptin; the Mran s I value is reprted autmatically, but tests fr statistical significance reprted nly when yu select this ptin Specify the regressin mdel Vss & Curtis 1

Dependent Variable: child pverty, SQRTPPOV (square rt transfrmed) r PPOV, if yu prefer Independent Variables: What shall we explre? Chse weights matrix (necessary t get spatial diagnstics): Which shuld we use? Chse Classic mdel Nte: In GeDa the include cnstant term ptin is checked by default; uncheck if yu have reasn t exclude a cnstant frm yur mdel (e.g., fixed effects mdel) Run the mdel by clicking n the Run buttn Chse Save if yu want t add predicted values and residuals t the data table; this is an ptin nly after running the mdel If yu select the OK buttn befre yu select the Save buttn, yu will need t rerun the mdel t get the estimates Name the variables (predicted values and/r residuals) smething meaningful (e.g., OLS1_RES) Yu will need t create a new shapefile t permanently append the new variables t yur table (it is like a wrking file in SAS) (activate the table bject>file>save t Shape File As ) Output File An utput windw autmatically appears when selecting OK The file als can be viewed in Wrdpad r MS Wrd; Ntepad is nt recmmended (can pen but the frmat is messy) File cntent: Summary statistics f the mdel and measures f fit Parameter estimates Mdel diagnstics The F statistic reprted in the tp sectin is a test f the null hypthesis that all regressin cefficients are jintly 0 Nt that useful, unless yur mdel is way ff base 3 imprtant statistics reprted at the tp fr mdel cmparisns: Lg likelihd: higher, better (less negative) Akaike Infrmatin Criterin (AIC): lwer, better ( 2L + 2K) Schwarz Criterin (SC): lwer, better ( 2L + 2K x ln(n)) where L is the lg likelihd, K is the number f parameters, and Ln(N) is natural lg f the frequency values f the bservatin Standard Diagnstics Multicllinearity: nt a test statistic, per se, but a diagnstic t suggest prblems with the stability f the regressin results due t multicllinearity > 30 is prblematic, in general Nte: high values are cmmn when interactin terms are used since the independent variables are pwers and crss prducts f each ther Additinal nte: I have fund this diagnstic t be unreliable in GeDa especially with small data sets; examine multicllinearity in ther statistical packages (e.g., SAS) Nrmality: Jarque Bera test Chi square distributins with 2 df Tests the assumptin f nrmality in the errrs Vss & Curtis 2

Heterskedasticity is tested n three null hyptheses Breusch Pagan: assumes heterskedasticity is a functin f the squares f the explanatry variables Kenker Bassett: same as BP, except residuals are studentized (made rbust t nnnrmality) White: des nt assume a specific functinal frm f heterskedasticity A NA is smetimes reprted fr this test when interactins are included in the mdel because all square pwers and crss prducts are cnsidered in this test fr heterskedasticity Mran s I (Errr) This is the glbal value, as reprted in the scatter plt, less any explanatry value f the predictrs and is derived frm the errrs f the regressin mdel Usually bserve sme reductin (cmpared t riginal MI n the utcme) What was ur riginal statistic? Hw d the values cmpare? Tests fr statistical significance are nt reprted (i.e., NA is reprted) if yu did nt select the Mran s I z value ptin when yu specified the utput Lagrange Multiplier In general, the LM is used in mathematical ptimizatin prblems and is a methd fr finding the lcal extreme values f a functin f several variables subject t ne r mre cnstraints Here, the LM gives sme indicatin f which type f spatial regressin mdel is mst apprpriate Cmpare as yu add predictrs; d nt run with the first mdel utput We are trying t eliminate spatial autcrrelatin frm ur mdel and can inapprpriately estimate it if we haven t exhausted the alternatives t a spatial dependence regressin mdel Errr, lag, r SARMA (bth lag and errr)? Only cnsider the rbust LM statistics when the standard LM values are statistically significant A larger LM suggests the mre likely mdel SARMA is always significant, it seems, and is nt that useful in practice It tends t be significant when either lag r errr is indicated, nt just when a higher rder mdel is The value can be cmpared with the standard LM values; if similar, then it is nt picking up a higher rder mdel Which mdel is indicated? Have we exhausted ther explanatins? What abut a trend surface r ther techniques t address spatial hetergeneity? Residuals. The predicted and residual values are appended at the end f the table if yu chse this ptin under the Save buttn when specifying the regressin mdel (pen data table). Maps Predicted value maps (Map>Std Dev>predicted value variable saved t table) In essence, smthed maps since the randm variability due t factrs ther than thse in the mdel has been smthed ut Residual maps (Map>Std Dev>residual value variable saved t table) Vss & Curtis 3

Gives a sense f spatial autcrrelatin patterns since they suggest any under r verpredictin in sub regins Quantile Maps f predicted values and residual values (Map>Quantile>variable) Predicted value quantile map shws where predicted pverty is higher (darker) and lwer (lighter) Residual value map is mre intuitive, fr me, and shws ver predictin (lighter) and under predictin (darker) Where is the mdel ver predicting? Under predicting? Is there evidence f spatial clustering? What abut the pssibility f spatial regimes? Mran Scatter Plt & LISA Map Run a Mran scatter plt n the residuals (Space>Univariate Mran) Use the same weights matrix that yu used in the regressin mdel It is purely descriptive Thrugh this apprach, we are nt able t btain reliable estimates fr significance tests r LISA map cnstructin because the permutatin functin ignres the fact that OLS residuals are already crrelated by cnstructin Still, it gives yu sme sense and it is usually nt far ff base Cnstruct a LISA map (Space>Univariate LISA) Use the same weight matrix that yu used in the regressin mdel Again, purely descriptive, but smewhat useful in identifying gegraphic areas where the mdel des nt explain the spatial distributin f the dependent variable Things t Cnsider. What d yu think is indicated by the tests fr spatial dependence based n the OLS residuals in terms f what mdel might be a gd fit fr yur data (errr r lag)? Hw r d the diagnstics fr spatial dependence differ when yu use different spatial weights matrices? Hw des the patterning f psitive and negative residuals in the chrpleth maps f yur OLS residuals relate t yur mdel diagnstics? What clustering is evidenced in the residuals using LISA maps? D yu think there might be any prcesses r mitted variables that culd help explain the clustering in the residuals? Part 2: Spatial Regressin Mdels Spatial Regressin. The specificatins f the spatial regressin mdel shuld be based n the results frm the standard OLS mdel t make meaningful cmparisns. Regressin Run a mdel that yu think ught t reasnably explain child pverty in the Suth (Regress>) Specify the utput frmat Specify the regressin mdel Dependent Variable: child pverty, SQRTPPOV (square rt transfrmed) r PPOV, if yu prefer Independent Variables: What shall we explre? (shuld be cnsistent with OLS t be cmpared) Chse weight matrix: Which shuld we use? (shuld be cnsistent with OLS t be cmpared) Run bth the Spatial Errr and Spatial Lag ptins fr cmparisn Vss & Curtis 4

Save the residuals, predicted values, and predicted errrs (chse the Save buttn and give the variables a meaningful name) Remember that yu will need t create a new shapefile t permanently append the new variables t yur table (it is like a wrking file in SAS) (activate the table bject>file>save t Shape File As ) Output File An utput windw autmatically appears when selecting OK The file cntent is similar t that reprted fr the classic OLS regressin Summary statistics f the mdel and measures f fit Parameter estimates Mdel diagnstics A pseud R squared is reprted and can be cmpared t the OLS mdels, yet the lg likelihd, AIC and SC are better fr mdel cmparisns T review Lg likelihd: bigger, better (less negative) AIC and SC: lwer, better Review the autregressive cefficient (ρ, spatial lag, r λ, spatial errr) Is it significant? What is the directin? Is it what yu expected? Review the explanatry variables Check the signs, significance, and magnitude Check mdel heterskedasticity Only the Breush Pagan test is reprted (tests n randm cefficients that assumes a functinal frm based n the squares f the explanatry variables) Als, can plt the mdel residuals (Explre>Scatter Plt) Y: residual values X: predicted values Check the likelihd rati test fr the specified spatial frm (lag r errr, depending n the mdel) This test cmpares the spatial mdel t the nn spatial alternative What is missing in the GeDa diagnstics is a direct cmparisn with the alternative spatial mdel (lag vs. errr); can get this thrugh SpaceStat, R, and, hpefully, in future versins f GeDa Fr nw, we cmpare the tw mdels n a number f different pints (LL, AIC, etc.) Predicted Values, Predictin Errrs and Residuals Predicted Values: the estimated value f child pverty ( I ˆ W ) 1 X ˆ Predictin Errrs: the difference between the bserved and predicted values f child pverty, btained by cnsidering the exgenus variables alne 1 ( I W ) Residuals: estimates fr the mdel errr term u ( I ˆ W ) y X ˆ Vss & Curtis 5

Cnstruct a univariate Mran scatter plt fr the residuals and errrs (Space>Univariate Mran) Residuals: shuld be clse t 0 since spatial autcrrelatin has been purged frm the mdel r, alternatively phrased, captured in the ρ r λ parameter Predictin Errrs: is abut the same as the riginal OLS MI statistic This is kay since, by definitin, they are spatially crrelated; the predicted errrs are an estimate fr the spatially transfrmed errrs Cmpare the scatter plt f the lag and errr mdel residuals What des this cmparisn indicate? Things t Cnsider. Which mdel, given all f the infrmatin we ve explred, is a better fit fr ur data? What des this mdel selectin mean, cnceptually, in terms f ur utcme variable? What, if any, substantive infrmatin is gained thrugh spatial regressin techniques? Vss & Curtis 6