Outline. Inference in regression. Critical values GEOGRAPHICALLY WEIGHTED REGRESSION

Similar documents
Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Correlation and Regression (Excel 2007)

Econometrics. 4) Statistical inference

Outline. ArcGIS? ArcMap? I Understanding ArcMap. ArcMap GIS & GWR GEOGRAPHICALLY WEIGHTED REGRESSION. (Brief) Overview of ArcMap

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Geographically Weighted Regression

Inference for Regression

Looking at the Other Side of Bonferroni

Lectures 5 & 6: Hypothesis Testing

Statistical testing. Samantha Kleinberg. October 20, 2009

Business Statistics. Lecture 10: Course Review

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Specific Differences. Lukas Meier, Seminar für Statistik

Statistical Inference for Means

High-throughput Testing

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistics Part IV Confidence Limits and Hypothesis Testing. Joe Nahas University of Notre Dame

Probability and Statistics Notes

STA 6167 Exam 3 Spring 2016 PRINT Name

Review of Statistics 101

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

The spgwr Package. January 9, 2006

A discussion on multiple regression models

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Garvan Ins)tute Biosta)s)cal Workshop 16/7/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia

Psychology 282 Lecture #4 Outline Inferences in SLR

STA 4210 Practise set 2a

STAT 461/561- Assignments, Year 2015

Variance Decomposition and Goodness of Fit

Journal Club: Higher Criticism

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Inference in Normal Regression Model. Dr. Frank Wood

Stat 206: Estimation and testing for a mean vector,

Simple Linear Regression

Density Temp vs Ratio. temp

Sociology 6Z03 Review II

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

MATH 240. Chapter 8 Outlines of Hypothesis Tests

0 0'0 2S ~~ Employment category

Plan Martingales cont d. 0. Questions for Exam 2. More Examples 3. Overview of Results. Reading: study Next Time: first exam

Relating Graph to Matlab

STAT 215 Confidence and Prediction Intervals in Regression

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inferences for Regression

Remedial Measures, Brown-Forsythe test, F test

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Statistical Inference. Hypothesis Testing

Inference for Regression Simple Linear Regression

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

ANOVA Analysis of Variance

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Lecture 41 Sections Mon, Apr 7, 2008

1 Multiple Regression

Homework 9 Sample Solution

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

a) The runner completes his next 1500 meter race in under 4 minutes: <

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Inference with Heteroskedasticity

Statistical Applications in Genetics and Molecular Biology

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Preview from Notesale.co.uk Page 3 of 63

Simple Linear Regression: One Qualitative IV

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Correlation and Simple Linear Regression

Inference for Single Proportions and Means T.Scofield

MS&E 226: Small Data

Lecture 6 Multiple Linear Regression, cont.

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

STAT22200 Spring 2014 Chapter 5

Alpha-Investing. Sequential Control of Expected False Discoveries

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Simple Linear Regression: One Quantitative IV

HYPOTHESIS TESTING. Hypothesis Testing

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Simple and Multiple Linear Regression

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Step-down FDR Procedures for Large Numbers of Hypotheses

Normal Curve in standard form: Answer each of the following questions

Chapter 16: Understanding Relationships Numerical Data

An inferential procedure to use sample data to understand a population Procedures

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

2. Outliers and inference for regression

Transcription:

GEOGRAPHICALLY WEIGHTED REGRESSION Outline Multiple Hypothesis Testing : Setting critical values for local pseudo-t statistics stewart.fotheringham@nuim.ie http://ncg.nuim.ie ncg.nuim.ie/gwr/ Significance tests in regression Parameter significance in GWR Multiple testing procedures Critical value adjustment in GWR Excel and R functions Example: Georgia data Inference in regression It is common in regression to test whether each parameter is zero A variable whose parameter estimate is zero does not contribute to the variation in the model. The t statistic is commonly reported on regression output for the null hypothesis that β n = 0 If the reported value of t for a parameter is greater than +1.96 or less than -1.96 (normal approximation - Wald), we reject the null hypothesis for that parameter. The parameter estimate is described as significant Critical values Note that in this context, significant does not mean important, merely not zero. Note as well that the test statistic follows a t distribution. However, if the sample is large enough, you can use the percentage points of the normal distribution as critical values If you use the 5% significance level, then the critical values for various sample sizes are: t α=0.05,df=30 = 2.04 t α=0.05,df=100 = 1.98 t α=0.05,df=500 = 1.97 t α=0.05,df=1000 = 1.96

What about GWR If we map the parameter estimates we see some apparent variation Sometimes most of the parameter estimates will be positive but a small handful are negative This suggests that the variable s influence is positive virtually everywhere, but in some places negative! and vice versa Local pseudo-t statistics We can divide the local parameter estimates by their associated local standard errors to obtain local pseudo-t statistics There will be one t for each of the m+1 variables and n regression points (the intercept term has local t statistics as well) Critical value This gives us n * (m+1) simultaneous tests So can we use the percentage points of the normal distribution as the critical values? NO! If we carried out 100 independent statistical tests, and we use the 5% significance level, we would expect that about 5 of these tests are significant by chance If n = 1000 and p = 4, then we will be carrying out 5000 tests, and about 250 would be significant So the false positive rate is much higher than the 5% of the individual tests Adjusting the critical value This means that we have to increase the critical value for a given α level to account for this This means changing the test-wise error rate to maintain the family-wise error rate at α So, you can t just use 1.96 or 2.58 Adjustments Our problem is that we have a large number of dependent tests (generally they are positively dependent) There are a number of different ways of adjusting the critical value

Bonferroni adjustment A commonly encountered adjustment in multiple testing is the Bonferroni adjustment The adjusted critical value is: β = α / n With 5000 tests, and α=0.05, the adjusted level β is 0.0001 So, instead of 1.96, you use 4.44 Bonferroni The effect is this is to make the test more stringent The Bonferroni test will deal with dependencies But it is widely regarded as being far too conservative that is, it rejects too many tests Yoav Benjamini Benjamini has long been interested in the multiple testing problem He is associated with the concept of the False Discovery Rate His work with Daniel Yekutieli has dealt with adjustment where the tests are independent Adjustment procedures The Benjamini-Yekutieli adjustment, and several similar procedures are complex to operationalise We have discovered that there is a Bonferroni-like adjustment which gives results very similar to the B-Y test Byrne, et al, 2009 Fotheringham adjustment Multiple Dependent Hypothesis Tests in Geographically Weighted Regression presented at Geocomputation 2009 in Sydney Stewart made the observation that the adjustment would be proportion to the number of degrees of freedom in the GWR model β = 1+ α pe pe np

β = 1 + α pe p e np Procedure β is the adjusted test level α is the unadjusted test level (frequently 0.05) p e is the effective number of parameters in the model n is the number of regression points p is the number of parameters to be estimated You can compute the critical value by using TINV function in Excel TINV(β,n-p e ) The resulting value will lie between 1.96 and the over-conservative value from the Bonferroni adjustment General Procedure Compute the adjusted critical value, t β,npe, and the TINV function in Excel (R [use qt()], or whatever) Map the parameter estimates for a variable of interest Select those locations where the associated local t value is in excess of the adjusted critical value Create a layer from this these are locations where the parameter estimate is significant - map this layer. Excel and R In Excel TINV(0.05,1000) returns 1.962339 In R qt(1-(0.05/2),1000) returns 1.962339 Georgia As an example, we ll fit global and local models to the Georgia data PctBach is the dependent variable PctEld, PctFB, PctPov and PctBlack are the 4 independent variables in this model Global model Parameter Estimate (B) Std Err T p(b=0) --------- ------------ ------------ ------------ ------- Intercept 12.671125639718 1.636823331703 7.741291046143 0.0000 PctEld -0.105305563762 0.137837644439-0.763982594013 0.4461 PctFB 2.545211464154 0.298392136976 8.529753684998 0.0000 PctPov -0.282906604601 0.078104575505-3.622151374817 0.0004 PctBlack 0.076847513749 0.028028073689 2.741805076599 0.0068 In the global model the t statistics for the null hypotheses suggest that the variation in all the variable except PctEld contribute to the variation in the model. These variables are significant. The critical value of T with α=0.05 and 159-5 degrees of freedom is 1.98 (so the normal approximation of 1.96 is reasonable)

GWR There are n=159 regression points (centroids of the counties) There are p=5 parameters estimated at each regression point Local model The effective degrees of freedom in the local model is 12.359 The test-wise level to keep the family-wise error rate, α, at 0.05 will be 0.00375 which yields a critical value of 2.95 TINV(0.00375,159-12.359) => 2.95 Selection/Select by Attributes Label Minimum Lwr Quartile Median Upr Quartile Maximum -------- ------------- ------------- ------------- ------------- ------------- Intrcept 10.056587 11.026283 12.064082 13.257461 15.077528 PctEld -0.455205-0.286661-0.203335-0.116267 0.053857 PctFB 0.963658 1.594383 3.020416 3.629860 3.771199 PctPov -0.309038-0.264154-0.162981-0.078514 0.027344 PctBlack -0.029594 0.008555 0.034712 0.089672 0.134230 PctPov seems to change sign mostly it s negative, but there are some counties where the relationship appears, counterintuitively, to be postive This is variable 4 we plot PARM_4 and select those areas where abs(tval_4) is greater than 2.95 To highlight the significant counties, select those where the absolute value of the T statistic is greater than 2.946 PctPov parameter estimates End The selected counties are highlighted this variable is only significant in the southernmost part of the state. The values around the sign change are too small to be significant.