GIST 4302/5302: Spatial Analysis and Modeling

Similar documents
Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Overview of Statistical Analysis of Spatial Data

Unit 2. Describing Data: Numerical

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

GIST 4302/5302: Spatial Analysis and Modeling

BNG 495 Capstone Design. Descriptive Statistics

Concepts and Applications of Kriging. Eric Krause

Concepts and Applications of Kriging

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

GIST 4302/5302: Spatial Analysis and Modeling

P8130: Biostatistical Methods I

Statistical Inference

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

STATISTICS 1 REVISION NOTES

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 5 Geostatistics

Lecture 2: Probability Distributions

Concepts and Applications of Kriging

Sets and Set notation. Algebra 2 Unit 8 Notes

Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

After completing this chapter, you should be able to:

2011 Pearson Education, Inc

Descriptive Data Summarization

Introduction to Statistics for Traffic Crash Reconstruction

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Descriptive Univariate Statistics and Bivariate Correlation

Math Section SR MW 1-2:30pm. Bekki George: University of Houston. Sections

Learning Objectives for Stat 225

AP Statistics Cumulative AP Exam Study Guide

Statistics for Managers using Microsoft Excel 6 th Edition

Glossary for the Triola Statistics Series

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

Contents. Acknowledgments. xix

Lecture 8. Spatial Estimation

Describing Distributions With Numbers Chapter 12

Describing Distributions

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

REVIEW: Midterm Exam. Spring 2012

Units. Exploratory Data Analysis. Variables. Student Data

Math Review Sheet, Fall 2008

Summarizing Measured Data

CIVL 7012/8012. Collection and Analysis of Information

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Bioeng 3070/5070. App Math/Stats for Bioengineer Lecture 3

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Statistics I Chapter 2: Univariate data analysis

1. Exploratory Data Analysis

STAT 200 Chapter 1 Looking at Data - Distributions

AP Final Review II Exploring Data (20% 30%)

Elementary Statistics

Introduction. Semivariogram Cloud

Statistics I Chapter 2: Univariate data analysis

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

in the company. Hence, we need to collect a sample which is representative of the entire population. In order for the sample to faithfully represent t

Review of Statistics

Introduction to Basic Statistics Version 2

Describing Distributions With Numbers

Spatial Analysis 1. Introduction

Chapter 3. Data Description

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Statistics Handbook. All statistical tables were computed by the author.

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Section 3. Measures of Variation

Ch. 1: Data and Distributions

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Sociology 6Z03 Review II

PRIME GENERATING LUCAS SEQUENCES

STAT 4385 Topic 01: Introduction & Review

Spatial Interpolation & Geostatistics

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

MATH 10 INTRODUCTORY STATISTICS

Describing distributions with numbers

Nature of Spatial Data. Outline. Spatial Is Special

Describing distributions with numbers

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

YEAR 12 - Mathematics Pure (C1) Term 1 plan

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Determining the Spread of a Distribution

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

University of Jordan Fall 2009/2010 Department of Mathematics

Describing Distributions with Numbers

Determining the Spread of a Distribution

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

ENGRG Introduction to GIS

MgtOp 215 Chapter 3 Dr. Ahn

Review of Multiple Regression

Transcription:

GIST 4302/5302: Spatial Analysis and Modeling Basics of Statistics Guofeng Cao www.myweb.ttu.edu/gucao Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Spring 2015

Outline of This Week ˆ Review basics of statistics and probability ˆ Learning pitfalls of spatial data Guofeng Cao, Texas Tech GIST4302, Basic Statistics 2/37

Basic Concepts in Statistics Population vs. Samples ˆ Population: total set of elements/measurements that could be (hypothetically) observed in a study, e.g., all U.S. college students ˆ Sample: subset of elements/measurements from population, e.g., surveyed college students in Texas Tech Population Parameters vs. Sample Statistics ˆ Parameters: summary measures that describe a population variable, e.g., average age of college students in Texas Tech. ˆ Statistics: summary measures that describe a sample variable, e.g., average age of surveyed Tech students Guofeng Cao, Texas Tech GIST4302, Basic Statistics 3/37

Statistical Procedures I Statistical Sampling ˆ procedure of getting a representative sample of a population, e.g., a random visit of all U.S. colleges ˆ random sample: sample in which every individual in population has same chance of being included ˆ preferential sampling: sample in which certain individuals in population has higher chance of being included ˆ Law of large numbers and central limit theorem Sample average should be close to the expected value given a large number of trials Sample mean approaches the normal distribution (under regular conditions) Guofeng Cao, Texas Tech GIST4302, Basic Statistics 4/37

Statistical Procedures II Statistics ˆ Descriptive statistics procedure of determining sample statistics, e.g., determination of the average student age of all randomly visited colleges ˆ Statistical inference procedure of making statements regarding population parameters from sample statistics, e.g., average student age of all randomly visited colleges = average age of college students in the U.S.? ˆ Statistical estimate best (educated) guess about the value of a population parameter ˆ Hypothesis testing procedure of determining whether sample data support a hypothesis that species the value (or range of values) of a certain population parameter Guofeng Cao, Texas Tech GIST4302, Basic Statistics 5/37

Histogram ˆ An Example: Consider a list of 10 hypothetical sample values: 2 2 9 8 7 9 5 6 8 3 Count of sample values within each class count 0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6 7 8 9 10 class ˆ Relative frequency table: p k = # of data in k-th class/(total # of data) k 1 2 3 4 5 6 7 8 9 p k 0.2 0.1 0.0 0.1 0.1 0.1 0.2 0.2 0.0 ˆ Please note: Histogram shape depends on number and width of classes; rule of thumb for number of classes: 5 log 10 (# of data) and use non-overlapping equal intervals Guofeng Cao, Texas Tech GIST4302, Basic Statistics 6/37

0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.2 0.4 0.6 0.8 Histogram Shape Characteristics ˆ Peaked or not Peaked Non Peaked 0 50 100 150 200 0 20 40 60 80 100 120 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 ˆ Numbers of peaks 0 50 100 150 200 0 50 100 150 200 3 2 1 0 1 2 3 4 2 0 2 4 6 8 10 ˆ Symmetric or not 0 50 100 150 0 50 100 150 0 50 100 150 200 250 300 350 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 7/37

Cumulative Histogram ˆ Ranked sampled data and their relative frequency k 1 2 3 4 5 6 7 8 9 p k 0.2 0.1 0.0 0.1 0.1 0.1 0.2 0.2 0.0 ˆ Cumulative relative frequency Empirical CDF probability 0.0 0.2 0.4 0.6 0.8 1.0 2 3 4 5 6 7 8 9 n:10 m:0 x Proportion of sample values less than, or equal to, any given cuto value Probability that any random sample is no greater than and given cuto value Guofeng Cao, Texas Tech GIST4302, Basic Statistics 8/37

Quantiles Denition: ˆ datum value x p corresponding to specic cumulative relative frequency value p Quantiles probability 0.0 0.2 0.4 0.6 0.8 1.0 ˆ Commonly used quantiles: 3 2 1 0 1 2 3 n:1000 m:0 x min: x 0.0, lower quantiles: x 0.25, median: x 0.50, upper quantile: x 0.75, max: x 1.00 Percentiles: x 0.01, x 0.02,..., x 0.99 Deciles: x 0.10, x 0.20,..., x 0.90 ˆ Quantiles are not sensitive to extreme values (outliers) Guofeng Cao, Texas Tech GIST4302, Basic Statistics 9/37

Measure of Central Tendency ˆ mid-range: arithmetic average of highest and lowest values: x max +x min 2 ˆ mode: most frequently occuring values in data sets ˆ median: datum value that divides data set into halves; also dened as 50-th percentiles: x 0.5 ˆ mean: arithmatic average of values in data set sample mean: m = x = 1 n n x=1 x i population mean: µ = 1 N N x=1 x i sample mean is an esimation of population mean ˆ Note: Most appropriate measure of central tendency depends on distribution shapes Guofeng Cao, Texas Tech GIST4302, Basic Statistics 10/37

Measure of Dispersion I ˆ range: dierence between highest and lowest values: x max x min ˆ interquantile range (IQR): dierence between upper and lower quantiles: x 0.75 x 0.25 ˆ mean absolute derivation from mean: averange absolute dierence 1 between each datum value and the mean: n n i=1 x i x ˆ median absolute derivation from median: median absolute dierence between each datum value and the median: x i x 0.5 0.5 ˆ variance: average squared dierence between any datum values and the mean: sample variance: s 2 = 1 n 1 n i=1 (x i m) 2 population variance: σ 2 = 1 N N i=1 (x i µ) 2 sample variances is an estimate of the population variance Guofeng Cao, Texas Tech GIST4302, Basic Statistics 11/37

Measure of Dispersion II ˆ variance: alternative denition: dierence between average squred data and the mean squared sample variance: s 2 = 1 n 1 n i=1 x 2 i n n 1 m2 population variance: σ 2 = 1 N N i=1 x 2 i µ 2 Note: variance is expressed in squared data units ˆ standard deviation: square root of variance s or σ unit of standard deviation is same as the data ˆ coecient of variation: ratio of standard deviation and the mean s sample coecient: m σ population coecient: µ coecient of variation is unitless ˆ choose alternative measures of dispersion: any summary statistic involving squared values is senstive to outliers any summary statistic based on quantiles is robost to outliers coecient of variation: very use for comparing spread of dierent data sets Guofeng Cao, Texas Tech GIST4302, Basic Statistics 12/37

Normalizing Data Normalizing data to zero mean and unit variance allows more meaningful comparison of dierent data sets ˆ Normalizing procedure: 1. compute mean m and standard deviation s of data seta 2. subtract the mean from each datum: x i m 3. divide by the standard deviation: z i = x i m s ˆ normlized data are unit free; shape of distibution does not change (e.g., modes remain the same) 0 50 100 150 Original histogram 4 0 2 4 6 8 0 50 150 250 Normalized histogram 2 1 0 1 2 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 13/37

Quantile-Quantile (Q-Q) Plots Graph for comparing the shapes of distribution ˆ Normalizing procedure: 1. rank both data sets from smallest to largest values 2. compute quantiles of each data set 3. cross-plot each quantile pair 3 2 1 0 1 2 3 3 2 1 0 1 2 3 Normal Q Q Plot Theoretical Quantiles Sample Quantiles ˆ Interpretation: straight plot aligned with 45 line implies two similar distribution Guofeng Cao, Texas Tech GIST4302, Basic Statistics 14/37

Boxplot Graph for describing the the degree of dispersion and skewness and identify outliers ˆ Non-parametric ˆ 25%, 50%, and 75% percentiles ˆ end of the hinge (whisker) could mean dierently; most ofen represent the lowest datum within 1.5 IQR of the lower quantile, and the highest datum still within 1.5 IQR of the upper quantile 5 10 15 20 25 30 35 0.5.OJ 1.OJ 2.OJ 0.5.VC 1.VC 2.VC ˆ Points outside of range are usually taken as outliers Guofeng Cao, Texas Tech GIST4302, Basic Statistics 15/37

Statistical Experiment and Events ˆ Statistical experiment process in which one outcome from a set of possible outcomes occurs (also known as random trial), e.g., sampling n data from a population is a collection of n statistical experiments ˆ Elementary outcome ˆ Event the outcome E of a statistical experiment, e.g., age of a single student in GIST 4302, or rain on a particular day a collection of k elementary outcomes A = {E 1, E 2,..., E k } of interests, e.g., all male GIST 4302 students ˆ Random variables Don't have single, xed values; it can take on a set of possible dierent values, each with an associated probability Guofeng Cao, Texas Tech GIST4302, Basic Statistics 16/37

Probability II ˆ Relative frequency denition: if a statistical experiment is repeated N times, and event A occurs in n of these trials, then the probability for A to occur is: P(A) = Prob{A} = n, as N tends to innity N e.g., the probability for a wet-day event in Lubbock can be seen as the proportion of wet-days in a very long precipitation record ˆ Axioms of probability: probabilities are necessarily non-negative: P(A) 0 the sample space S will certainly occur: P(S) = 1 for two mutually execluive events A and B: P(A B) = P(A) + P(B) ˆ Elementary probability theorem the impossible event has zero probability of occurrence: P( ) = 0 the probability of the complement event Ā of an event A is: P(Ā) = 1 P(A) the probability of an event A to occur cannot be greater than one: P(A) 1 the probability of either two events A and B to occur is: P(A b) = P(A) + P(B) P(A B) Guofeng Cao, Texas Tech GIST4302, Basic Statistics 17/37

Probability III ˆ Example: wet or dry day in the 10 days of period, wet day event A = 1, dry day event A = 0 day i 1 2 3 4 5 6 7 8 9 10 event a i 1 1 0 0 1 1 0 0 0 0 P{A = 1} = 4 10 = 0.4 ˆ Example: precipitation x in the 10 days of period, an binary event a i = 1 if associated x i 4 otherwise a i = 0 day i 1 2 3 4 5 6 7 8 9 10 precip i 6 0 3 0 0 2 1 5 6 5 event a i 0 1 1 1 1 1 1 0 0 0 P{a = 1} = 6 10 = 0.6 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 18/37

Commonly Used Probability Distributions ˆ Gaussian (or normal) distribution N(10,3) y 0.00 0.04 0.08 0.12 1 4 7 10 13 16 19 x ˆ The shapes are controlled by mean (µ) and variance (σ 2 ) ˆ Three sigma rule (68 95 99.7 rule) Guofeng Cao, Texas Tech GIST4302, Basic Statistics 19/37

Conditional Probability ˆ Example: Consider n = 10 days of precipitation (X ) and daily max temperature (Y ) records for a certain place: day i 1 2 3 4 5 6 7 8 9 10 precip x i 6 0 3 0 0 2 1 5 6 5 temp y i 15 18 20 19 22 24 21 21 20 18 ˆ Convert the previous records to binary events by a i = 1 if x i > 0, 0 if not and b i = 1 if y i > 20, 0 if not day i 1 2 3 4 5 6 7 8 9 10 precip x i 1 0 1 0 0 1 1 1 1 1 temp y i 0 0 0 0 1 1 1 1 0 0 ˆ Joint probability: P(A, B) = 1 n 10 i=1 a i b i = 3 10 = 0.3 ˆ Conditional probability: P(A B) = P(A,B) P(B) = 0.3 0.4 = 0.75 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 20/37

Independence Independence events ˆ two events A and B are independent i: P(A B) = P(A) ˆ knowledge of event B will not aect the probability of event A to occur ˆ in previous example, P(A B) = 0.75 = 0.7 = P(A) Alternatively ˆ two events A and B are independent i: P(A, B) = P(A)P(B) ˆ joint probability P(A, B) equals the product of individual occurence probability P(A)P(B) ˆ in previous example, P(A, B) = 0.3 = 0.7 0.4 = P(A)P(B) Guofeng Cao, Texas Tech GIST4302, Basic Statistics 21/37

Covariance and Correlation Coecient Suppose X and Y are two random variables for a random experiment ˆ the covariance of X and Y measures how much these two random variables are related cov(x, Y ) = E [(X E (X )(Y E (Y )))] ˆ The correlation coecient of X and Y a normalized version of covariance cov(x,y ) cor(x, Y ) = σ X σ Y ˆ cov(x, Y ) = 0 means X and Y are 'unrelated' Guofeng Cao, Texas Tech GIST4302, Basic Statistics 22/37

p-value ˆ Assuming the null hypothesis is true, the p-value is the probability a test statistics at least as extreme as the one that was actually observed y 0.00 0.04 0.08 0.12 1 4 7 10 13 16 19 x Guofeng Cao, Texas Tech GIST4302, Basic Statistics 23/37

Spatial Versus Non-Spatial Statistics Classical statistics ˆ samples assumed realizations of independent and identically distributed random variables (iid) ˆ most hypothesis testing procedures call for samples from iid random variables ˆ problems with inference and hypothesis testing in a spatial setting Spatial statistics ˆ multivariate statistics in a spatial/temporal context: each observation is viewed as a realization from a dierent random variable, but such random variables are auto-correlated in space and/or time ˆ each sample is not an independent piece of information, because precisely it is redundant with other samples (due to the corresponding random variables being auto-correlated) ˆ auto- and cross-correlation (in space and/or time) is explicitly accounted for to establish condence intervals for hypothesis testing Guofeng Cao, Texas Tech GIST4302, Basic Statistics 24/37

Some Issues Specic to Spatial Data Analysis Spatial dependency ˆ values that are closer in space tend to be more similar than values that are further apart (Tobler's rst law of Geography) ˆ redundancy in sample data = classical statistical hypothesis testing procedures not applicable ˆ positive, zero, and negative spatial correlation or dependency The modied areal unit problem (MAUP) ˆ spatial aggregations display dierent spatial characteristics and relationships than original (non-averaged) values ˆ scale and zoning (aggregation) eects Ecological fallacy ˆ problem close related to the MAUP ˆ relationships established at a specic level of aggregation (e.g., census tracts) do not hold at more detailed levels (e.g., individuals) Guofeng Cao, Texas Tech GIST4302, Basic Statistics 25/37

Spatial Dependency (I) ˆ often termed as spatial similarity, spatial correlation and spatial pattern, spatial pattern, spatial texture... ˆ Examples of synthetic maps with same histogram: 80 3 2 80 2 60 40 1 0 1 60 40 1 0 20 2 20 1 0 0 20 40 60 80 3 0 0 20 40 60 80 2 4 3 80 60 40 2 0 80 60 40 2 1 0 1 20 2 20 2 0 0 20 40 60 80 0 0 20 40 60 80 3 1 0.5 0 3 2 1 0 1 2 3 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 26/37

Spatial Dependency (II) Spatial statistics ˆ inference of spatial dependency is the core of spatial statistics spatial interpolation, e.g., kriging family of methods spatial point pattern analysis spatial areal units (regular or irregular) ˆ often extended into a spatio-temporal domain to investigate the dynamic phenomena and processes, e.g., land use and land cover changes Guofeng Cao, Texas Tech GIST4302, Basic Statistics 27/37

The Modied Areal Unit Problem The same basic data yield dierent results when aggregated in dierent ways ˆ First studied by Gehlke and Biehl (1934) ˆ Applies where data are aggregated to areal units which could take many forms, e.g., postcode sectors, congressional district, local government units and grid squares. ˆ Aects many types of spatial analysis, including clustering, correlation and regression analysis. ˆ Example: Gerrymandering of congressional districts (Bush vs. Gore, Lincoln vs. Douglas) ˆ Two aspects of this problem: scale eect and zoning (aggregation) eect Guofeng Cao, Texas Tech GIST4302, Basic Statistics 28/37

The Modied Areal Unit Problem: Scale Eect (1) Scale eect Analytical results depending on the size of units used (generally, bigger units lead to stronger correlation) Example Table : spatial variable #1 versus spatial variable #2 87 95 72 37 44 24 40 55 55 38 88 34 41 30 26 35 38 24 14 56 37 34 08 18 49 44 51 67 17 37 55 25 33 32 59 54 72 75 85 29 58 30 50 60 49 46 84 23 21 46 22 42 45 14 19 36 48 23 8 29 38 47 52 52 22 48 58 40 46 38 35 55 Table : ρ(v 1, v 2) = 0.83 90 80 70 60 50 v2 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 v1 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 29/37

The Modied Areal Unit Problem: Scale Eect (2) Scale eect Analytical results depending on the size of units used (generally, bigger units lead to stronger correlation) Example Table : spatial aggregation strategy # 1 91.0 47.5 35.5 35.0 46.5 40.0 54.5 46.5 30.5 35.5 59.0 32.5 34.0 61.0 31.0 13.0 27.0 56.5 73.5 55.0 33.5 27.5 42.5 49.0 57.0 47.5 32.0 35.5 52.0 42.0 44.0 53.5 29.5 18.5 35.0 45.0 Table : ρ(v 1, v 2) = 0.90 80 70 60 50 40 30 20 10 10 20 30 40 50 60 70 80 90 100 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 30/37

The Modied Areal Unit Problem: Zoning Eect Zoning eect Analytical results depending on how the study area is divided up, even at the same scale Example Table : spatial aggregation strategy #2 63.5 75 63.5 37.5 66 29.0 27.5 43 31.5 34.5 23 21 52.0 34.5 42 49.5 38.0 45.5 61.0 67.5 67.0 37.5 71.0 26.5 20.0 41.0 35.0 32.5 26.5 21.5 48.0 43.5 49.0 45.0 28.5 51.5 Table : ρ(v 1, v 2) = 0.94 80 70 60 50 40 30 20 20 30 40 50 60 70 80 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 31/37

The Modied Areal Unit Problem: Zoning Eect Zoning eect: another example Figure : Image Courtesy of OpenShaw Guofeng Cao, Texas Tech GIST4302, Basic Statistics 32/37

Ecological Fallacy (I) ˆ relationships established at a specic level of aggregation do not hold at more detailed levels Example Table : spatial aggregation strategy # 1 91.0 47.5 35.5 35.0 46.5 40.0 54.5 46.5 30.5 35.5 59.0 32.5 34.0 61.0 31.0 13.0 27.0 56.5 73.5 55.0 33.5 27.5 42.5 49.0 57.0 47.5 32.0 35.5 52.0 42.0 44.0 53.5 29.5 18.5 35.0 45.0 Table : ρ(v 1, v 2) = 0.90 80 70 60 50 40 30 20 10 10 20 30 40 50 60 70 80 90 100 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 33/37

Ecological Fallacy (II) ˆ relationships established at a specic level of aggregation do not hold at more detailed levels Example Table : spatial variable #1 versus spatial variable #2 95 87 37 72 24 44 55 40 38 55 34 88 30 41 35 26 24 38 56 14 34 37 18 08 44 49 67 51 37 17 25 55 32 33 54 59 72 75 85 29 58 30 50 60 49 46 84 23 21 46 22 42 45 14 19 36 48 23 8 29 38 47 52 52 22 48 58 40 46 38 35 55 Table : ρ(v 1, v 2) = 0.21 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Guofeng Cao, Texas Tech GIST4302, Basic Statistics 34/37

Stages in Spatial Statistical Analysis Exploratory analysis ˆ explore spatial data using cartographic (or other visual) representations ˆ statistical analysis for detecting possible sub-populations, outliers, trends, relationships with neighboring values or other spatial variables Modeling or conrmatory analysis ˆ establish parametric or non-parametric model(s) characterizing attribute spatial distribution ˆ estimate model parameters from data; evaluate their statistical signicance; predict attribute values at other locations and/or future time instants Notes ˆ boundaries between above stages not always clear-cut, and it is often an iterative process Guofeng Cao, Texas Tech GIST4302, Basic Statistics 35/37

Software for Statistical Analysis of Spatial Data GIS-based packages ˆ ESRI's Spatial Analyst, Geostatistical Analyst, Spatial Statistics ˆ opt for close or loose coupling with specialized external packages when specic functionalities are missing from a GIS Statistical packages ˆ R packages, Matlab (new class will be available this Fall!) ˆ GeoDa/PySAL ˆ versatile in modeling, programable Guofeng Cao, Texas Tech GIST4302, Basic Statistics 36/37

Acknowledgement ˆ Some slides of the the materials are based on Dr. Phaedon Kyrikidis's classes in University of California, Santa Barbara Guofeng Cao, Texas Tech GIST4302, Basic Statistics 37/37