David Tenenbaum GEOG 090 UNC-CH Spring 2005

Similar documents
Governing Rules of Water Movement

Sampling Populations limited in the scope enumerate

David Tenenbaum GEOG 070 UNC-CH Spring 2005

Simple Linear Regression

Wet May 29/30 Avg. June 26/28 Dry August 22 R 2 =0.79 R 2 =0.24

Representation of Geographic Data

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Introducing GIS analysis

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

2.2 Geographic phenomena

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia

About places and/or important events Landmarks Maps How the land is, hills or flat or mountain range Connected to maps World Different countries

Introduction to Statistics

Now we will define some common sampling plans and discuss their strengths and limitations.

Math 201 Statistics for Business & Economics. Definition of Statistics. Two Processes that define Statistics. Dr. C. L. Ebert

Geographers Perspectives on the World

In this exercise we will learn how to use the analysis tools in ArcGIS with vector and raster data to further examine potential building sites.

Σ x i. Sigma Notation

Chapter 1. Preliminaries

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

The Nature of Geographic Data

GED 554 IT & GIS. Lecture 6 Exercise 5. May 10, 2013

Developing Database and GIS (First Phase)

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference

Lecture 5. Symbolization and Classification MAP DESIGN: PART I. A picture is worth a thousand words

GEOGRAPHY ADVANCED LEVEL

Draft Proof - Do not copy, post, or distribute

Topographic Maps Lab

Statistics 301: Probability and Statistics Introduction to Statistics Module

Unit 1 The Basics of Geography. Chapter 1 The Five Themes of Geography Page 5

Lecture 5. Representing Spatial Phenomena. GIS Coordinates Multiple Map Layers. Maps and GIS. Why Use Maps? Putting Maps in GIS

GIS & Spatial Analysis in MCH

GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY

Wayne E. Sirmon GEO 301 World Regional Geography

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

CE 394K/CEE6440 GIS in Water Resources Fall 2018 Final Exam Solution

Vehicle Freq Rel. Freq Frequency distribution. Statistics

The Choropleth Map Slide #2: Choropleth mapping enumeration units

Data Mining 4. Cluster Analysis

Nature of Spatial Data. Outline. Spatial Is Special

Spatial Intelligence. Angela Schwering

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Linear Algebra, Summer 2011, pt. 2

Applied Statistics in Business & Economics, 5 th edition

AP Human Geography Chapter 1: Thinking Geographically Key Issue 1: How do Geographers describe where things are?

Using Scientific Measurements

Spatial Analyst. By Sumita Rai

Developing Spatial Awareness :-

GIS Lecture 5: Spatial Data

Chapter 7: Making Maps with GIS. 7.1 The Parts of a Map 7.2 Choosing a Map Type 7.3 Designing the Map

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference

Unit 1, Lesson 2. What is geographic inquiry?

Cartography and Geovisualization. Chapters 12 and 13 of your textbook

Intro to GIS In Review

2 Introduction to Response Surface Methodology

Map and Compass Skills

A Comparative Study of the National Water Model Forecast to Observed Streamflow Data

Raster Spatial Analysis Specific Theory

Hypothesis Testing hypothesis testing approach

Sampling Distribution Models. Chapter 17

Stochastic calculus for summable processes 1

Intro to GIS Summer 2012 Data Visualization

Examination Copy COMAP Inc. Not For Resale

Appropriate Selection of Cartographic Symbols in a GIS Environment

The central problem: what are the objects of geometry? Answer 1: Perceptible objects with shape. Answer 2: Abstractions, mere shapes.

Contents. Learning Outcomes 2012/2/26. Lecture 6: Area Pattern and Spatial Autocorrelation. Dr. Bo Wu

Geomatics: Geotechnologies in Action, Grade 12, University/College Expectations

Lecture 4. Spatial Statistics

CIVL 7012/8012. Collection and Analysis of Information

Course Introduction II

5 Themes of Geography Review Video Notes What is Geography?

56H. This system allows definition of points on the Earth s surface to within 100 meters. Page 20. Navigation Systems Basics of Maps

1. Write down the term 2. Write down the book definition 3. Put the definition in your own words 4. Draw an image and/or put a Real Life Example

Introduction to Vectors

Section 2.1 ~ Data Types and Levels of Measurement. Introduction to Probability and Statistics Spring 2017

ADVANCED PLACEMENT HUMAN GEOGRAPHY

Lectures of STA 231: Biostatistics

Chapter 02 Maps. Multiple Choice Questions

Give 4 advantages of using ICT in the collection of data. Give. Give 4 disadvantages in the use of ICT in the collection of data

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow:

Shape e o f f the e Earth

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

from

High Speed / Commuter Rail Suitability Analysis For Central And Southern Arizona

GEOGRAPHY (029) CLASS XI ( ) Part A: Fundamentals of Physical Geography. Map and Diagram 5. Part B India-Physical Environment 35 Marks

Unit 1, Lesson 3 What Tools and Technologies Do Geographers Use?

Map Skills and Geographic Tools

Course Introduction II

Discrete Multivariate Statistics

ENGRG Introduction to GIS

Theory, Concepts and Terminology

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Spatial-Temporal Analytics with Students Data to recommend optimum regions to stay

Linear Algebra. Preliminary Lecture Notes

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Introduction to Spatial Data Resources and Analysis for research in Urban Design and Planning

Unit 1 Chapter 1. Thinking Geographically * Basics of Geography

Module 4 Educator s Guide Overview

Transcription:

Statistical Thinking, Data Types, and Geographical Primitives The scientific method in geography, two kinds of approaches, and the sorts of statistics used to support those approaches Some characteristics of data and considerations when collecting measurements, making observations and using data The geographical primitives that are often used to generate measurements in geography from the attributes of geographical features

Applying the Scientific Method Both physical scientists and social scientists (in our context a.k.a. physical and human geographers) often make use of the scientific method in their attempts to learn about the world organize surprise Concepts Description Hypothesis formalize validate Theory Laws Model

Two Sorts of Approaches The scientific method gives us a means by which to approach the problems we wish to solve The core of this method is the forming and testing of hypotheses A very loose definition of hypotheses is potential answers to questions Geographers use quantitative methods in the context of the scientific method in at least two distinct fashions:

Two Sorts of Approaches Exploratory methods of analysis focus on generating and suggesting hypotheses Confirmatory methods are applied in order to test the utility and validity of hypotheses organize surprise Concepts Description Hypothesis formalize validate Theory Laws Model

Two Sorts of Statistics for Two Approaches Statistics can be divided into two major types, with each type most useful in the context of one of the two approaches in geography Descriptive statistics tend to useful in the context of exploratory approaches because their function is primarily to summarize a dataset in a way that emphasizes some characteristics Inferential statistics are applied in the context of confirmatory approaches because their function is to test the veracity and validity of an idea or inference

Descriptive Statistics Descriptive statistics provide an organization and summary of a dataset A small number of summary measures replaces the entirety of a dataset e.g. Suppose a TV station s weather bureau records the temperature on a hourly basis each day, giving 24 values. Rather than reporting all 24 values, they usually tell you the high and low temperature for the day (and possibly the range and an average value as well).

Descriptive Statistics In the act of summarizing a dataset using descriptive statistics, there is necessarily some loss of information It is through the decisions made in how the salient aspects of a dataset are summarized that there is the potential to give a misleading impression of data. The statistician can distort the information contained in the dataset through the selection of particular descriptive stats, or through the aggregation of data in such a way that the stats are misleading

Descriptive Statistics The temptation for a scientist to select descriptive methods that emphasize their notions about their datasets can be very strong indeed While it would be nice to be able to approach every problem with an entirely open mind, the reality is that scientists almost always have some preconceived notions about what they expect to find in their data As a result, there is a tendency to select statistical measures that most strongly convey the pattern that is expected to be found a priori

The Nature of Statistics Statistical methods are designed to derive conclusions based upon empirical data, derived by observations Mathematics operates using deductive reasoning Statistics relies on inductive reasoning: Statistical approaches are used to extrapolate conclusions that apply to more than just the limited set of available observations e.g. when someone infers from a small poll some truth about the parent population using a limited set of data

Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key characteristics that must be considered are: A. Scale of Measurement B. Continuous vs. Discrete C. Grouped vs. Individual

A. Scales of Measurement Data the plural of datum, which are generated by the recording of measurements Measurements involves the categorization of an item (I.e. assigning an item to a set of types) when the measure is qualitative OR makes use of a number to give something a quantitative measurement

A. Scales of Measurement The data used in statistical analyses can be divided into four types 1. The Nominal Scale 2. The Ordinal Scale 3. The Interval Scale 4. The Ratio Scale As we progress through these scales, the types of data they describe have increasing information content

The Nominal Scale Nominal scale data are data that can simply be broken down into categories, i.e. having to do with names or types Dichotomous or binary nominal data has just two types, e.g. yes/no, female/male, is/is not, hot/cold etc. Multichotomous data has more than two types, e.g. vegetation types, soil types, counties, eye color etc. Not a scale in the sense that categories cannot be ranked or ordered (no greater/less than)

The Ordinal Scale Ordinal scale data are data that can be categorized, but also can be placed in an order, i.e. categories that can be assigned a relative importance and can be ranked such that numerical category values have some meaning, e.g. star-system restaurant rankings 5 stars > 4 stars, 4 stars > 3 stars, 5 stars > 3 stars BUT ordinal data still are not scalar in the sense that differences between categories do not have a quantitative meaning, i.e. a 5 star restaurant is not superior to a 4 star restaurant by the same amount as a 4 star restaurant is than a 3 star

The Interval Scale When using the interval scale, there is a meaningful quantity to a numerical category name, so we can not only access that A > B, but we can also look at how much greater A is than B (A-B) To put it another way, the units of a scale can be used here, e.g. temperature scales, elevation etc. So, we can meaningfully look at the difference in the value of two interval scale observations BUT we still cannot multiply or divide them meaningfully, because the value of zero is arbitrary

The Ratio Scale Similar to the interval scale, but with the addition of having a meaningful zero value, which allows us to compare values using multiplication and division operations, e.g. precipitation, weights, heights etc. We can say that 2 inches of rain is twice as much rain as 1 inch of rain because this is a ratio scale measurement, whereas 2 degrees Celcius is not twice as warm as 1 degree Celcius because 0 degrees Celcius does not denote a total absence of warmth (degrees Celcius is interval scale)

B. Continuous vs. Discrete Data Continuous data can include any value (i.e. real numbers), e.g. 1, 1.43. 1 ¾ are all acceptable values. A geographic example would be a distance measured between two points. From the integer or ratio scale Discrete data only consists of discrete values, and the numbers in between those values are not defined (i.e. whole or integer numbers), e.g. 1, 2, 3. The number of people who have malaria would be a discrete value

C. Grouped vs. Individual Data The distinction between individual and grouped data is somewhat self-explanatory, but the issue pertains to the effects of grouping data For example, in census data we might find a mean value of family income for some level of census geography (like a census tract or county) While a family income value is collected for each household (individual data), for the purpose of analysis it is transformed into a set of classes (e.g. <$10K, $10-20K, >$20K)

C. Grouped vs. Individual Data In grouped data, the raw individual data is categorized into several classes, and then analyzed. The act of grouping the data, by taking the central value of each class, as well as the frequency of the class interval, and using those values to calculate a measure of central tendency (like our mean value for a census tract) has the potential to introduce a significant distortion Grouping always reduces the amount of information contained in the data

Basic Issues in Data Collection The reliability of measurements is a key consideration in data collection; is bias being introduced? The validity of your data also needs to be considered is your instrument measuring what it claims to be measuring? What is the precision of your instrument; how exact is it in its measurements? What is your instrument s level of accuracy? Is it calibrated properly, or is it introducing bias?

Precision and Accuracy These related concepts are often confused: Precision refers to the exactness associated with a measurement (i.e. closely clustered) Accuracy refers to the extent of systematic bias in the measurement process (i.e. centered on the middle) x x x x x x x x x x x x x x x x x x x x Precise & Accurate Precise & Inaccurate Imprecise & Accurate Imprecise & Inaccurate

Geographic Primitives In geography applications, the observations which we are going to make use of are derived from some characteristics of a geographic feature which has been mapped: Point features: These are features with only a location, no length or area. e.g. On campus, the following are well represented as point features: The Old Well, the flag pole, etc. Line features: These are feature with several locations strung out along the line in sequence, and are too narrow to represent their width. e.g. roads, rivers, etc. Area features: These consist of one or more lines that form a loop. e.g. shorelines enclosing a lake.

Geographic Primitives (x,y) (x,y) (x,y) (x,y) (x,y) point line polygon (area) A point: specified by a pair of (x,y) coordinates, representing a feature that is too small to have length and area. A line: formed by joining two points, representing features too narrow to have areas A polygon (area): formed by a joining multiple points that enclose an area (x,y) (x,y) (x,y) (x,y) (x,y)

Geographic Primitives - Points Points are often sometimes used to denote a location, and when used in the sense of a Euclidean point, they are 0-dimensional (having no width, length, area etc.) However, points are more often used in the sense of a centroid, approximating the center of something that in fact does have an extent, but can be adequately approximated as being 0-D. e.g. the North Pole, the Old Well, the geographic center of the United States

Pond Branch Catchment Control Color Infrared Digital Orthophotography

Soil Moisture Sampling Method 25 samples taken using a random walk within a 5 meter circle ThetaProbe Soil Moisture Sensor - measures the impedance of the sensing rod array, a f(x) of the soil s moisture content 5 meter diameter + + + + + + + + + + + + + + + + + + + + + + +

Geographic Primitives - Lines Lines are primarily applied to the purpose of showing the length of a feature, linkages between features The key sort of information we extract from linear features is distances along them of various sorts, although measures of their sinuousity are also sometimes of interest e.g. The length of a river, the distances between two cities, or the degree to which a river meanders

Transects & Segments

Geographic Primitives - Areas Areas often provide the source of a measure of an attribute over a given area, including density values e.g. The levels of census geography states, counties, census tracts etc. can give us a value like the number of motor vehicle accidents per 100,000 people in each state In NC, this value is the range of 22 25 but what would this value be for local counties like Alamance, Chatham and Orange? To assume that this would be the same at different scales would be to fall victim to Ecological Fallacy (related to the Modifiable Area Unit Problem)

MODIS LULC In Climate Divisions Maryland CD6 North Carolina CD3

Geographic Primitives - Surfaces Surfaces are different in the sense that they cannot be thought of as a single feature, and that they also represent the third dimension You can obtain information about altitude or volumes from a surface, as well as other quantities which can derive from the shape of the surface e.g. a topographic map, from which any number of derived quantities can be obtained, such as slope and aspect, which in turn can provide drainage direction information etc.

Pond Branch Catchment Control Topographic Index Example

Topographic Moisture Index TMI = ln(a/tanβ) Hornsberger, G.M., Raffensberger, J.P., Wiberg, P.L. and K.N. Eshleman. 1998. Elements of Physical Hydrology, Johns Hopkins Press, U.S.A., p. 210 & p. 216.

Geographic Primitives - Distance Distances can be calculated between points, along lines, or in a variety of fashions with areas Euclidean Distance calculated in a Cartesian frame of reference: P 2 (x 2,y 2 ) C= (x 1 x 2 ) 2 + (y 1 y 2 ) 2 Over what distances on Earth is this valid? Why? Can we use this with latitude and longitude? C P 1 (x 1,y 1 )

Geographic Primitives - Distance An alternative formulation for distance that is useful in urban environments with orthogonal road networks is Manhattan Distance, which is still calculated in a Cartesian frame of reference, but movement is limited to city streets: P 2 (x 2,y 2 ) d m = x 1 x 2 + y 1 y 2 a reminder the symbols denote absolute value P 1 (x 1,y 1 )

Geographic Primitives Area You will recall formulae for calculating the area of regular figures from geometry: rectangular a = l * w l w circle a = πr 2 r Vector GIS calculates the area of a polygon by summing rectangular and triangular areas

Geographic Primitives Shape You are likely less familiar with indices of shape An example of such a value describes the extent to which a shape is compact vs. elongated, an index of compactness that measures the deviation of a shape from circular: S = d / l, where l is the length of the longest diagonal within a shape that spans it S = 1 S ~ 0.5 S = 0

Geographic Primitives Density Density is the concentration of a given attribute over an area, and can be formulated in any number of ways: e.g. points per area as in my spot height densities e.g. length per area as in my transect densities If you can make of a count of something per area, you can create a density measure for that quantity

Sources of Digital Elevation Data Catchment Area (ha) Pond Branch (control) Glyndon (urbanizing) 37.55 81.05 Data Source Number of Points Points per m2 Photogrammetric 6569 0.017 LIDAR 273228 0.727 Photogrammetric 39687 0.049 LIDAR 437759 0.540

Upper Baismans Run Sampling 0.5 0 0.5 1 1.5 Kilometers W N E Upper Baismans Run Sample 1 Upper Baismans Run Sample 2 Upper Baismans Run Sample 3 S 3 Samples, 100 meters/ha, 100 meter long transects