GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY

Similar documents
2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

GED 554 IT & GIS. Lecture 6 Exercise 5. May 10, 2013

Sampling Populations limited in the scope enumerate

Workbook Exercises for Statistical Problem Solving in Geography

Chapter 7: Making Maps with GIS. 7.1 The Parts of a Map 7.2 Choosing a Map Type 7.3 Designing the Map

Chapter 2 Solutions Page 15 of 28

Representation of Geographic Data

Geog183: Cartographic Design and Geovisualization Winter Quarter 2017 Lecture 6: Map types and Data types

Section 2.1 ~ Data Types and Levels of Measurement. Introduction to Probability and Statistics Spring 2017

USING DOWNSCALED POPULATION IN LOCAL DATA GENERATION

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Vehicle Freq Rel. Freq Frequency distribution. Statistics

2.2 Geographic phenomena

Introduction to Statistics

Basic Verification Concepts

What is a map? A simple representation of the real world Two types of maps

Lecture 5. Symbolization and Classification MAP DESIGN: PART I. A picture is worth a thousand words

Quality and Coverage of Data Sources

Hennepin GIS. Tree Planting Priority Areas - Analysis Methodology. GIS Services April 2018 GOAL:

POPULATION AND SAMPLE

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

Rural Pennsylvania: Where Is It Anyway? A Compendium of the Definitions of Rural and Rationale for Their Use

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Applied Statistics in Business & Economics, 5 th edition

ECON1310 Quantitative Economic and Business Analysis A

Geographic Data Science - Lecture IV

Applying cluster analysis to 2011 Census local authority data

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Basic Verification Concepts

Introducing GIS analysis

Diamonds on the soles of scholarship?

104 Business Research Methods - MCQs

Using American Factfinder

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

GIST 4302/5302: Spatial Analysis and Modeling

Lectures of STA 231: Biostatistics

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Draft Proof - Do not copy, post, or distribute

Land Use of the Geographical Information System (GIS) and Mathematical Models in Planning Urban Parks & Green Spaces

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference

The Choropleth Map Slide #2: Choropleth mapping enumeration units

SESSION 5 Descriptive Statistics

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Cartography and Geovisualization. Chapters 12 and 13 of your textbook

Globally Estimating the Population Characteristics of Small Geographic Areas. Tom Fitzwater

Quick Response Report #126 Hurricane Floyd Flood Mapping Integrating Landsat 7 TM Satellite Imagery and DEM Data

CIVL 7012/8012. Collection and Analysis of Information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary for the Triola Statistics Series

An Introduction to Scientific Research Methods in Geography Chapter 3 Data Collection in Geography

Data preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data

Chapter 2: Tools for Exploring Univariate Data

OBESITY AND LOCATION IN MARION COUNTY, INDIANA MIDWEST STUDENT SUMMIT, APRIL Samantha Snyder, Purdue University

MATH 1150 Chapter 2 Notation and Terminology

STAT 200 Chapter 1 Looking at Data - Distributions

Introduction to Survey Data Integration

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Developing Spatial Awareness :-

Part 7: Glossary Overview

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Topic 4: Changing cities

Chapter 6. Fundamentals of GIS-Based Data Analysis for Decision Support. Table 6.1. Spatial Data Transformations by Geospatial Data Types

COMBINING ENUMERATION AREA MAPS AND SATELITE IMAGES (LAND COVER) FOR THE DEVELOPMENT OF AREA FRAME (MULTIPLE FRAMES) IN AN AFRICAN COUNTRY:

Developing Built Environment Indicators for Urban Oregon. Dan Rubado, MPH EPHT Epidemiologist Oregon Public Health Division

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS

BROOKINGS May

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Technical Documentation Demostats april 2018

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

A Comprehensive Method for Identifying Optimal Areas for Supermarket Development. TRF Policy Solutions April 28, 2011

New Land Cover & Land Use Data for the Chesapeake Bay Watershed

Probability & Sampling.

Preparing the GEOGRAPHY for the 2011 Population Census of South Africa

US Census Bureau Geographic Entities and Concepts. Geography Division

Session 2.1: Terminology, Concepts and Definitions

UNIT-3 Sampling Design

Learning Objectives for Stat 225

Units. Exploratory Data Analysis. Variables. Student Data

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

The Recent Long Island Drought

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Geography 281 Map Making with GIS Project Four: Comparing Classification Methods

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

CRP 608 Winter 10 Class presentation February 04, Senior Research Associate Kirwan Institute for the Study of Race and Ethnicity

Sampling and Sample Size. Shawn Cole Harvard Business School

Neighborhood Locations and Amenities

Intro to GIS Summer 2012 Data Visualization

Overview of Statistical Analysis of Spatial Data

Transcription:

GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY

CHAPTER 2: GEOGRAPHIC DATA Primary data - acquired directly from the original source In situ or in the field Costly time and money! Campus bike racks

GEOGRAPHIC DATA Secondary (or archival) data - collected by an organization or government agency Processed/organized accessible and formatted Less time and expense Often comprehensive census Ex. US Census data, Maryland land use, USGS National Hydrologic dataset Potential issues/errors: improperly collected/summarized out-of-date as soon as its measured!

Ponds in Wicomico County: Three primary sources Differences.hmm

PRIMARY DATA COLLECTION Often sampling is necessary. Direct observation E.g., Traffic counts, etc.. Field measurement Mail questionnaires Personal/telephone interviews Survey Design: Very important! Pilot tests Question interpretation/wording Erroneous responses?!? Logistics Absences Refusals

GEOGRAPHIC STUDIES - TYPES Explicitly spatial - locations or placement of the observations or units of data are themselves directly analyzed Spatial statistics: examine patterns for randomness Ex. diseased trees in national forest, farms in a watershed Clustered or randomly distributed Implicitly spatial observations or units of data represent locations or places, but the locations themselves are not directly analyzed Ex. relationship between house values and age Neighborhood not individual locational pattern

Blue: wet counties Red: dry counties Yellow: restrictions partially dry, municipalities within dry

GEOGRAPHIC STUDIES Individual-level data sets - each data value represents an individual element of the phenomena under study Ex. tree circumference, SU student interview about parking Random sample of Nigerian women seeking information about fertility Spatially-aggregated data sets each value represents a summary or spatial aggregation of individual units of information for a particular place or area Ex. Maryland median income by county, Alcohol laws by county Birth rate estimates for all administrative divisions in Nigeria to estimate nationwide fertility Ecological fallacy invalid transfer of conclusions from spatiallyaggregated analysis to smaller areas or individual level Transfer results or apply conclusions down Ex. Nigerian district with low birth rates some women in district may not use birth control for family planning Aggregating individual-level data to a larger spatial unit is generally NOT problematic! Often used as the method to determine higher level, spatial estimates

DATASET VARIABLE CHARACTERIZATION Discrete variable some restriction placed on the values a variable can assume Result from counting or tabulating the number of items (whole integer) Ex. number of households, number of active volcanoes Continuous variable infinitely-large number of possible values along some interval of a real number line Result from measurement values expressed as decimals Ex. precipitation, area in forest, commute distance/time Importance: Probability distributions change coming soon!

DATASET VARIABLE CHARACTERIZATION Quantitative data observations or responses expressed numerically Units of data are assigned numeric values Qualitative data each observation is assigned to one or more categories Ex. Type of land cover: agriculture, forest, residential, commercial, etc., primary cash crop Frequency counts - number of observations assigned to non-numerical categories

Frequency 600 500 400 300 200 100 0 Aesthetic Agriculture Borrow Pit Impoundment Stormwater Other Category

Count 1000 Small Water Bodies 900 800 700 600 500 OBJECTID Cat_1 Area_ha Area_acre DataSource 1 Agriculture 0.0502111 0.1240741 USGS 2 Extractive 0.105233 0.2600361 USGS 4 Extractive 0.9638432 2.3817065 USGS 5 Stormwater 0.2240009 0.5535177 USGS 7 Extractive 0.0817403 0.2019845 USGS 8 Extractive 0.0264723 0.0654144 USGS 400 300 200 100 0 1000 5000 8000 50000 500000 Area (m2)

LEVELS OF MEASUREMENT Measurement levels of data inform the selection of the appropriate statistical technique

NOMINAL Each variable is given a name and assigned to at least two qualitative classes or categories Only relationship between categories is different Simplest scale, non-numerical Condition: categories must be exhaustive and mutually exclusive Exhaustive every value or unit of data can be assigned to a category Mutually exclusive cannot assign a value to more than one category No overlap Other can be used

NOMINAL DATA: EXAMPLES Individuals religious affiliation dichotomous: only two options Gender, yes-no, presence-absence Cities primary industry? Countries language family

OBJECTID Cat_1 DataSource 1 Agriculture USGS 2 Extractive USGS 4 Extractive USGS 5 Stormwater USGS 7 Extractive USGS 8 Extractive USGS

ORDINAL SCALE Quantitative distinctions can be made Rank order greater than or less than Strongly-ordered: each value or unit of data is given a particular position in a rank order sequence Ex. 10 best college towns, Countries GNP Each assigned preference rank (1 to 10) Remember: 2 nd ranked town is not twice as good as 4 th ranked town

ORDINAL SCALE Weakly-ordered: values are placed in categories and resulting categories are ranked Ex. Choropleth map - % population change for US counties between 2000 and 2009 5 categories (< 0%, 0 to 10%, 10.1% to 15%, 15.1% to 25%, greater than 25%) Generate frequency counts for the number of counties in each category weakly ordered Two counties may have different values but in same category

Best College Towns: American Institute for Economic Research 9. Charlottesville, VA 8. Blacksburg, VA 7. Champaign, IL 6. Corvallis, OR 5. Iowa City, IA 4. Crestview, FL 3. State College, PA 2. Ames, IA 1. Ithaca, N.Y. Often based on composite variables

INTERVAL AND RATIO SCALES Magnitude of differences between values can be determined Length of interval between any two units of data can be measured on a scale Interval data origin or zero starting point is arbitrary Ex. Fahrenheit and Celsius temperature scale Ratio scale - natural zero is used, ratios between values can be determined Ex. Rainfall: 40 inches, Montreal, 10 inches Chihuahua 40/10 = 4, four times as much rainfall Examples: Kelvin, distance, area, median income, etc.

MEASUREMENT SCALE Observations from same variable can be expressed at different measurement scales depending on how they are measured, organized, and displayed Ex. Resource planner Type of Energy Use in Homes Individual households Nominal primary type of energy (coal, gas, oil, wood, etc.) County-level summaries Strongly-ordered ordinal: % of households using natural gas by county Weakly-ordered ordinal: Choropleth map of % of households using natural gas Ratio scale: number of households by county using natural gas

MEASUREMENT CONCEPTS Measurement Error: Precision and Accuracy Precision: level of exactness associated with measurement Ex. rain gauge tipping bucket calibration every.10 inch vs..01 inches 1.2 to 1.3 inches, 1.21 to 1.22 inches Spurious precision: Computer/calculator produces many decimal places..real? meaningful? 5.2/3 = 1.7333333333333333333333

MEASUREMENT CONCEPTS Accuracy: the extent of system-wide bias in the measurement process Ex. rain gauge Precise instrument, calibrated badly 1.19 inches recorded, 1.27 inches actually fell How do you know? Difficult.

MEASUREMENT CONCEPTS Validity: measurement issues related to the nature, meaning, or definition of concept or variable Assigning true or appropriate meaning to concepts through measurement of a simple variable or set of variables Complex concepts: Ex. quality of life, economic well-being Operational definitions: true meaning is not possible Indirect or surrogate method to best define complex concept Ex. quality of education evaluate by average student score on California Achievement Exam in elementary school, percent of graduates who subsequently go to college in high school Degree of validity difficult to assess, often ignored Good research MUST be addressed!

MEASUREMENT CONCEPTS Reliability: consistency and stability of measure/data Geographic data: temporally and spatially varying Consistent data collection methods? Ex. water quality same depth, time since rainfall event, etc. Consistent classification/categorization methods? Ex. poverty same definition 2010, 2000, 1970? Problematic: developing countries Assess reliability test-retest procedure Behavioral geography survey or questionnaire Collect data from respondents at twice!

BASIC CLASSIFICATION METHODS Why and how do we classify or categorize data? Classification organizes, simplifies and generalizes large amounts of information into effective or meaningful categories clarifies communication, reveals spatial patterns organized according to degree of similarity Minimizes within group dispersion and maximizes between group differences Categories must be mutually exclusive and exhaustive!

BASIC CLASSIFICATION METHODS Result: Information lost Generalization and simplification Individual-level values aggregated: spatial units classes

CLASSIFICATION Conceptual Strategies Subdivision (logical subdivision): all units of data in a population are grouped together and then individual values are allocated to an appropriate subdivision using carefully defined criteria Clear, consistent set of rules used to assign values to proper class Top down, hierarchical approach Characteristics of each category pre-determined

CLASSIFICATION: LOGICAL SUBDIVISION Ex. USGS National Land Cover Dataset (NLCD) Landsat-based 30m pixels Level I and II

CLASSIFICATION: LOGICAL SUBDIVISION

CLASSIFICATION: LOGICAL SUBDIVISION Ex. North American Industry Classification System (NAICS)

CLASSIFICATION Agglomeration: each observation in a population or data set is separate and distinct from others to begin classification Examine each value and allocate to classes using well-defined grouping criteria Combine like, separate unlike Bottom up approach Frequently used in geography numerically or graphically aggregated

CLASSIFICATION: AGGLOMERATION

CLASSIFICATION: OPERATIONAL PROCEDURES Practical application: often mixture of subdivision and agglomeration

SINGLE-VARIABLE CLASSIFICATION METHODS Equal intervals based on range Range: difference in magnitude between the largest and smallest value in an interval-ratio data set Class breaks: the values that separate one class from another Procedure range is divided into the desired number of equal-width class intervals Ex. High=1856, Low=213, Range=1643, 4 classes (410.75) Classes: 213-623.75 623.76-1034.5 1034.6-1445.25 1445.26-1856 Considerations: based on extreme values, break values precise, unequal number of values in each category,

SINGLE-VARIABLE CLASSIFICATION METHODS Equal intervals not based on range Same equal interval class breaks however based on practical/convenient values, not range Often rounded Preferred for constructing frequency distribution, histogram, or ogive (graphical representations) Frequently used by government agencies Ex. High=1856, Low=213, Range=1643, 4 classes (410.75) Classes: 213 to 623.9 624 to 1034.9 1035 to 1445.9 1446 to 1856.9 Considerations: Easy to understand and interpret, number of values in each category varies widely

SINGLE-VARIABLE CLASSIFICATION METHODS Quantile Total number of values is divided as equally as possible into the desired number of classes Equalize number of values in each class Quartiles (four classes),quintiles (five classes) Considerations: Choropleth mapping produces even distribution of areas within classes on map area Class breaks not rounded, uneven interval widths Data clustered, split unnaturally

SINGLE-VARIABLE CLASSIFICATION METHODS Natural Breaks Single linkage: identify natural breaks in the data and separate values into different classes based on these breaks Iterative identify largest gap between values on number line, then the second largest gap, until desired number class breaks is achieved Groups similar values, highlights extreme values Clusters large number of values in one or two categories

CHOROPLETH MAPPING Ideal number of classes? Trade-off between generalization and sufficient detail 4 to 7 tend to be effective 5 classes in the following examples 4 class breaks

Dendrogram graphically depicts step-by-step, single linkage natural breaks process Outliers extreme values in data set Adversely affect natural breaks classification method

CLASSIFICATION RESULTS All portray the same data differently! Starkest contrasts? Considerations Do three states really have an obesity rate of exactly 24.6%?