Sampling Populations limited in the scope enumerate

Similar documents
David Tenenbaum GEOG 090 UNC-CH Spring 2005

Wet May 29/30 Avg. June 26/28 Dry August 22 R 2 =0.79 R 2 =0.24

Introduction to Statistics

David Tenenbaum GEOG 070 UNC-CH Spring 2005

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Simple Linear Regression

Stochastic calculus for summable processes 1

The science of learning from data.

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Σ x i. Sigma Notation

CIVL 7012/8012. Collection and Analysis of Information

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow:

Introduction to Statistics

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Governing Rules of Water Movement

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Learning Objectives for Stat 225

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Part 7: Glossary Overview

Introduction to Statistics

Statistics 301: Probability and Statistics Introduction to Statistics Module

Chapter 2: Tools for Exploring Univariate Data

Introducing GIS analysis

Review of the Normal Distribution

Examine characteristics of a sample and make inferences about the population

Figure Figure

Descriptive Data Summarization

POPULATION AND SAMPLE

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.)

Lecture 5: Sampling Methods

Histograms, Central Tendency, and Variability

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Sampling Theory in Statistics Explained - SSC CGL Tier II Notes in PDF

Descriptive Statistics Methods of organizing and summarizing any data/information.

Now we will define some common sampling plans and discuss their strengths and limitations.

A SHORT INTRODUCTION TO PROBABILITY

Lesson 6 Population & Sampling

Topic 3 Populations and Samples

FAQ: Linear and Multiple Regression Analysis: Coefficients

Chapter 1. Looking at Data

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty.

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Experimental Design, Data, and Data Summary

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Glossary for the Triola Statistics Series

Vehicle Freq Rel. Freq Frequency distribution. Statistics

Elements of probability theory

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Name: Lab Partner: Section: In this experiment error analysis and propagation will be explored.

ECON1310 Quantitative Economic and Business Analysis A

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY

Quality and Coverage of Data Sources

TECH 646 Analysis of Research in Industry and Technology

Statistical Methods: Introduction, Applications, Histograms, Ch

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

MATH 10 INTRODUCTORY STATISTICS

Sets and Set notation. Algebra 2 Unit 8 Notes

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

REVIEW: Midterm Exam. Spring 2012

Destination Math California Intervention

Hypothesis Testing hypothesis testing approach

Louisiana Transportation Engineering Conference. Monday, February 12, 2007

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Statistics Statistical Process Control & Control Charting

Module 16. Sampling and Sampling Distributions: Random Sampling, Non Random Sampling

Field data acquisition

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Introduction to Basic Statistics Version 2

Chapter 4. Displaying and Summarizing. Quantitative Data

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

Unit 4 Probability. Dr Mahmoud Alhussami

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

University of Jordan Fall 2009/2010 Department of Mathematics

Monitoring Design: Study Area,

What is sampling? shortcut whole population small part Why sample? not enough; time, energy, money, labour/man power, equipment, access measure

GLOSSARY. a n + n. a n 1 b + + n. a n r b r + + n C 1. C r. C n

APS Eighth Grade Math District Benchmark Assessment NM Math Standards Alignment

AP Statistics Cumulative AP Exam Study Guide

Chapter 3. Data Description

Chapter. Organizing and Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS

Inferential Statistics. Chapter 5

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

Statistics 511 Additional Materials

Remote Sensing and Geospatial Application for Wetlands Mapping, Assessment, and Mitigation

a table or a graph or an equation.

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Transcription:

Sampling Populations Typically, when we collect data, we are somewhat limited in the scope of what information we can reasonably collect Ideally, we would enumerate each and every member of a population so we could know its parameters perfectly In most cases this is not possible, because of the size of the population (infinite populations?) and associated costs (time, money, etc.) Usually it is not necessary, because by collecting data on an appropriate subset of the population we can create statistics that are adequate estimates of population parameters Instead, we sample a population, trying to get information about a representative subset of the population

Sampling Concepts We must define the sampling unit - the smallest subdivision of the population that becomes part of our sample We want to minimize sampling error when we design how we will collect data: Typically the sampling error as the sample size because larger samples make up a larger proportion of the population (and a complete census, for example, theoretically has no sampling error) We want to try and avoid sampling bias when we design how we will collect data: Bias here is referring to a systematic tendency in the selection of members of a population to be included in a sample, i.e. any given member of a population should have an equal chance of being included in the sample (for random sampling)

Probability Sampling Designs - Random Random sampling - In general, we need some degree of randomness in the selection of a sample to be able to draw any meaningful inferences about a population, but in some cases this may conflict with representativeness These are drawn in such a way that every unit of a population has an equal chance of being chosen and the selection of one unit has no impact on whether or not another individual will be selected (independence) This can be done with or without replacement (which determines whether the same unit can be drawn twice) We can generate random numbers using a table of random numerals, or using a computer, and we can scale to any required range of values

Transect Placement Software selects a random starting position for each transect, applying criteria Software assigns a random to each direction transect

Probability Sampling Designs - Systematic Representative approaches place restrictions on selection: Systematic sampling - This approach uses every k th element of the sampling frame, by beginning at a randomly chosen point in the frame, e.g. given a sampling frame of size = 200, to create a sample of size n=10 from such a sample, select a random point to begin within the frame and then include every 20 th value in the systematic sample This approach assumes that the assignment of the individuals in the sampling frame is random (i.e. they have not been placed in the frame in some order or grouping), and this should be checked before systematically sampling from a frame

Probability Sampling Designs - Systematic Some problems with systematic sampling: The possible values of sample size n are somewhat restricted by the size of the sampling frame, since the interval should divide evenly into the size of the sampling frame If the population itself exhibits some periodicity, then a stratified sample is likely to not be representative In geographic applications, with could be applied in 2 dimensions in (x,y) space with with xand y(which are not necessarily the same) specifying a systematic grid, but the sample size is still restricted by the extent of the study area (since the grid must fit evenly)

Probability Sampling Designs - Stratified We may need to place restrictions on how we select units for inclusion in a sample to ensure a representative sample. Stratified sampling - Divide the population into categories and select a random sample from each of these This approach can be used to decrease the likelihood of an unrepresentative sample if the classes/categories/strata are selected carefully (the individuals within a strata must be very much alike, which means that the population must be able to divided into relatively homogeneous groups) We need to know something about the population in order to make good decisions about stratification

Probability Sampling Designs - Stratified We can take a stratified sample that is Proportional - Where the random sample drawn from each class/category/stratum is the same size OR Disproportional - Where random samples of different sizes are drawn from each class/category/stratum, with the sample size usually being chosen on the basis of the size of that sub-population. This approach is best used when the sizes of the categories are significantly different, although it can also be applied to mitigate cost issues (i.e. it may be more costly to sample in a swamp than in a grassy field, so we might choose to take less samples in the swamp, although this clearly would be nothing to enhance representativeness in our sample)

Pond Branch Catchment Control Color Infrared Digital Orthophotography

Pond Branch Catchment Stratified TMI Sampling Pond Branch TMI Histogram TMI Values at Soil Moisture Sampling Locations using 11.25m PG DEM Percent of cells in catchment 48 44 40 36 32 28 24 20 16 12 8 4 0 4 5 6 7 8 9 10 11 12 13 14 15 16 Topographic Moisture Index 4 5 6 7 8 9 10 11 12 13 14 Topographic Moisture Index Pond Branch Glyndon

Probability Sampling Designs - Stratified WARNING: A class/category/stratum that is homogeneous with respect to one variable may have high variation with respect to another variable! Thus, stratification must be performed with some foreknowledge of how the sample will be analyzed, and if the sampling is being performed in a preliminary fashion (still seeking the relationships), there is a danger that the stratification will be found to be inappropriate after the fact E.g. my soils sampling may have been stratified with respect to TMI, but if I want to check if upstream landuse is a factor in Glyndon, I may find my samples are not representatively distributed with respect to land use

Random Spatial Sampling We can choose a random point in (x,y) space by choosing pairs of random numbers this produces a Poisson distribution if we divide the area into quadrats and count This is easy with rectangular study areas, otherwise we also need to reject any points outside the study area (e.g. my method for selecting the beginning of a transect) We can also produce stratified and systematic point samples by dividing the area into a group of mutually exclusive and collective exhaustive strata:

Data Portrayal Once we have sampled some geographic phenomenon, it is often useful to portray it in some fashion that allows you to get a sense of the values in the dataset Many portrayal approaches still involve reducing the volume of data (and information content), but if applied properly, they can help you see the interesting characteristics of data For the various scales of measurement, there are different approaches that are applicable

Scales of Measurement Thematic data can be divided into four types 1. The Nominal Scale 2. The Ordinal Scale 3. The Interval Scale 4. The Ratio Scale As we progress through these scales, the types of data they describe have increasing information content

Nominal Data From one of my dissertation transect samples, the set of types of segments are nominal data: Class Frequency % of Total Woody 105 32.92 Herbaceous 151 47.34 Water 1 0.31 Normalizing Ground 6 the data, 1.88 Road 23 expressing it 7.21 relative to the Pavement 22 total (some 6.90 Structures 11 caveats here) 3.45

Nominal Data Class Frequency % of Total Woody 105 32.92 Herbaceous 151 47.34 Water 1 0.31 Ground 6 1.88 Road 23 7.21 Pavement 22 6.90 Structures 11 3.45 This is a tabular presentation of data has the advantage of giving the exact quantities, but can be busy, especially in larger tables

Nominal Data Class Frequency Woody 105 Segment Type Frequency 160 140 Herbaceous 151 120 100 Water 1 80 60 40 Ground 6 Road 23 20 0 Pavement 22 Segment Types Structures 11 The frequency of nominal data classes can be well displayed by a bar graph Frequency Woody Herbaceous Water Ground Road Pavement Structures

Class Woody 32.92 Herbaceous 47.34 Water 0.31 Ground 1.88 Road 7.21 Pavement 6.90 Structures 3.45 Nominal Data % of Total Structures 3% Pavement 7% Segment Types Once normalized, the values are well displayed in a pie chart, which emphasizes each category s portion of the whole Road 7% Ground 2% Water 0% Herbaceous 48% Woody 33% Woody Herbaceous Water Ground Road Pavement Structures

Ordinal, Interval, & Ratio Data From my dissertation, the set of all topographic moisture index values drawn from a raster data layer is an example of an interval dataset:

Ordinal, Interval, & Ratio Data Pond Branch is a 37.55 hectare watershed, which is equivalent to 375,500 m 2 (1 hectare = 10,000 m 2 ) Using 11.25m x 11.25m pixels (126.5625 m 2 ), there are ~ 2966 pixels from which we can draw TMI values

Ordinal, Interval, & Ratio Data It would clearly be impractical to try and get a sense of the distribution of TMI values in Pond Branch by looking at a table of 2966 values We need a data reduction approach by which we can reduce the number of values to a manageable amount, which in turn lends itself to some sort of graphical display For ordinal, interval, and ratio scale data, we can make use of histograms for this purpose, and building a histogram involves following a multistep procedure

Building a Histogram 1. Developing an ungrouped frequency table That is, we build a table that counts the number of occurrences of each variable value from lowest to highest: TMI Value Ungrouped Freq. 4.16 2 We could attempt to 4.17 4 construct a bar chart from this table, but it 4.18 0 would have too many bars to really be useful 13.71 1

Building a Histogram 2. Construct a grouped frequency table This table has classes of values (in a sense we are reducing our data back to the ordinal scale for display purposes) The decision on how to perform the grouping is a subjective one, but there are some common guidelines: Use class intervals with simple bounds and a common width (i.e. categories have same range) Adjacent intervals should not overlap (each datum should fit into one class)

Building a Histogram 3. Select an appropriate number of classes There are formulae available to make this decision objectively, but in reality it is a somewhat subjective decision If you have more observations, you usually need more classes, because when you put observations together in a class, you are considering them to have the same value for display purposes there is a trade-off here between simplicity and loss of information (e.g. Pond Branch TMI - 2966 observations grouped into 10 classes)

Building a Histogram 3. Select an appropriate number of classes cont. Class Frequency 4.00-4.99 120 5.00-5.99 807 6.00-6.99 1411 7.00-7.99 407 8.00-8.99 87 9.00-9.99 33 10.00-10.99 17 11.00-11.99 22 12.00-12.99 43 13.00-13.99 19

Building a Histogram 4. Plot the frequencies of each class All that remains is to create the plot: Pond Branch TMI Histogram Percent of cells in catchment 48 44 40 36 32 28 24 20 16 12 8 4 0 4 5 6 7 8 9 10 11 12 13 14 15 16 Topographic Moisture Index

Frequencies & Distributions A histogram is one way to depict a frequency distribution. A loose definition of a frequency: The number of times a variable takes on a particular value (note that any variable has a frequency distribution) E.g. roll a pair of dice several times and record the resulting values (constrained to being between and 2 and 12), counting the number of times any given value occurs (the frequency of that value occurring), and take these all together to form a frequency distribution

Frequencies & Distributions Frequencies can be absolute (when the frequency provided is the actual count of the occurrences of that particular frequency) or they can be relative (when they are normalized by dividing the absolute frequency by the total number of observations to yield a relative frequency between 0 and 1) Relative frequencies are particularly useful if you want to compare distributions drawn from two different sources, i.e. while the numbers of observations of each source may be different, by normalizing them, they can be reasonably compared

Glyndon Segment Length Distributions Upper Baismans Run Percent of all segments in class Percent of all segments in class 100 80 60 40 20 0 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Segment length (meters) Woody Herbaceous Pavement Roads Structures 0 10 20 30 40 50 60 70 80 90 100 Segment length (meters) Woody Herbaceous Pavement Roads Structures

Frequencies & Distributions In addition to the conventional frequencies described thusfar, there is another type of frequency known as a cumulative frequency. Cumulative frequencies are calculated by starting with the lowest class of an observed variable and its frequency and then adding each successive variable value to the preceding sum. Cumulative frequencies are desirable when we want to know what proportion of observations have a value less than some threshold

Frequencies & Distributions For example, here s some frequency data for the woody vegetation class segments distance from streams in Upper Baisman s Run: CLASS MIN. VALUE FREQ. CUM FREQ. 1 0.00000 9.30 9.30 2 23.31757 7.73 17.03 3 46.63514 7.08 24.11 4 69.95271 5.71 29.82 5 93.27028 4.70 34.52 6 116.58785 3.67 38.19 7 139.90542 3.17 41.36 8 163.22300 2.73 44.09 9 186.54057 5.36 49.45

Conventional Baismans Run Primary Class Distance from Stream Distributions Cumulative Percent of all cells in class Percent of all cells in class 30 25 20 15 10 5 0 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 Distance to stream along D8 flow paths (meters) Woody Herbaceous Pavement and Road Structures Ground 100 80 60 40 20 0 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 Distance to stream along D8 flow paths (m eters) Woody Herbaceous Pavement and Road Structures Ground

Frequencies & Distributions By examining the shape of freq. distribution curves we can gain some sense of the distribution through some general characteristics: 1. Modality Most distributions are unimodal, but we might also see bimodal or multi-modal dists. (if unimodal, we can also consider): 2. Symmetry a.k.a. skewness of the distribution Is it positively or negatively skewed? 3. Kurtosis Describes the degree of peakedness or flatness of the curve

Shapes of Histograms Bell Shaped Bimodal Mode: value with highest frequency Range: largest value-smallest value Skewed Random Developing a histogram from attribute data is one level of data reduction; we can describe bell shaped distributions using parameters that provide a more concise summary