SAMPLING BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :55. BIOS Sampling

Similar documents
SAMPLING II BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :37. BIOS Sampling II

SAMPLING III BIOS 662

Point and Interval Estimation II Bios 662

You are allowed 3? sheets of notes and a calculator.

Lesson 6 Population & Sampling

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

7.1 Basic Properties of Confidence Intervals

Power and Sample Size Bios 662

We would like to describe this population. Central tendency (mean) Variability (standard deviation) ( X X ) 2 N

Figure Figure

Sample size and Sampling strategy

WISE International Masters

Special Topic: Bayesian Finite Population Survey Sampling

MN 400: Research Methods. CHAPTER 7 Sample Design

Analysis of Variance II Bios 662

Math 181B Homework 1 Solution

Cluster Sampling 2. Chapter Introduction

Unequal Probability Designs

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

STAT 512 sp 2018 Summary Sheet

Variance estimation on SILC based indicators

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Sampling: What you don t know can hurt you. Juan Muñoz

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Session 2.1: Terminology, Concepts and Definitions

Variance reduction. Timo Tiihonen

Cross-sectional variance estimation for the French Labour Force Survey

Topic 12 Overview of Estimation

Introduction to Statistical Data Analysis Lecture 4: Sampling

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

VOL. 3, NO.6, June 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Estimation of uncertainties using the Guide to the expression of uncertainty (GUM)

Tribhuvan University Institute of Science and Technology 2065

The Central Limit Theorem

BIOS 625 Fall 2015 Homework Set 3 Solutions

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503

Estimation Techniques in the German Labor Force Survey (LFS)

1 Outline. Introduction: Econometricians assume data is from a simple random survey. This is never the case.

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

One-sample categorical data: approximate inference

What if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

STA304H1F/1003HF Summer 2015: Lecture 11

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Petter Mostad Mathematical Statistics Chalmers and GU

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

STAT 830 Bayesian Estimation

Lecture 5: Sampling Methods

New Developments in Econometrics Lecture 9: Stratified Sampling

Simple linear regression

Mathematical statistics

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

Statistics. Statistics

Non-uniform coverage estimators for distance sampling

Estimation of Parameters and Variance

The ESS Sample Design Data File (SDDF)

Discrete Random Variables

Inference for Regression

Data Integration for Big Data Analysis for finite population inference

Main sampling techniques

Review of the Normal Distribution

Theory of Statistics.

Last few slides from last time

Business Statistics. Lecture 5: Confidence Intervals

Statistics for Business and Economics

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

X i. X(n) = 1 n. (X i X(n)) 2. S(n) n

ECEN 689 Special Topics in Data Science for Communications Networks

multilevel modeling: concepts, applications and interpretations

Survey Sample Methods

Survey of Smoking Behavior. Samples and Elements. Survey of Smoking Behavior. Samples and Elements

Problem 1 (20) Log-normal. f(x) Cauchy

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Econ 325: Introduction to Empirical Economics

arxiv: v2 [math.st] 20 Jun 2014

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

SDS 321: Practice questions

Overview Scatter Plot Example

HT Introduction. P(X i = x i ) = e λ λ x i

Problems for Chapter 15: Sampling Distributions. STAT Fall 2015.

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

1 Basic continuous random variable problems

STAT 285: Fall Semester Final Examination Solutions

Analysis of Variance Bios 662

1 Statistical inference for a population mean

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

13 Simple Linear Regression

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Survey of Smoking Behavior. Survey of Smoking Behavior. Survey of Smoking Behavior

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Estimation and Confidence Intervals

Lecture 01: Introduction

Review of probability and statistics 1 / 31

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Transcription:

SAMPLIG BIOS 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-11-14 15:55 BIOS 662 1 Sampling

Outline Preliminaries Simple random sampling Population mean Population total Sample size Proportion BIOS 662 2 Sampling

Preliminaries: References SK Thompson. Sampling. John Wiley and Sons, 1992 L Kish. Survey sampling. Wiley, ew York, 1965 WG Cochran. Sampling Techniques. John Wiley and Sons, 1977 BIOS 662 3 Sampling

Preliminaries: What is (Survey) Sampling? Sampling study: selecting some part of a population to be observed so that one may estimate something about the whole of the population Eg, to estimate the amount of lichen in a well-defined area, a biologist collects lichen from selected small plots within the study area Typically want to estimate total or mean Observational - does not intentionally perturb or disturb population (i.e., not experimental) However one does have control over how the sample is selected BIOS 662 4 Sampling

Preliminaries: Terminology Population: The group of units (e.g., people) we are sampling and studying. Assumed to be of known, finite size. Sampling Design: The strategy followed in selecting a sample from a population Sampling Unit: Unit designated for listing and selection in a sample survey (e.g., persons, dwellings, households, area units, pharmacies) Sampling Frame: List of sampling units from which a sample is drawn Variable: Some measurement taken on members of the sample (e.g., number of children ever born to a woman aged 15-49 years); sometimes call this the y-variable or x-variable BIOS 662 5 Sampling

Preliminaries: Terminology Selection probability: Likelihood over repeated applications of sampling design that a particular unit will be chosen for a sample Probability Sampling Sampling in which the design calls for using random methods to ultimately decide which units are chosen Every unit has a known, nonzero selection probability Equal-Probability Sampling Probability sampling in which all units in the population have the same selection probability AKA self-weighted sampling or epsem (equal probability of selection method) sampling BIOS 662 6 Sampling

Preliminaries: Terminology on-probability Sampling Sampling in which subjective judgment (usually by interviewers) is used to ultimately decide who is chosen in the sample Selection probabilities cannot be determined Difficult to determine if sample is representative (i.e., includes members from all relevant segments of the population) BIOS 662 7 Sampling

Preliminaries: Terminology Unbiased estimator: An estimator which, if repeated over all possible samples that might be selected using a sampling design, would yield estimates which on average equal the parameter being estimated (e.g., sample mean from a simple random sample is an unbiased estimator of the population mean) AKA design-unbiased Key idea: the randomness in the estimator is induced by the sampling design BIOS 662 8 Sampling

Preliminaries: Software SAS: Proc Surveymeans, Surveyfreq,... R: survey package BIOS 662 9 Sampling

Preliminaries: Sampling Designs Simple Random Sampling Stratified Sampling Cluster Sampling BIOS 662 10 Sampling

Simple Random Sampling (SRS) Let denote the number of units in the population Simple random sampling, or random sampling without replacement, is the sampling design in which n distinct units are selected from the units in the population in such a way that every possible combination of the n units is equally likely to be the sample selected (SRSWOR) SRS sample can be obtained through a sequence of independent selections from the whole population where each unit has an equal probability of selection at each step, discarding repeat selections and continuing until n distinct units are obtained f n/ sampling rate BIOS 662 11 Sampling

Obtaining an SRS sample A. umber the units in the population (i.e., sampling frame) from 1 to. B. Select and record a random number between 1 and. C. Select a second random number between 1 and. If this number is the same as the first selected number, discard it. Otherwise, record it. D. Select another random number between 1 and. If this number is the same as a previously selected number, discard it. Otherwise, record it. E. Continue in this manner until n different numbers between 1 and have been chosen. F. Population units corresponding to selected numbers are an SRS sample of size n. BIOS 662 12 Sampling

Key properties of SRS All possible SRS samples have the same chance of being selected. The probability that any one population unit will be chosen is n/. Selection probabilities in an SRS are not statistically independent. BIOS 662 13 Sampling

SRS: Estimating population mean Denote (finite) population mean by µ = 1 y i i=1 Denote (finite) population variance by σ 2 = 1 1 i=1 (y i µ) 2 Let Z i indicate whether unit i is in sample with Z i = 1 if sampled, Z i = 0 otherwise Key: y i s are fixed, Z i s are random BIOS 662 14 Sampling

SRS: Estimating population mean Sample mean Sample variance ȳ = 1 n s 2 = 1 n 1 y i Z i i=1 i=1 (y i ȳ) 2 Z i Sample mean unbiased: Each Z i is Bernoulli with E(Z i ) = n/, thus E(ȳ) = 1 n i E(Z i ) = i=1y 1 y i = µ i=1 BIOS 662 15 Sampling

SRS: Estimating population mean To derive variance of sample mean, { Var(ȳ) = 1 n i=1y 2 2 i Var(Z i ) + y i y j Cov(Z i,z j ) i j we need var and cov terms } Variance easy since Z i Bernoulli Var(Z i ) = n ( 1 n ) For SRS, Z i s not independent Cov(Z i,z j ) = E(Z i Z j ) E(Z i )E(Z j ) = n = n ( 1 n ) 1 1 (n 1) ( n ) 2 ( 1) BIOS 662 16 Sampling

SRS: Estimating population mean Thus Var(ȳ) = 1 n 2 ( n )( 1 n ) { } y 2 i 1 i=1 1 y i y j i j Using the identity i=1 we get (y i µ) 2 = Var(ȳ) = 1 n y 2 i ( y i ) 2 i=1 = 1 { ( 1 n ) (y i µ) 2 1 ( 1) = i=1 } y 2 i y i y j i j ( 1 n ) σ 2 n BIOS 662 17 Sampling

SRS: Estimating population mean The quantity 1 n = n = 1 f is called the finite population correction factor If the population is large relative to the sample size, n/ will be small, such that Var(ȳ) σ 2 n On the other hand, Var(ȳ) 0 as n BIOS 662 18 Sampling

SRS: Estimating population variance Homework problem? Show E(s 2 ) = σ 2, i.e., the sample variance is an unbiased estimator of the finite population variance From this fact, it follows that an unbiased estimator of Var(ȳ) is given by ( Var(ȳ) = 1 n ) s 2 n BIOS 662 19 Sampling

SRS: Estimating population total Define population total Unbiased estimator with variance τ = y i = µ i=1 ˆτ = ȳ = n Unbiased estimator of variance y i Z i i=1 Var( ˆτ) = 2 Var(ȳ) = ( n) σ 2 Var( ˆτ) = 2 Var(ȳ) = ( n) s2 n n BIOS 662 20 Sampling

SRS: Estimating population total Often estimator written as where w 1 i ˆτ = w i y i Z i = i=1 y i Z i i=1 π i = π i = n/ = f is the selection probability Inverse probability weighting This is the formulation SAS uses (more below) Special case of the Horvitz-Thompson estimator BIOS 662 21 Sampling

SRS: Finite-population CLT Usual CLT requires independence Imagine a sequence of populations with population size becoming large along with sample size n. Let µ be the population mean and ȳ be the sample mean for a SRS from that population. According to the finite-population CLT as both n and n ȳ µ Var(ȳ ) Z (0,1) BIOS 662 22 Sampling

SRS: Finite-population CIs This leads to approximate (1 α) 100% CIs for the population mean µ ( ȳ ±t n 1,1 α/2 1 n ) s 2 n Likewise for population total τ ˆτ ±t n 1,1 α/2 ( n) s2 n BIOS 662 23 Sampling

SRS: Example Example (Section 2.3 Thompson 1992) Survey of caribou population via aircraft. A 286 mile wide study region divided into exhaustive and mutually exclusive one-mile strips ( = 286). A SRS of n = 15 yielded counts of 1, 2, 4, 4, 5, 7, 10, 15, 21, 21, 29, 36, 50, 86, 98 Sample mean ȳ = 25.933; sample variance s 2 = 919.067 Thus Var(ȳ) = yielding 95% CI for µ ( 1 15 ) 919.067 286 15 = 58.058 25.933 ± 2.145 58.058 = (9.59,42.28) BIOS 662 24 Sampling

SRS: Example using SAS proc surveymeans total=286; run; var counts; Std Error Variable Mean of Mean -------------------------------------------------------- counts 15 25.933333 7.619553 -------------------------------------------------------- Statistics Variable 95% CL for Mean --------------------------------- counts 9.59101702 42.2756497 --------------------------------- BIOS 662 25 Sampling

SRS: Example with SAS For population total, ˆτ = ȳ = 7417 with 95% CI 7417 ± 2.145 286(286 15) 919.067 = (2743, 12091) 15 SAS proc surveymeans total=286 sum clsum; var counts; weight wt; * equals /n=286/15; run; Variable Sum Std Dev 95% CL for Sum -------------------------------------------------------------- counts 7416.933333 2179.192221 2743.03087 12090.8358 -------------------------------------------------------------- BIOS 662 26 Sampling

SRS: Sample Size Suppose we want to choose the smallest sample size n such that Pr[ ˆθ θ > d] α Assuming choose n such that ˆθ θ Var( ˆθ) (0,1) z 1 α/2 Var( ˆθ) = d BIOS 662 27 Sampling

SRS: Sample Size For example, if θ = µ, choose n such that This implies n = z 1 α/2 ( 1 n ) σ 2 n = d 1 1/n 0 + 1/ where n 0 = z2 1 α/2 σ 2 d 2 ote if >> n, then n n 0 BIOS 662 28 Sampling

SRS: Sample Size If θ = τ, choose n such that z 1 α/2 implying n = ( n) σ 2 n = d 1 1/n 0 + 1/ where n 0 = 2 z 2 1 α/2 σ 2 d 2 Example: Find n necessary to estimate caribou population total to within 2000 animals of true total with 90% confidence. σ 2 = 919, d = 2000, α =.1 n 0 = 50.9, n = 43.2 BIOS 662 29 Sampling

SRS: Estimating a proportion Suppose responses are binary, e.g., want to estimate the proportion of voters favoring a candidate for elected office Let y i = 1 if unit i has attribute of interest, y i = 0 otherwise Then µ is the proportion of units in the populations with the attribute Thus can use methods from before. However, certain special features: Formulae simplify considerably Exact confidence intervals are possible Sample size does not require information about population parameters BIOS 662 30 Sampling

SRS: Estimating a proportion Let proportion of population with attribute be Finite population variance σ 2 = i=1(y i p) 2 1 p = 1 y i = µ i=1 = i=1 y 2 i p 2 1 Proportion in sample with attribute Sample variance s 2 = i=1(y i ȳ) 2 Z i n 1 ˆp = 1 n y i Z i = y i=1 = p p2 1 = i=1 y 2 i Z i n ˆp 2 n 1 = = n ˆp(1 ˆp) n 1 p(1 p) 1 BIOS 662 31 Sampling

SRS: Estimating a proportion Since sample proportion is the sample mean of a SRS, all previous results hold. In particular: E( ˆp) = p Var( ˆp) = Var( ˆp) = ( ) n p(1 p) 1 n ( ) n ˆp(1 ˆp) n 1 E( Var( ˆp)) = Var( ˆp) BIOS 662 32 Sampling

SRS: CI for a proportion Approximate (1 α)% CI ˆp ±t 1 α/2,n 1 Var( ˆp) approximation improves as n increases and the closer p is to 0.5 Exact CI can be computed based on inverting a test and the hypergeometric distribution BIOS 662 33 Sampling

SRS: CI for a proportion Suppose ν units in the population have the attribute of interest. Let X = i=1 y i Z i. Then ( ν ν ) j)( Pr[X = j ν] = n j ( n) Suppose we observe X = x for one particular SRS (such that ˆp = x/n) Let ν L be the smallest integer such that Pr[X x ν L ] > α/2 and let ν U be the largest integer such that Pr[X x ν U ] > α/2 Then exact (1 α)% CI given by (ν L /,ν U /) BIOS 662 34 Sampling

SRS: SS for a proportion To obtain an estimator ˆp having probability at least 1 α of being no farther than d from the population proportion n = p(1 p) ( 1)d 2 /z 2 1 α/2 + p(1 p) If >> n n z2 1 α/2p(1 p) d 2 If no a-priori knowledge of p, conservatively assume p =.5 BIOS 662 35 Sampling

SRS: Estimating a ratio Example: Biologist studying animal population selects a SRS of plots in the study region. In each selected plot, she counts the number y i of young animals and the number x i of adult females, with the object of estimating the ratio of young to adult females in the population Example: In a household survey to estimate the number of television sets per person in the region, an SRS of households is conducted. For each selected household the number y i of television sets and the number x i of people is recorded Ratio estimator r = i=1 y i Z i = ȳ i=1 x i Z i x ote denominator of estimator is a random variable BIOS 662 36 Sampling

SRS: Concluding remarks SRS is the simplest probability sampling method Only rarely used in practice. Exception: sample size and population small, stratified sampling not possible BIOS 662 37 Sampling