Chapter 3: Element sampling design: Part 1

Similar documents
Cluster Sampling 2. Chapter Introduction

Chapter 8: Estimation 1

Bootstrap inference for the finite population total under complex sampling designs

Nonresponse weighting adjustment using estimated response probability

6. Fractional Imputation in Survey Sampling

Unequal Probability Designs

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

Ch 2: Simple Linear Regression

Data Integration for Big Data Analysis for finite population inference

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu

Chapter 2: Simple Random Sampling and a Brief Review of Probability

Confidence Intervals, Testing and ANOVA Summary

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Imputation in Survey Sampling: A Comparative Review

Formal Statement of Simple Linear Regression Model

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Main sampling techniques

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

A comparison of stratified simple random sampling and sampling with probability proportional to size

Bias Variance Trade-off

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

ECON Introductory Econometrics. Lecture 2: Review of Statistics

REMAINDER LINEAR SYSTEMATIC SAMPLING

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Comment on Article by Scutari

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

A measurement error model approach to small area estimation

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

ICES training Course on Design and Analysis of Statistically Sound Catch Sampling Programmes

Linear models and their mathematical foundations: Simple linear regression

CSE 421 Greedy Algorithms / Interval Scheduling

arxiv: v2 [stat.me] 11 Apr 2017

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

REMAINDER LINEAR SYSTEMATIC SAMPLING WITH MULTIPLE RANDOM STARTS

Chapter 2 Inferences in Simple Linear Regression

Ch 3: Multiple Linear Regression

Review of Statistics

Performance Evaluation and Comparison

Statistical Inference

Remedial Measures, Brown-Forsythe test, F test

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

2. AXIOMATIC PROBABILITY

AN UNDERGRADUATE LECTURE ON THE CENTRAL LIMIT THEOREM

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Compatible probability measures

1 Statistical inference for a population mean

The ESS Sample Design Data File (SDDF)

Introduction to Machine Learning. Lecture 2

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Mathematics for Economics MA course

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

Lecture 1 Linear Regression with One Predictor Variable.p2

STA304H1F/1003HF Summer 2015: Lecture 11

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Sets, Functions and Relations

Multiple Regression Analysis: Heteroskedasticity

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Master s Written Examination

From the help desk: It s all about the sampling

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Two-phase sampling approach to fractional hot deck imputation

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Multiple comparisons - subsequent inferences for two-way ANOVA

Multiple Regression Analysis

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

SURVEY SAMPLING. ijli~iiili~llil~~)~i"lij liilllill THEORY AND METHODS DANKIT K. NASSIUMA

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Statistical Inference

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Statistical Hypothesis Testing

Optimal Estimation and Sampling Allocation in Survey Sampling Under a General Correlated Superpopulation Model

Paired comparisons. We assume that

Confidence Intervals and Hypothesis Tests

Evaluating Hypotheses

arxiv: v1 [stat.me] 16 Jun 2016

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

Multiple Linear Regression

Multiple Regression Analysis

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Ch. 1: Data and Distributions

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Simple Linear Regression

Weighting in survey analysis under informative sampling

The Inclusion Exclusion Principle

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

RESEARCH REPORT. Vanishing auxiliary variables in PPS sampling with applications in microscopy.

Statistics for Engineers Lecture 9 Linear Regression

arxiv: v1 [stat.ap] 7 Aug 2007

Hypothesis Testing hypothesis testing approach

Sampling. Jian Pei School of Computing Science Simon Fraser University

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Transcription:

Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014

Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 2 / 31

Simple random sampling Simple Random Sampling Motivation: Choose n units from N units without replacement. 1 Each subset of n distinct units is equally likely to be selected. 2 There are ( N n) samples of size n from N. 3 Give equal probability of selection to each subset with n units. Definition Sampling design for SRS: / (N ) 1 n if A = n P(A) = 0 otherwise. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 3 / 31

Simple random sampling Lemma Under SRS, the inclusion probabilities are π i = n/n π ij = n (n 1) N (N 1) for i j. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 4 / 31

Simple random sampling Theorem Under SRS design, the HT estimator Ŷ HT = N n y i = Nȳ i A is unbiased for Y and has variance of the form where V S 2 = 1 1 1 2 N N 1 (ŶHT ) N N i=1 j=1 = N2 n ( 1 n ) S 2 N (y i y j ) 2 = 1 N 1 N ( yi Ȳ ) 2. i=1 Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 5 / 31

Simple random sampling Theorem (Cont d) Also, the SYG variance estimator is where Thus, under SRS ˆV (ŶHT ) s 2 = 1 n 1 = N2 n ( 1 n ) s 2 N (y i ȳ) 2. i A E(s 2 ) = S 2. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 6 / 31

Simple random sampling Remark (under SRS) 1 n/n is often called the finite population correction (FPC) term. The FPC term can be ignored (FPC. = 1) if the sampling rate n/n is small ( 0.05) or for conservative inference. For n = 1, the variance of the sample mean is 1 ( 1 n ) S 2 = 1 n N N N ( yi Ȳ ) 2 σ 2 Y i=1 Central limit theorem: under some conditions, ˆV ( ) 1/2 Ŷ HT Y = 1 n ȳ Ȳ ( ) N (0, 1). 1 n N S 2 Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 7 / 31

Simple random sampling Remark (under SRS) Sample size determination 1 Choose the target variance V of V (ȳ). 2 Choose n the smallest integer satisfying 1 ( 1 n ) S 2 V. n N For dichotomous y (taking 0 or 1), may use S 2. = P(1 P) 1/4. A simple rule is n d 2, where d is the margin of error. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 8 / 31

Simple random sampling How to select a simple random sample of size n from the finite population? Draw-by-draw procedure Rejective Bernoulli sampling method Sample Reservoir method Random sorting method Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 9 / 31

Simple random sampling Draw-by-draw procedure For example, consider U = {1, 2,, N} and n = 2. In the first draw, select one element with equal probability. In the second draw, select one element with equal probability from U {a 1 } where a 1 is the element selected from the first draw. Let a 2 be the element selected from the second draw. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 10 / 31

Simple random sampling Draw-by-draw procedure (Cont d) P(a 1, a 2 ) = P(a 1 )P(a 2 U {a 1 }) + P(a 2 )P(a 1 U {a 2 }) = = 2 N(N 1). We can prove similar results for general n. (Use mathematical induction). Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 11 / 31

Simple random sampling Rejective Bernoulli sampling method 1 Apply Bernoulli sampling of expected size n. where f = n/n. I 1,, I N Bernoulli(f ) 2 Check if the realized sample size is n. If yes, accept the sample. Otherwise, goto Step 1. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 12 / 31

Simple random sampling Rejective Bernoulli sampling method (Cont d) Justification: ( P I 1, I 2,, I N ) N I i = n i=1 = = 1 ( N n) N i=1 f I i (1 f ) 1 I i ( N ) n f n (1 f ) N n if N i=1 I i = n. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 13 / 31

Simple random sampling Reservoir method (McLeod and Bellhouse, 1983) 1 The first n units are selected into the sample. 2 For each k = n + 1,, N: 1 Select k with probability n/k. 2 If unit k is selected, remove one element from the current sample with equal probability. 3 Unit k takes the place of the removed unit. Note that the population size is not necessarily known. You can stop any time point of the process then you will obtain a simple random sample from the finite population considered up to that time point. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 14 / 31

Simple random sampling Random sorting method 1 A value of an independent uniform variable in [0,1] is allocated to each unit of the population. 2 The population is sorted in ascending (or descending) order. 3 The first n units of the sorted population are selected in the sample. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 15 / 31

SRS with replacement 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 16 / 31

SRS with replacement In with-replacement sampling, order of the sample selection is important. Ordered sample OS = (a 1, a 2,, a n ) where a i is the index of the element in the i-th with-replacement sampling. Sample: A = {k; k = a i for some i, i = 1, 2,, m} SRS with replacement: For each i-th draw, we use a i = k with probability 1/N, k = 1,, N. Sample size is random variable: Note that π k = Pr (k A) = 1 Pr (k / A) ( = 1 1 1 ) n N Thus, n 0 = N k=1 π k = N N ( 1 N 1) n n for n > 2. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 17 / 31

SRS with replacement 1 First, define Z i = y ai = N y k I (a i = k). k=1 Note that Z 1,, Z n are independent random variables since the n draws are independent. 2 Z 1,, Z m are identically distributed since the same probabilities are used at each draw, where E (Z i ) = Ȳ and V (Z i ) = N 1 N k=1 ( yk Ȳ ) 2 σ 2 y. 3 Thus, Z 1,, Z m are IID with mean Ȳ and variance σ2 y. Use z = n k=1 Z k/n to estimate Ȳ. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 18 / 31

SRS with replacement Estimation of Total Unbiased estimator of Y : Variance V (ŶSRSWR ) = N2 n Ŷ SRSWR = N n n y ai = Nȳ n. i=1 ( 1 1 ) S 2 = N2 N n σ2 y V (Ŷ SRS ) where S 2 = (N 1) 1 N i=1 (y i Ȳ N ) 2 = N(N 1) 1 σ 2 y. Variance estimation ˆV (ŶSRSWR ) = N2 n s2 where s 2 = (n 1) 1 n i=1 (y a i ȳ n ) 2. Note that E(s 2 ) = σ 2 y. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 19 / 31

Systematic sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 20 / 31

Systematic sampling Setup: 1 Have N elements in a list. 2 Choose a positive integer, a, called sampling interval. Let n = [N/a]. That is, N = na + c, where c is an integer 0 c < a. 3 Select a random start, r, from {1, 2,, a} with equal probability. 4 The final sample is A = {r, r + a, r + 2a,, r + (n 1)a}, if c < r a = {r, r + a, r + 2a,, r + na}, if 1 r c. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 21 / 31

Systematic sampling Sample size can be random { n if c < r a n A = n + 1 if r c Inclusion probabilities π k = π kl = Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 22 / 31

Systematic sampling Remark This is very easy to do. This is a probability sampling design. This is not measurable sampling design: No design-unbiased estimator of variance (because only one random draw) Pick one set of elements (which always go together) & measure each one: Later, we will call this cluster sampling. Divide population into non-overlapping groups & choose an element in each group: closely related to stratification. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 23 / 31

Systematic sampling Estimation Partition the population into a groups where U i : disjoint Population total where t r = k U r y k. Y = i U U = U 1 U 2 U a y i = a r=1 k U r y k = a r=1 Think of finite population with a elements with measurements t 1,, t a. t r Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 24 / 31

Systematic sampling Estimation (Cont d) HT estimator: if A = U r. Ŷ HT = t r 1/a, Variance: Note that we are doing SRS from the population of a elements {t 1,, t a }. ) ( Var (ŶHT = a2 1 1 ) St 2 1 a where S 2 t = 1 a 1 and t = a r=1 t r /a. When the variance is small? a (t r t) 2 r=1 Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 25 / 31

Systematic sampling Estimation (Cont d) Now, assuming N = na V (ŶHT ) = a (a 1) St 2 a = n 2 a (ȳ r ȳ u ) 2 r=1 where ȳ r = t r /n and ȳ u = t/n. ANOVA: U = a r=1 U r SST = a (y k ȳ u ) 2 = (y k ȳ u ) 2 k U r=1 k U r a a = (y k ȳ r ) 2 + n (ȳ r ȳ u ) 2 r=1 k U r r=1 = SSW + SSB. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 26 / 31

Systematic sampling V (ŶHT ) = na SSB = N SSB = N (SST SSW ). If SSB is small, then ȳ r are more alike and V If SSW is small, then V (ŶHT ) is large. (ŶHT ) is small. Intraclass correlation coefficient ρ measures homogeniety of clusters. ρ = 1 n SSW n 1 SST More details about ρ will be covered in the cluster sampling. (Chapter 6). Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 27 / 31

Systematic sampling Comparison between systematic sampling (SY) and SRS How does SY compare to SRS when the population is sorted by the following way? 1 Random ordering: Intuitively should be the same 2 Linear ordering: SY should be better than SRS 3 Periodic ordering: if period = a, SY can be terrible. 4 Autocorrelated order: Successive y k s tend to lie on the same side of ȳ u. Thus, SY should be better than SRS. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 28 / 31

Systematic sampling How to quantify? : V SRS (ŶHT ) V SY (ŶHT ) = N2 n = n 2 a ( 1 n ) 1 N N 1 a (ȳ r ȳ u ) 2 r=1 N ( ) 2 yk Ȳ N k=1 Cochran (1946) introduced superpopulation model to deal with this problem. (treat y k as a random variable) Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 29 / 31

Systematic sampling Example: Superpopulation model for a population in random order. Denote the model by ζ: {y k } iid ( µ, σ 2) E ζ { V SRS (ŶHT )} E ζ { V SY (ŶHT )} = N2 n = N2 n ( 1 n ) σ 2 N ( 1 n ) σ 2 N Thus, the model expectations of the design variances are the same under the IID model. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 30 / 31