Formalizing the Concepts: Simple Random Sampling Juan Muñoz Kristen Himelein March 2013
Purpose of sampling To study a portion of the population through observations at the level of the units selected, such as households, persons, institutions or physical objects and make quantitative statements about the entire population
Why sampling? Purpose of sampling Saves cost compared to full enumeration Easier to control quality of sample More timely results from sample data Measurement can be destructive
Sampling Concepts and Definitions Unit of analysis The level at which a measurement is taken Most common units of analysis are persons, households, farms, and economic establishments
Sampling Concepts and Definitions Target population or universe The complete collection of all the units of analysis to study. Examples: population living in households in a country; students in primary schools
Sampling Concepts and Definitions Sampling frame List of all the units of analysis whose characteristics are to be measured Comprehensive, non-overlapping and must not contain irrelevant elements Units must be identifiable (often linked to cartography) Should be updated to ensure complete coverage Examples: list of establishments; census; civil registration
Sampling Concepts and Definitions Parameter / Estimate Objective of sampling is to estimate parameters of a population Quantity computed from all N values in a population set Typically, a descriptive measure of a population, such as mean, variance Poverty rate, average income, etc.
Sampling Concepts and Definitions Unbiased Estimator Estimator - mathematical formula or function using sample results to produce an estimate for the entire population ˆ ˆ( X,..., When the mean of individual sample estimates equals the population parameter, then the estimator is unbiased Formally, an estimator is unbiased if the expected value of the (sample) estimates is equal to the (population) parameter being estimated (where k is the number of experiments)., X ˆ ˆ ˆ 1 2... k k X 1 2 n ) k
Random sampling Also known as scientific sampling or probability sampling Each unit has a non-zero and known probability of selection Mathematical theory is available to predict the probability distribution of the sampling error (the error caused by observing a sample instead of the whole population).
Random sampling techniques Single stage, equal probability sampling Simple Random Sampling (SRS) Systematic sampling with equal probability Stratified sampling Multi-stages sampling In real life those techniques are usually combined in various ways most sampling designs are complex
Techniques in Random Sampling Single stage, equal probability sampling Random selection of n units from a population of N units, so that each unit has an equal probability of selection N (population ) n (sample) Probability of selection (sampling fraction) = f = n/n Is the most basic form of probability sampling and provides the theoretical basis for more complicated techniques
Techniques in Random Sampling Single stage, equal probability sampling (continued) 1. Simple Random Sampling. The investigator mixes up the whole target population before grabbing n units. 2. Systematic Random Sampling. The N units in the population are ranked 1 to N in some order (e.g., alphabetic). To select a sample of n units, calculate the step k ( k= N/n) and take a unit at random, from the 1st k units and then take every k th unit.
Techniques in Random Sampling Single stage, equal probability sampling Advantage self-weighting (simplifies the calculation of estimates and variances) Disadvantages Sample frame may not be available May entail high transportation costs
Techniques in Random Sampling Stratified sampling The population is divided into mutually exclusive subgroups called strata. Then a random sample is selected from each stratum. Common examples : Urban / Rural, Provinces, Male / Female
Techniques in Random Sampling Two-stage sampling Units of analysis are divided into groups called Primary Sampling Units (PSUs) A sample of PSUs is selected first Then a sample of units is chosen in each of the selected PSUs This technique can be generalized (multi-stage sampling)
Sample variance & standard error Uncertainty is measured by the standard error (ê). Variance of the sample mean of an SRS of n units for a population of size N : e ˆ2 Var ( x) N N n 1 Var ( X ) n 1 Measure of sampling error. Depends on 3 factors: ( 1 - n/n ) = Finite Population Correction (fpc) n = sample size Var(X) = Population variance. Unknown, but can be estimated without bias by: sˆ 2 x n N Var ( X ) n n i1 ( x i x) n 1 2
Sample Variance in Proportions A proportion P (or prevalence) is equal to the mean of a dummy variable. In this case Var(P) = P(1-P), and Var ( pˆ) pˆ(1 pˆ) n 1
Standard deviation vs standard error Population Sample 2 N = variance of the population = standard deviation around the mean s s 2 n = variance of the sample = standard error Difference: The standard deviation is a descriptive statistic. It is degree to which individuals in the population differ from the mean of the population. The standard error is an estimate of how close to the population mean your sample mean is likely to be. Standard errors decrease with sample size. Standard deviations are left unchanged.
Sample Standard Error n = 100 n = 750 Bigger samples have smaller standard errors around the mean
Confidence intervals o Estimates obtained from random samples can be accompanied by measures of the uncertainty associated with the estimate called confidence intervals. o It is not sufficient to simply report the sample proportion obtained by a candidate in the sample survey, we also need to give an indication of how accurate the estimate is.
Confidence intervals for averages x t eˆ ( x ) where: t α = 1.28 for confidence level α = 80% t α = 1.64 for confidence level α = 90% t α = 1.96 for confidence level α = 95% t α = 2.58 for confidence level α = 99%
Confidence intervals for proportions In a sample of 1,000 electors, 280 of them (28 percent) say they will vote Green. e Var( pˆ) pˆ(1 pˆ) n 1 0.280.72 999 Standard error is 1.42 percent.
Confidence intervals In a sample of 1,000 electors, 280 of them (28 percent) say they will vote Green. Standard error is 1.42 percent. Standard error 24 25 26 27 28 29 30 31 32 95 percent confidence interval 28 ± 1.42 1.96 99 percent confidence interval 28 ± 1.42 2.58
Sample Size The required sample size n is determined by The variability of the parameter Var(X) Though this is unknown The maximum margin of error E we are willing to accept How confident we want to be in that the error of our estimation will not exceed that maximum For each confidence level α there is a coefficient t α The size of the population (not very important) n t 2 Var ( X 2 E ) n t 2 P(1 2 E P) n N n 1 n N