Unequal Probability Designs

Size: px

Start display at page:

Download "Unequal Probability Designs"

Briana Phillips
5 years ago
Views:

1 Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014

2 Section 7.11 and 7.12

3 Probability Sampling Designs: A quick review A probability sampling design defines a random mechanism to decide which subset of a finite population is included in the sample. In comparison, a representative sample can be taken by the judgement of the samplers: these units are typical of the population. When appropriately used, a probability sampling plan avoids human bias, allows us to give an error assessment of the estimators.

4 Is a sampling plan biased? Suppose I want to get an idea on how well my students are prepared for the final exam. Let me randomly get three students in the class and check their readiness. This sample is biased because less prepared students likely miss more classes. More precisely, it is because the inclusions probability of individuals are unequal.

5 Is a sampling plan biased? Suppose I want to get an idea on how well my students are prepared for the final exam. Let me randomly get three students in the class and check their readiness. This sample is biased because less prepared students likely miss more classes. More precisely, it is because the inclusions probability of individuals are unequal.

6 Is a bias sampling plan evil? Often, a textbook will instruct student that because the plan is biased, the conclusion will be biased. I must declare this is only partly true. The following statement is more accurate. The conclusion will be biased if the statistician fails to quantify the effect of the bias in the sampling and accommodates it into analysis. We often contradict ourselves unknowingly.

7 Is a bias sampling plan evil? Often, a textbook will instruct student that because the plan is biased, the conclusion will be biased. I must declare this is only partly true. The following statement is more accurate. The conclusion will be biased if the statistician fails to quantify the effect of the bias in the sampling and accommodates it into analysis. We often contradict ourselves unknowingly.

8 Is a bias sampling plan evil? Often, a textbook will instruct student that because the plan is biased, the conclusion will be biased. I must declare this is only partly true. The following statement is more accurate. The conclusion will be biased if the statistician fails to quantify the effect of the bias in the sampling and accommodates it into analysis. We often contradict ourselves unknowingly.

9 Examples of biased and unbiased sampling plans SRSWOR is an unbiased sampling plan: all units in the population have equal probability to be included. The systematic sampling plan is unbiased. Cluster sampling plan on populations with equal cluster sizes is unbiased. (some details will be given).

10 Examples of biased and unbiased sampling plans Most artificial probability sampling plans we used for illustration purpose are biased. Cluster sampling plan on populations with unequal cluster sizes is biased. (some details must be added). Stratified SRSWOR is usually biased unless the proportional allocation is used.

11 Examples of biased and unbiased sampling plans Most artificial probability sampling plans we used for illustration purpose are biased. Cluster sampling plan on populations with unequal cluster sizes is biased. (some details must be added). Stratified SRSWOR is usually biased unless the proportional allocation is used.

12 Examples of biased and unbiased sampling plans Most artificial probability sampling plans we used for illustration purpose are biased. Cluster sampling plan on populations with unequal cluster sizes is biased. (some details must be added). Stratified SRSWOR is usually biased unless the proportional allocation is used.

13 What do we mean by biased A probability sampling plan is biased when the inclusion probabilities π i are not all equal for i = 1, 2,..., N. do you still remember π i? do you remember π i,j? Even if a plan (sampling design) is unbiased, the joint inclusion probabilities π i,j are not required to be equal.

14 What do we mean by biased A probability sampling plan is biased when the inclusion probabilities π i are not all equal for i = 1, 2,..., N. do you still remember π i? do you remember π i,j? Even if a plan (sampling design) is unbiased, the joint inclusion probabilities π i,j are not required to be equal.

15 What do we mean by biased A probability sampling plan is biased when the inclusion probabilities π i are not all equal for i = 1, 2,..., N. do you still remember π i? do you remember π i,j? Even if a plan (sampling design) is unbiased, the joint inclusion probabilities π i,j are not required to be equal.

16 Stratified SRSWOR as an unequal probability plan Suppose a finite population is stratified with stratum sizes N 1, N 2,..., N G. An SRSWOR of size n g is taken from the gth stratum (independently in different strata). For a unit in the gth stratum, its inclusion probability is π g,i = n g /N g. Unless n 1 /N 1 = n 2 /N 2 = = n G /N G, this plan is biased.

17 Some properties of the unequal probability sampling plan Suppose we somehow have managed a probability sampling design (without replacement) with unequal probability of selection (unequal π i ). Consider the plan where the sample size n is non-random.

18 Let δ i be an indicator about whether unit i is in the sample or not. It is seen that this δ i is random and N δ i = n i=1 because the sum on the left is a count of now many sampling units are in the sample. Taking expectation on two sides we get N π i = n i=1 That is, the total inclusion probabilities across all sampling units in the population is fixed at n.

19 What changes are needed to N π i = n i=1 when n is random in a sampling plan?

20 Similarly with obvious notation, we should have N δ ij = n(n 1) i=1 j:j i which leads to N π ij = n(n 1). i=1 j:j i Try to show other identities in (7.33) of the textbook for yourselves.

21 Horvitz-Thompson Estimator (7.12) Let us change our general setting slightly. We consider the situation where a finite population has N sampling units (listed in a sampling frame) A probability sampling plan has been carried out, where the sample size n is allowed to be random. The inclusion probability of unit i under this plan remains denoted as π i.

22 Recall that we denote sampling weight of a sampling unit as w i = 1/π i. It is the number of units in the population this unit is representing. For this view, the population total Y is sensibly estimated by Ŷ HT = n w i y i = i=1 n i=1 y i π i. The above estimator is named as Horvitz-Thompson estimator.

23 Bias of HT-estimator Let δ i = 1 only if the ith unit in the population is included in the sample. The HT-estimator can also be written as n w i y i = i=1 N δ i w i y i. i=1 What is random in this estimator? δ i is random. How should we compute E{δ i } and what it equals? E(δ i ) = π i = 1/w i.

24 Bias of HT-estimator Let δ i = 1 only if the ith unit in the population is included in the sample. The HT-estimator can also be written as n w i y i = i=1 N δ i w i y i. i=1 What is random in this estimator? δ i is random. How should we compute E{δ i } and what it equals? E(δ i ) = π i = 1/w i.

25 Bias of HT-estimator Let δ i = 1 only if the ith unit in the population is included in the sample. The HT-estimator can also be written as n w i y i = i=1 N δ i w i y i. i=1 What is random in this estimator? δ i is random. How should we compute E{δ i } and what it equals? E(δ i ) = π i = 1/w i.

26 Bias of HT-estimator Let δ i = 1 only if the ith unit in the population is included in the sample. The HT-estimator can also be written as n w i y i = i=1 N δ i w i y i. i=1 What is random in this estimator? δ i is random. How should we compute E{δ i } and what it equals? E(δ i ) = π i = 1/w i.

27 Bias of the HT-estimator It is therefore seen n E( w i y i ) = i=1 N E(δ i )w i y i = Y. i=1 What have we assumed here? π i > 0 for all i in the population. When π i > 0 for all i, HT-estimator is unbiased for the population total Y.

28 Bias of the HT-estimator It is therefore seen n E( w i y i ) = i=1 N E(δ i )w i y i = Y. i=1 What have we assumed here? π i > 0 for all i in the population. When π i > 0 for all i, HT-estimator is unbiased for the population total Y.

29 Bias of the HT-estimator It is therefore seen n E( w i y i ) = i=1 N E(δ i )w i y i = Y. i=1 What have we assumed here? π i > 0 for all i in the population. When π i > 0 for all i, HT-estimator is unbiased for the population total Y.

30 HT-estimator under stratified SRSWOR Under stratified SRSWOR, π g,i = n g /N g, hence w i = N g /n g. Therefore, the HT-estimator is Ŷ HT = g n g (N g /n g )y gi = Nȳ st. i=1 That is, if translated to estimator of Ȳ, HT-estimator is simply the stratified sample mean under this design. The moral is: we often find the trace of a more advanced method in commonly used methods.

31 HT-estimator under stratified SRSWOR Under stratified SRSWOR, π g,i = n g /N g, hence w i = N g /n g. Therefore, the HT-estimator is Ŷ HT = g n g (N g /n g )y gi = Nȳ st. i=1 That is, if translated to estimator of Ȳ, HT-estimator is simply the stratified sample mean under this design. The moral is: we often find the trace of a more advanced method in commonly used methods.

32 Variance of the HT-estimator We clearly have Var( n i=1 w i y i ) = i,j Cov(δ i, δ j )w i w j y i y j Do not be scared by this complex summation. There are two cases for Cov(, ): When i = j, we have Cov(δ i, δ j ) = π i (1 π i ); When i j, we have Cov(δ i, δ j ) = π ij π i π j Note that π ii = π i, so Cov(δ i, δ j ) = π ij π i π j is right in both cases.

33 Variance of the HT-estimator Sub Cov(δ i, δ j ) = π ij π i π j into the following, Var( n i=1 w i y i ) = i,j Cov(δ i, δ j )w i w j y i y j, we get (7.35) Var( n i=1 w i y i ) = i,j (π ij π i π j )(y i /π i )(y j /π j ).

34 We may equivalently have it written as n Var( w i y i ) = (π i π j π ij )(y i /π i y j /π j ) 2. i=1 i<j Let us do it in class with your help.

35 We may equivalently have it written as n Var( w i y i ) = (π i π j π ij )(y i /π i y j /π j ) 2. i=1 i<j Let us do it in class with your help.

36 How to estimate the above variance? There are two suggestions. One is to estimate it as v 1 (ŶHT ) = n i=1 1 π i πi 2 yi 2 + i j π ij π i π j y i y j π i π j π ij

37 The other is by v 2 (Ŷ HT ) = 1 i<j n Both are unbiased estimators of Var(ŶHT ). (π i π j π ij ) { y i y j } 2. π ij π i π j Both are mathematically imperfect as they may take negative values. If you work hard enough, you will find such imperfectness is not a problem under stratified srswor.

38 Variance estimators of HT-estimate under stratified srswor Using conventional notation, and under stratified srswor, we have π 1 ij (π i π j π ij ) = 0 when two units are in two different strata. For two units in the same stratum g, π 1 ij (π i π j π ij ) = (N g n g )/(n g 1) = (1 f g )/(n g 1). Let us use them in (7.36).

39 Variance estimators of HT-estimate under stratified srswor Recall (7.36) v( n i=1 w i y i ) = i<j π 1 ij (π i π j π ij )(y i /π i y j /π j ) 2 Hence, under stratified srswor, we have n n G g v( w i y i ) = {(1 f g )/(n g 1)}(Ng 2 /ng 2 ) (y gi y gj ) 2 i=1 g=1 i<j = N 2 Wg 2 (1 f g )sg 2 /n g. g I have abused notation quite badly.

40 Variance estimators of HT-estimate under stratified srswor You should notice that v( n i=1 w i y i ) = N 2 g W 2 g (1 f g )s 2 g /n g = N 2 v(ȳ st ). I have skipped many details. One will not get full mark in final exam until these details are included and explained.

41 Variance estimators of HT-estimate under stratified srswor You should notice that v( n i=1 w i y i ) = N 2 g W 2 g (1 f g )s 2 g /n g = N 2 v(ȳ st ). I have skipped many details. One will not get full mark in final exam until these details are included and explained.

42 Estimate the population mean If ŶHT is a good estimator of the population total, it makes sense to estimate the population mean by ˆȲ HT = Ŷ HT /N. We could estimate the mean by Ȳ HT = ŶHT n i=1 w i instead.

43 The second estimator makes sense when N is not known as it could happen in cluster sampling, particularly when the cluster sizes are not equal. It also works better when n i=1 w i differs a lot from N.

44 PPS design motived from variance of HT estimator Staring at the following formula a bit longer n Var( (π i π j π ij )(y i /π i y j /π j ) 2 i=1 w i y i ) = i<j we may conclude that if y i /π i = c for all units in the population, then n Var(Ŷ HT ) = Var( w i y i ) = 0. i=1 This is wonderful except the idea is not feasible in applications: we have to know all y i values to design such a plan.

45 PPS design motived from variance of HT estimator Staring at the following formula a bit longer n Var( (π i π j π ij )(y i /π i y j /π j ) 2 i=1 w i y i ) = i<j we may conclude that if y i /π i = c for all units in the population, then n Var(Ŷ HT ) = Var( w i y i ) = 0. i=1 This is wonderful except the idea is not feasible in applications: we have to know all y i values to design such a plan.

46 PPS design motived from variance of HT estimator Staring at the following formula a bit longer n Var( (π i π j π ij )(y i /π i y j /π j ) 2 i=1 w i y i ) = i<j we may conclude that if y i /π i = c for all units in the population, then n Var(Ŷ HT ) = Var( w i y i ) = 0. i=1 This is wonderful except the idea is not feasible in applications: we have to know all y i values to design such a plan.

47 PPS design motived from variance of HT estimator Staring at the following formula a bit longer n Var( (π i π j π ij )(y i /π i y j /π j ) 2 i=1 w i y i ) = i<j we may conclude that if y i /π i = c for all units in the population, then n Var(Ŷ HT ) = Var( w i y i ) = 0. i=1 This is wonderful except the idea is not feasible in applications: we have to know all y i values to design such a plan.

48 PPS based on auxiliary information Knowing all y i values is not realistic. Yet consider the example when the finite population is all farms in Canada. The acreages (z) of these farms are often known while we might be interested in their total produce of corns (y). Create a design such that π i z i in this case is at least conceptually possible. Since z i is approximately proportional to y i, this design is likely efficient.

49 PPS based on auxiliary information Knowing all y i values is not realistic. Yet consider the example when the finite population is all farms in Canada. The acreages (z) of these farms are often known while we might be interested in their total produce of corns (y). Create a design such that π i z i in this case is at least conceptually possible. Since z i is approximately proportional to y i, this design is likely efficient.

50 PPS based on auxiliary information Knowing all y i values is not realistic. Yet consider the example when the finite population is all farms in Canada. The acreages (z) of these farms are often known while we might be interested in their total produce of corns (y). Create a design such that π i z i in this case is at least conceptually possible. Since z i is approximately proportional to y i, this design is likely efficient.

51 PPS based on auxiliary information Knowing all y i values is not realistic. Yet consider the example when the finite population is all farms in Canada. The acreages (z) of these farms are often known while we might be interested in their total produce of corns (y). Create a design such that π i z i in this case is at least conceptually possible. Since z i is approximately proportional to y i, this design is likely efficient.

52 PPS based on auxiliary information Knowing all y i values is not realistic. Yet consider the example when the finite population is all farms in Canada. The acreages (z) of these farms are often known while we might be interested in their total produce of corns (y). Create a design such that π i z i in this case is at least conceptually possible. Since z i is approximately proportional to y i, this design is likely efficient.

53 PPS design Suppose a size information z i is known for all units in the finite population. A probability sampling design with the property of π i z i is called a PPS design. A PPS design is often referred as an optimal design. I do not like an unqualified claim of being optimal.

54 PPS design Suppose a size information z i is known for all units in the finite population. A probability sampling design with the property of π i z i is called a PPS design. A PPS design is often referred as an optimal design. I do not like an unqualified claim of being optimal.

55 PPS design: final hurdle Let us now try to get design an optimal PPS plan such that π i z i. How do we do it? Suppose z i = i for i = 1, 2,..., 9 and n = 6.

56 Some math behind π i Recall the definition of δ i and that N δ i = n. i=1 Taking expectation on both sizes, we get π i = n. i Requiring π i i together with n = 6, we end up π 9 = 6/5.

57 Conclusion from the previous calculation: Because the largest possible inclusion probability is 1, PPS plan is not always feasible. In general, it is not simple to design a sampling plan such that π i z i for pre-specified z i.

58 One more slide Much of the discussion about unequal probability sampling plan are conceptual. Try to understand the concepts, ignore more or less these formulas.

59 Implementation of a probability sampling plan Most toy examples of probability sampling plan can be easily implemented using cards, dices and so on. SRSWOR can be easily implemented: make N cards representing N units in a finite population. shuffle them thoroughly. take the units represented by the first n cards.

60 Implementation of a probability sampling plan Most toy examples of probability sampling plan can be easily implemented using cards, dices and so on. SRSWOR can be easily implemented: make N cards representing N units in a finite population. shuffle them thoroughly. take the units represented by the first n cards.

61 SRSWOR If N = 10, 000, making 10,000 cards is not practical. However, there are computer software which can generate pseudo random numbers for a large enough N. To our naked eyes, these outcomes are random enough. Even if you do not trust computer software, there are ways to generate very authentic random numbers (such as in lottery).

62 Systematic plan This one is the easiest in any realistic applications. Suppose we wish to sample every 100th unit in a population. Create 100 cards, shuffle them thoroughly and pick one. Using this number as the first unit to be sampled.

63 Stratified SRSWOR The level of difficulty to implement a stratified SRSWOR is the same as the SRSWOR. We simply carry out SRSWOR stratum by stratum. Stratum sample sizes are design issues, not implementation issues.

64 Poisson Sampling Plan It is generally too complex to create a sampling plan with pre-specified inclusion probabilities (π i ). The problem is even harder if one wants specific pairwise joint inclusion probabilities (π ij ). We usually create a plan use some common sense, and end up with whatever π i and π ij. Poisson sampling plan is one which allows us to control π i in a compromised way.

65 Poisson Sampling Plan Suppose we have pre-specified inclusion probabilities (π i ) for every sampling unit in the finite population. We then toss N coins such as the ith coin has probability π i to show its face. We then sample all units with their corresponding coins showed faces. Physically making and tossing N coins are not sensible. Yet we can cheat with a computer software.

66 Poisson Sampling Plan Suppose we have pre-specified inclusion probabilities (π i ) for every sampling unit in the finite population. We then toss N coins such as the ith coin has probability π i to show its face. We then sample all units with their corresponding coins showed faces. Physically making and tossing N coins are not sensible. Yet we can cheat with a computer software.

67 Undesirable properties of the Poisson Sampling Plan This plan does not have a pre-specified sample size.

68 Design with Arbitrary pre-specified unequal probability plan There are many mathematically elegant solutions; None of them are simple enough to be discussed here; In applications, we do not really like them even if implementable.

69 Rao-Hartley-Cochran design Rao-Hartley-Cochran design is the only that is actually used in applications with nice statistical efficiency. It does not achieve pre-specified inclusion probabilities. Yet it achieves high efficiency in an elegant way.

70 Rao-Hartley-Cochran design Suppose we have a surrogate (auxiliary) size variable z i > 0 for all units in the finite population. It is desirable to have inclusion probability positively correlated to z i. Let the population size be N and the pre-specified sample size is n.

71 Rao-Hartley-Cochran design Suppose we have a surrogate (auxiliary) size variable z i > 0 for all units in the finite population. It is desirable to have inclusion probability π i positively correlated to z i. Let the population size be N and the pre-specified sample size is n.

72 Rao-Hartley-Cochran design We divide the sampling units in the finite population into n groups of pre-specified sizes N 1, N 2,..., N n evenly. Let Z g = i g z i for g = 1, 2,..., n. Here i g means all units in the gth group. Select one unit from the gth group with probability p j = z j /Z g. With one unit from each of n group, we get n units in the sample. Note that the sample size is not random.

73 Rao-Hartley-Cochran design We divide the sampling units in the finite population into n groups of pre-specified sizes N 1, N 2,..., N n evenly. Let Z g = i g z i for g = 1, 2,..., n. Here i g means all units in the gth group. Select one unit from the gth group with probability p j = z j /Z g. With one unit from each of n group, we get n units in the sample. Note that the sample size is not random.

74 Inclusion probability of the RHC-design If these n groups are not formed randomly, we would have π j = z j /Z g. Given the outcome of the random grouping, the conditional inclusion probability is p j. Unconditionally, we have π j = E(p j ), yet there is no simple algebraic expression for this inclusion probability.

75 Estimating Y under the RHC-design In the spirit of HT estimator, RHC recommend estimating the population total by n Ŷ RHC = y g /p g. g=1 When N g = N/n is an integer, its variance is given by Var(Ŷ RHC ) = and it is well estimated by v(ŷ RHC ) = N n (N 1)n N n n(n 1) n g=1 N i=1 z i ( yi z i Y ) 2 ( ) 2 yg Z g Ŷ RHC. z g I will provide no math here. Numerical illustration will be given when time permits.

76 Estimating Y under the RHC-design In the spirit of HT estimator, RHC recommend estimating the population total by n Ŷ RHC = y g /p g. g=1 When N g = N/n is an integer, its variance is given by Var(Ŷ RHC ) = and it is well estimated by v(ŷ RHC ) = N n (N 1)n N n n(n 1) n g=1 N i=1 z i ( yi z i Y ) 2 ( ) 2 yg Z g Ŷ RHC. z g I will provide no math here. Numerical illustration will be given when time permits.

77 Complex designs While very complex designs are used in real world, they are usually assembled by using simple designs. The population may be first divided into strata, each strata is made of clusters of unequal size. A cluster itself may have some structure. A complex design may decide to use a stratified plan on the top, a systematic plan for clusters, an unequal probability plan within each clusters selected. Such designs are called multi-stage designs.

78 Complex designs No matter how complex a design might be, they are made of simple ones. No matter how complex a building may appear, it materially made of bricks, glasses, steels; and structurally made of simple triangles, rectangles or at most some curves.

79 Complex designs No matter how complex a design might be, they are made of simple ones. No matter how complex a building may appear, it materially made of bricks, glasses, steels; and structurally made of simple triangles, rectangles or at most some curves.

80 What notions you should retain? Be able to give a clear description of how to implement SRSWOR Poisson plan, Systematic plan, RHC plan, Stratified x-plan.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability