Cluster Sampling 2. Chapter Introduction

Size: px
Start display at page:

Download "Cluster Sampling 2. Chapter Introduction"

Transcription

1 Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the second stage sampling. To formally state this, two-stage sampling can be described as follows: 1. Stage 1: Draw A I U I via p I. Stage : For every i A I, draw A i U i via p i A I Sample of elements is now given by A = i AI A i. In two-stage sampling, we have two simplifying assumptions about the second stage sampling design p i A I : 1. Invariance of the second-stage design p i A I = p i for every i U I and for every A I such that i A I. Independence of the second-stage design Pr i AI A i A I = Pr A i A I 1

2 CHAPTER 7. CLUSTER SAMPLING Under independence, we have Prk A i,l A j = Prk A i Prl A j, i j. If the invariance assumption does not hold, the sampling design is called two-phase sampling design. Two-phase sampling design will be covered in Chapter 1. In two-stage sampling, we use to denote the cluster sample size in the firststage sampling and use m i = A i to denote the sample size in cluster i. The number of sampled elements is equal to i AI m i = A. The first-order inclusion probability of element k in cluster i is a product of the cluster-level inclusion probability and the conditional inclusion probability given the cluster: π ik = Pr {ik A} = Pr k A i i A I Pr i A I = π k i, where is the cluster-level inclusion probability and π k i = Pr [k A i i A I ] is the element level conditional inclusion probability. In general, π k i is a random variable in the sense that it is a function of A I. Under invariance, it is fixed. as The second-order inclusion probability between two elements can be expressed π k i π ik, jl = π kl i j π k i π l j if i = j and k = l if i = j and k l if i j where j is the cluster level joint inclusion probability and π kl i = Pr [k,l A i i A I ]. 7. Estimation In two-stage cluster sampling, we do not observe Y i. Instead, we obtain Ŷ i from the second stage sampling such that EŶ i A I = Y i, where the conditional expectation is taken with respect to the second-stage sampling. For simplicity, we use E Ŷ i = EŶ i A I. The HT estimation for Y = i UI Y i = i UI k Ui y ik is given by Ŷ HT = Ŷ i = k A i y ik π k i. 7.1

3 7.. ESTIMATION 3 The HT estimator in 7.1 is unbiased and its variance can be computed by V Ŷ HT = V { E ŶHT A I } + E { V ŶHT A I } 7. The first term is the variance due to the first-stage sampling sampling of PSUs and the second term is the variance due to the second-stage sampling sampling of SSUs. Thus, we can write V Ŷ HT = VPSU +V SSU 7.3 where V PSU = V { Y i } = j π I j Y i Y j i U I j U I { } 1 V SSU = E πii V i V i =, i U I π I j and V i = V Ŷ i A i = π kl i π k i π l i y ik y il. k U i l U i π k i π l i Example 7.1. Consider the following two-stage sampling design. 1. Stage One: Select sample clusters from population clusters by simple random cluster sampling.. Stage Two: Within sampled cluster i, select m i sample elements from M i population elements independently. Under this two-stage sampling, we have Ŷ HT = N n Ŷ i = j A i M i m i y i j

4 4 CHAPTER 7. CLUSTER SAMPLING and its variance is where V Ŷ HT = N I 1 S I + NI NI Mi 1 m i S i 7.4 M i m i SI = 1 1 Y i Ȳ N Si = M i 1 1 M i y i j Ȳ i. Now, consider estimation of population mean Ȳ = N 1 M i j=1 y i j where N = M i is assumed to be known. In this case, where M = NI 1 ˆȲ HT = ŶHT N V { ˆȲ HT } = 1 1 j=1 = N Ŷ i = 1 Ŷ i M M i. Its variance is, using 7.4, Sq1 + 1 M Mi m i 1 m i Si M i where Sq1 = 1 1 q i q 1 with q i = Y i / M, q 1 = NI 1 q i, and Si = M i 1 1 k Ui y ik Ȳ i. If the sampling rate for the second stage sampling is constant such that m i /M i = f, then we can write V { ˆȲ HT } = 1 1 f 1 S q1 + 1 m 1 f 1 N where f 1 = / and m = NI 1 m i. = 1 1 f 1 B + 1 m 1 f W M i S i Example 7.. We now consider a special case of Example 7.1 where M i = M and m i = m. In this case, 7.4 is further simplified V N Ŷ HT = I 1 n I M SSB 1 + = n 1 n I N MSb + I NI 1 m M 1 m M M M mm 1 SSW m S w 7.5

5 7.. ESTIMATION 5 For the case of mean estimation, we can simply divide 7.5 by N = N I M to get V ˆȲ = 1 S b M + 1 m M S w m 7.6 Note that the sample size associated with the first term V PSU term is M while the sample size associated with the second term V SSU term is nm. Now, we can express the variance term in 7.6 in terms of intracluster correlation coefficient. Using Table 6.1 and the property of ρ, given by 6.4, we have and S b 1 1 SSB = 1 1 M 1 [1 + ρ M 1]SST Sw NI 1 M 1 1 SSW = NI 1 M 1 1 ρsst Thus, the variance term in 7.6 reduces to, ignoring / term,.= 1 V ˆȲ M [1 + M 1ρ]S + 1 m 1 1 ρs M m = S {1 + m 1ρ}. 7.7 m Thus, the design effect becomes 1 + m 1ρ. In this case of M i = M and m i = m, the problem of finding the optimal choice of m given the cost function C = c 0 + c 1 + c m can be formulated as minimizing subject to V ˆȲ HT When the total cost C is fixed, we have = S b M + 1 m M S w m = 1 { } 1 S M b Sw 1 + m S w C = c 0 + c 1 + c m. n = C c 0 c 1 + c m

6 6 CHAPTER 7. CLUSTER SAMPLING and the optimal choice is given by m c 1 M Sw = c Sb. 7.8 S w The optimal solution 7.8 is obtained by applying a m + bm ab with equality if and only if m = a/b. That is, since { } V ˆȲ 1 HT C c 0 = S M b Sw 1 + m S w c 1 + c m = const. + c 1 m S w + c S M b Sw m, the lower bound is achieved when m = { c 1 Sw } 1/ { c S M b Sw } 1/ which equals to 7.8. For sufficiently large M, the optimal solution becomes m c 1 = 1 c ρ 1. More generally, the objective function can be written as V ˆȲ HT = 1 B + 1 m 1 f W. In this case, the optimal solution becomes m c 1 W = c B W /M. 7.9 We now discuss variance estimation under two-stage cluster sampling. Theorem 7.1. An unbiased estimator for the variance of HT estimator in 7.1 under two-stage sampling design is ˆV j π I j Ŷ HT = j A I j Ŷ i where ˆV i satisfies E ˆV i i A I = VarŶ i i A I. Ŷ j 1 + π I j ˆV i 7.10

7 7.. ESTIMATION 7 Proof. By 7.3, N j=1 V Ŷ HT = By the independence assumption, j π I j Y i Y j π I j + N V i { Y E Ŷi Ŷ j = i +V i Y i Y j if i = j if i j where E denotes the expectation with respect to the second-stage sampling. Thus, j A I j π I j E Ŷi Ŷ j j and, since E ˆV i = Vi, we have E { ˆV Ŷ HT } = π I j = j π I j j A I j + V i πii j π I j j A I j Y i Y j V i + π I j πii. Y i Y j π I j Taking the expectation of the above term with respect to the first-stage sampling design, it equal to the variance term in The variance estimation formula in 7.10 is the sum of two terms. The first term is the variance estimation formula for the first-stage sampling applied to Ŷ i and the second term is the point estimator for the first-stage sampling applied to ˆV i. The validity of the variance estimation formula 7.10 further holds even when Ŷ i and ˆV i are obtained from multi-stage sampling. That is, as long as EŶ i A I = Y i and E ˆV i A I = V Ŷ i A I hold, the variance estimation formula in 7.10 remain unbiased. Such phenomenon was first discovered by Raj If we use only the first term of 7.10 j π I j ˆV 1 = j A I j Ŷ i Ŷ j π I j,

8 8 CHAPTER 7. CLUSTER SAMPLING to estimate the total variance, the bias can be written Bias ˆV 1 = V i. 7.1 and the bias term is of order O. Since Var Ŷ HT is of order O n 1 I NI, the bias term is negligible when / = o1. Under the setup of Example 7.1 where M i = M, m i = m, the variance estimation formula in 7.10 reduces to ˆV Ŷ HT = N I 1 n I 1 1 Ŷ i 1 Ŷ i + j A I ˆV i 7.13 where Ŷ i = Mȳ i and ˆV i = M m 1 m 1 M m 1 y i j ȳ i. j A i In the case of mean estimation, we can divide 7.10 by N = N I M to get where ˆV ˆȲ = 1 s b + 1 m M s w m s b = 1 1 ȳ i ˆȲ s w = n 1 I m 1 1 j A i y i j ȳ i If f 1 = / is negligible, then we can use ˆV ˆȲ = s b as a variance estimator for the mean estimator under simple random sampling. When the cluster sizes are unequal, the simple random sampling in the firststage sampling is not preferable. The following example is very popular method of two-stage sampling under the case of unequal cluster sizes. Example 7.3. Consider the following two-stage sampling design.

9 7.. ESTIMATION 9 1. Stage One: Select clusters of size by PPS sampling with size measure M i.. Stage Two: Select elements by SRS of size m from M i elements in the sample cluster i. We first consider estimation of population total Y = M i j=1 y i j. Under single-stage cluster sampling, we would have observed Y i = M i j=1 y i j. In this case, an unbiased estimator of Y is given by Ŷ PPS = N k=1 Y ak M ak 7.15 where a k is the index of population cluster in the k-th draw of the PPS sampling. In the two-stage sampling, we do not observe Y i but we obtain Ŷ i = M i ȳ i, where ȳ i is the sample mean of elements in cluster i. Thus, we can use Ŷ PPS = N Ŷ ak k=1 M ak = N ȳ ak 7.16 k=1 to estimate the total Y. Assuming that there is no duplication of the selected clusters, the sampling weights are all equal to N/ m, which implies that every element in the population has the same probability of selection. The sampling design that leads to equal sampling weights is called self-weighting design. For estimation of the population mean Ȳ = Y /N, we have ˆȲ PPS = 1 which takes the sample mean of the cluster means. ȳ ak 7.17 k=1 To discuss variance estimation, note that the point estimator 7.16 can be written as the sample mean of z 1,,z ni distributed with the following discrete distribution: where z k are independently and identically z 1 = Ŷ i /p i with probability p i = M i /N, i = 1,,. Note that Ez 1 = Ŷi, which is unbiased for Y = Y i as E Ŷ i = Y i. For variance estimation, since Ŷ PPS in 7.16 can be written as the sample mean of

10 10 CHAPTER 7. CLUSTER SAMPLING independent z k, we have ˆV PPS ŶPPS = 1 S z = k=1 z k z Variance estimation of the mean estimator 7.17 can be similarly constructed. Specifically, we can use ˆV PPS ˆȲ PPS = k=1 ȳ ak ˆȲ PPS 7.19 as an unbiased estimator for the variance of the mean estimator To illustrate the use of two-stage sampling in Example 7.3, consider a finite population of households in a city. The city consists of clusters of houses and cluster i consists of M i houses. We use the following two-stage cluster sampling. [Stage 1] Select = 3 sample clusters by the PPS sampling with the measure of size equal to M i. [Stage ] Within each selected cluster i, select m i = 4 sample houses by the simple random sampling. Once the sample households are selected, we obtain two information. One is the number of household members in the house t i j and the other is the number of household members with age under 6 y i j. We are interested in estimating the proportion of the population with age under 6 in the city. That is, the parameter of interest is P = M i j=1 y i j N i M i t := Y i j T. The following table gives the summary of the realized sample household from the above two-stage sampling.

11 7.. ESTIMATION 11 Sample Cluster ID Sample household ID t i j y i j The proportion of the population with age under 6 in the city is estimated by ˆP = Ŷ ˆT = Nn 1 I Nn 1 I k=1 ȳk k=1 t k = 6/4 + 5/4 + 8/4 8/4 + 41/4 + 0/4. = 0.13 where the second equality follows from To estimate the variance of ˆP, we use ˆV ˆP = ni t i ȳ i ˆθ t i = The design effect can be computed by the ratio of ˆV ˆP under the current sampling design to the variance of ˆP under simple random sampling, which is computed by ˆV SRS ˆp = 1 n ˆp1 ˆp = ni m i j=1 t i j 1 ˆp1 ˆp = Thus, the estimated design effect is / =.8105.

12 1 CHAPTER 7. CLUSTER SAMPLING Reference Raj, D Some remarks on a simple procedure of sampling without replacement, Journal of the American Statistical Association 61,

Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu 1 Chapter 5 Cluster Sampling with Equal Probability Example: Sampling students in high school. Take a random sample of n classes (The classes

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

STA304H1F/1003HF Summer 2015: Lecture 11

STA304H1F/1003HF Summer 2015: Lecture 11 STA304H1F/1003HF Summer 2015: Lecture 11 You should know... What is one-stage vs two-stage cluster sampling? What are primary and secondary sampling units? What are the two types of estimation in cluster

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

A two-phase sampling scheme and πps designs

A two-phase sampling scheme and πps designs p. 1/2 A two-phase sampling scheme and πps designs Thomas Laitila 1 and Jens Olofsson 2 thomas.laitila@esi.oru.se and jens.olofsson@oru.se 1 Statistics Sweden and Örebro University, Sweden 2 Örebro University,

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

Chapter 6: Sampling with Unequal Probability

Chapter 6: Sampling with Unequal Probability Chapter 6: Sampling with Unequal Probability 1 Introduction This chapter is the concluding chapter to an introduction to Survey Sampling and Design. Topics beyond this one are, for the most part standalone.

More information

Main sampling techniques

Main sampling techniques Main sampling techniques ELSTAT Training Course January 23-24 2017 Martin Chevalier Department of Statistical Methods Insee 1 / 187 Main sampling techniques Outline Sampling theory Simple random sampling

More information

EC969: Introduction to Survey Methodology

EC969: Introduction to Survey Methodology EC969: Introduction to Survey Methodology Peter Lynn Tues 1 st : Sample Design Wed nd : Non-response & attrition Tues 8 th : Weighting Focus on implications for analysis What is Sampling? Identify the

More information

SAMPLING II BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :37. BIOS Sampling II

SAMPLING II BIOS 662. Michael G. Hudgens, Ph.D.  mhudgens :37. BIOS Sampling II SAMPLING II BIOS 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-11-17 14:37 BIOS 662 1 Sampling II Outline Stratified sampling Introduction Notation and Estimands

More information

SAMPLING BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :55. BIOS Sampling

SAMPLING BIOS 662. Michael G. Hudgens, Ph.D.   mhudgens :55. BIOS Sampling SAMPLIG BIOS 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-11-14 15:55 BIOS 662 1 Sampling Outline Preliminaries Simple random sampling Population mean Population

More information

The ESS Sample Design Data File (SDDF)

The ESS Sample Design Data File (SDDF) The ESS Sample Design Data File (SDDF) Documentation Version 1.0 Matthias Ganninger Tel: +49 (0)621 1246 282 E-Mail: matthias.ganninger@gesis.org April 8, 2008 Summary: This document reports on the creation

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

SAMPLING III BIOS 662

SAMPLING III BIOS 662 SAMPLIG III BIOS 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2009-08-11 09:52 BIOS 662 1 Sampling III Outline One-stage cluster sampling Systematic sampling Multi-stage

More information

Estimation under cross classified sampling with application to a childhood survey

Estimation under cross classified sampling with application to a childhood survey TSE 659 April 2016 Estimation under cross classified sampling with application to a childhood survey Hélène Juillard, Guillaume Chauvet and Anne Ruiz Gazen Estimation under cross-classified sampling with

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator Linearization Method 141 properties that cover the most common types of complex sampling designs nonlinear estimators Approximative variance estimators can be used for variance estimation of a nonlinear

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage

More information

Cross-sectional variance estimation for the French Labour Force Survey

Cross-sectional variance estimation for the French Labour Force Survey Survey Research Methods (007 Vol., o., pp. 75-83 ISS 864-336 http://www.surveymethods.org c European Survey Research Association Cross-sectional variance estimation for the French Labour Force Survey Pascal

More information

Estimation under cross-classified sampling with application to a childhood survey

Estimation under cross-classified sampling with application to a childhood survey Estimation under cross-classified sampling with application to a childhood survey arxiv:1511.00507v1 [math.st] 2 Nov 2015 Hélène Juillard Guillaume Chauvet Anne Ruiz-Gazen January 11, 2018 Abstract The

More information

Simulation of stationary processes. Timo Tiihonen

Simulation of stationary processes. Timo Tiihonen Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation Simulation has always a goal. How to organize simulation to reach the goal sufficiently well and without unneccessary

More information

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - Edgar Brunner and Marius Placzek University of Göttingen, Germany 3. August 0 . Statistical Model and Hypotheses Throughout

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Advanced Survey Sampling

Advanced Survey Sampling Lecture materials Advanced Survey Sampling Statistical methods for sample surveys Imbi Traat niversity of Tartu 2007 Statistical methods for sample surveys Lecture 1, Imbi Traat 2 1 Introduction Sample

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012 Formalizing the Concepts: Simple Random Sampling Juan Muñoz Kristen Himelein March 2012 Purpose of sampling To study a portion of the population through observations at the level of the units selected,

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11 Econometrics A Keio University, Faculty of Economics Simple linear model (2) Simon Clinet (Keio University) Econometrics A October 16, 2018 1 / 11 Estimation of the noise variance σ 2 In practice σ 2 too

More information

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29 REPEATED MEASURES Copyright c 2012 (Iowa State University) Statistics 511 1 / 29 Repeated Measures Example In an exercise therapy study, subjects were assigned to one of three weightlifting programs i=1:

More information

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2.

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2. . The New sampling Procedure for Unequal Probability Sampling of Sample Size. Introduction :- It is a well known fact that in simple random sampling, the probability selecting the unit at any given draw

More information

Partial systematic adaptive cluster sampling

Partial systematic adaptive cluster sampling University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln USGS Staff -- Published Research US Geological Survey 2012 Partial systematic adaptive cluster sampling Arthur L. Dryver

More information

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013 Formalizing the Concepts: Simple Random Sampling Juan Muñoz Kristen Himelein March 2013 Purpose of sampling To study a portion of the population through observations at the level of the units selected,

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Paired comparisons. We assume that

Paired comparisons. We assume that To compare to methods, A and B, one can collect a sample of n pairs of observations. Pair i provides two measurements, Y Ai and Y Bi, one for each method: If we want to compare a reaction of patients to

More information

ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles).

ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles). ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Set up standard example and notation. Review performance measures (means, probabilities and quantiles). A framework for conducting simulation experiments

More information

Review of probability and statistics 1 / 31

Review of probability and statistics 1 / 31 Review of probability and statistics 1 / 31 2 / 31 Why? This chapter follows Stock and Watson (all graphs are from Stock and Watson). You may as well refer to the appendix in Wooldridge or any other introduction

More information

Variance reduction. Timo Tiihonen

Variance reduction. Timo Tiihonen Variance reduction Timo Tiihonen 2014 Variance reduction techniques The most efficient way to improve the accuracy and confidence of simulation is to try to reduce the variance of simulation results. We

More information

Sampling and Estimation in Network Graphs

Sampling and Estimation in Network Graphs Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ March

More information

Variance Estimation for Calibration to Estimated Control Totals

Variance Estimation for Calibration to Estimated Control Totals Variance Estimation for Calibration to Estimated Control Totals Siyu Qing Coauthor with Michael D. Larsen Associate Professor of Statistics Tuesday, 11/05/2013 2 Outline A. Background B. Calibration Technique

More information

C. J. Skinner Cross-classified sampling: some estimation theory

C. J. Skinner Cross-classified sampling: some estimation theory C. J. Skinner Cross-classified sampling: some estimation theory Article (Accepted version) (Refereed) Original citation: Skinner, C. J. (205) Cross-classified sampling: some estimation theory. Statistics

More information

Estimation of Parameters and Variance

Estimation of Parameters and Variance Estimation of Parameters and Variance Dr. A.C. Kulshreshtha U.N. Statistical Institute for Asia and the Pacific (SIAP) Second RAP Regional Workshop on Building Training Resources for Improving Agricultural

More information

Two-Way Analysis of Variance - no interaction

Two-Way Analysis of Variance - no interaction 1 Two-Way Analysis of Variance - no interaction Example: Tests were conducted to assess the effects of two factors, engine type, and propellant type, on propellant burn rate in fired missiles. Three engine

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Chapter 2: Simple Random Sampling and a Brief Review of Probability

Chapter 2: Simple Random Sampling and a Brief Review of Probability Chapter 2: Simple Random Sampling and a Brief Review of Probability Forest Before the Trees Chapters 2-6 primarily investigate survey analysis. We begin with the basic analyses: Those that differ according

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

You are allowed 3? sheets of notes and a calculator.

You are allowed 3? sheets of notes and a calculator. Exam 1 is Wed Sept You are allowed 3? sheets of notes and a calculator The exam covers survey sampling umbers refer to types of problems on exam A population is the entire set of (potential) measurements

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Indirect Sampling in Case of Asymmetrical Link Structures

Indirect Sampling in Case of Asymmetrical Link Structures Indirect Sampling in Case of Asymmetrical Link Structures Torsten Harms Abstract Estimation in case of indirect sampling as introduced by Huang (1984) and Ernst (1989) and developed by Lavalle (1995) Deville

More information

Deriving indicators from representative samples for the ESF

Deriving indicators from representative samples for the ESF Deriving indicators from representative samples for the ESF Brussels, June 17, 2014 Ralf Münnich and Stefan Zins Lisa Borsi and Jan-Philipp Kolb GESIS Mannheim and University of Trier Outline 1 Choosing

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Adaptive two-stage sequential double sampling

Adaptive two-stage sequential double sampling Adaptive two-stage sequential double sampling Bardia Panahbehagh Afshin Parvardeh Babak Mohammadi March 4, 208 arxiv:803.04484v [math.st] 2 Mar 208 Abstract In many surveys inexpensive auxiliary variables

More information

Reliability of inference (1 of 2 lectures)

Reliability of inference (1 of 2 lectures) Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of

More information

1 Outline. Introduction: Econometricians assume data is from a simple random survey. This is never the case.

1 Outline. Introduction: Econometricians assume data is from a simple random survey. This is never the case. 24. Strati cation c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 2002. They can be used as an adjunct to Chapter 24 (sections 24.2-24.4 only) of our subsequent book Microeconometrics:

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

7 Variance Reduction Techniques

7 Variance Reduction Techniques 7 Variance Reduction Techniques In a simulation study, we are interested in one or more performance measures for some stochastic model. For example, we want to determine the long-run average waiting time,

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Special Topic: Bayesian Finite Population Survey Sampling

Special Topic: Bayesian Finite Population Survey Sampling Special Topic: Bayesian Finite Population Survey Sampling Sudipto Banerjee Division of Biostatistics School of Public Health University of Minnesota April 2, 2008 1 Special Topic Overview Scientific survey

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Experimental variogram of the residuals in the universal kriging (UK) model

Experimental variogram of the residuals in the universal kriging (UK) model Experimental variogram of the residuals in the universal kriging (UK) model Nicolas DESASSIS Didier RENARD Technical report R141210NDES MINES ParisTech Centre de Géosciences Equipe Géostatistique 35, rue

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Date

Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Date Errata for ASM Exam MAS-I Study Manual (First Edition) Sorted by Date 1 Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Date Practice Exam 5 Question 6 is defective. See the correction

More information

Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Page

Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Page Errata for ASM Exam MAS-I Study Manual (First Edition) Sorted by Page 1 Errata and Updates for ASM Exam MAS-I (First Edition) Sorted by Page Practice Exam 5 Question 6 is defective. See the correction

More information

From the help desk: It s all about the sampling

From the help desk: It s all about the sampling The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.

More information

Figure Figure

Figure Figure Figure 4-12. Equal probability of selection with simple random sampling of equal-sized clusters at first stage and simple random sampling of equal number at second stage. The next sampling approach, shown

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Coupled-Cluster Theory. Nuclear Structure

Coupled-Cluster Theory. Nuclear Structure Coupled-Cluster Theory! for Nuclear Structure!!!! Sven Binder INSTITUT FÜR KERNPHYSIK! 1 Nuclear Interactions from Chiral EFT NN 3N 4N NLO LO N 2 LO +... N 3 LO +... +... +... 2 Nuclear Interactions from

More information

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham Last name (family name): First name (given name):

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information