A Bound for the Relative Bias of the Design Effect

Similar documents
Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Sampling Theory MODULE V LECTURE - 17 RATIO AND PRODUCT METHODS OF ESTIMATION

Chapter 3 Describing Data Using Numerical Measures

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

REPLICATION VARIANCE ESTIMATION UNDER TWO-PHASE SAMPLING IN THE PRESENCE OF NON-RESPONSE

Statistics for Business and Economics

x = , so that calculated

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

A FAMILY OF ESTIMATORS FOR ESTIMATING POPULATION MEAN IN STRATIFIED SAMPLING UNDER NON-RESPONSE

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

β0 + β1xi and want to estimate the unknown

Improvement in Estimating the Population Mean Using Exponential Estimator in Simple Random Sampling

Properties of Least Squares

Estimation: Part 2. Chapter GREG estimation

Exponential Type Product Estimator for Finite Population Mean with Information on Auxiliary Attribute

Chapter 11: Simple Linear Regression and Correlation

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Statistics for Economics & Business

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

On The Estimation of Population Mean in Current Occasion in Two- Occasion Rotation Patterns

Product and Exponential Product Estimators in Adaptive Cluster Sampling under Different Population Situations

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Chapter 9: Statistical Inference and the Relationship between Two Variables

Modeling and Simulation NETW 707

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Statistical Evaluation of WATFLOOD

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Uncertainty as the Overlap of Alternate Conditional Distributions

Improved Class of Ratio Estimators for Finite Population Variance

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Lecture 4 Hypothesis Testing

A General Class of Selection Procedures and Modified Murthy Estimator

Topic- 11 The Analysis of Variance

e i is a random error

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

β0 + β1xi. You are interested in estimating the unknown parameters β

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Cathy Walker March 5, 2010

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Comparison of Regression Lines

T E C O L O T E R E S E A R C H, I N C.

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

Bias-correction under a semi-parametric model for small area estimation

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

A Design Effect Measure for Calibration Weighting in Cluster Samples

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Chapter 13: Multiple Regression

Confidence Intervals for the Overall Effect Size in Random-Effects Meta-Analysis

Negative Binomial Regression

Statistics II Final Exam 26/6/18

Test for Intraclass Correlation Coefficient under Unequal Family Sizes

Chapter 5 Multilevel Models

A Comparative Study for Estimation Parameters in Panel Data Model

On New Selection Procedures for Unequal Probability Sampling

STAT 3008 Applied Regression Analysis

A Monte Carlo Study for Swamy s Estimate of Random Coefficient Panel Data Model

Finding Dense Subgraphs in G(n, 1/2)

Chapter 14 Simple Linear Regression

A Robust Method for Calculating the Correlation Coefficient

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals

Economics 130. Lecture 4 Simple Linear Regression Continued

A general class of estimators for the population mean using multi-phase sampling with the non-respondents

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Multivariate Ratio Estimation With Known Population Proportion Of Two Auxiliary Characters For Finite Population

a. (All your answers should be in the letter!

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Basic Business Statistics, 10/e

Chapter 4: Regression With One Regressor

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

β0 + β1xi. You are interested in estimating the unknown parameters β

Topic 23 - Randomized Complete Block Designs (RCBD)

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

SIMPLE LINEAR REGRESSION

Chapter 12 Analysis of Covariance

Small Area Estimation for Business Surveys

Linear Approximation with Regularization and Moving Least Squares

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

STATISTICS QUESTIONS. Step by Step Solutions.

The Geometry of Logit and Probit

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

On Distribution Function Estimation Using Double Ranked Set Samples With Application

Global Sensitivity. Tuesday 20 th February, 2018

Introduction to Regression

COORDINATION OF PPS SAMPLES OVER TIME

Transcription:

A Bound for the Relatve Bas of the Desgn Effect Alberto Padlla Banco de Méxco Abstract Desgn effects are typcally used to compute sample szes or standard errors from complex surveys. In ths paper, we show that the desgn effect estmator s based and an upper bound for the relatve bas s presented. A smulaton study was conducted to assess the sze of the bound for the relatve bas wth samples drawn from two artfcally generated populatons usng stratfed and two-stage random samplng. Key Words: Varance of varances, Confdence nterval, Coeffcent of varaton 1. Introducton The desgn effect, deff, Ksh (1965), s defned as the rato of the varance of an estmator under an specfc desgn to the varance of the estmator under smple random samplng wthout replacement, wor. The estmator of the desgn effect s used for example n the computaton of the sample sze for complex sample desgns and to buld confdence ntervals. In ths artcle we wll exhbt the bas of the desgn effect and a bound for the relatve bas of the standard error of the desgn effect..1 Defnton. Desgn Effect Estmaton and Bas It s worth mentonng that all results are based on the desgn based approach for the plannng stage of a survey. The desgn effect, deff, Ksh (1965), s defned as the rato of the varance of an estmator under an specfc desgn dfferent from smple random samplng, v ), to the varance of the estmator under smple random samplng wthout replacement, wor : deff yˆ) v ) v ) ( wor wor The desgn effect estmator, deff, Ksh (1965), s computed by pluggng n the formula showed n the prevous slde, estmators of the varances n both the numerator and denomnator. In partcular, the varance estmator under wor s generally obtaned by usng formula: n n 1 ( ˆ) 1 ˆ ˆ 1 y y vwor y N n n 1 In ths formula the sample s consdered as a wor and t does not guarantee an unbased estmaton of the populaton varance under wor. An example wth a small populaton ll shed lght on ths pont.

Remark: we wll use the term relatve bas to the upper bound of the rato of the bas to the standard error..1.1 Example 1. Desgn Effect Estmaton and Bas Stratum Table 1: Small Stratfed Populaton Values yh Populaton mean yh s hu 1 {, 3, 4.5,.5, 3.4} 3.08 0.91 {11, 14, 18} 14.33 1.33 Populaton 7.30 37.96 A smple random sample wthout replacement,, of sze was extracted from each statum. There are 30 possble samples under stratfed random samplng, strs, and 70 under. The populaton varances for the mean under both samplng desgns are, v wor ) 4.75 and v st ) 0. 395 respectvely. The desgn effect for ths populaton s:. deff ) v ) v ) 0.083 wor As t was mentoned above, Ksh (1965) defned the estmator of deff for each sample as: deffˆ ) v ) ), where =1, 30, and K strs st, based, wor based, ( yˆ ) (1 4/8) j ( y ˆ ) (4 1)4 1 hj y 30 In ths case, 1, ( yˆ ) 30 5.93 based, whch s dfferent from v ) 4.. 75 The average over all possble samples under stratfed random samplng s, 30 ˆ deff ) 30 0.067 deff ) 0.083 1 K Ths result s not surprsng snce the estmator defˆ f K s a rato estmator whch s known to be based, Cochran (1977). Ths result shows the need to modfy the estmator used n the denomnator of the formula proposed by Gambno (009) to obtan an unbased estmator: n 1 yˆ ( yˆ) unbased, ( yˆ) (1 ) sq ) N n ( N 1) N Where yˆ sq, yˆ and ( yˆ) are unbased estmators of the followng populaton quanttes: sum of squares, total squared and varance of the total under the desgn dfferent from wor. Herenafter, defˆ f G wll denote the deff estmator usng Gambno s correcton. Usng Gambno s correcton for the estmaton of defˆ, f G we obtan: 30 deffˆ ) 30 0.084 deff ) 0.083 1 G st 4 defˆ f K Ths estmator remans based, but the source of the bas stems from the use of a rato estmator only. wth

.1. Bounds for the Bas and Relatve Bas of the Deff Theorem: for a sample desgn,, dfferent from, wth varance estmators,, msˆ and ˆ 0we have: e mse mse E deff E ˆ ( ˆ) mseˆ v v bas bas Corollary 1: under a sample desgn dfferent from, wth unbased estmators of the populaton varances of and, and usng defˆ f, G the relatve bas s gven by: cov( def fˆ G, unbased ) v Remark: n ths expresson, v ˆ s computed wth Gambno s formula (009). unbased Corollary : a bound for the relatve bas of defˆ f, G s gven by cv( unbased ),.e., the coeffcent of varaton of the unbased estmators of the populaton varance under. Remark: we are workng wth expresson defˆ f G, whch s dfferent from defˆ f K. The latter expresson s the quantty routnely employed n practce..1.3 Example. Desgn Effect Estmaton and Bas stratfed random samplng Based on Cochran (1977) example, page 137, we smulated a small populaton wth 3 strata and 57 elements. Table : Smulated Stratfed Populaton Strata Nh nh Wh yh s hu 1 13 9 0..33 1.6 18 7 0.3 1.61 0.08 3 6 6 0.46 5.04 1.18 Populaton 57 3.44 cov( mseˆ, deffˆ) v bas msˆ e Populaton quantty v v strs deff K Value 0.096 0.035 0.364

The bound for the relatve bas, cv( unbased ), computed wth v ˆ unbased from the smulaton, was 18.4%. Remark: n table, the labels for columns to 6 refer to populaton sze, sample sze, relatve sze, stratum mean and element varance wthn strata. From the populaton defned n the prevous slde, we smulate the extracton of 5,000 samples of sze under strs, and for each sample we computed the followng estmators: Unbased estmator of the varance under strs, strs, Based estmator of the varance under, usng Ksh defnton, v ˆbased, Unbased estmator of the varance under usng Gambno correcton, v ˆ Deff estmator usng Ksh formula, defˆ f K, Deff estmator usng Gambno correcton, defˆ f G Results for 5,000 samples for each estmator: Estmator Value Bas (%) v ˆunbased 0.0964 --- v ˆbased 0.083-13.4% strs 0.035 --- defˆ f K 0.4318 18.7% defˆ 0.374.4% f G unbased The bound for the relatve bas, cv( unbased ), computed wth v ˆunbased from the smulaton, was 18.4%..1.4 Example 3. Desgn Effect Estmaton and Bas two-stage cluster samplng We have a small populaton wth 8 clusters or prmary samplng unts, PSU, and each PSU has 8 elements, SSU, secondary samplng unts. At frst stage we draw a=3 PSU and b=4 SSU, so the sample sze s n=ab=1 elements. The values wthn each cluster were smulated usng unform random varables. The mnmum and maxmum employed n the smulaton are shown n the next table. y s mn & max 1 0.33 0.0058 0. and 0.5 0.444 0.0009 0.4 and 0.5 3 1.37 0.0094 1.1 and 1.4 4 0.919 0.0037 0.8 and 1.0 5 0.3 0.0064 0.1 and 0.35 6 0.610 0.0030 0.5 and 0.7 PSU 7 0.970 0.0044 0.9 and 1.1 8 0.461 0.0077 0.3 and 0.6 Populaton 0.650 0.1166

The values n columns and 3 refer to the wthn-cluster mean and varance. Populaton quantty v v clus deff K Value 0.0079 0.01.6877 In ths table v clus s the populaton varance under two-stage random samplng. The ntraclass correlaton for ths populaton s 0.95 and was computed usng the result from Cochran (1977) page 91. We smulate the extracton of 3,500 samples of sze 1, wth a=3 PSU selected by and b=4 SSU selected by. For each sample we computed the followng estmators: Unbased estmator of the varance under clus, clus, Based estmator of the varance under, usng Ksh defnton, v ˆbased, Unbased estmator of the varance under usng Gambno correcton, v ˆ Deff estmator usng Ksh formula, defˆ f K, Deff estmator usng Gambno correcton, defˆ f G The results for the 5,000 samples for each estmator are shown above: Estmator Average of estmators Relatve bas v ˆunbased 0.0080 --- v ˆbased 0.0066-16.13% clus 0.069 --- defˆ f K 3.9037 45.4% defˆ 3.467 0.80% f G unbased The bound for the relatve bas, cv( unbased ), computed wth from the unbased smulaton, was 63.44%..1.5 Example 4. Desgn Effect Estmaton and Bas two-stage cluster samplng wth several roh values Usng the same populaton of example 3 and changng elements between clusters, we repeated the smulatons as n example 3 n order to obtan dfferent values of the bound for the relatve bas and to compute the formula used n practce deff [1+roh(b-1)] and compare t to the populaton deff. Wth ths change between clusters the populaton mean and varance between elements was unaffected, but roh and deff changed.

roh deff Bound Relatve bas deff [1+roh(b-1)] -0.14 0.70 0.5 0.58-0.05 0.9 0.34 0.85 0.01 1.07 0.34 1.03 0.14 1.38 0.36 1.4 0.6 1.66 0.37 1.78 0.38 1.97 0.43.15 0.50.5 0.45.50 0.63.50 0.54.88 0.76.87 0.61 3.7 0.86 3.1 0.6 3.57 0.96 3.35 0.64 3.87 In the table, [1+roh(b-1)] s a good aproxmaton to deff (populaton value) whenever (A-1)/A and (N-1)/N are equal to unty, see Ksh (1965), chapter 5. From ths table, t can be seen that [1+roh(b-1)] overestmates the populaton deff for roh values from 0.6 to 0.96 and when roh s negatve t underestmates t. 3. Conclusons An exact expresson for the bas and an upper bound to the rato of the bas of the desgn effect estmator to the standard error was gven. The upper bound for the bas s gven by the coeffcent of varaton of the unbased estmators of the varance under smple random samplng. Based on the smulatons and the extensve use of the desgn effect n practce, t s advsable to analyse the stablty of the varance estmator under smple random samplng, whenever possble. It s also advsable to work wth an unbased estmator of the varance of smple random samplng. Some more smulatons are needed to assess the usefulness of formula deff [1+roh(b-1)], t seems that t tends to over or underestmate the true deff value. References Cochran, W. G. (1977). Samplng Technques, 3rd ed. New York: Wley. Gambno, J.G. (009). Desgn effects caveat, The Amercan Statstcan, pp. 141-145. Ksh, L. (1965). Survey Samplng. New York: Wley.

Padlla, A.M., Una cota para el sesgo relatvo del efecto del dseño, Memoras electróncas en extenso de la 4ª Semana Internaconal de la Estadístca y la Probabldad. Julo 011, CD ISBN: 978-607-487-34-5. Rao, J.N.K. (196). On the estmaton of the relatve effcency of samplng procedures, Annals of the Insttute of Statstcal Mathematcs, pp. 143-150.