Two phase stratified sampling with ratio and regression methods of estimation

Similar documents
Chain ratio-to-regression estimators in two-phase sampling in the presence of non-response

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Expectation and Variance of a random variable

A General Family of Estimators for Estimating Population Variance Using Known Value of Some Population Parameter(s)

Varanasi , India. Corresponding author

Some Exponential Ratio-Product Type Estimators using information on Auxiliary Attributes under Second Order Approximation

Element sampling: Part 2

Topic 9: Sampling Distributions of Estimators

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Random Variables, Sampling and Estimation

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Topic 9: Sampling Distributions of Estimators

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Properties and Hypothesis Testing

Topic 9: Sampling Distributions of Estimators

AClassofRegressionEstimatorwithCumDualProductEstimatorAsIntercept

Estimation of the Population Mean in Presence of Non-Response

Objectives and Use of Stratification in Sample Design

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

11 Correlation and Regression

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

SYSTEMATIC SAMPLING FOR NON-LINEAR TREND IN MILK YIELD DATA

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 7: Properties of Random Samples

Module 1 Fundamentals in statistics

Estimation for Complete Data

Estimation of Population Ratio in Post-Stratified Sampling Using Variable Transformation

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Estimation of a population proportion March 23,

Jambulingam Subramani 1, Gnanasegaran Kumarapandiyan 2 and Saminathan Balamurali 3

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Estimating the Population Mean using Stratified Double Ranked Set Sample

5. Fractional Hot deck Imputation

On stratified randomized response sampling

Unbiased Estimation. February 7-12, 2008

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Statistical Properties of OLS estimators

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

1 Inferential Methods for Correlation and Regression Analysis

Lecture 33: Bootstrap

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

10-701/ Machine Learning Mid-term Exam Solution

Dual to Ratio Estimators for Mean Estimation in Successive Sampling using Auxiliary Information on Two Occasion

Simple Random Sampling!

Matrix Representation of Data in Experiment

Improved exponential estimator for population variance using two auxiliary variables

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

A Family of Unbiased Estimators of Population Mean Using an Auxiliary Variable

Random Models. Tusheng Zhang. February 14, 2013

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Use of Auxiliary Information for Estimating Population Mean in Systematic Sampling under Non- Response

New Ratio Estimators Using Correlation Coefficient

Improved Ratio Estimators of Population Mean In Adaptive Cluster Sampling

UNIVERSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. FOR MULTIVARIATE POPULATIONS. J. N. Srivastava.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

MA Advanced Econometrics: Properties of Least Squares Estimators

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Stat 319 Theory of Statistics (2) Exercises

Chapter 6 Sampling Distributions

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Improved Estimation of Rare Sensitive Attribute in a Stratified Sampling Using Poisson Distribution

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

A statistical method to determine sample size to estimate characteristic value of soil parameters

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

INTERNAL ASSIGNMENT QUESTIONS M.Sc. STATISTICS PREVIOUS. ANNUAL EXAMINATIONS June / July 2018

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Linear Regression Demystified

Abstract. Ranked set sampling, auxiliary variable, variance.

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Estimation of Gumbel Parameters under Ranked Set Sampling

Machine Learning Brett Bernstein

Sampling, Sampling Distribution and Normality

1 Introduction to reducing variance in Monte Carlo simulations

UNIT 11 MULTIPLE LINEAR REGRESSION

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

4.5 Multiple Imputation

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

(all terms are scalars).the minimization is clearer in sum notation:

Data Analysis and Statistical Methods Statistics 651

Estimation of Population Mean Using Co-Efficient of Variation and Median of an Auxiliary Variable

APPLIED MULTIVARIATE ANALYSIS

Areas and Distances. We can easily find areas of certain geometric figures using well-known formulas:

Binomial Distribution

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Problem Set 4 Due Oct, 12

Lecture Chapter 6: Convergence of Random Sequences

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Chapter 6 Principles of Data Reduction

Transcription:

CHAPTER - IV Two phase stratified samplig with ratio ad regressio methods of estimatio 4.1 Itroductio I sample survey a survey sampler might like to use a size variable x either (i) for stratificatio or (ii) for icorporatio i estimatio procedure or (iii) for selectig a sample. Sometimes oe might thik of usig x both for (i) ad (ii) or for (i) ad (iii). I this chapter we cosider the situatio whe a auxiliary variable x is used both for stratificatio ad for ratio or regressio method of estimatio. Let the fiite populatio JJ of size N cosists of L strata of sizes NJt N2, ;Nl with Nh uits belogig to /t-th stratum. Whe the sizes of strata are ot kow, a iitial SRSWOR sample Sj of fixed size! is selected ad the classified ito differet strata with [ uits fallig i the h-th stratum i. slh (h = 1,2,..,L) with T,h = '. I the secod phase a SRSWOR sample of size hf/ is draw from slh of size idepedetly of each h to observe the mai variable y. We assume that! is so large that 77,( > 0 for each h. We also assume that at the secod phase a costat proportio of uits iitial sample. gh = 'flll are sampled from the h-th stratum of the,v 0 (i ' > V* ' * ' - :' C

75 4.2 Ratio method estimatio Let us defie a ratio estimator uder two phase stratified samplig. L -// ~ yh / wh^xh h (4.2.1) where yhf/, xhf/ are the sample mea of the h th stramm based o a sample of size,fh; x/h is the sample mea of the hth stratum based o a sample of size 'h wi, = h/' Theorem 4.1 Uder two phase stratified radom samplig yis approximately a ubiased estimator of Y for large value of u Proof: E(y«*) W h -II Yh -/ L -// = Ei 2u wh xh E2 h=l II -/ /. \ r/' +'o v."»// / (4.2.2) where, = A^/ZV = W/r ad is based o with 2?(yj[) = -

76 Theorem 4.2 If the first sample is a radom sample of size /, the secod sample is a radom stratified sample from the first, with fixed gh (0 < gh < 1), the = 1 N \ I 4+E f-l-i) / a=i k ; (4.2.3) where S,h = 4 + Rl 4-2 Rh ; - t N Sy - K yrj) syh» sxh are populatio variaces of y ad x for the /z-th stramm respectively ad 5 A is the populatio covariace betwee x ad y for the /2-th stratum. Proof = Eiv2(%) + ViE2(yst) (4.2.4) Now,,^(5*,) a=i // yh -/ Xh,I>* : h=l / -// )2 yh -/ -/ // xh~yh / = ie w* a=i // / h h /2 v v 2 1?rA = Wh-~ a=i w. k?/2 VA where s = s + R? 4 2Rh 4a

77 Syh, are the variace based o sampled h uits i the iitial sample of the h-xh stratum ad s^,h is the covariace based o sampled 'h uits i the iitial sample of the h-th stratum ad r[ = y!h I x[. E1V2(yRJj= L> h=1 u-l) whsl it f--1] [8k ) h=l ; (4.2.5) L -H L ad VlE2$to) = VlE2 Y>kqfxll = Wkyl h-1 Xl < N S; (4.2.6) / From (4.2.4), (4.2.5) ad (4.2.6) we fid F(yJ = = (l i) L S? + E U - l) lv N) y *-i k8* > 2 wl h ^rh s, Theorem 4.3 A ubiased estimator of is N' N-1 f -lh Sk -1 hsrh + N- '-\ Zyiyl-'ylft=l ghj*= 1 J (4.2.7) where srh2 = sy2 + Rh//2 - IR^s^, yhj is the j-th observatio of /?-th stramm ad = yf / Jf.

78 Proof L Nh Est.(N-l)Sy = Est.'E y -MyL - h=l j=l (4.2.8) ad w E-*Etf *=i h y-i L If, TjEE^i iv A-l ;=1 (4.2.9) Form (4.2.8) ad (4.2.9) st(7v-l)s 2 = TV,, //.. Hl J=1 It ca be easily see that Est-i h=l (1-1 wa _ (J-i) l J ; h=i v ) WhSrh (4.2.10) (4.2.11) From (4.2.10) ad (4.2.11), we write ±_1 r N N N-l *=i i! j=i r\ a. w Hhc. M... i 1 + E (gk -1 \ 2 ^ Vrh (4.2.: Hece the result.

79 4.2.1 Optimum allocatio Cosider the cost fuctio C = C'' + 'ECkh (4.2.1.1) h = l where d = Cost per uit i the first phase sample; d1 h = Cost per uit i the secod phase sample. Sice f,h is a radom variable, the expected cost is E(C) = C1 (stf) = C ' +» 'Y, Ch8hWh h=i (4.2.1.2) because " = hgk, = ghe['h) =!Whgh. The product C* r(y**h- y N w, S,2 + E / 1 \ -l WH& h=l is miimised if ad oly if C' c[g>wh 2 J WtS^ s; - E A-1 (4.2.1.3) This gives optimum value of as = A SmVc7 s,2.-e^4,// (4.2.1.4)

80 Hece, the optimum variace is 2 v( yjjopt = c* ^ (4.2.1.5) N 4.3. Regressio method of estimatio samplig. Let us defie a regressio estimator uder two phase stratified radom L h=1 (4.3.1) where /3h is the kow populatio regressio coefficiet for the /i-th stratum. Theorem 4.4 Uder two phase stratified radom samplig yreg_st is a ubiased estimator of Y. Proof L L (4.3.2) where is the sample mea of the h th stratum based o a sample of size ^.

81 Theorem 4.5 If the first sample is a radom sample of size 1 ad the secod sample is a radom stratified sample from the first with fixed gh (0 < gh < 1), the V(y Reg-st) ( '~ N \ i. l-i WhS^ M- (4.3.3) where Sxh2 is the populatio variace of y for the /z-th stratum ad ph is the populatio correlatio coefficiet betwee x ad y for the /z-th stratum. Proof y(y^) = + (4.3.4) Now E^y^) r // o /-/ //v Ja + Pa(*a~*a ) = iewaf2(^/-ma/) / = ie w* II / V «A»A ) + p*42-2p p*v4) L M W.S, (i-p ^,8h, Tt -E (4.3.5) ad ^(7**-*) = W*K +?h(.4-*h)

82 Fi vl = h=l J l ' N (4.3.6) That completes the proof. Theorem 4.6 A estimator of is ^(y Reg-st) N1 N-l^t '-\h~ i -1 ^ Sh j (i-ps)^ TiT 1\L 1 "* N~ ' J\-\ 1 2 /=2 +------ L Vkj - y.reg-*\ -1 *=i Sr y=i (4.3.7) where ph is the estimated value of ph based o l Proof: L Nt Writig (IV-1)5^ = E E yl- Ny2,, h=l j=1 we ca see that it has a ubiased estimate If N _ // 'r r 2 _/=* ^ // ^5 ^ " y=l (4.3.8) Also sr. I /i=i 21 WVy, -1 M) &h II' Result follows from (4.3.7), (4.3.8) ad (4.3.9). A: f',.x /cjl P.tivWcf'' (4.3.9)

83 4.3.1 Optimum allocatio Cosiderig cost fuctio i (4.2.1.1) the optimum value of the variace is obtaied by miimisig V(y ) + JL 'jreg-st' jy ( c'+t.c;gkwh h=1 X r, l se+r / /i=i ^ gh ) (l-p with respect to gh ad it exists if ad oly if c' cl'ghwh h=1 Sh Hece the optimum value of gh is 8h = \ ^-E^-pIKs h=1.// (4.3.1.1) ad hece the optimum value of ' ca be obtaied by the expected cost ad the substitutig the optimum values of ' ad gh, the optimum value of the variace is obtaied as 4.3.2 Numerical illustratio Cosider a data collected i a complete eumeratio of 256 commercial peach orchards i North Carolia i Jue 1946 (Fiker, 1950). Here the area

84 is divided geographically ito three strata. The umber of peach trees i a orchard is deoted by xhi ad the estimated productio i bushels of peaches by Yhi- Strata wh Syh2 Q 2 &xh Syxh xh Yh Srh2 Ph 1 0.184 8699 5186 6462 53.80 69.48 1.29133 658 0.962 2 0.461 4614 2367 3100 31.07 43.64 1.40475 573 0.938 3 0.355 7311 4877 4817 56.97 66.39 1.16547 2706 0.807 Let the expected cost of the experimet be C*=50 ad the cost for each uit of the sample at the secod phase be cf=0.5. Hece for SRS, a sample of size =100 is permissible. Now, cz _ V rar q* &y ^ yy h ^yh h=l h=1 ' 6465.0378 Hece, fl_v U atj From (4.2.1.5) we have S* = 39.3963, sice N=256 2 h = 1 N {71.5485 VC1 + 24.1985) / 50} - 25.2541

85 We fid < V(yra) if d < 0.2083 ad hece further takig the cost for each uit of the sample at the iitial stage, C/=0.15 we fid = 28.6370 Also from (4.3.1.2) we have Rst'opt N -E (1 -pfjwhs^c'+ewhsyh y(l-p^)c 50 y 72.0068 {c1 + 33.4661 yjc^ 2 50 25.2541 = 27.8985 The relative precisio of the various methods ca be summarized as follows: Table 4-1 Samplig Method Method of Estimatio Relative Precisio (%) 1. Simple radom Mea per uit 100.00 2. Stratified radom Two phase 100.96 3. Stratified radom Two phase ratio 377.62 4. Stratified radom Two phase regressio 390.92

86 4.3.2.1 Determiatio of sample size Further, from (4.2.1.4) the optimum values of samplig fractios are: gj = 0.1964, g2 = 0.1832 ad g3 = 0.3982 ad hece from the expected cost give by (4.2.1.2) we fid *'=178, E(")=6, E{'l)=15, (f) =25 Also from (4.3.1.1), the optimum values of samplig fractios are: gj = 0.1937, g2 = 0.1791 ad g3 = 0.3841 ad hece from the expected cost give by (4.2.1.2) we fid /1M80, E(l!)=6, («") =15, (f) = 25 Summary ad Coclusio I this chapter a attempt has bee made to costruct ratio ad regressio estimators uder two phase stratified radom samplig i presece of oe auxiliary variable. Numerical illustratio shows that the regressio estimator uder two phase stratified samplig perform better i terms of efficiecy with respect to other competitive estimators.