Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Similar documents
Empirical Likelihood Methods

Advanced Topics in Survey Sampling

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Combining data from two independent surveys: model-assisted approach

Empirical Likelihood Methods for Sample Survey Data: An Overview

Chapter 8: Estimation 1

Data Integration for Big Data Analysis for finite population inference

Calibration estimation in survey sampling

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Variance Estimation for Calibration to Estimated Control Totals

Weighted Least Squares

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Generalized Pseudo Empirical Likelihood Inferences for Complex Surveys

Calibration estimation using exponential tilting in sample surveys

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Fractional Imputation in Survey Sampling: A Comparative Review

STAT 100C: Linear models

Graduate Econometrics I: Maximum Likelihood I

6. Fractional Imputation in Survey Sampling

Propensity score adjusted method for missing data

Nonresponse weighting adjustment using estimated response probability

the error term could vary over the observations, in ways that are related

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size

STAT5044: Regression and Anova. Inyoung Kim

Regression and Statistical Inference

Modelling Non-linear and Non-stationary Time Series

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Maximum Likelihood (ML) Estimation

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

Graybill Conference Poster Session Introductions

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Graduate Econometrics I: Asymptotic Theory

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Chapter 5 Matrix Approach to Simple Linear Regression

Nonlinear Signal Processing ELEG 833

Chapter 3: Maximum Likelihood Theory

Inference for High Dimensional Robust Regression

Simple Linear Regression: The Model

Some General Types of Tests

Estimation of change in a rotation panel design

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Weight calibration and the survey bootstrap

GMM Estimation and Testing

Weighted Least Squares

Time Series Analysis

Heteroskedasticity. Part VII. Heteroskedasticity

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Introduction to Survey Data Integration

Week 3: The EM algorithm

8. Hypothesis Testing

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Recent Advances in the analysis of missing data with non-ignorable missingness

STAT 100C: Linear models

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Weighted Least Squares

Quantile regression and heteroskedasticity

Combining multiple observational data sources to estimate causal eects

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Model Assisted Survey Sampling

The R package sampling, a software tool for training in official statistics and survey sampling

Chapter 9: Hypothesis Testing Sections

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Chapter 3: Element sampling design: Part 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

MEI Exam Review. June 7, 2002

Weighting in survey analysis under informative sampling

Multiple Linear Regression for the Supervisor Data

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Graduate Econometrics Lecture 4: Heteroskedasticity

Economics 582 Random Effects Estimation

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Lecture 6: Discrete Choice: Qualitative Response

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Sensitivity of GLS estimators in random effects models

The outline for Unit 3

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

F. Jay Breidt Colorado State University

What if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random

Lecture 14 Simple Linear Regression

Accounting for Complex Sample Designs via Mixture Models

Maximum Likelihood Estimation

New Developments in Econometrics Lecture 9: Stratified Sampling

Unit roots in vector time series. Scalar autoregression True model: y t 1 y t1 2 y t2 p y tp t Estimated model: y t c y t1 1 y t1 2 y t2

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

The regression model with one fixed regressor cont d

WISE International Masters

Finite Population Sampling and Inference

Introductory Econometrics

Transcription:

Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st ) = ( H ȳ reg = ȳ st + ( x N x) ˆβ opt Wh 2 (1 f h) n 1 h Ŝxxh ) 1 H Wh 2 (1 f h) n 1 h Ŝxyh n h = (n h 1) 1 (x hj x h ) (x hj x h, y hj ȳ h ) j=1 H W h ( x h, ȳ h ). J. Kim (ISU) Chapter 2 2 / 26

2.4 Regression and stratification Note that H n h ˆβ opt = K h (x hj x h ) (x hj x h ) j=1 1 H n h j=1 where K h = Wh 2(1 f h)n 1 h (n h 1) 1 = W 2 h (1 f h )n 2 h On the other hand, H n h ˆβ GREG = j=1 hj (x hj x h ) (x hj x h ) π 1 1 H n h j=1 K h (x hj x h ) y hj, = (1 f h )π 2 hi. π 1 hj (x hj x h ) y hj. Roughly speaking, ˆβ opt is the first part of the slope for the regression of π 1 hi y i on π 1 hi x i and z i, where z i is a vector of stratum indicator functions. J. Kim (ISU) Chapter 2 3 / 26 2.4 Regression and stratification Given ˆβ, consider a regression estimator under stratified sampling ȳ st,reg = ȳ st + ( x N x st ) ˆβ. Write y hi = x hi β h + e hi, e hi ( 0, σe,h) 2. The large-sample variance of the regression estimator is V (ȳ st,reg ) = H Wh 2 (1 f h) n 1 h σ2 a,h where σ 2 a,h = σ2 e,h + (β h β N ) Σ xx,h (β h β N ), Σ xx,h = V {x hi }, and β N is the probability limit of ˆβ. J. Kim (ISU) Chapter 2 4 / 26

2.4 Regression and stratification Example 2.4.1 Two estimators of β: ˆβ wls = ( X D w X ) 1 X D w y ˆβ opt = ( X D 2 w X ) 1 X D 2 w y, where D w is a diagonal matrix with diagonal elements equal to W h n 1 h for units in stratum h. Probability limits: β ols,n = p lim ˆβ wls = ( X NX N ) 1 X Ny N β opt,n = p lim ˆβ opt = ( X ND w,n X N ) 1 X ND w,n y N J. Kim (ISU) Chapter 2 5 / 26 2.4 Regression and stratification Example 2.4.1 (Cont d) For example, assume H = 2 with W 1 = 0.15 and W 2 = 0.85. Stratum parameters: σ 2 x,h = { 4.3 if h = 1 0.6 if h = 2, β 1,h = Population regression coefficients (under n 1 = n 2 ) { 3.0 if h = 1 1.0 if h = 2 β obs,n = H W hσ 2 xh β 1h H W hσ 2 xh = 2.1169 β opt,n = H W 2 h σ2 xh β 1h H W 2 h σ2 xh = 1.3649. J. Kim (ISU) Chapter 2 6 / 26

2.4 Regression and stratification Example 2.4.1 (Cont d) To compare the variances, assume that σ 2 e,h = { 24 if h = 1 0.8 if h = 2. Stratum variances of the residuals from ˆβ ols. σ 2 a,h = { (3 2.1169) 2 (4.3) + 24 = 27.3537 if h = 1 (1 2.1169) 2 (0.6) + 0.8 = 1.5485 if h = 2 Stratum variances of the residuals from ˆβ opt. σ 2 a,h = { (3 1.3649) 2 (4.3) + 24 = 35.4960 if h = 1 (1 1.3649) 2 (0.6) + 0.8 = 0.8106 if h = 2 J. Kim (ISU) Chapter 2 7 / 26 2.4 Regression and stratification Example 2.4.1 (Cont d) (Under n h =constant,) the large-sample variances of the regression estimator satisfy and n h V {ȳ st,reg,wls } = (0.15) 2 (27.3537) + (0.85) 2 (1.5485) = 1.7345 n h V {ȳ st,reg,opt } = (0.15) 2 (35.4960) + (0.85) 2 (0.8106) = 1.3845 Roughly speaking, β ols,n minimizes h W hσa,h 2 while β opt,n minimizes h W h 2n 1 h σ2 a,h, where σ2 ah = E { (y hi x hi β) 2}. J. Kim (ISU) Chapter 2 8 / 26

2.4 Regression and stratification If x h,n = N 1 Nh h i=1 x hi are available then we can construct a separate regression estimator ȳ s,reg = N W h x h,n ˆβ h where ˆβ h = { nh i=1 ( x hi x hn ) ( x hi x hn ) } 1 n h ( x hi x hn ) y hi. i=1 Because the weights are the same within each stratum, the GREG type estimator is the same as the design-optimal estimator when the separate regression estimation is used. Bias can be sizable if n h are small in some strata. J. Kim (ISU) Chapter 2 9 / 26 2.6 Regression for two-stage samples Basic Setup Two-stage cluster sampling 1 Stage One: select n clusters 2 Stage Two: Within the selected cluster i, select m i second-stage units (from the M i units). π (ij) : the inclusion probability of selecting element j in primary sampling unit i. (π (ij) = π 1i π 2j i ) The analysis unit is the element, not the cluster. Thus, we want to construct weights for the sample elements. J. Kim (ISU) Chapter 2 10 / 26

2.6 Regression for two-stage samples Basic Setup Two-types of auxiliary information x ij : element level auxiliary information z i : cluster level auxiliary information Want to incorporate the auxiliary information. I j A i w ij x ij = I j A i w ij z i = i U I M i j=1 i U I z i x ij J. Kim (ISU) Chapter 2 11 / 26 2.6 Regression for two-stage samples Approach 1 Construct z ij from z i and apply the regression weighting method using (x ij, z ij ) in the sample. Use z ij = z im 1 i π 2j i. Note that j A i π 1 2j i z ij = z i and so E π 1 1i π 1 2j i z ij = E j A i I I π 1 1i z i = i UI z i. J. Kim (ISU) Chapter 2 12 / 26

2.6 Regression for two-stage samples Approach 2 : design-consistent model-based approach Model for the two-stage sample y ij = x ij β + u ij u ij = b i + e ij where b i iid(0, σ 2 b ), e ij iid(o, σ 2 e), and e ij is independent of b k for all i, j, k. Writing u i = (u i1,, u im ), we have u i (0, Σ uu ) where Σ uu = I m σ 2 e + J m J mσ 2 a. For illustration, see Example 2.6.1. J. Kim (ISU) Chapter 2 13 / 26 2.7 Calibration Minimize ω Vω s.t. ω X = x N (ω Vω)(aX V 1 Xa ) (ω Xa ) 2 with equality iff ω V 1/2 ax V 1/2 ω ax V 1 ω = kax V 1, k : constant ω X = kax V 1 X & x N (X V 1 X) 1 = ka ω = x N (X V 1 X) 1 X V 1 ω Vω x N (X V 1 X) 1 x N Note Minimize V ξ (ω y) s.t. E ξ (ω y) = E(ȳ N ). J. Kim (ISU) Chapter 2 14 / 26

Alternative Minimization Lemma α : given n-dimensional vector Let ω a = arg min ω ω Vω s.t ω X = x N Let ω b = arg min ω (ω α) V(ω α) s.t ω X = x N If V α C(X), then ω a = ω b. Proof : (ω α) V(ω α) = ω Vω α Vω ω Vα + α Vα = ω Vω λ X ω ω Xλ + α Vα where V α = Xλ = ω Vω 2λ x N + α Vα ω X = x N If α = D 1 π J n, then V α C(X) is the condition for design consistency in Corollary 2.2.3.1. J. Kim (ISU) Chapter 2 15 / 26 General Objective Function min G(ω i, α i ) s.t. ω i x i = x N Lagrange multiplier method g(ω i, α i ) λ x i = 0 where g(ω i, α i ) = G ω i ω i = g 1 (λ x i, α i ) where λ is from g 1 (λ x i, α i )x i = x N J. Kim (ISU) Chapter 2 16 / 26

GREG Estimator min Q(ω, d) = d 1 i (ω i d i )q i + λ x i = 0 ω i = d i + λ d i x i/q i ω i x i = ( ) 2 ωi d i 1 q i s.t. d i d i x i + λ d i x ix i /q i ω i x i = x N. λ = ( x N x HT )( d i x ix i /q i ) 1 w i = d i + ( x N x HT )( d i x ix i /q i ) 1 d i x i/q i J. Kim (ISU) Chapter 2 17 / 26 Other Objective Functions Pseudo empirical likelihood Q(ω, d) = d i log Kullback-Leibler distance: Q(ω, d) = ω i log ( ωi d i ( ωi d i ), ω i = d i /(1 + x i λ) ), ω i = d i exp(x i λ) J. Kim (ISU) Chapter 2 18 / 26

Theorem 2.7.1 Deville and Särndal (1992) Theorem Let G(ω, α) be a continuous convex function with a first derivative that is zero for ω = α. Under some regularity conditions, the solution ω i that minimizes G(ω i, α i ) s.t. ω i x i = x N satisfies ω i y i = α i y i + ( x N x α ) ˆβ + O p (n 1 ) where ˆβ = ( x i x i/φ ii ) 1 x i y i/φ ii and φ ii = 2 G(α i, α i )/ ω 2 i. J. Kim (ISU) Chapter 2 19 / 26 Proof of Theorem 2.7.1 Using the Lagrange multiplier method and Taylor linearization, ω i = ω i (λ) = g 1 (λ x i, α i ) where g(ω i, α i ) = G/ ω i. By assumption, g 1 (0, α i ) = α i. Define Û(λ) = ω ix i x N and let ˆλ satisfy Û(ˆλ) = 0. By Taylor 0 = Û(ˆλ) = Û(0) + Û(0) λ (ˆλ 0) + O p (n 1 ). Because Û(0) = α ix i and where g (α i, α i ) = Û(0) λ 2 ω 2 i = 1 g (α i, α i ) x ix i = x ix i /φ ii, G(ω i, α i ) = φ ii. ωi =α i J. Kim (ISU) Chapter 2 20 / 26

Proof of Theorem 2.7.1, continued ȳ cal (ˆλ) = ω i (ˆλ)y i [ ] ȳcal (0) = ȳ cal (0) + (ˆλ 0) + O p (n 1 ) λ = [ ] [ ] 1 x i α i y i + y i x i x i ( x N x α ) + O p (n 1 ). φ ii φ ii J. Kim (ISU) Chapter 2 21 / 26 2.8 Weight Bounds ω i = d i + d i λ x i /c i can take negative values (or take very large values) Add L 1 ω i L 2 to ω i x i = x N. Approaches 1 Huang and Fuller: 2 Husain (1969) Q(w i, d i ) = d i Ψ ( wi d i ), Ψ : Huber function min ω ω + γ(ω X x N ) Σ 1 x x (ω X x N ) for some γ 3 Other methods, quadratic programming. J. Kim (ISU) Chapter 2 22 / 26

2.9 Maximum likelihood and raking ratio Basic Setup Two-way (r c) categorical data a km = n km n, [ nkm p km = E n p k, p m : known k = 1, 2,, r, m = 1, 2,, c ] We are interested in estimating p km. Constraints: ˆp km = p k ˆp km = p m m k J. Kim (ISU) Chapter 2 23 / 26 Maximum likelihood approach Multinomial Likelihood r c a km log(p km ) k=1 m=1 Lagrangian Multiplier Method r c r a km log(p km ) + p km p k ) + k=1 m=1 c p km = m=1 λ r+m ( r k=1 a km λ k + λ r+m ( c λ k k=1 m=1 ) p km p m J. Kim (ISU) Chapter 2 24 / 26

Raking ratio method Deming & Stephan (1940) idea: Approximate r k=1 m=1 c a km log(p km ). = r k=1 m=1 c {a km log(a km ) + (p km a km ) a 1 km (p km a km ) 2} Thus, maximizing r c k=1 m=1 a kmlog(p km ) is asymptotically equivalent to minimizing r c k=1 m=1 a 1 km (p km a km ) 2. If there is only one set of constraints, c p km = p k k = 1,, r, m=1 then the solution to minimizing c m=1 a 1 km (p km a km ) 2 s.t. the constraint is p k p km = a km c m=1 a. km J. Kim (ISU) Chapter 2 25 / 26 Raking ratio method (Cont d) For the two sets of constraints, c p km = p k k = 1,, r m=1 r p km = p m m = 1,, c. k=1 p (t+1) km = p (t) km p k, p (t+2) c m=1 p(t) km km = p (t+1) km p m r k=1 p(t+1) km J. Kim (ISU) Chapter 2 26 / 26