Bias-correction under a semi-parametric model for small area estimation

Similar documents
On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

Robust Small Area Estimation Using a Mixture Model

Small Area Estimation Under Spatial Nonstationarity

Small Area Estimation: Methods, Applications and New Developments. J. N. K. Rao. Carleton University, Ottawa, Canada

Small Area Estimation: Methods and Applications. J. N. K. Rao. Carleton University, Ottawa, Canada

Outlier Robust Small Area Estimation

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Estimation: Part 2. Chapter GREG estimation

Model Based Direct Estimation of Small Area Distributions

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

University, Bogor, Indonesia.

University of Wollongong. Research Online

Small Area Interval Estimation

Efficient nonresponse weighting adjustment using estimated response probability

Small Area Estimation for Business Surveys

Chapter 9: Statistical Inference and the Relationship between Two Variables

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

EB-EBLUP MSE ESTIMATOR ON SMALL AREA ESTIMATION WITH APPLICATION TO BPS DATA 1,2

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

10-701/ Machine Learning, Fall 2005 Homework 3

Bootstrapping Mean Squared Errors of Robust Small-Area Estimators: Application to the Method-of-Payments Data

e i is a random error

Negative Binomial Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Parametric fractional imputation for missing data analysis

Weighted Estimating Equations with Response Propensities in Terms of Covariates Observed only for Responders

Population Design in Nonlinear Mixed Effects Multiple Response Models: extension of PFIM and evaluation by simulation with NONMEM and MONOLIX

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Nonparametric model calibration estimation in survey sampling

17 Nested and Higher Order Designs

Linear Approximation with Regularization and Moving Least Squares

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators *

Comparison of Regression Lines

A Bound for the Relative Bias of the Design Effect

Adaptively Transformed Mixed Model Prediction of General Finite Population Parameters

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

A Robust Method for Calculating the Correlation Coefficient

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

arxiv: v3 [stat.me] 11 Jun 2018

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

STK4080/9080 Survival and event history analysis

Composite Hypotheses testing

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

An R implementation of bootstrap procedures for mixed models

Small area prediction of counts under a nonstationary

Primer on High-Order Moment Estimators

ANOVA. The Observations y ij

First Year Examination Department of Statistics, University of Florida

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Chapter 3 Describing Data Using Numerical Measures

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Chapter 5 Multilevel Models

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Some basic statistics and curve fitting techniques

Tracking with Kalman Filter

Uncertainty as the Overlap of Alternate Conditional Distributions

Small Area Estimation with Auxiliary Survey Data

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Basically, if you have a dummy dependent variable you will be estimating a probability.

Introduction to Analysis of Variance (ANOVA) Part 1

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Econometrics of Panel Data

Lab 4: Two-level Random Intercept Model

Chapter 13: Multiple Regression

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Unit 10: Simple Linear Regression and Correlation

Data Abstraction Form for population PK, PD publications

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

STAT 3008 Applied Regression Analysis

Chapter 12 Analysis of Covariance

Small area estimation for semicontinuous data

Statistics for Economics & Business

SOME NEW MODELS FOR SMALL AREA ESTIMATION. Hao Ren

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

18. SIMPLE LINEAR REGRESSION III

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

x i1 =1 for all i (the constant ).

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Basic Statistical Analysis and Yield Calculations

Limited Dependent Variables and Panel Data. Tibor Hanappi

28. SIMPLE LINEAR REGRESSION III

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Computing MLE Bias Empirically

Systems of Equations (SUR, GMM, and 3SLS)

A note on regression estimation with unknown population size

Constrained Small Area Estimators Based on M-quantile Methods

Transcription:

Bas-correcton under a sem-parametrc model for small area estmaton Laura Dumtrescu, Vctora Unversty of Wellngton jont work wth J. N. K. Rao, Carleton Unversty ICORS 2017 Workshop on Robust Inference for Sample Surveys, Wollongong, July 7th, 2017 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 1

Table of contents 1 Small area estmaton 2 Robust methods 3 Sem-parametrc mxture model 4 Smulaton study Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 2

Small area estmaton Area or Doman s a geographcal area: county/provnce/an admnstratve area/ soco-demographc area Desgn-based approach uses doman-specfc drect estmators for domans wth large enough sample szes Ŷ = k s w k (a k y k ) a k s the doman ndcator varable w k are the desgn weghts Drect doman estmators are not relable for small domans Indrect estmators are employed, typcally based on lnear models Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 3

Framework Populaton of nterest s dvded nto m areas, U, = 1,..., m Let s = s U, = 1,..., m Varable y s observed n area Auxlary vector x n area s known at populaton level Scope: predct Ȳ Models employed at area level, or unt level Focus on mxed models, whch nclude the area random effect All non-sampled values follow exactly the assumed workng model Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 4

Mxed models Area level (Fay-Herrot, 1979) relate small area drect estmators to area-specfc covarates ȳ = x t β + v + e Unt level (Battese, Harter and Fuller, 1988) s a nested error regresson model y j = x t jβ + v + e j P-splne model (Opsomer, Claeskens, Ranall, Kauemann and Bredt, 2008) avods parametrc specfcaton to the mean functon y j = m K (x j ) + v + e j Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 5

Semparametrc model The penalzed splne approach allows the use of mxed model theory m K (x j ; β, u) = β 0 + β 1 x j +... + β px p j + := x t jβ + w t ju Mnmze penalzed sum of squares mn β,u =1 Choce of λ Cross-Valdaton Mxed model approach K u k (x j q k ) p + k=1 n K (y j m K (x j ; β, u)) 2 + λ k=1 u 2 k Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 6

Outlers Defnton Outlers are values whch devate from the pattern set by the majorty of the data. Outlyng observatons can occur due to measurement errors or generated by heavy-taled dstrbutons. Part of the data not fttng the same model: mxture models. Manly focus on dstrbutonal robustness. General form of a mxture dstrbuton F = (1 ε)g + εh Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 7

Estmaton Goals 1 optmal or nearly optmal effcency when the model s correct 2 small devatons from the model assumptons should only slghtly affect ts performance Robust estmators M-estmator generalzes the maxmum lkelhood estmator; solved by numercal methods ψ(x θ) = 0 L-estmator lnear combnaton of a functon of order statstcs ˆθ = α n f (X () ) R-estmator obtaned by nvertng a rank test Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 8

Robust methods n small area estmaton Chambers, Chandra, Salvat and Tzavds (2014): robust projectve and robust predctve approaches Robust Plug-In methods are projectve All non-sampled values are not outlers Approach s projectve because the workng model s projected onto the whole non-sampled part of the populaton Examples M-Quantle methods (Chambers and Tzavds, 2006) Robust EBLUP (Snha and Rao, 2009) Semparametrc Robust EBLUP (Rao, Snha and Dumtrescu, 2014) Bas-corrected robust methods are robust predctve Approach s predctve because the sample outler nformaton s used to predct contamnaton on the varable of nterest Some non-sampled unts are outlers Examples Local bas-correcton (Chambers, Chandra, Salvat and Tzavds, 2014) Full bas-correcton (Dongmo-Jongo, Hazza and Duchesne, 2013) Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 9

Robust mxed model estmaton Lnear mxed model y = Xβ + Wu + e, If varance components are known ˆβ = (X t V 1 X) 1 X t V 1 y û = Σ uw t V 1 (y Xˆβ) Robust estmators and predctors X T Σ 1/2 ε Ψ[Σ 1/2 ε (y Xˆβ Wû)] = 0 Z T Σ 1/2 ε Ψ[Σ 1/2 ε (y Xˆβ Wû)] Σ 1/2 u Ψ(Σ 1/2û) u = 0 Ψ[Σ 1/2 ε (y Xˆβ Wû)] T 1/2 V Σ ε Σ 1/2 ε Ψ[Σ 1/2 ε (y Xˆβ Wû)] θ k ( ) tr V 1 V = 0. θ k Ψ(s) = (ψ b (s 1 ), ψ b (s 2 ),..., ) t, ψ b (s) = s mn(1, b/ s ), b > 0 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 10

Robust predctor Approxmatng model y = Xβ + Wu + Zv + e The (R)EBLUP of Ȳ s taken as X t ˆβ + Wt û + 1 ˆv If the samplng fracton n /N s not neglgble the best lnear unbased predctor of Ȳ ˆµ = 1 y j + N j s j s ŷ j, where ŷ j = x t j ˆβ + w t jû + ˆv Robust methods are known to perform well f the dstrbuton s symmetrc, but may nvolve a large bas otherwse The case of a mxture between two semparametrc models wth dfferent means leads to a larger bas when fxed parameters and random effects are estmated/predcted usng robust ML or robust MME methods Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 11

Mxture model Nonparametrc model y j = m(x j ) + v + ε j possble outlers n ε, v Mxture model ζ m : y j = (1 A j )y 0j + A j y 1j, A j Bernoull(p) Bas correcton methods ζ 0 : y 0j = m 0 (x j ) + v 0 + ε 0j ζ 1 : y 1j = m 1 (x j ) + v 1 + ε 1j Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 12

Bas correcton I Due to Chambers (1986) ˆµ EBLUP = (ω () s ) t y s (ω () s ) t x s = j U x t j, ˆµ EBLUP = ˆµ REBLUP + correcton terms The weghts ω () hj = { 1 + 1 t N n M () j, h =, j s 1 t N n M () hj, h, j s h. M () = N () [I X(X t V 1 X) 1 X t V 1 ] + Ẋ(X t V 1 X) 1 X t V 1 N () = (σ 2 uẇw t + σ 2 v ŻZ t )V 1, where W () h = { j s ω () j j s h ω () hj, N, h = h Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 13

Then, under the P-splne model ˆµ robust = ˆµ REBLUP m + N 1 + N 1 + N 1 ( h=1 m h=1 j s h Ψ c1 [ω () jh (y hj x t hj ˆβ R w t hjûr ˆv h,r )] Ψ c2 (W () ˆv h h,r) m h=1 Calbraton does not hold for w Choce of tunnng constants j s h ω () jh wt hj j U w t j)û R c 1 = k medan jh ( ω () jh )ˆσ er, c 2 = k medan h ( W () h )ˆσ vr Alternates between EBLUP and the robust predctor Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 14

Bas correcton II Beaumont, Hazza and Ruz-Gazen (2013), Dongmo Jongo, Hazza and Duchesne (2013) Defne the condtonal bas as a measure of nfluence of unt j n area h for predctng the mean area B hj (y hj, v h, u) := E(ˆµ BLUP Ȳ y hj, v h, u). Wth T := N 1 ( m l=1 r s l ω () lr w T lr r U w T r )u, r hj = y hj x T hjβ w T hju v h the condtonal bas s N 1 ω () r hj hj + N 1 W () h v h + T, h, j s h N 1 W () B hj (y hj, v h, u) = h v h + T, h, j U h s h N 1 (ω () 1)r j j + N 1 W () v + T, h =, j s N 1 r j + N 1 W () v + T, h =, j U s Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 15

ψ d (ˆB hj ) = ˆµ robust = ˆµ EBLUP m ˆBhj + h=1 j s h h=1 { N 1 ψ d (ˆω () ˆr hj hj) + N 1 ψ d [( ω ˆ() N 1 j Ŵ () h m j s h ψ d (ˆB hj ) ψ d(ˆv h ) + ˆT, h, j s h 1)r j ] + N 1 Ŵ () ψ d (ˆv ) + ˆT, h =, j s, where fxed effects and random components are estmated by an robust ML method. Consder the class of robust predctors ˆµ R (c) = ˆµ EBLUP + (c) Wthn ths class, we search the value of c whch mnmzes the maxmum absolute estmated condtonal bas of ˆµ R (c) Then ˆB mn ˆµ R (c) = ˆµ EBLUP where = mn h,j sh {ˆB hj (y hj, ˆv h, û)} and = max h,j sh {ˆB hj (y hj, ˆv h, û)} ˆB max 1 mn max (ˆB + ˆB ), 2 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 16

Setup True sem-parametrc nested error model y j = m(x j ) + v + e j, = 1,..., 40, j = 1,..., 40 Random effects v and e j are generated from contamnated normal dstrbutons v d (1 γ 1 )N(0, σ 2 v ) + γ 1 N(0, σ 2 v1), e j d (1 γ 2 )N(0, σ 2 e) + γ 2 N(0, σ 2 e1), where σ 2 v = σ 2 e = 1 and σ 2 v1 = σ 2 e1 = 25 Proportons of outlers: γ = γ 1 = γ 2 = 0.1 SRS from each area wth n = 4 Mxture of the two means m(x) = (1 γ)m 0 (x) + γm 1 (x) lnear quadratc k 1 = 1.345 and k 2 = 9 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 17

Smulated absolute bases and mean squared predcton errors True Conta- K = 0 K = 20 K = 30 model mnaton Method Bas MSPE Bas MSPE Bas MSPE Lnear (0, 0) EBLUP 0.0193 0.1878 0.0127 0.1915 0.0136 0.1886 REBLUP 0.0193 0.1940 0.0120 0.1968 0.0142 0.1936 (e, 0) EBLUP 0.0246 0.4732 0.0273 0.4682 0.0242 0.4707 REBLUP 0.0208 0.3168 0.0192 0.3118 0.0210 0.3165 (0, v) EBLUP 0.0142 0.2115 0.0175 0.2138 0.0148 0.2141 REBLUP 0.0140 0.2044 0.0173 0.2076 0.0148 0.2079 (e, v) EBLUP 0.0256 0.6201 0.0245 0.6310 0.0286 0.6220 REBLUP 0.0203 0.3447 0.0206 0.3433 0.0226 0.3430 Quadratc (0, 0) EBLUP 0.0543 0.3997 0.0174 0.1921 0.0166 0.1969 REBLUP 0.1017 0.3756 0.0179 0.1976 0.0164 0.2022 (e, 0) EBLUP 0.0643 0.6262 0.0287 0.4792 0.0288 0.4756 REBLUP 0.1030 0.5251 0.0235 0.3201 0.0212 0.3210 (0, v) EBLUP 0.0521 0.5556 0.0156 0.2183 0.0198 0.2202 REBLUP 0.1156 0.4535 0.0143 0.2112 0.0200 0.2130 (e, v) EBLUP 0.0681 0.9451 0.0260 0.6250 0.0283 0.6404 REBLUP 0.1336 0.6631 0.0217 0.3560 0.0220 0.3529 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 18

Mxture of two lnear models m 0 (x) = 100 + 3x, m 1 (x) = 150 + x Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 19

Smulated absolute bases and mean squared predcton errors Models: m 0 (x) = 100 + 3x and m 1 (x) = 150 + x Mxture Method Bas MSPE Bas MSPE (K = 0) (K = 20) (0, 0, b) EBLUP 0.0937 7.4464 0.0882 7.4957 REBLUP 4.0219 21.8341 3.7152 23.8805 BCI (k 1 ) 3.4322 18.0372 3.1751 20.8888 BCI (k 2 ) 1.5730 10.023 1.2699 13.9493 BCII 0.1981 7.2601 0.1208 8.9915 (e, v, 0) EBLUP 0.0217 0.6168 0.0217 0.6197 REBLUP 0.0180 0.3421 0.0178 0.3440 BCI (k 1 ) 0.0205 0.3652 0.0204 0.3697 BCI (k 2 ) 0.0233 0.5828 0.0230 0.5893 BCII 0.0205 0.4578 0.0201 0.4620 (e, v, b) EBLUP 0.0886 9.4479 0.0858 9.49290 REBLUP 3.9278 22.2256 3.7386 24.8069 BCI (k 1 ) 3.1015 17.4217 2.9441 20.6463 BCI (k 2 ) 1.1727 10.2148 1.0513 13.7787 BCII 0.1973 8.7437 0.1502 10.0789 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 20

Mxture of two quadratcs m 0 (x) = 1 + x + x 2 m 1 (x) = 2 x 3x 2 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 21

Smulated absolute bases and mean squared predcton errors Models: m 0 (x) = 1 + x + x 2 and m 1 (x) = 2 x 3x 2, 20 knots Mxture Method Bas MSPE (0, 0, b) EBLUP 0.0538 1.4086 REBLUP 0.6221 1.2053 BCI (k 1 ) 0.3849 0.9991 BCI (k 2 ) 0.1472 1.0375 BCII 0.0781 1.1184 (e, v, 0) EBLUP 0.0236 0.6305 REBLUP 0.0209 0.3553 BCI (k 1 ) 0.0211 0.3860 BCI (k 2 ) 0.0233 0.6004 BCII 0.0212 0.4765 (e, v, b) EBLUP 0.0474 2.2603 REBLUP 0.5844 1.4148 BCI (k 1 ) 0.3261 1.2446 BCI (k 2 ) 0.1455 1.5596 BCII 0.1040 1.6208 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 22

Smulated absolute bases and mean squared predcton errors Models: m 0 (x) = 1 + x and m 1 (x) = 1 + x + 4x 2 Mxture Method Bas MSPE (0, 0, b) EBLUP 0.0409 1.0841 REBLUP 0.5136 0.9073 BCI (k = 3) 0.2037 0.9991 BCII 0.0771 0.8640 (e, v, 0) EBLUP 0.0328 0.6179 REBLUP 0.0242 0.3422 BCI (k = 3) 0.0248 0.4384 BCII 0.0275 0.4569 (e, v, b) EBLUP 0.0508 2.1058 REBLUP 0.5420 1.2334 BCI (k = 3) 0.1897 1.1625 BCII 0.0939 1.4690 Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 23

Further research nterests Inference- bootstrap MSPE estmaton; analytc MSPE approxmatons Generalzed Lnear Mxed Model Informatve samplng Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 24

Thank You! Laura Dumtrescu Bas-correcton under a sem-parametrc model for small area estmaton 25