Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Similar documents
Parametric fractional imputation for missing data analysis

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

arxiv: v1 [stat.me] 27 Aug 2015

Efficient nonresponse weighting adjustment using estimated response probability

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Small Area Interval Estimation

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Computing MLE Bias Empirically

Maximum Likelihood Estimation

Estimation: Part 2. Chapter GREG estimation

Stat 543 Exam 2 Spring 2016

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

EM and Structure Learning

Stat 543 Exam 2 Spring 2016

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Bias-correction under a semi-parametric model for small area estimation

EEE 241: Linear Systems

A note on multiple imputation for method of moments estimation

Composite Hypotheses testing

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

An adaptive SMC scheme for ABC. Bayesian Computation (ABC)

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Appendix B. The Finite Difference Scheme

Conjugacy and the Exponential Family

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

ASYMPTOTIC PROPERTIES OF ESTIMATES FOR THE PARAMETERS IN THE LOGISTIC REGRESSION MODEL

An R implementation of bootstrap procedures for mixed models

Statistical registers by restricted neighbor imputation

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Nonparametric model calibration estimation in survey sampling

Estimation of the Mean of Truncated Exponential Distribution

A Hybrid Variational Iteration Method for Blasius Equation

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Efficient estimation in missing data and survey sampling problems

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

Representation Theorem for Convex Nonparametric Least Squares. Timo Kuosmanen

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Primer on High-Order Moment Estimators

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Global Sensitivity. Tuesday 20 th February, 2018

Testing for seasonal unit roots in heterogeneous panels

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Logistic regression models 1/12

Population Design in Nonlinear Mixed Effects Multiple Response Models: extension of PFIM and evaluation by simulation with NONMEM and MONOLIX

4.3 Poisson Regression

Parameters Estimation of the Modified Weibull Distribution Based on Type I Censored Samples

Course 395: Machine Learning - Lectures

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

β0 + β1xi and want to estimate the unknown

Hidden Markov Models

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Solutions Homework 4 March 5, 2018

Hidden Markov Models

Markov Chain Monte Carlo Lecture 6

A Robust Method for Calculating the Correlation Coefficient

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

RELIABILITY ASSESSMENT

18.1 Introduction and Recap

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals

Numerical Heat and Mass Transfer

Chapter 12 Analysis of Covariance

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

T E C O L O T E R E S E A R C H, I N C.

Weighted Estimating Equations with Response Propensities in Terms of Covariates Observed only for Responders

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Lecture 3 Stat102, Spring 2007

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Chapter 20 Duration Analysis

Lecture 21: Numerical methods for pricing American type derivatives

4DVAR, according to the name, is a four-dimensional variational method.

Lecture 10 Support Vector Machines II

STAT 3008 Applied Regression Analysis

Economics 130. Lecture 4 Simple Linear Regression Continued

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 3: Probability Distributions

Robust mixture modeling using multivariate skew t distributions

Probabilistic Graphical Models

The Geometry of Logit and Probit

Supporting Information

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Singular Value Decomposition: Theory and Applications

Rockefeller College University at Albany

Lecture Notes on Linear Regression

Generalized Linear Methods

Transcription:

Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010

1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton Smulaton study Concluson

2 Introducton Z: a vector of random varables wth dstrbuton F (z; θ). z 1,, z n are n ndependent realzatons of Z. Two types of parameters of nterest 1 θ: the parameter n the model F (z; θ) 2 η g = E θ {g(z)}: nduced parameter from θ. η g can be computed n two ways: 1 Maxmum lkelhood method: ˆη g = η g (ˆθ MLE ) 2 Smple method: ˆθ g = n 1 n =1 g (z )

3 Introducton Suppose that z s not fully observed. z = (z obs,, z ms, ): (observed, mssng) part of z Under the mssng-at-random assumpton, we can use the followng estmators: ˆθ: soluton to n E {S (θ; z ) z obs, } = 0 =1 ( ) ˆη g = n 1 n E {g (z ) z obs, } =1 where S (θ; z ) = ln f (z ; θ) / θ s the score functon of θ under complete response. The equaton n (*) s called mean score equaton.

4 Introducton Under some regularty condtons, the soluton ˆθ to the mean score equaton maxmzes the observed lkelhood (= the lkelhood assocated wth the margnal dstrbuton of z obs, ). Computng the condtonal expectaton can be a challengng problem. 1 Do not know θ n E {g (z ) z obs, } = E {g (z ) z obs, ; θ}. 2 Even f we know θ, computng the condtonal expectaton s numercally dffcult.

5 Introducton Imputaton: Monte Carlo approxmaton of the condtonal expectaton E { S (θ; z ) z obs, ; ˆθ } = 1 M { E g (z ) z obs, ; ˆθ } = 1 M ( where z (j) z ms, z obs, ; ˆθ ) M S j=1 M g j=1 ( ) θ; z obs,, z (j) ms, ( ) z obs,, z (j) ms, ms, f. By provdng mputed data, the estmates are very easy to compute and dfferent users can get consstent results. Very attractve when η g s unknown at the tme of mputaton (e.g. publc-access data n survey samplng).

6 Introducton To compute ˆθ, EM algorthm can be used. ˆθ (t+1) : soluton to M 1 n M S =1 j=1 ( ) where zj(t) = z obs,, z (j) ms,(t) wth ( ) θ; zj(t) = 0 z (j) ms,(t) f (z ms, z obs, ; ˆθ (t)). (1) Computatonally heavy f (1) requres MCMC for each t. Convergence s hard to be acheved (unless M s ncreased) snce the mputed values are re-generated for each teraton.

7 Proposed method: Fractonal mputaton Fractonal mputaton 1 More than one (say M) mputed values of z ms, : z (1) ms,,, z (M) ms, from some (ntal) densty h (y,ms ). 2 Create weghted data set {( w j, zj ) } ; j = 1, 2,, M; = 1, 2, n where M j=1 w j = 1, z j = (z obs,, z (j) ms, ) w j f (z j ; ˆθ)/h(z (j),ms ), ˆθ s the maxmum lkelhood estmator of θ, and f (z; θ) s the jont densty of z. 3 The weght wj are the normalzed mportance weghts and can be called fractonal weghts.

8 Proposed method: Fractonal mputaton Product: fractonally mputed data set of sze nm {( w j, zj ) } ; j = 1, 2,, M; = 1, 2, n Property: for suffcently large M, M j=1 wj g ( f (z ;ˆθ) zj ) = h(z,ms ) g(z )h(z,ms )dz,ms f (z ;ˆθ) h(z,ms ) h(z,ms)dz,ms for any g such that the expectaton exsts. = E { } g (z ) z,obs ; ˆθ If we choose h(z,ms ) = f (z,ms z,obs, ˆθ) where ˆθ s the MLE, then t s equal to the usual Monte Carlo mputaton for maxmum lkelhood estmaton.

9 Proposed method: Fractonal mputaton EM algorthm by fractonal mputaton 1 Intal mputaton: generate z (j) ms, h (y,ms ). 2 E-step: compute where M j=1 w j(t) = 1. 3 M-step: update w j(t) f (z j ; ˆθ (t) )/h(z (j),ms ) ˆθ (t+1) : soluton to n M =1 j=1 w j(t) S ( θ; z j ) = 0. 4 Repeat Step2 and Step 3 untl convergence.

10 Proposed method: Fractonal mputaton If we set h( ) ndependent of θ, then the mputed values are not changed for each teraton. Only the fractonal weghts are changed. 1 Computatonally effcent (because we use mportance samplng only once). 2 Convergence s acheved (because the mputed values are not changed). For suffcently large t, ˆθ (t) ˆθ. Also, for suffcently large M, ˆθ ˆθ MLE. Thus, we need to use bg M for satsfactory approxmaton.

11 Approxmaton: Calbraton Fractonal mputaton In large scale survey samplng, we prefer to have smaller M. Two-step method for fractonal mputaton: 1 Create a set of fractonally mputed data wth sze nm 1, (say M 1 = 500). 2 Use an effcent samplng and weghtng method to get a fnal set of fractonally mputed data wth sze nm 2, (say M 2 = 10). Thus, we treat the step-one mputed data as a fnte populaton and the step-two mputed data as a sample. We can use effcent samplng technque (such as systematc samplng or stratfcaton) to get a fnal mputed data and use calbraton technque for fractonal weghtng.

Approxmaton: Calbraton Fractonal mputaton Step-One data set (of sze nm 1 ): {( w j, zj ) ; j = 1, 2,, M1 ; = 1, 2, n } and the fractonal weghts satsfy M 1 j=1 w j = 1 and n M 1 ) wj S (ˆθ; zj = 0 =1 j=1 where ˆθ s obtaned from the EM algorthm after convergence. The fnal fractonally mputed data set can be wrtten {( w j, z j ) ; j = 1, 2,, M2 ; = 1, 2, n } and the fractonal weghts satsfy M 2 j=1 w j = 1 and n M 2 =1 j=1 ) w j S (ˆθ; z j = 0 12

13 Approxmaton: Calbraton Fractonal mputaton Thus, fractonal weghts can be constructed by the calbraton technques n survey samplng. If the dstrbuton belongs to the exponental famly, then the calbraton constrants can be smplfed to n M 2 =1 j=1 w j T ( z j ) = n M 1 wj T (zj ) =1 j=1 where T (z) s the complete suffcent statstc for θ.

14 Varance estmaton for fractonal mputaton Wrte where ˆη g,fi = ˆη g,fi (ˆθ) n 1 S(ˆθ) n M =1 j=1 n M =1 j=1 wj (ˆθ)g ( zj ) w j (ˆθ)S(ˆθ; z j ) = 0. Taylor lnearzaton { } η g,fi ) = ˆηg,FI (θ 0 ) θ ˆη g,fi (θ 0 ) (ˆθ θ0 { } 0 = S(ˆθ) = S(θ 0 ) + θ S(θ ) 0 ) (ˆθ θ 0

15 Varance estmaton for fractonal mputaton Combne the two η g,fi = ˆηg,FI (θ 0 ) = n 1 n M =1 j=1 n = n 1 ē (θ 0 ) =1 { } { } 1 θ ˆη g,fi (θ 0 ) θ S(θ 0 ) S(θ0 ) wj (θ 0 ) { g ( zj ) K S(θ 0 ; zj ) } where K = { θ ˆη g,fi (θ 0 ) } { S(θ θ 0 ) } 1 and ē (θ 0 ) are IID random varables. Plug-n estmator for the lnearzed varance can be used.

16 Multple mputaton Generate M mputed values (wth equal weghts) Features 1 Imputed values are generated from z (j),ms f (z,ms z,obs, θ ) where θ s generated from the posteror dstrbuton π (θ z,obs ). 2 Varance estmaton formula s smple. ˆV MI ( η g,m ) = 1 M ( ˆV I (m) + 1 + 1 ) 1 M ) 2 (ˆηg(m) η g,m M M M 1 m=1 m=1 where η g,m = M 1 M m=1 ˆη g(m) s the average of M mputed estmators and ˆV I (m) s the mputed verson of the varance estmator of ˆη g under complete response.

17 Multple mputaton The computaton for Bayesan mputaton can be mplemented by the data augmentaton (Tanner and Wong, 1987) technque, whch s a specal applcaton of the Gbb s samplng method: 1 I-step: Generate z ms f (z ms z obs, θ ) 2 P-step: Generate θ g (θ z obs, z ms ) Consstency of varance estmator s questonable (Km et al, 2006, JRSSB). 1 If η g = η g (θ), then the varance estmator wth large M s consstent. 2 If η g η g (θ), then the varance estmator s not consstent.

18 Smulaton study: smulaton setup B = 2, 000 smulaton samples of sze n = 200 are generated wth x N (2, 1) y x N (β 0 + β 1 x, σ ee ) z (x, y ) Bernoull(p ) where (β 0, β 1 ) = (1, 0.7), σ ee = 1, p = exp (ψ 0 + ψ 1 x + ψ 2 y ) 1 + exp (ψ 0 + ψ 1 x + ψ 2 y ) wth (ψ 0, ψ 1, ψ 2 ) = ( 3, 0.5, 0.7).

19 Smulaton study: smulaton setup x s always observed. y s observed f δ 1 = 1 and y s mssng f δ 1 = 0, where and wth (ϕ 0, ϕ 1 ) = (0, 0.5). δ 1 Bernoull(π 1 ) π 1 = exp (ϕ 0 + ϕ 1 x ) 1 + exp (ϕ 0 + ϕ 1 x ) z s observed f δ 2 = 1 and z s mssng f δ 2 = 0, where δ 2 Bernoull(0.7). The response mechansm s mssng at random.

20 Smulaton study: fractonal mputaton 1 Intal mputaton: a Ft a parametrc model f 1 (y x, θ 1 ) for the condtonal dstrbuton of y gven x among respondents of y. In ths smulaton setup, we use the followng model y (x, δ 1 = 1) N (β 0 + β 1 x, σ ee ) for some θ 1 = (β 0, β 1, σ ee ). b Estmate parameter θ 1 usng the samples wth δ 1 = 1 only. c For each unt wth δ 1 = 0, generate M mputed values of y, say y (1),, y (M), from the estmated densty f 1 (y x, ˆθ 1 ).

21 Smulaton study: fractonal mputaton 1 Intal mputaton (cont d): d Smlarly, ft a parametrc model f 2 (z x, y, θ 2 ) for the condtonal dstrbuton of z gven x and y among z-respondents. e Estmate parameter θ 2 usng the samples wth δ 2 = 1 only. f For each unt wth δ 2 = 0, generate M mputed values of z by f 2 (z x, y, ˆθ 2 ). z (j)

22 Smulaton study: fractonal mputaton 2 Fractonal weghtng (E-step): For the current parameter estmate ˆθ (t) = (ˆθ 1(t), ˆθ 2(t) ) where ˆθ 1(t) = ( ˆβ 0(t), ˆβ 1(t), ˆσ ee(t) ) and ˆθ 2(t) = ( ˆψ 0(t), ˆψ 1(t), ˆψ 2(t) ), compute the fractonal weghts assocated wth the mputed values. The ( fractonal weghts assocate wth w y (j) ), z (j) are ( ) ( ) f 1 y (j) j(t) x, ˆθ 1(t) f 2 z (j) x, y (j), ˆθ 2(t) ( ) ( ), (2) f 1 y (j) x, ˆθ 1(0) f 2 z (j) x, y (j), ˆθ 2(0) where f 1 (y x, θ 1 ) f 2 (z x, y, θ 2 ) s the jont densty of (y, z) gven x. In (2), t s understood that y (j) = y f δ 1 = 1 and z (j) = z f δ 2 = 1.

23 Smulaton study: fractonal mputaton 3 Update parameter estmates (M-step): Usng the current fractonal weghts, compute the maxmum lkelhood estmator to update ˆθ (t+1) by solvng the followng mputed score equatons: S (t) (θ) n =1 j=1 M ( ) wj(t) S θ; x, y (j), z (j) = 0 The soluton to the above equaton can be obtaned usng the exstng software.

24 Smulaton study: fractonal mputaton Usng the fnal fractonal weghts after convergence, we can estmate η g = E {g(x, y, z)} by ˆη g = 1 n n M =1 j=1 w j g(x, y (j), z (j) ) where y (j) = y f δ 1 = 1 and z (j) = z f δ 2 = 1. For fractonal mputaton wth M = 10, calbraton method s used wth an ntal mputaton of sze M 1 = 100. The followng parameters were consdered n the smulaton study: 1 η 1 = E(y) 2 η 2 = Pr (y < 3).

25 Smulaton study: result Table 1 Monte Carlo bas and varance of the pont estmators. Parameter Estmator Bas Varance Std Var Complete sample 0.00.00739 100 η 1 FI (M = 100) 0.00.00925 125 MI (M = 100) 0.00.01058 143 FI (M = 10) 0.00.00952 129 MI (M = 10) 0.00.01081 146 Complete sample 0.00.00108 100 η 2 FI (M = 100) 0.00.00117 108 MI (M = 100) 0.00.00114 106 FI (M = 10) 0.00.00117 108 MI (M = 10) 0.00.00116 107

26 Smulaton study: result Table 2 Monte Carlo relatve bas of the varance estmator. Parameter Imputaton Relatve bas (%) FI (M = 100) 3.7 V (ˆη 1 ) MI (M = 100) 4.3 FI (M = 10) 2.7 MI (M = 10) 4.2 FI (M = 100) 3.9 V (ˆη 2 ) MI (M = 100) 15.6 FI (M = 10) 2.3 MI (M = 10) 15.5

27 Smulaton study: result For the estmaton of E(Y ), fractonal mputaton s more effcent. For the estmaton of proportons, multple mputaton s slghtly more effcent. However, multple mputaton provdes based varance estmates for proportons. Note that we are not usng the MLE for η 2. Under complete response, ˆη 2 = 1 n n I (y < 3) =1 s less effcent than the maxmum lkelhood estmator. The MI varance estmator s justfed only for the MLE s.

28 Concluson Fractonal mputaton s proposed as a tool for computng the condtonal expectaton of any functon of orgnal data gven the observed data. Fractonal mputaton can be used to mplement the Monte Carlo EM algorthm effcently n a computatonally effcent manner. Varance estmaton based on Taylor lnearzaton s also covered. Further detals can be found n Km (2010, Bometrka, tentatvely accepted).

29 Future Research Survey samplng applcaton (wth multvarate mssng data) Nonparametrc or sem-parametrc fractonal mputaton. Measurement error models Random effect models (wth possble applcatons n small area estmaton) More theoretcal work on nference (Wlk s theorem?)