Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics and Big Data Renmin University of China August 28, 2018 Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 1 / 39

Introduction Copula Model -- Flexible modelling of multivariate distributions -- Popular in business, economics, finance, etc. Missing Data We unitethem for the first time in the literature. Treatment Effect -- Common problem in all fields -- Deep literature in statistics -- Binary phenomenon (to observe or not to observe) -- Estimate unobserved outcome -- Central topic in econometrics Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 2 / 39

Introduction How to fit copula models when there are missing data? A naïve approach is listwise deletion (LD): 1 Keep individuals with all d components being observed, and discard all other individuals. 2 Treat the individuals with complete data in an equal way. i= 1 i= 2 i= 3 i= 4 i= N Y 11 Y 12 Y 13 Y 14 Y 1N Y 21 Y 22 Y 23 Y 24 Y 2N Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 3 / 39

Introduction LD leads to a consistent estimator for the copula parameter of interest if the missing mechanism is missing completely at random (MCAR). LD leads to an inconsistent estimator if the missing mechanism is missing at random (MAR). Under MAR, target variables Y i = [Y 1i,..., Y di ] and their missing status are independent of each other given observed covariates X i = [X 1i,..., X ri ]. LD treats individuals with complete data all equally, and it does not use the information of X i. That can cause substantial bias under MAR. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 4 / 39

Introduction How to obtain a consistent estimator for the copula parameter when the missing mechanism is MAR? A key step is the estimation of propensity score function (i.e. conditional probability of observing data given covariates). Direct estimation of propensity score is notoriously challenging, whether it is performed parametrically or nonparametrically. Parametric approaches are haunted by misspecification problems, while nonparametric approaches are often unstable. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 5 / 39

Introduction We apply Chan, Yam, and Zhang s (2016) calibration estimation to a missing data problem for the first time in the literature. The calibration estimator for the copula parameter satisfies consistency and asymptotic normality under some assumptions including i.i.d. data and the MAR condition. We also derive a consistent estimator for the asymptotic covariance matrix. Our simulation results indicate that the calibration estimator dominates listwise deletion, parametric approach, and nonparametric approach. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 6 / 39

Table of Contents 1) Introduction 2) Review of Copula Models 3) Review of Missing Data 4) Set-up of Main Problem 5) Theory of Calibration Estimation 6) Data-Driven Selection of Tuning Parameter K 7) Monte Carlo Simulations 8) Conclusions Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 7 / 39

Review of Copula Models Suppose that there are N individuals and d components: Y i = [Y 1i, Y 2i,..., Y di ] (i = 1,..., N). Suppose that we want to estimate the d-dimensional joint distribution of Y i, assuming i.i.d. What can we do? There are potential problems about estimating the joint distribution directly. d = small d = large Parametric Misspecification Misspecification Parameter proliferation Nonparametric (Curse of dimensionality) Curse of dimensionality Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 8 / 39

Review of Copula Models Copula models accomplish flexible specification with a small number of parameters. Copula models follow a two-step procedure. Step 1: Model the marginal distribution of each of the d components separately. Step 2: Combine the d marginal distributions to recover a joint distribution. Step 1 Parametric Nonparametric Nonparametric Step 2 Parametric Parametric Nonparametric Name of model Parametric copula model Semiparametric copula model (Target of our paper) Nonparametric copula model Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 9 / 39

Review of Copula Models The copula approach is justified by Sklar s (1959) theorem. Sklar s theorem ensures the existence of a unique copula function C : (0, 1) d (0, 1) that recovers a true joint distribution. Theorem (Sklar, 1959) Let {Y i } be i.i.d. random vectors with a joint distribution F : R d (0, 1). Assume that the marginal distribution of Y ji, written as F j : R (0, 1), is continuous for j {1,..., d}. Then, there exists a unique function C : (0, 1) d (0, 1) such that F (y 1,..., y d ) = C (F 1 (y 1 ),..., F d (y d )), or in terms of probability density functions, f(y 1,..., y d ) = c (f 1 (y 1 ),..., f d (y d )). Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 10 / 39

Review of Missing Data Suppose that {Y 1i,..., Y di } N i=1 are target variables. Define the missing indicator: { 1 if Y ji is observed, T ji = 0 if Y ji is missing. Suppose that {X 1i,..., X mi } N i=1 are observable covariates. Missing Completely at Random (MCAR): Missing at Random (MAR): {T 1i,..., T di } {Y 1i,..., Y di }. {T 1i,..., T di } {Y 1i,..., Y di } {X 1i,..., X ri }. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 11 / 39

Review of Missing Data An illustrative example on health survey (d = r = 1): Y 1i = weight of individual i; X 1i = I(Individual i is female). MCAR requires that P[Individual i reports his/her weight] should be independent of both weight and gender of individual i. MAR requires that: P[A man reports his weight] should be independent of his weight. P[A woman reports her weight] should be independent of her weight. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 12 / 39

Review of Missing Data Probably too restrictive, since men and women may have different willingness to report their weights. MCAR MNAR MAR More plausible than MCAR, since MAR controls for gender. MAR may be still restrictive, since men (or women) with different weights may have different willingness to report their weights. MNAR is most general since it controls for both gender and weight, but MNAR is hard to handle technically. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 13 / 39

Review of Missing Data It is well known that listwise deletion (LD) leads to consistent inference under MCAR. It is also well known that LD leads to inconsistent inference under MAR. Correct inference under MAR has been extensively studied since the seminal work of Rubin (1976). The present paper assumes MAR and elaborates the estimation of semiparametric copula models, which has not been done in the literature. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 14 / 39

Set-up of Main Problem Semiparametric copula models are estimated in two steps: Step 1: Estimate the marginal distributions {F 1,..., F d } nonparametrically via F j (y) = P(Y ji y) = E[I(Y ji y)]. Step 2: Estimate the true copula parameter θ 0 via θ 0 = arg max θ Θ E [log c(f 1(Y 1i ),..., F d (Y di ); θ)]. If data were all observed, then we could simply replace the population quantities with sample counterparts. When data are Missing at Random, we need to assign some weights based on propensity score functions. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 15 / 39

Set-up of Main Problem Define propensity score functions: π j (x) = P(T ji = 1 X i = x), Define p j (x) = 1/π j (x). Step 1 is rewritten as j {1,..., d}. F j (y) = E[I(Y ji y)] = E[E[I(Y ji y) X i ]] ( LIE) [ [ ] ] Tji = E E π j (X i ) X i E [I(Y ji y) X i ] [ [ ]] Tji = E E π j (X i ) I(Y ji y) X i ( MAR) [ ] Tji = E π j (X i ) I(Y ji y) ( LIE) = E[T ji p j (X i ) I(Y ji y)]. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 16 / 39

Set-up of Main Problem We have derived: F j (y) = E[I(T ji = 1) p j (X i ) I(Y ji y)]. Horvitz and Thompson s (1952) inverse probability weighting (IPW) estimator for F j is written as F j (y) = 1 N N I(T ji = 1)p j (X i )I(Y ji y). i=1 If p j (x) were known, then it would be straightforward to compute the IPW estimator. p j (x) is unknown in reality. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 17 / 39

Set-up of Main Problem Define a propensity score function: Define q(x) = 1/η(x). η(x) = P(T 1i = 1,..., T di = 1 X i = x). Using MAR and LIE, Step 2 is rewritten as θ 0 = arg max θ Θ E [I(T 1i = 1,..., T di = 1)q(X i ) log c(f 1 (Y 1i ),..., F d (Y di ); θ)]. The IPW estimator for θ 0 is given by 1 θ = arg max θ Θ N N I(T 1i = 1,..., T di = 1)q(X i ) log c( F 1 (Y 1i ),..., F d (Y di ); θ). i=1 q(x) is unknown in reality. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 18 / 39

Set-up of Main Problem Direct estimation of propensity score functions is challenging, whether it is performed parametrically or nonparametrically. In the treatment effect literature, Chan, Yam, and Zhang (2016) propose an alternative approach that bypasses a direct estimation of propensity score. They construct calibration weights by balancing the moments of observed covariates among treatment, control, and whole groups. The present paper applies their method to a missing data problem for the first time. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 19 / 39

Theory of Calibration Estimation Under MAR, the moment matching condition holds: E [ I(T ji = 1)p j,kj (X i )u Kj (X i ) ] = E[u Kj (X i )], j {1,... d} for any integrable function u Kj : R r R K j called an approximation sieve. A common choice is, say, u Kj (X i ) = [1, X i, X 2 i, X 3 i ] (r = 1, K j = 4). A sample counterpart is written as 1 N N I(T ji = 1) p j,kj (X i ) u Kj (X i ) = 1 N i=1 N u Kj (X i ). Multiple values of {p j,kj (X 1 ),..., p j,kj (X N )} satisfy the moment matching condition. Among them, we choose the one closest to a uniform weight given some distance measure. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 20 / 39 i=1

Theory of Calibration Estimation Why do we want the uniform weight? 1 If there are no missing data, then the uniform weight leads to a natural estimator ˆF j (y) = (N + 1) 1 N i=1 I(Y ji y). 2 It is well known that volatile weights cause instability in the Horvitz-Thompson IPW estimator. Let ρ : R R be any strictly concave function. Define a concave objective function: G j,kj (λ) = 1 N Compute N [ I(Tji = 1)ρ(λ u Kj (X i )) λ u Kj (X i ) ], λ R Kj. i=1 ˆλ j,kj = arg max λ G j,k j (λ). Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 21 / 39

Theory of Calibration Estimation Compute calibration weights for marginal distributions: ˆp j,kj (X i ) = ρ (ˆλ j,k j u Kj (X i )). Estimate the marginal distribution of the j-th component by ˆF j (y) = 1 N N I(T ji = 1)ˆp j,kj (X i )I(Y ji y). i=1 Step 2 (maximum likelihood) can be handled in the same way. We put calibration weights ˆq K (X i ) on individual log-likelihoods. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 22 / 39

Theory of Calibration Estimation Theorem (Consistency and Asymptotic Normality) Impose a set of assumptions including what follows: The missing mechanism is missing at random (MAR). {Y i, T i, X i } are i.i.d. across individuals i {1,..., N}. π 1 ( ),..., π d ( ), and η( ) are sufficiently smooth. K j (N) as N, and the rate of divergence is sufficiently slow. (The same assumption applies for K(N).) Then, consistency and asymptotic normality follow. p 1 ˆθ θ0 as N. 2 N( ˆθ θ0 ) d N(0, V ) as N. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 23 / 39

Theory of Calibration Estimation The asymptotic covariance matrix V is expressed as V = B 1 ΣB 1. We can construct consistent estimators for B and Σ, and hence ˆV = ˆB 1 ˆΣ ˆB 1 p V. See the main paper for a complete set of assumptions, proofs of the consistency and asymptotic normality, and the construction of V and ˆV. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 24 / 39

Data-Driven Selection of K A practical question is how to select {K 1,..., K d, K}, the dimensions of approximation sieves. Focus on K j for concreteness. We select an optimal Kj from a viewpoint of covariate balancing. A natural estimator of the joint distribution of X, based on the whole group of individuals, is ˆF X (x) = 1 N N I(X 1i x 1,..., X ri x r ). i=1 An alternative estimator based on the observed group of individuals is N ˆF X,Kj (x) = T jiˆp j,kj (X i ) I(X 1i x,..., X ri x r ). i=1 Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 25 / 39

Data-Driven Selection of K On one hand, we should select K j that makes ˆF X and ˆF X,Kj as close as possible. On the other hand, we should impose a penalty against raising K j in order to keep parsimony. Hence we propose to select K j {1, 2,..., K} that minimizes a penalized L 2 -distance: d Kj ( ˆF X, ˆF X,Kj ) = ] 2 [ 1 N N i=1 ˆFX (X i ) ˆF (X X,Kj i) ( 1 K 2 j /N ) 2. The proposed approach is analogous to the generalized cross-validation of Li and Racine (2007, Sec. 15.2). Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 26 / 39

Monte Carlo Simulations: DGP Target variables are Y i = [Y 1i, Y 2i ] (d = 2). Consider a scalar covariate X i. Define U i = [U 1i, U 2i, U 3i ] = [F 1 (Y 1i ), F 2 (Y 2i ), F X (X i )]. F 1 ( ) is the marginal distribution of Y 1i, and we use N(0, 1). F 2 ( ) is the marginal distribution of Y 2i, and we use N(0, 1). F X ( ) is the marginal distribution of X i, and we use N(0, 1). The inverse distribution functions F 1 1 ( ), F 1 2 ( ), and F 1 X ( ) are known and tractable. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 27 / 39

Monte Carlo Simulations: DGP Step 1: Draw U i i.i.d. Gumbel 3 (γ 0 ) with γ 0 = 4 (Kendall s τ = 0.75). Step 2: Recover Y 1i = F1 1 (U 1i ), Y 2i = F2 1 (U 2i ), and X i = F 1 X (U 3i). Step 3: Assume that {Y 11,..., Y 1N } are all observed. Make some of {Y 21,..., Y 2N } missing according to P(T 2i = 1 X i = x i ) = 1 1 + exp[a + bx i ]. We use (a, b) = ( 0.420, 0.400), which implies MAR with E[T 2i ] = 0.6. Step 4: Repeat Steps 1-3 J = 1000 times with sample size N {250, 500}. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 28 / 39

Monte Carlo Simulations: Estimation Approach #1: Listwise deletion. Step 1: Estimate the marginal distribution of the j-th component by ˆF j (y) = 1 N + 1 N I(T 1i = 1, T 2i = 1)I(Y ji < y), i=1 where N = N i=1 I(T 1i = 1, T 2i = 1) is the number of individuals with complete data. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 29 / 39

Monte Carlo Simulations: Estimation Step 2: Compute the maximum likelihood estimator ˆγ by max γ (1, ) where 1 N N i=1 ( I(T 1i = 1, T 2i = 1) log c 2 ˆF1 (Y 1i ), ˆF ) 2 (Y 2i ); γ, { 2 } 1 γ c 2 (u 1, u 2 ; γ) = exp ( log u k ) γ k=1 is the probability density function of Gumbel 2 (γ). This likelihood function is correctly specified since any bivariate marginal distribution of Gumbel 3 (γ) is Gumbel 2 (γ). Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 30 / 39

Monte Carlo Simulations: Estimation Approach #2: Parametric estimation. Consider a correctly specified model for the propensity score: We estimate (a, b) via π 2 (x; a, b) = 1 1 + exp(a + bx). max N [T 2i log π 2 (X i ; a, b) + (1 T 2i ) log (1 π 2 (X i ; a, b))]. i=1 Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 31 / 39

Monte Carlo Simulations: Estimation Compute ˆp 2 (X i ) = ˆq(X i ) = Estimate marginal distributions by 1 π 2 (X i ; â, ˆb). ˆF j (y) = 1 N N I(T ji = 1)ˆp j (X i )I(Y ji < y). i=1 Estimate the copula parameter γ via max γ (1, ) 1 N N i=1 ( I(T 1i = 1, T 2i = 1)ˆq(X i ) log c 2 ˆF1 (Y 1i ), ˆF ) 2 (Y 2i ); γ. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 32 / 39

Monte Carlo Simulations: Estimation For comparison, consider a misspecified model: π 2 (x; b) = 1 1 + exp(bx). This model is misspecified since the true value is a = 0.420 0. The remaining procedure is the same. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 33 / 39

Monte Carlo Simulations: Estimation Approach #3: Nonparametric estimation of Hirano, Imbens, and Ridder (2003). Define the approximation sieve: u K (X i ) = [1, X i,..., X K 1 i ]. Define 1 π 2K (X i ; λ) = 1 + exp[ λ u K (X i )]. Estimate λ via max N [T 2i log π 2K (X i ; λ) + (1 T 2i ) log (1 π 2K (X i ; λ))]. i=1 Compute ˆp 2K (X i ) = ˆq K (X i ) = 1/π 2K (X i ; ˆλ). We use K {3, 4} and the data-driven selection of K with upper bound K = 5. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 34 / 39

Monte Carlo Simulations: Estimation Approach #4: Calibration estimation. Calibration weights are computed with Exponential Tilting: ρ(v) = exp( v). We use K {3, 4} and the data-driven selection of K with upper bound K = 5. The procedure is as explained. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 35 / 39

Monte Carlo Simulations: Results N = 250 N = 500 Truth: γ 0 = 4.000 Bias, Stdev, RMSE Bias, Stdev, RMSE Listwise Deletion -0.243, 0.348, 0.424-0.256, 0.244, 0.353 Param (Correct) -0.313, 0.323, 0.450-0.153, 0.223, 0.271 Param (Misspec) -2.610, 0.204, 2.618-2.620, 0.124, 2.623 Nonparam (K = 3) -0.318, 0.315, 0.448-0.165, 0.233, 0.285 Nonparam (K = 4) -0.645, 0.923, 1.126-0.682, 1.081, 1.278 Nonparam (CB) -0.323, 0.308, 0.446-0.140, 0.224, 0.264 Calibration (K = 3) -0.140, 0.324, 0.353-0.093, 0.234, 0.252 Calibration (K = 4) -0.129, 0.342, 0.366-0.094, 0.243, 0.260 Calibration (CB) -0.142, 0.333, 0.362-0.100, 0.240, 0.260 Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 36 / 39

Monte Carlo Simulations: Results Listwise deletion produces bias under MAR as expected. The parametric approach produces enormous bias if a propensity score model is misspecified. The nonparametric approach exhibits considerable instability across K. The proposed calibration approach achieves strikingly sharp and stable performance. The data-driven selection of K performs well for both the nonparametric and calibration estimators. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 37 / 39

Conclusions We investigate the estimation of semiparametric copula models under missing data for the first time in the literature. There is analogy between missing data and average treatment effects since both of them are binary in nature. We apply Chan, Yam, and Zhang s (2016) calibration estimation to missing data for the first time in the literature. The calibration estimator satisfies consistency and asymptotic normality under the i.i.d. assumption and the Missing at Random (MAR) condition. Our simulation results indicate that the calibration estimator dominates listwise deletion, parametric approach, and nonparametric approach. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 38 / 39

References Chan, K. C. G., S. C. P. Yam, and Z. Zhang (2016). Globally Efficient Non-Parametric Inference of Average Treatment Effects by Empirical Balancing Calibration Weighting. Journal of the Royal Statistical Society, Series B, 78, 673-700. Hirano, K., G. W. Imbens, and G. Ridder (2003). Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. Econometrica, 71, 1161-1189. Horvitz, D. G. and D. J. Thompson (1952). A Generalization of Sampling without Replacement from a Finite Universe. Journal of the American Statistical Association, 47, 663-685. Li, Q. and J. S. Racine (2007). Nonparametric Econometrics: Theory and Practice. Princeton University Press. Rubin, D. B. (1976). Inference and Missing Data. Biometrika, 63, 581-592. Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l Institut de Statistique de L Université de Paris, 8, 229-231. Hamori, Motegi & Zhang (Kobe & RUC) Copula Models with Data MAR August 28, 2018 39 / 39