Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Similar documents
Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Statistical testing of covariate effects in conditional copula models

Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification. Jialin Zou

Conditional Copula Models for Right-Censored Clustered Event Time Data

Bayesian Inference for Conditional Copula models with Continuous and Binary Responses

Bayesian Inference for Conditional Copula models

arxiv: v3 [stat.me] 25 May 2017

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

On the Choice of Parametric Families of Copulas

STAT Section 2.1: Basic Inference. Basic Definitions

Efficient estimation of a semiparametric dynamic copula model

Chapter 2: Resampling Maarten Jansen

Gaussian Process Vine Copulas for Multivariate Dependence

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Estimation of multivariate critical layers: Applications to rainfall data

Nonparametric Econometrics

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

Economics 583: Econometric Theory I A Primer on Asymptotics

STAT5044: Regression and Anova

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

Confidence intervals for kernel density estimation

Chapter 4: Asymptotic Properties of the MLE (Part 2)

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Nonparametric Covariate Adjustment for Receiver Operating Characteristic Curves

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

A Goodness-of-fit Test for Copulas

Simulating Exchangeable Multivariate Archimedean Copulas and its Applications. Authors: Florence Wu Emiliano A. Valdez Michael Sherris

Statistics: Learning models from data

UNIVERSITÄT POTSDAM Institut für Mathematik

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Using Estimating Equations for Spatially Correlated A

Balancing Covariates via Propensity Score Weighting

Nonparametric Regression

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Econometrics I, Estimation

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Estimation of Treatment Effects under Essential Heterogeneity

Empirical Likelihood Ratio Confidence Interval Estimation of Best Linear Combinations of Biomarkers

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

A nonparametric method of multi-step ahead forecasting in diffusion processes

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Function of Longitudinal Data

University of California, Berkeley

Outline of GLMs. Definitions

Interval Estimation for AR(1) and GARCH(1,1) Models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Tests of independence for censored bivariate failure time data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

The Nonparametric Bootstrap

Better Bootstrap Confidence Intervals

Statistics 3858 : Maximum Likelihood Estimators

Local Polynomial Modelling and Its Applications

Introduction to Estimation Methods for Time Series models. Lecture 1

Package CopulaRegression

A better way to bootstrap pairs

Statistica Sinica Preprint No: SS

Lecture 3 September 1

Estimation of cumulative distribution function with spline functions

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Nonparametric Modal Regression

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Balancing Covariates via Propensity Score Weighting: The Overlap Weights

Tobit and Interval Censored Regression Model

NONPARAMETRIC MIXTURE OF REGRESSION MODELS

Calibration Estimation for Semiparametric Copula Models under Missing Data

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

0.1 The plug-in principle for finding estimators

1 One-way analysis of variance

A Bootstrap Test for Conditional Symmetry

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Econometric Analysis of Cross Section and Panel Data

POLI 8501 Introduction to Maximum Likelihood Estimation

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Introductory Econometrics

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Nonparametric Estimation of Distributions in a Large-p, Small-n Setting

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

First steps of multivariate data analysis

interval forecasting

On a Nonparametric Notion of Residual and its Applications

ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION

Problem Selected Scores

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

ECO Class 6 Nonparametric Econometrics

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

3 Joint Distributions 71

Estimation of Quantiles

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Financial Econometrics and Volatility Models Copulas

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Transcription:

1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and hessian functions The score vector L(β,x) and hessian matrix 2 L(β,x) used in the Newton-Raphson algorithm are given by, denoting x i,x = (1,X i x,...,(x i x) p ) T, L(β,x) = L(β,x,p,h) β n { } = l(g 1 (x T i,x β β),u 1i,U 2i ) K h (X i x) i n = l (g 1 (x T i,x β),u 1i,U 2i ) [(g 1 ) (x T i,x β)] x i,x K h (X i x), i 2 L(β,x) = 2 L(β,x,p,h) β 2 n = {l (g 1 (x Ti,xβ),U 1i,U 2i )[(g 1 ) (x Ti,xβ)] } x i,x K h (X i x) β i n { = l (g 1 (x T i,x β),u 1i,U 2i ) [(g 1 ) (x T i,x β)]2 i=1 (1) + l (g 1 (x T i,x β),u 1i,U 2i ) (g 1 ) (x T i,x β) } x i,x x T i,x K h(x i x). (2)

2 Biometrics, 000 0000 Derivations of Asymptotic Bias and Variance Let H = diag{1,h,...,h p }, W = diag{k h (X 1 x),...,k h (X n x)}, 1 X 1 x (X 1 x) p 1 X 2 x (X 2 x) p X x =.... 1 X n x (X n x) p Define(p+1) (p+1)matricess n = n i=1 x i,xx T i,x K h(x i x)ands n = n i=1 x i,xx T i,x K2 h (X i x), with entries S n,j = n i=1 (X i x) j K h (X i x) and S n,j = n i=1 (X i x) j K 2 h (X i x). The proofs of Theorem 1 and Corollary 1 rely on the following Taylor expansion 0 = L(ˆβ,x) L(β,x)+ 2 L(β,x){ˆβ β}, where β = (β 0,β 1,...,β p ) T is the vector of true local parameters and ˆβ = (ˆβ 0, ˆβ 1,..., ˆβ p ) T is the local likelihood estimator, yielding ˆβ β ( 2 L(β,x) ) 1 L(β,x). Hence, the leading terms in the asymptotic conditional bias and variance can be written as E(ˆβ X) β E ( 2 L(β,x) X ) 1 E( L(β,x) X), (3) Var(ˆβ X) E ( 2 L(β,x) X ) 1 Var( L(β,x) X) E ( 2 L(β,x) X ) 1. (4) In the following we approximate three terms appearing in (3) and (4). The expectation of the kernel weighted local score function is n E( L(β,x) X) = E{ l (g 1 (x T i,x β),u 1i,U 2i ) X} [(g 1 ) (x T i,x β)] x i,x K h (X i x). i=1 Since, for each i = 1,2,...,n, we have (U 1i,U 2i ) X i C(u 1,u 2 g 1 (η(x i ))), from the first order Bartlett s identity we have 0 = E{ l (g 1 (η(x i )),U 1i,U 2i ) X} = E{ l (g 1 (x T i,x β +r i),u 1i,U 2i ) X}, where r i is the approximation error made by replacing the calibration function η(x i ) locally

3 by a p th degree polynomial, r i = β j (X i x) j = β p+1 (X i x) p+1 +o p {(X i x) p+1 }. j=p+1 If g 1 is continuous and twice differentiable, then l (g 1 (η(x i )),U 1i,U 2i ) l (g 1 (x T i,x β),u 1i,U 2i ) + [ r i (g 1 ) (x T i,x β) ] l (g 1 (x T i,x β),u 1i,U 2i ), and, after taking conditional expectations, E{l (g 1 (x T i,x β),u 1i,U 2i ) X} [ r i (g 1 ) (x T i,x β) ] E{ l (g 1 (x T i,x β),u 1i,U 2i ) X}. Hence, we obtain E( L(β,x) X) E{ l (g 1 (η(x)),u 1,U 2 ) x} [(g 1 ) (η(x))] 2 X T x Wr, where r = (r 1,r 2,...,r n ) T. Taking conditional expectation of the hessian matrix in (2) yields E ( 2 L(β,x) X ) E ( l (g 1 (η(x)),u 1,U 2 ) x ) [(g 1 ) (η(x))] 2 S n. The variance of the weighted local score function is Var( L(β,x) X) [(g 1 ) (η(x))] 2 Var{ l (g 1 (η(x)),u 1,U 2 ) x} S n. The second order Bartlett s identity implies σ 2 (x) = E ( l (g 1 (η(x)),u 1,U 2 ) x ) = Var{ l (g 1 (η(x)),u 1,U 2 ) x}. Hence, we obtain E(ˆβ X) β S 1 n XT x Wr, Var(ˆβ X) 1 [(g 1 ) (η(x))] 2 σ 2 (x) S 1 n SnS n 1. The approximations S n = nf(x)h S H {1 + o p (1)} and S n = nh 1 f(x)hs H{1 + o p (1)} are outlined in Fan and Gijbels (1996). The entries of X T x Wr can be written as

4 Biometrics, 000 0000 k j = = n r i (X i x) j K h (X i x) i=1 n { i=1 m=p+1 } β m (X i x) m (X i x) j K h (X i x) = β p+1 S n,j+p+1 + o p {nh j+p+1 }. Letting s n = (S n,p+1,s n,p+2,...,s n,2p+1 ) T, we obtain X T x Wr = β p+1 s n + o p {nh p+1 }. Hence, the bias and variance expressions become E(ˆβ X) β H 1 S 1 s p β p+1 h p+1 {1+o p (1)}, Var(ˆβ X) 1 nh f(x) [(g 1 ) (η(x))] 2 σ 2 (x) H 1 S 1 S S 1 H 1 {1+o p (1)}. The result of Theorem 1 is obtained by considering the first entry in the above expressions. Since ˆη(x) η(x) = g(ˆθ(x)) g(θ(x)) g (θ(x)) {ˆθ(x) θ(x)}, we reach the result of Corollary 1 by dividing the asymptotic bias and variance of the local calibration estimator by g (θ(x)) and [g (θ(x))] 2, respectively. Web Appendix B: Simulations Study under the Frank Copula This additional simulation study presents the results for the data {(U 1i,U 2i X i ) : i = 1,2,...,n} generated from the Frank copula under each of the following models: (1) Linear calibration function : η(x) = 25 4.2X (U 1,U 2 ) X C(u 1,u 2 θ = 25 4.2X ) where X Uniform(2,5), (2) Nonlinear calibration function: η(x) = 12+8sin(0.4X 2 ) (U 1,U 2 ) X C(u 1,u 2 θ = 12+8sin(0.4X 2 )) where X Uniform(2,5). The true copula parameter varies from 4 to 16.6 in the linear calibration model, and from 4 to 20 in the nonlinear one. Under each calibration model, we conduct m = 100 experiments with sample size n = 200. The local linear (p = 1) and parametric linear

5 estimation are performed to estimate the copula parameter under the Clayton, Frank and Gumbel copulas, and the results are converted to the Kendall s tau scale. Web Table 1 summarizes the performance of the Kendall s tau estimates in terms of integrated mean square error, integrated square bias and integrated variance. [Web Table 1 about here.] We construct approximate 90% pointwise condence bands for the copula parameter under the correctly selected Frank family, using half of the optimum bandwidth. Web Figure 1 displays the results in the Kendalls tau scale, averaged over 100 Monte Carlo samples. We also present the Monte Carlo based condence bands obtained from 100 estimates of the Kendalls tau. [Web Figure 1 about here.] Our copula selection method successfully identifies the Frank family 95% and 97% of the times under the linear and nonlinear calibration model, respectively. Web Appendix C: Framingham Heart Study We analyze a subset of the Framingham Heart study to illustrate our proposed methodology. The data consist of 348 participants who had stroke incidence during the follow-up study and the clinical data was collected on each participant during three examination periods, approximately 6 years apart. Of interest is the dependence structure between the log-pulse pressures of the first two examination periods, log(pp 1 ) and log(pp 2 ). It is known that chronic malnutrition may occur after stroke, particularly in severe cases that result in eating difficulties. Conversely, some patients may gain weight as a side effect of medication. Therefore, as an indicator of health status, we consider the change in the body mass index ( BMI) as the covariate. For the initial investigation, we categorize the change in BMI, and focus on the three

6 Biometrics, 000 0000 extreme cases, namely quite stable ( 0.2 BMI 0.2), highly increased ( BMI > 2) andhighlydecreased( BMI< 2).FromWebFigure2,whenthereisasignificantchangein BMI in either direction, the dependence between the pulse pressures seems weaker compared to the case when BMI remains stable. [Web Figure 2 about here.] The scatterplot and histograms of the log-pulse pressures are given in Web Figure 3(a), from which the marginals are seen to be well-modeled by linear regression models assuming normal errors. We consider U j = Φ((log(PP j ) ˆµ j ( BMI))/ˆσ j ) to transform the response variables to uniform scale, where ˆµ j ( BMI)) are obtained from fitting linear models, ˆσ j are the estimated standard errors in the linear regression models and Φ is the c.d.f. of N(0,1). Web Figure 3(b) gives the scatterplot and histograms of the transformed random variables U j, j = 1,2. [Web Figure 3 about here.] For comparison, the calibration function is estimated using both a parametric linear model and the proposed nonparametric model with local linear (p = 1) specification under the Clayton, Frank and Gumbel families. In the local linear estimation, the optimum bandwidths are chosen as 9.47, 6.72, and 4.25, for the Clayton, Frank and Gumbel copulas, respectively. We constructed approximate 90% confidence intervals under each family using half of these bandwidth values. Web Figure 4 displays the results converted to the Kendall s tau scale. [Web Figure 4 about here.] A natural interest in modeling the conditional dependence of log-pulse pressures at different periods is to predict the latter from the previous measurement. Therefore, in copula selection, we use the cross-validated prediction errors only for the second log-pulse pressure as the selection criterion which chooses the Frank copula family. Since, at a given change in BMI,

7 observing two high pulse pressures together or two low pulse pressures together would be equally likely, our choice of Frank copula model seems practically sensible. The results obtained using the nonparametric approach under the Frank copula indicates that the dependence between two log-pulse pressures is stronger when BMI remains stable, as reflected by a steady health status. However, the nonlinear pattern does not seem to be significant as the parametric linear fit in fact falls within the approximate 90% pointwise confidence bands, suggesting that the linear model may be already adequate. Web Appendix D: Birth weight dependence in twins - Revisited The results presented in Section 4 are based on parametrically estimated conditional marginals. Here, we demonstrate the use of bootstrapping the raw data to construct confidence bands for Kendall s tau. Specifically, for each bootstrap sample we use kernel based conditional distribution estimation (Hall et al., 1999; Li and Racine, 2008) to estimate the marginals and then transform the responses to the uniform scale, as shown in Web Figure 5. Subsequently, we obtain the local linear estimate of the calibration function under the selected Frank copula model and construct the 90% quantile-based bootstrap interval based on 200 resamples. For comparison, we also plot in Web Figure 6 the asymptotic 90% confidence interval described in Section 2.3. [Web Figure 5 about here.] [Web Figure 6 about here.] References Hall, P., Wolff, R. C. L., and Yao, Q. (1999). Methods for estimating a conditional distribution function. Journal of the American Statistical Association 94, 154 163. Li, Q. and Racine, J. S. (2008). Nonparametric estimation of conditional cdf and quantile functions with mixed categorical and continuous data. Journal of Business & Economic Statistics 26, 423 434.

8 Biometrics, 000 0000 τ(x) 0.0 0.2 0.4 0.6 0.8 1.0 τ(x) 0.0 0.2 0.4 0.6 0.8 1.0 2.0 2.5 3.0 3.5 4.0 4.5 5.0 x 2.0 2.5 3.0 3.5 4.0 4.5 5.0 x Web Figure 1. Simulation study: 90% confidence intervals for the Kendall s tau under the Frank copula with the linear(left panel) and quadratic(right panel) calibration models: truth (solid line), averaged local linear estimates (dashed line), approximate confidence intervals (dotted line), and Monte Carlo confidence intervals (dotdashed line)

9 BMI < 2 0.2 BMI 0.2 BMI > 2 log(pp2) 3.5 4.0 4.5 5.0 log(pp2) 3.5 4.0 4.5 5.0 log(pp2) 3.5 4.0 4.5 5.0 3.0 3.5 4.0 4.5 5.0 log(pp1) 3.0 3.5 4.0 4.5 5.0 log(pp1) 3.0 3.5 4.0 4.5 5.0 log(pp1) Web Figure 2. Framingham data: scatterplots of the log-pulse pressures at different covariate ranges.

10 Biometrics, 000 0000 log(pp2) 3.5 4.0 4.5 U2 0.0 0.2 0.4 0.6 0.8 1.0 3.5 4.0 4.5 log(pp1) (a) 0.0 0.2 0.4 0.6 0.8 1.0 U1 (b) Web Figure 3. Framingham data: histograms and scatterplots of (a) the log-pulse pressures at period 1 and period 2;(b) the marginal distributions of the transformed variables (U 1,U 2 ).

11 Clayton Copula Frank Copula Gumbel Copula τ(x) 0.0 0.2 0.4 0.6 τ(x) 0.0 0.2 0.4 0.6 τ(x) 0.0 0.2 0.4 0.6 4 2 0 2 4 x 4 2 0 2 4 x 4 2 0 2 4 x Web Figure 4. Framingham data: the Kendalls tau estimates under three copula families: global linear estimates (dotdashed line), local linear estimates at the optimum bandwidth(longdashed line), and an approximate 90% pointwise condence bands (dotted lines).

12 Biometrics, 000 0000 U2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 U1 Web Figure 5. Twin data: histograms and scatterplot of the nonparametric estimates of the conditional marginal distributions.

13 τ(x) 0.3 0.4 0.5 0.6 0.7 0.8 30 35 40 x Web Figure 6. Twin data: the Kendall s tau estimates under the Frank copula: local linear estimates at the optimum bandwidth (longdashed line), an approximate 90% asymptotic confidence interval (dotted lines), a 90% bootstrap confidence interval (dotdashed line) based on 200 resamples.

14 Biometrics, 000 0000 Web Table 1 Integrated Squared Bias, Integrated Variance and Integrated Mean Square Error of the Kendall s tau estimator (multiplied by 100) and the averages of the selected bandwidths h. Linear Calibration Model Parametric estimation Local estimation IBIAS 2 IVAR IMSE IBIAS 2 IVAR IMSE h Clayton 8.611 1.125 9.736 8.334 1.514 9.848 2.056 Frank 0.006 0.316 0.322 0.006 0.444 0.450 2.206 Gumbel 1.887 0.478 2.365 1.793 0.746 2.539 2.133 Nonlinear Calibration Model Parametric estimation Local estimation IBIAS 2 IVAR IMSE IBIAS 2 IVAR IMSE h Clayton 18.192 3.470 21.662 10.593 2.833 13.426 0.822 Frank 6.162 0.673 6.835 0.127 1.280 1.407 0.468 Gumbel 9.597 1.183 10.780 3.504 2.067 5.571 0.569