Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification. Jialin Zou

Size: px

Start display at page:

Download "Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification. Jialin Zou"

Shonda Preston
5 years ago
Views:

1 Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification by Jialin Zou A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Statistical Science University of Toronto c Copyright 2015 by Jialin Zou

2 Abstract Nonparametric Methods for Interpretable Copula Calibration and Sparse Functional Classification Jialin Zou Doctor of Philosophy Graduate Department of Statistical Science University of Toronto 2015 Nonparametric estimation is a novelty statistical method which relaxes the distribution assumption about the relationship between response and covariate, in contrast to parametric estimation. This method has been applied in many field of interest, including density function, regression model and derivative function. One of the important application of nonparametric estimation is modelling dependence among random variables via copula approaches has attracted considerable research attention. With advances in data collection, the strength of dependence often varies according to some covariate, which motivates the dependence calibration using conditional copulas. We propose a penalized estimation framework for the copula parameter function that inherits the flexibility of a nonparametric method and, at the same time, yields a parsimonious and interpretable dependence structure. The theoretical analysis guarantees that the penalized estimators enjoy the oracle properties and behave asymptotically as well as their nonparametric counterparts, while numerical experiments demonstrate the improved empirical performance. We then apply the proposed method to a twin birth weights data. Another important application of nonparametric estimation is classifying the functional data. We consider the classification of sparse functional data that are often encountered in longitudinal studies and other scientific experiments. To utilize the information from not only the functional trajectories but also the observed class labels, we propose a probability enhanced method achieved by weighted support vector machine ii

3 based on its Fisher consistency property to estimate the effective dimension reduction space. Since only a few measurements are available for some, even all, individuals, a cumulative slicing approach is suggested to borrow information across individuals. We provide justification for validity of the probability-based effective dimension reduction space, and a straightforward implementation that yields a low-dimensional projection space ready for applying standard classifiers. The empirical performance is illustrated through simulated and real examples, particularly in contrast to classification results based on the prominent functional principal component analysis. iii

4 Dedication This thesis is dedicated to my parents. iv

5 Acknowledgements First, I would like to express my deepest thanks to my supervisor, Professor Fang Yao, for his guidance, enthusiasm and patience during my research. Without his help, finishing the thesis is impossible. Secondly, I also thank my thesis committee members, especially, Professor Radu V. Craiu from University of Toronto and Professor Yichao Wu from North Carolina State University for their feedback and comments on my thesis. Thirdly, I am grateful to the faculty members, Professor Sheldon Lin, Professor Radford Neal, Professor Nancy Reid, Professor Jeffrey S. Rosenthal, Professor Mike Evans and Professor Lawrence J. Brunner for teaching me statistical courses. I also express many thanks to all the staff at Department of Statistical Science, especially Andrea Carter, Christine Bulguryemez and Dermot Whelan for the support and help during my PhD program. Furthermore, I express many thanks to the graduate students at Department of Statistical Science for their support and help. Finally, I would like to thank my father and mother for their encouragement and love. v

6 Contents 1 Interpretable Dependence Calibration in Conditional Copulas Introduction Proposed Methodology Asymptotic Properties Simulation Study Application to Twin Birth Data Proofs of Main Theorems PEFCS for Classifying Sparse Functional Data Introduction Proposed Methodology Simulations Data Examples Concluding Remarks Bibliography 74 vi

7 List of Tables 1.1 Comparisons between the proposed penalized estimation and the unpenalized local linear estimation, using the true and nonparametrically estimated conditional marginals, respectively, for the Clayton family. The underlying models correspond to η j defined in (1.11), where the IBIAS 2, IVAR and IMSE with their standard errors in parenthesis are with respect to Kendall s tau functions τ j (multiplied by 100 for visualization), j = 1, 2, Proportion(%) of correct identified copula in each family under the calibration functions, η 1 (x), η 2 (x) and η 3 (x) The average classification error ( 100%) with its standard error in parenthesis obtained from 100 Monte Carlo repetitions in Simulation I The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the Berkeley growth data The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the spinal bone density data The average classification error ( 100%), with its standard error in parenthesis obtained from 20 random partitions of the primary biliary cirrhosis follow-up data vii

8 List of Figures 1.1 The Kendall s tau of the conditional copula estimates: local linear unpenalized estimate(dashed line) Kendall s tau functions τ j (x) that correspond to η j (x) in (1.11) for j = 1, 2, 3 (from left to right) Scatterplots and histograms for the original twin birth weights (left) and the transformed responses using the parametric (middle) and nonparametric (right) marginal estimates, respectively The Kendall s tau of estimated copula parameter functions under three Archimedean couples: Clayton, Frank and Gumbel. The top panels correspond to parametric marginals, and the bottom panels correspond to nonparametric marginals. In each panel, shown are the penalized estimate (solid) with 95% bootstrap confidence bands (dotted), as well as the unpenalized estimate (dotdashed) Height trajectories of 39 boys (top) and 54 girls (bottom) from the Berkeley growth data Spinal bone density data for Hispanic female (top) and male (bottom) Logarithm-transformed measurements of serum bilirubin for the patients that are alive (top) or dead (bottom) beyond ten years from the primary biliary cirrhosis data viii

9 Chapter 1 Interpretable Dependence Calibration in Conditional Copulas: A Penalized Approach 1.1 Introduction One of the challenging problems in statistics is how to characterize dependence structure among random variables. Although correlation is easy to obtain for measuring linear association as a common means, a full characterization of dependence among random variables is desirable but more difficult. Pioneered by Sklar s theorem (Sklar, 1959), copula has become a powerful tool for modelling dependence structure. To be specific, denote the marginal distributions of random variables Y 1 and Y 2 by F 1 and F 2, and their joint distribution by H, then the existence and uniqueness of the copula function C are guaranteed by Sklar s theorem, H(y 1, y 2 ) = C{F 1 (y 1 ), F 2 (y 2 )}. Along with theoretical developments, the applications of copula have also flourished, e.g., in finance and insurance (Frees and Valdez, 1998; Embrechts and Straumann, 2002; Cherubini et al., 2004), survival analysis (Clayton, 1978; Shih and Louis, 1995; Hougaard, 2000; Wang and Wells, 2000), among others. Although the ordinary copula has been widely studied, it failed to enjoy additional information from covariates. The conditional copula was recently proposed by introducing covariate into the copula modelling. By extending the 1

10 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas2 Sklar s theorem (Patton, 2006), the existence and uniqueness of a conditional copula are also guaranteed and covariate adjustment is brought into conditional distributions for improving estimation of copula parameter. For instance, based on information from a covariate X, the dependence surface between Y 1 and Y 2 can be modelled by C(U 1, U 2 ; X), where U 1 = F 1 X (Y 1 X) and U 2 = F 2 X (Y 2 X) with F j X being the conditional marginal cumulative distribution functions (c.d.f.). Patton (2006) showed that the conditional joint distribution for each X = x is uniquely defined by H(y 1, y 2 ; x) = C(U 1, U 2 ; x) for (y 1, y 2 ) in the support of (Y 1, Y 2 ). A copula family defined by C is often indexed by a parameter θ that plays a critical role in determining the dependence structure. As a consequence, the estimation of the copula parameter is of particular interest. Common approaches for estimating a single copula parameter include the maximum likelihood method (Genest and Rivest, 1993; Joe, 1997), or alternatively the nonparametric kernel method (Fermanian and Scaillet, 2003; Chen and Huang, 2007). For estimating the conditional copula parameter, we refer readers to Bartrama et al. (2007), Jondeau and Rockinger (2006) and Patton (2006) for parametric estimation. Due to the limitation of a priori, nonparametric estimation of the functional relationship between the conditional copula parameter θ and a covariate X has been called for. Gijbels et al. (2011) proposed empirical estimators that are fully nonparametric and studied the asymptotic properties in Veraverbeke et al. (2011). Acar et al. (2011) modelled the conditional copula as an unknown function of X, and expanded it around each point by utilizing the local polynomial technique (Fan and Gijbels, 1996), while treating the conditional marginal distributions as known. Abegaz et al. (2012) extended this framework with a nonparametric kernel estimator for the conditional marginals. Although a nonparametric estimation of the conditional copula is flexible, it some-

11 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas3 times can be hazardous to have overly fluctuated dependence over some range of the covariate. This is inherited from nonparametric regression and might not respect the underlying relationship. This consideration motivates us to inspect the Kendall s tau of the conditional copula estimates obtained by local linear modelling for the twin birth weight data from the Matched Multiple Birth Dataset of the National Centre for Health Statistics (Acar et al., 2011), shown in Figure 1.1. A careful inspection may question that the dependence strength does not necessarily change in the middle region. Moreover, should the dependence in the middle be constant, the copula model is more parsimonious with an enhanced interpretation. Figure 1.1: The Kendall s tau of the conditional copula estimates: local linear unpenalized estimate(dashed line). We tackle this problem by introducing a penalized estimation framework that detects the region over which the dependence structure is potentially constant to remove undesirable fluctuation (Yao and Zou, 2015). There have emerged extensive penalty approaches in high-dimensional literature, such as the nonnegative garrote (Breiman,

12 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas4 1995), the (adaptive) LASSO (Tibshirani, 1996; Zou, 2006), the Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001), among others. Some of these approaches have also been adopted to nonparametric estimation for identifying regions of particular interest, among which a relevant work is Kong et al. (2013) that coupled local polynomial regression with the SCAD in varying coefficient model to identify nonzero regions. In this chapter, we introduce a specially designed penalty in the context of local likelihood for conditional copulas, so that the resultant estimator respects the underlying dependence relationship. Our proposal enjoys the flexibility of nonparametric estimation without suffering from unnecessary fluctuations. A main contribution is the theoretical analysis which guarantees that the proposed method enjoys the oracle properties and behaves asymptotically as well as its nonparametric counterparts, while the numerical study illustrates its superior finite sample performance. In the following, we briefly review the relevant topics that are involved in our proposed methodology for interpretable dependence calibration in conditional copulas Introduction to Copula Modeling A copula is used to describe the association of two or more random variables from any joint distribution. It can be defined as follow. Definition 1. A bivariate copula is a joint distribution of two uniform distributions, i.e., C(u 1, v 1 ) = P (U 1 u 1, V 1 v 1 ), (1.1) where U 1 Uniform(0, 1) and V 1 Uniform(0, 1). Equivalently, a copula has the following properties.

13 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas5 Proposition 1. A bivariate copula is a two dimensional function C whose support and range are [0, 1] 2 and [0, 1] and has the following properties: 1. C(0, u) = C(u, 0) = 0 and C(1, u) = C(u, 1) = u, u [0, 1]. 2. C(u 1, v 1 ) + C(u 2, v 2 ) C(u 1, v 2 ) C(u 2, v 1 ) 0 for all u 1, u 2, v 1, v 2 [0, 1] such that u 1 u 2 and v 1 v 2. The central theory about copula is Sklar s theorem (1959), which demonstrates the relationship among copula, joint distribution and marginal distribution. Theorem 1. (Sklar s Theorem) Suppose that H is the joint distribution for continuous random variables Y 1 and Y 2 with marginal distributions F 1 and F 2, then there exists a unique copula C such that H(y 1, y 2 ) = C{F 1 (y 1 ), F 2 (y 2 )}, for all (y 1, y 2 ) R 2. (1.2) Conversely, if F 1 and F 2 are distribution functions and C is a copula, then the function H defined by (1.2) is the joint distribution function with margin distributions F 1 and F 2. The detail proof can be found in Schweizer and Sklar (1983). Sklar s theorem guarantees the existence and uniqueness of the so-called copula function C. After the ordinary copula has been widely applied, the conditional copula was recently proposed by introducing the covariate into the copula function. Definition 2. (Conditional copula). The conditional copula C(Y 1, Y 2 X) is the joint distribution of U 1 = F 1 X (Y 1 X) and V 1 = F 2 X (Y 2 X) given X, where Y 1 X F 1 X ( X) and Y 2 X F 2 X ( X). Patton (2006) extended the Sklar s theorem to guarantee the existence and uniqueness of the conditional copula.

14 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas6 Theorem 2. (Sklar s Theorem for the conditional distributions) Suppose that Y 1 X = x and Y 2 X = x have conditional distributions F 1 X ( x) and F 2 X ( x), respectively, H X ( ; x) is the joint distribution of Y 1 and Y 2 given X = x, where the support of X is χ, then there exists a unique copula C( x) such that H X (y 1, y 2 ; x) = C(F 1 X (y 1 x), F 2 X (y 2 x); x), (1.3) if F 1 X ( x) and F 2 X ( x) are continuous in y 1 and y 2. Conversely, Y 1 X = x and Y 2 X = x have conditional distributions F 1 X ( x) and F 2 X ( x), respectively, and C( x) is a conditional copula which is measurable in x, then the function H X ( x) defined by (1.3) is the joint distribution function with margin distributions F 1 X ( x) and F 2 X ( x). Detail of the proof can be found in Patton (2002). Sklar s Theorem for the conditional distributions guarantees the existence and uniqueness of the so-called copula function C after bringing in the covariate, it extends the flexibility of copula. Note based on the theorem, all of F 1 X ( x), F 2 X ( x) and C(, ; x) are conditional on covariate x, otherwise the theorem fails Local Polynomial Regression Smoothing methods is a powerful approach to describe the complex data structure without any stringent assumptions. Many advanced techniques have been extensively studied in the regression setting and more complicated framework. Under the framework of the simple nonparametric regression, we have the model Y = η(x) + ε, ε (0, σ 2 ), (1.4) where η is the smooth function of interest, ε is the noise with mean zero and variance σ 2.

15 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas7 The local polynomial regression (Fan and Gijbels, 1996) is a popular and simple method to (1.4). By Taylor expansion, we approximate η(x) by p th order expansion η(x i ) p j=0 η (j) (x) (X i x) j j! p β j,x (X i x) j. (1.5) j=0 To account for the contribution from X i in the neighborhood of x, we minimize the local mean square error { Y i p j=0 } 2K( β j,x (X i x) j X i x ), (1.6) h where K is a one-dimensional symmetric kernel function and the bandwidth h determines the width of the local window. From (1.5), we obtain the estimator ˆη (j) (x) = j! ˆβ j,x, j = 0,..., p Adopting the idea from local polynomial, one can generalize it to a framework, when the least square loss is not appropriate, and shed light on further development of likelihood-based technique. Suppose that the observation (X i, Y i ) has the log-likelihood l{η(x i ), Y i }, where η(x i ) is to be estimated, the log-likelihood or the loss function for the entire n data points can be written as, l{η(x i ), Y i }. On the consideration of the contribution of the local log-likelihood, using expansion (1.5), the value of ˆη(x) at grid x is given by maximizing the local log-likelihood { p } l β j,x (X i x) j, Y i K( X i x ), h j=0 over β 0,x,..., β p,x. Similarly, the estimators are given by ˆη (j) (x) = j! ˆβ j,x, j = 0,..., p.

16 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas8 The local likelihood approach has been extended to a large number of problems. The generalized additive models (Hastie and Tibshirani, 1990), the generalized linear models (Fan et al., 1995) and the varying coefficient models (Cai et al., 2000) have been developed under this framework. In multiparameter regression, Aerts and Claeskens (2000) proposed the multiparameter likelihood models, among others Smoothly Clipped Absolute Deviation Fan and Li (2001) proposed the Smoothly Clipped Absolute Deviation (SCAD) penalty which yields the estimators with desired properties including sparsity, continuity and unbiasedness. To be specific, sparsity means that the small estimator is being set to zero to achieve the variable selection, and continuity means that the penalty can lead to the continuous estimator so that the model prediction is stable, while unbiasedness means that the estimator is unbiased if the true estimator is large. As a consequence, with the help of SCAD, Fan and Li (2001) provided a novelty approach to achieve the dimension reduction and variable selection simultaneously. The derivative formula is P λ(t) = λ{i(t λ) + (aλ t) + I(t > λ)}, t > 0, for some a > 2, (a 1)λ and the regular formula is P λ (t) = λ t, if t λ, t2 2a t + λ 2, 2(a 1) if λ < t aλ, (a + 1)λ 2, 2 if t aλ, where a = 3.7 is often used and λ is the tuning parameter which controls the penalty

17 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas9 strength. From the formula, we can obtain that the SCAD penalty is continuously differentiable in R except 0 with its derivative being zero in (, aλ) and (aλ, ). Under the multivariate linear regression framework, suppose Y is the response and β is a d 1 coefficient vector associated with the corresponding covariate vector X. We consider the model Y = Xβ + ɛ, where ɛ is from the normal distribution with mean zero and constant variance. With SCAD penalty, we can obtain the resulting estimators through the following equation min β 1,,β d (Y i X i β) 2 + n d P λ ( β j ). j=1 Note when the design matrices are orthonormal, we can obtain the explicit form of the resulting estimator as follow, ˆβ SCAD = ( ˆβ λ) + sign( ˆβ), if ˆβ < 2λ, (a 1) ˆβ sign( ˆβ)aλ, a 2 if 2λ < ˆβ aλ, ˆβ, if ˆβ aλ, where ˆβ is the unpenalized estimator, i.e., the least square estimator. The SCAD penalty has been studied extensively. Fan and Peng (2004) developed the oracle properties with a diverging number of covariates, while Kim et al. (2008) studied the sparsity property in high dimension case. The rest of the chapter is organized as follows. In section 1.2, we describe the proposed methodology, along with the algorithm and the selection of tuning parameters. Asymptotic properties are presented in Section 1.3, where both parametrically and

18 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas10 nonparametrically estimated marginals are considered. We illustrate the empirical performance through a simulation study in Section 1.4, and apply the proposed method to the twin birth weights data in Section 1.5. Technical proofs are deferred section Proposed Methodology Penalized local likelihood for interpretable dependence Let X be a continuous covariate that a pair of continuous response (Y 1, Y 2 ) is conditional on, and recall that the marginal c.d.f. given X are F 1 X and F 2 X. We first focus on the estimation of the copula parameter function, treating the marginals as known, and then extend to the case of estimated marginals. There exists a unique copula function C such that the joint conditional distribution (Y 1, Y 2 ) given X can be expressed as follows, H{y 1, y 2 ; θ(x)} = C{F 1 X (y 1 x), F 2 X (y 2 x); θ(x)} or h{y 1, y 2 ; θ(x)} = c{f 1 X (y 1 x), F 2 X (y 2 x); θ(x)}f 1 X (y 1 x)f 2 X (y 2 x), where f 1 X (y 1 x) and f 2 X (y 2 x) are conditional density functions. Let U 1,x = F 1 X (Y 1 x) and U 2,x = F 2 X (Y 2 x) that are uniformly distributed as U[0, 1], we have (U 1i, U 2i ) X i C{u 1i, u 2i ; θ(x i )}, where θ(x i ) = g 1 {η(x i )} and g is a known monotone link function to guarantee the value of θ in a proper range, i = 1, 2,..., n. We begin with the local polynomial expansion around a fixed x in the support of X, η(x i ) η(x) + η (x)(x i x) η (p) (x)(x i x) p /p!.

19 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas11 Denoting β k,x = η (k) (x)/k!, the copula function is approximated by θ(z) g 1 {β 0,x + β 1 (z x) β p,x (z x) p } for z in some neighbourhood of x and β 0,x = g{θ(x)}. For brevity we suppress the dependence of β k,x on x. It is known that estimating higher degree coefficients leads to larger variability and computational complexity, while a customary choice is a local linear smoother with p = 1 (Fan and Gijbels, 1996). Denoting β = (β 0, β 1 ), the local log-likelihood at x is l(β; x) = log c[u 1i, U 2i ; g 1 {β 0 + β 1 (X i x)}]k h (X i x), (1.7) where K h ( ) = h 1 K( /h), K is a compactly supported kernel density, and h is the bandwidth to control the amount of smoothing. Common choices of K include the Epanechnikov kernel, K(u) = 3 4 (1 u2 )I( u 1), if K is Epanechnikov kernel, where I( ) is the indicator function, as well as the triweight kernel or Gaussian kernel, etc. Our goal is to encourage the nonparametric estimation to stay constant whenever the underlying relationship is indeed so. Note that the local coefficients of higher degrees regulate how the dependence structure varies over the neighbourhood of x. Specifically the local slope β 1 represents the rate of smooth change at x in the copula parameter. To identify the constant region of the dependence, we use a sufficiently dense grid over the domain of X, say {x 1,..., x N } on which the estimates will be attained. If the local slope parameters are zero for a set of consecutive grid points, say {x j,..., x j+l }, we will

20 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas12 regard θ (1) (x) = 0 for x (x j, x j+l ). Given the smooth feature of the local linear fit, the resultant estimator of η( ) will appear constant over the region (x j, x j+l ). The above consideration suggests the penalization to be imposed on the local slope parameters at each grid point. To properly scale the penalty function when coupled with the local log-likelihood l(β; x) in (1.7), we divide l(β) by K h (0) so that K h (X i x)/k h (0) = O(1). At any fixed x, the data contributing to the estimation of η(x) only include those in its local window. Thus we define the effective sample size at x by m x = n K h(x i x)/k h (0). Lastly we standardize each column of the design matrix as in traditional linear model, before coupling with the penalty. Denote the standard deviation of {K 1/2 h (X i x)},...,n by s x, the standard deviation of {(X i x)k 1/2 (X i x)},...,n by r x. Then the local coefficients are scaled as β 0 = s x β 0 and β 1 = r x β 1. This scaling facilitates our asymptotic analysis, thanks to the same convergence rates for the estimates of β 0 and β 1. Now we aim for maximizing the following penalized local log-likelihood with respect to (w.r.t.) β = ( β 0, β 1 ), h Q( β; x) = log c[u 1i, U 2i ; g 1 {s 1 β x 0 + rx 1 β 1 (X i x)}] K h(x i x) + m x P λx ( K h (0) β 1 ), (1.8) where m x = n K h(x i x)/k h (0), P λx ( ) is the penalty function that tends to shrink the local slope β 1 to zero if the true value is, and λ x is the shrinkage parameter. We employ the SCAD penalty that yields estimators with desired consistency and sparsity, { P λ x (t) = λ x I(t λ x ) + (aλ x t) } + I(t > λ x ), t > 0, for some a > 2, (a 1)λ x where a = 3.7 is suggested by Fan and Li (2001). Other choices of P λx ( ) are available, such as the MCP (Zhang, 2010) and the (adaptive) LASSO (Tibshirani, 1996; Zou, 2006).

21 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas13 The adaptive LASSO is a convex penalty function, while MCP and SCAD are non-convex penalty functions. All of them produce continuous and unbiased solution. For practical implementation, to emphasize the constant pattern and remedy the boundary effect, one can make numerical adjustment for the final estimator of η(x) after obtaining ˆβ on all grid points. For instance, we take the ˆβ 0 (x [N/2] ) at the central grid point x [N/2] and use the numerical approximation, where [N/2] denotes the nearest integer inclusively, ˆη a (x) = ˆβ a 0(x) = ˆβ 0 (x [N/2]) + k 1 j=[n/2] ˆβ 1 (x j)(x j+δ x j) + ˆβ 1 (x k)(x x k), where x k is the nearest grid point from x [N/2] to x, and δ = 1 if x > x [N/2] and 1 otherwise. It is easy to verify that this adjusted estimator is asymptotically equivalent, ˆη a (x) ˆη(x) = o p (1) for any x given a sufficiently dense grid. The assumption of known conditional marginals, F 1 X and F 2 X, may be relaxed. If F 1 X and F 2 X can be estimated from a parametric model, the estimated marginals are root-n consistent and the additional error is negligible relative to that from estimating the copula function. If there is no such prior knowledge of the marginals available, one can estimate the conditional marginals using a nonparametric approach and plug the estimates into the above penalized estimation. This inflates the error of the estimated copula function, and is characterized in Section 1.4. For specificity, we use the Nadaraya- Watson estimator suggested by Abegaz et al. (2012) for j = 1, 2, ˆF j X (y x) = ω ni (x, h j )I(Y ji y), ω ni (x, h j ) = K h j (X i x), (1.9) K hj (X k x) k=1 where h j s are bandwidths for controlling the smoothness.

22 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Optimization algorithm and selection of parameters To maximize the penalized log-likelihood Q( β; x) w.r.t. β = ( β0, β 1 ), we proposed an iterative procedure that modifies the local linear approximation (LLA) algorithm proposed by Zou and Li (2008). Denote the estimates obtained in the mth iteration by β (m) 0 = s x β (m) 0, β (m) 1 = r x β (m) 1, β (m),(m) (m) (m) = ( β 0, β 1 ), η (m),(m) (x) = s 1 x β (m) 0 + r 1 x β (m) 1 (X i x), regard l(β; x) in (1.7) as a function of β, i.e., l( β) = l i ( β)k h (X i x)/k h (0), where the dependence on x is suppressed. We use the unpenalized local linear estimator (Acar et al., 2011) as an initial estimate. The algorithm usually converges within a few iterations from our numerical experience. Update the local intercept hessian 2 l( β)/ β 2 0, β (m) 0 by calculating the gradient l( β)/ β 0 and the β (m+1) 0 = { (m) β 2 l( β (m),(m) ) } 1 l( β (m),(m) ) 0. 2 (m) (m) β 0 β 0 Update the local slope β (m) 1 by the modified LLA algorithm. Denote X x = (X 1

23 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas15 x,..., X n x), µ i = (X i x) β 1, and D = diag(d 11,..., D nn ), 2 l( β (m+1),(m) ) β (m) 2 1 = X x DX x, D ii = 2 l i ( β (m+1),(m) ) µ i µi = µ (m) i. Define the working data y = (D 1/2 11 ˆµ (m) 1,..., D 1/2 compute β (m) 1 as follows. nn ˆµ (m) n ) and X x = D 1/2 X x. We (a) If p (1) (m) λ x ( β 1 ) = 0, β (m+1) 1 = (X x X x) 1 X x y. (b) If p (1) (m) λ x ( β 1 ) > 0, take a further transform X x = λ x X x/p (m) λ x ( β 1 ), and apply the coordinate descent algorithm (Friedman et al., 2007), { β (m+1) 1 1 = arg min β 1 2 y X β x mλ x β } 1. Then the local slope is given by (m+1) β 1 = λ β(m+1) x 1 /p (m) λ x ( β 1 ). It is important to tune the shrinkage parameter λ x that controls the magnitude of β 1 and thus the dependence strength. Suggested by Wang and Leng (2009), we adopt the Bayesian information criterion (BIC), BIC(λ x ) = 2 log c[u 1i, U 2i ; g 1 {ˆη λx (X i )}] K h(x i x) K h (0) + df log m x, where m x = n K h(x i x)/k h (0), ˆη λx (X i ) = ˆβ λx,0 + ˆβ λx,1(x i x) with a subscript λ x emphasizing the dependence on λ x, and df = 1 + I( ˆβ λx,1 > 0). For the bandwidth h in the copula estimation (1.8), we minimize the 2 two-fold crossvalidated likelihood (CVL, Acar et al., 2011). With a slight abuse of notation, denote the estimate based on the training data by ˆβ 0,h indicating the dependence on h, and the

24 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas16 testing set by {X1,..., X[n/2] }. Maximizing the objective function w.r.t h, CVL(h) = [n/2] log c[u 1i, U 2i ; g 1 { ˆβ 0,h (X i )}] (1.10) yields a data-driven choice of h, where ˆβ 0,h (X i ) s are assessed on the testing set. When the marginal distributions are nonparametrically estimated by the kernel method (1.9), the bandwidths h 1 and h 2 can also be included in the criterion (1.10). Lastly for choosing an appropriate copula family, since the likelihoods are on different scale, we adapt a twofold cross-validated prediction error (CVPE). Details can be found in Acar et al. (2011) and are omitted for conciseness. 1.3 Asymptotic Properties In this section, we show that the proposed penalized estimator enjoys the oracle properties, including the estimation consistency, sparsity and asymptotic normality. Here the sparsity is in the sense that, if the underlying dependence is constant at x, the local slope will be estimated exactly as zero. Recall that β = ( β 0, β 1 ) is the scaled version of β = (β 0, β 1 ), i.e., β 0 = s x β 0 and β 1 = r x β 1. For convenience, we drop the subscript x in λ x, and denote the true value of β = (β 0, β 1 ) and β = ( β 0, β 1 ) by β 0 = (β 00, β 01 ) and β 0 = ( β 00, β 01 ), respectively. We present the asymptotic properties in terms of the scaled penalized estimators β λ = ( β λ0, β λ1 ) that maximizes (1.8), and β N λ = ( β λ0 N, β λ1 N ) when using the nonparametrically estimated marginals (1.9). Without loss of generality, we assume that the bandwidths h j for estimating ˆF j X in (1.9) are on the same order of h for estimating the copula function in (1.8), and use a common kernel density K for both marginal and copula estimation. We now present the regularity conditions on the conditional copula density in (A1)-

25 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas17 (A5) needed for establishing the asymptotic results, collectively as Conditions (A), which are analogous to those in Abegaz et al. (2012). With a slight abuse of notation, denote l{g 1 (η); u 1, u 2 } = log c{u 1, u 2 ; g 1 (η)}, l (η; u 1, u 2 ) = ( / η)l{g 1 (η); u 1, u 2 }, and similarly for l (η; u 1, u 2 ) and l (η; u 1, u 2 ). Define l 1,s (η; u 1, u 2 ) = ( 2 / η u s )l{g 1 (η); u 1, u 2 } for s = 1, 2. (A1) The conditional copula density c{u 1, u 2 ; θ(x)} has a common support in [0, 1] 2 R. There exists an open set Θ containing the true parameter θ(x), such that for almost all (u 1, u 2 ), c(u i, u 2 ; θ) has the third derivative w.r.t. u 1, u 2 and θ for all θ Θ. (A2) The functions l, l, l, l and l i,s for s = 1, 2, are bounded and continuous. Moreover, l is a Lipschitz continuous trivariate function. (A3) E θ [l {g(θ); U 1, U 2 }] = 0 for all θ Θ. Moreover, I(θ) = E θ [l {g(θ); U 1, U 2 } 2 ] = E θ [l {g(θ); U 1, U 2 )] is positive and continuously differentiable on Θ. (A4) There exist functions Q 1 and Q 2 such that l {g(θ); u 1, u 2 } Q 1 (u 1, u 2 ), l {g(θ); u 1, u 2 } Q 2 (u 1, u 2 ) for all θ Θ, and E θ {Q 2 1(U 1, U 2 )} and E θ {Q 2 2(U 1, U 2 )} are uniformly bounded on Θ. (A5) For some a j 0 and c 1 > 0, l {g(θ); u 1, u 2 } c 1 2 j=1 {u j(1 u j )} a j, such that E{ 2 j=1 {U j(1 U j )} a j } <. Moreover, for some b i a j, 1 j k 2 and c 2 > 0, l 1,j {g(θ); u 1, u 2 } c 2 {u k (1 u k )} a k {uj (1 u j )} b j, such that E[{U k (1 U k )} a k {U j (1 U j )} ɛ j b j }] < for some ɛ j (0, 1/2). Conditions (A1) (A4) are standard, (A5) allows the score and its partial derivatives w.r.t. u 1 and u 2 to possibly diverge at the boundaries. This makes the results applicable for some commonly used copula models, such as Gaussian, Student-t, Clayton or Gumbel copula. The conditions on the bandwidth h, the penalty function P λx ( ) and the shrinkage

26 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas18 parameter λ are summarized in Conditions (B). We suppress the dependence of h and λ on n, and denote a n = P λ ( β 01 ) and b n = (nh) 1/2. (B1) nh 2+δ for some δ > 0, and n 1/5 h <. (B2) a n /h 2 0, and a 2 nnh 0, i.e., a n = o(b n ). (B3) P λ ( β 01 ) 0, and lim inf lim inf P λ(θ)/λ > 0. n θ 0 + Condition (B1) is to ensure that the bias dominates the variance from the copula and marginal estimation, (B2) guarantees that the strength of the true signal dominates the bias and the variance, thus the existence of a b n -consistent penalized estimator. Condition (B3) makes the penalty negligible to the likelihood and the singularity at the origin for achieving a sparse solution, which is fulfilled by the SCAD penalty (Fan and Li, 2001). Lastly Conditions (C) collect the standard requirements on other relevant quantities. (C) The parameter function η( ) has the uniformly bounded second derivative. The monotone link function g is invertible with g 0, and g 1 has the third continuous derivative. The density f of X has the first continuous derivative, and for each x in the domain of X, there exists some neighbourhood R x such that inf x R x f(x ) > 0. The kernel density K is symmetric and bounded with compact support on [ 1, 1]. Define γ r = x r K(x)dx, and N 2 and S 2 are 2 2 matrices which have γ i+j 2 and γ i+j 1 on the (i, j) th entry, respectively. Denote η(x, X) = η(x) + η (x)(x x) = β 0 + β 1 (X x), and η 0 (x, X) = β 00 + β 01 (X x) when evaluated at true β 0, let Σ x = I{θ(x)}f(x)N 2, Λ x = I{θ(x)}f(x) S 2, x

27 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas19 Z i = ( 1, X ) i x, h M 0 (Y 1i, Y 2i, X i ) = hl i{ η 0 (x, X i ); F 1 X (Y 1i x), F 2 X (Y 2i x)}z i K h (X i x). Theorem 1 concerns the asymptotic property of the copula parameter estimation when the true marginals are used. It states that the penalized method will estimate a local slope exactly as zero, if the underlying value is indeed so, and performs asymptotically as well as the unpenalized local linear estimator considered in Acar et al. (2011). Theorem 3. Assume that Conditions (A), (B) and (C) hold. (Consistency) If λ 0 as n, then β λ β 0 = O p (b n ), where b n = (nh) 1/2. (Sparsity) If λ 0 and nλ as n, then, for the local maximizer β λ of Q( β; x) satisfying β λ β 0 = O p (b n ), P ( β λ1 = 0 β 01 = 0) 1. (Normality) If nh 5 / log n = O(1) and nh 3 / log 2 n, then (Σ 1 x Γ xσ 1 x ) 1/2 [ nh( βλ β 0 ) (Σ 1 x hσ 1 x E{M 0 (Y 1, Y 2, X)} Λ x Σ 1 )(nh) 1/2 n x ] D N(0, I 2 ), where Γ x is a 2 2 matrix with (Γ x) rs = I{θ(x)}f(x) x r+s 2 K 2 (x)dx, and I 2 is the 2 2 identity matrix. The next theorem considers the case when the marginals are nonparametrically estimated by (1.9). Denote z = (1, (w x)/h), and define M 1 (Y 1i, X i ) = h l 1,1 { η 0 (x, w); F 1 X (y 1 x), F 2 X (y 2 x)} [ Kh (X i x) ] E{K h (X x)} I(Y 1i y 1 ) F 1 X (y 1 x) zk h (w x)dh(y 1, y 2 ; w), M 2 (Y 2i, X i ) = h l 1,2 { η 0 (x, w); F 1 X (y 1 x), F 2 X (y 2 x)} [ Kh (X i x) ] E{K h (X x)} I(Y 2i y 2 ) F 2 X (y 2 x) zk h (w x)dh X (y 1, y 2 ; w).

28 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas20 Theorem 4. Assume that Conditions (A), (B) and (C) hold. In addition, the conditional marginal c.d.f. F j X satisfy the conditions (R1) and ( R3) in Veraverbeke et al. (2011). (Consistency) If λ 0 as n, then β N λ β 0 = O p (b n ), where b n = (nh) 1/2. (Sparsity) If λ 0 and nλ as n, then, for the local maximizer β N λ satisfying β N λ β 0 = O p (b n ), P ( β λ1 N = 0 β 01 = 0) 1. (Normality) If nh 5 / log n = O(1) and nh 3 / log 2 n, then (Σ 1 x E [ Γ xσ 1 x ) 1/2 nh( βn λ β 0 ) (Σ 1 x hσ 1 x { M 0 (Y 1, Y 2, X) + n 1 n M 1(Y 1, X) + n 1 n M 2(Y 2, X) Λ x Σ 1 x )(nh) 1/2 n }] D N(0, I 2 ). We see from Theorem 2 that the asymptotic covariance originated from the copula parameter estimation dominates those from the nonparametric marginals, thus is the same as in theorem 1. The bias is inflated by M 1 (Y 1, X) and M 2 (Y 2, X) due to the nonparametric estimation of the conditional marginals. Similar to Theorem 1, it is not surprising that this penalized estimator has the same asymptotic behaviour as its unpenalized nonparametric counterpart considered in Abegaz et al. (2012). 1.4 Simulation Study In this section, we examine the performance of the proposed penalized estimation of various types of conditional copula functions. We present the results using the data generated from the Clayton family, while the cases of Gumbel and Frank families lead to similar conclusions and are omitted for conciseness. The inverse link is θ = g 1 (η) for η > 0, so that the copula parameter is in a proper range. To assess the performance under different scenarios, we use three copula parameter functions: η 1 and η 2 are smoothly joint

29 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas21 piecewise linear and piecewise quadratic, respectively, and η 3 (X i ) is global quadratic, η 1 (x) = 2 9 I(x 3.8) + { (x 3.8)2 }I(x > 3.8)I(x 3.9) +(x 3.62)I(x > 3.9), η 2 (x) = 2 { 2 }I(x 9 I(x 2.9) I(x 2.9)2 > 2.9)I(x 3) { 2 } +{4.1 10(x 3.5) 2 }I(x > 3)I(x 4) (x 4.1)2 (1.11) I(x > 4)I(x 4.1) + 2 I(x > 4.1), 9 η 3 (x) = 1 + 5(x 3.5) 2. To visualize the strength of the dependence, we display these functions in Figure 1.2 on a common scale using the Kendall s tau (Trivedi and Zimmer, 2007) τ(x) = C(u 1, u 2 ; x)dc(u 1, u 2 ; x) 1, and simple calculation yields τ(x) = θ(x)/{θ(x) + 2} for the Clayton copula. We first generate n = 1000 independent copies of the covariate X i from U[2, 5], then generate (U 1i, U 2i ) from the Clayton copula given θ(x i ) = η(x i ), and further obtain Y ki = F 1 k X i (U ki X i ) using F k X = Φ, the c.d.f. of N(0, 1). When using the estimated marginals, we calculate Ûki = ˆF k Xi (Y ki X i ) with ˆF k X obtained by (1.9), k = 1, 2, i = 1,..., n. The Epanechnikov kernel K(u) = 3/4(1 u 2 )I( u 1) is used, and the bandwidths h k are selected together with the copula parameter estimation by slightly modifying the cross-validated likelihood (1.10). For assessment, we use a dense grid of 100 equally spaced points on [2, 5] and apply the local penalized estimation at each point. We examine the estimated Kendall s tau functions ˆτ(x) as the dependence measure, and define the integrated squared bias

30 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas22 Figure 1.2: Kendall s tau functions τ j (x) that correspond to η j (x) in (1.11) for j = 1, 2, 3 (from left to right). (IBIAS 2 ), variance (IVAR) and mean square error (IMSE), IBIAS 2 (ˆτ) = IVAR(ˆτ) = IMSE(ˆτ) = [E{ˆτ(x)} τ(x)] 2 dx, χ E([ˆτ(x) E{ˆτ(x)}] 2 )dx, χ E[{ˆτ(x) τ(x)} 2 ]dx = IBIAS 2 + IVAR, χ which are approximated with 200 Monte Carlo runs. To evaluate the detection of zero slopes, we define the correct zero coverage (CZ) as proportion of true zero slopes that are correctly identified on the grid, while the nonzero coverage (CNZ) as the proportion of true nonzero slopes that are correctly identified. For comparison, we also perform unpenalized local linear estimation using true and estimated marginals, respectively (see Acar et al., 2011; Abegaz et al., 2012, for detail procedures). The results are summarized in Table 1.1.

31 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas23 Table 1.1: Comparisons between the proposed penalized estimation and the unpenalized local linear estimation, using the true and nonparametrically estimated conditional marginals, respectively, for the Clayton family. The underlying models correspond to η j defined in (1.11), where the IBIAS 2, IVAR and IMSE with their standard errors in parenthesis are with respect to Kendall s tau functions τ j (multiplied by 100 for visualization), j = 1, 2, 3. Marignal True Nonparametric Model Penalized Unpenalized Penalized Unpenalized CZ 81.1%(.0036) %(.0032) - CNZ 95.6%(.0040) %(.0045) - η 1 IBIAS (.0603).8943(.1809).3602(.0625) 1.028(.2520) IVAR.5107(.0201) 8.347(.1485).5602(.0249) 8.288(.1819) IMSE.7707(.0695) 9.242(.3269).9204(.0751) 9.317(.4293) CZ 91.0%(.0128) %(.0131) - CNZ 85.1%(.0089) %(.0086) - η 2 IBIAS (.0773) 15.38(1.885) 1.211(.0995) 15.37(1.938) IVAR 1.120(.0534).2387(1.632) 1.565(.0642).2610(1.665) IMSE 2.169(.1208) 15.61(3.515) 2.777(.1517) 15.64(3.602) CZ CNZ 100%(.0000) - 100%(.0000) - η 3 IBIAS (.0340).6527(.1006) 1.314(.0391).9142(.1220) IVAR.4462(.0146) 2.040(.0728).6506(.0189) 1.937(.0903) IMSE 1.898(.0395) 2.693(.1719) 1.964(.0475) 2.851(.2113) We can see that the proposed penalized estimation is capable of correctly identifying the majority of both zero and nonzero slopes in all three cases. Regarding the estimation, the penalized estimators improve both the mean squared error for all cases, while the gains for η 1 and η 2 are more pronounced. Therefore, although the penalized estimators behave asymptotically as well as the unpenalized ones, they in fact achieve more favourable finite sample performance in our simulations. One also observes slightly increased error from using the nonparametrically estimated marginals for both penalized and unpenalized methods. We conduct copula selection using a two-fold CVPE (Acar et al., 2011) among three Archimedean families, Clayton, Gumbel and Frank, and observe that the Clayton copula has been correctly chosen over 95% of all Monte Carlo runs. The results are summarized in Table 1.2.

32 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas24 Table 1.2: Proportion(%) of correct identified copula in each family under the calibration functions, η 1 (x), η 2 (x) and η 3 (x). T rue N onparametric calibration Clayton Frank Gumbel Clayton Frank Gumbel η 1 (x) Proportion η 2 (x) η 3 (x)

33 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Application to Twin Birth Data In this section, we consider the Matched Multiple Birth Dataset that contains all US multiple births from 1995 to To be specific, we include live twin births with babies who survived beyond the first year and mothers of age between 18 and 40, and a random subset of 30 pairs of births at each week of the gestational age ranging from 28 to 42 weeks. Of interest is to study the dependence between the twin birth weights (in grams), denoted by BW 1 and BW 2, conditional on the gestational age (in weeks), denoted by GA. For completeness, we treat the unknown conditional marginals with both parametric and nonparametric estimation. In the former, we follow the suggestion by Acar et al. (2011), fitting a cubic polynomial model with the response BW ki and the covariate GA i, k = 1, 2 respectively. Denote the fitted values of BW ki by ˆµ k (GA i ) and the error variance by ˆσ k 2, then calculate Ûki = Φ[ˆσ 1 k {BW ki ˆµ k (GA i )}], k = 1, 2, i = 1,..., n. For the nonparametric case, we compute Ûki = ˆF k X (BW ki GA i ), where ˆF k X obtained by (1.9). The cross-validated likelihood (1.10) is used to tune the bandwidths used for estimating the copula as well as the marginals, and the tuning parameter λ x is chosen by BIC (1.10). To visualize the marginal transforms, the scatterplots with marginal histograms of BW ki are shown in Figure 1.3(a). In contrast, the transformed data, Ûki, using the parametric and nonparametric marginals are given in Figure 1.3(b) and 1.3(c), respectively. We perform the proposed penalized estimation under three common Archimedean copula families: Clayton, Frank, and Gumbel. For comparison, we also obtain unpenalized local linear estimates using the parametric and nonparametric marginals, respectively. The copula parameter function is expressed in the form of Kendall s tau to have the same scale across different copula families, shown in Figure 1.4, along with the 95% bootstrap confidence bands. It is interesting to see that, in all cases, the penalized es-

Chapter 1. Interpretable Dependence Calibration in Conditional Copulas26 (a) (b) (c) Figure 1.

using the parametric (middle) and nonparametric (right) marginal estimates, respectively.

34 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas26 (a) (b) (c) Figure 1.3: Scatterplots and histograms for the original twin birth weights (left) and the transformed responses using the parametric (middle) and nonparametric (right) marginal estimates, respectively. timation features a constant dependence relationship between the twin birth weights, given the gestational age around 34 to 36 weeks. Such a phenomenon has not been revealed by other methods in previous studies. The copula selection is conducted using a two-fold CVPE and favours the Clayton family in both marginal settings. It is also

35 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas27 Figure 1.4: The Kendall s tau of estimated copula parameter functions under three Archimedean couples: Clayton, Frank and Gumbel. The top panels correspond to parametric marginals, and the bottom panels correspond to nonparametric marginals. In each panel, shown are the penalized estimate (solid) with 95% bootstrap confidence bands (dotted), as well as the unpenalized estimate (dotdashed). noted that, based on 5-fold cross-validation using 20 random repetitions, the penalized estimates achieve higher likelihoods compared to the unpenalized estimates in all three families.

36 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas Proofs of Main Theorems Proof of Theorem 1. Recall a n = P λ ( β 01 ) and b n = (nh) 1/2, let α n = b n + a n. With a slight abuse of notation, let l i ( β) = l(g 1 { η(x, X i )}; u 1, u 2 ) u1 =U 1i,u 2 =U 2i and similarly for l i( β), l i ( β), l i ( β) and l i1,s ( β) for s = 1, 2, i.e., suppress the last two arguments when no confusion arises, where η(x, X i ) = s 1 x β 0 + rx 1 β 1 (X i x). Denoting Q( β) = Q( β; x), we aim to show that for any ɛ > 0, there exists a sufficiently large constant C such that P { sup Q( β 0 + α n v) < Q( β 0 )} 1 ɛ, (1.12) v =C which implies β λ β 0 = O p (α n ). Denote L( β) = n l i( β)k h (X i x)/k h (0), Q( β 0 + α n v) Q( β 0 ) L( β 0 + α n v) L( β 0 ) + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = {l i ( β 0 + α n v) l i ( β 0 )} + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = 1 l i( β 0 ) K h(x i x) (X i x) r K r=0 h (0) r x I(r = 1) + s x I(r = 0) α nv r + 1 { 1 1 l i ( β 2 0 ) K h(x i x) (X i x) r (X i x) s K r=0 s=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) } αnv 2 r v s + 1 { l i (β ) K h(x i x) (X i x) r (X i x) s 6 K r=0 s=0 t=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) (X i x) t } r x I(t = 1) + s x I(t = 0) α3 nv r v s v t + m x {P λ ( β 01 + α n v 1 ) P λ ( β 01 )} = A 1 + A 2 + A 3 + A 4, where β lie between β 0 and ( β 0 + α n v). We now show A 1 = O p (nhα 2 n) v, A 2 O p (nhα 2 n) v 2, A 3 o p (nhα 2 n) v 2 and A 4 o p (nhα 2 n) v 2. Thus A 2 dominates

37 Chapter 1. Interpretable Dependence Calibration in Conditional Copulas29 other terms and (1.12) holds. One has { 1 A 1 = l i( β u ) K h(x i x) (X i x) r } K r=0 h (0) r x I(r = 1) + s x I(r = 0) α nv r (1 + o p (1)) + 1 { 1 1 l i (β ) K h(x i x) (X i x) r (X i x) s 2 K r=0 s=0 h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0) ( β us β } 0s )α n v r = A 5 + A 6, where β lies between β 0 and β u. It suffices to show A 5 = 0 and A 6 O p (nhα 2 ) v, such that A 1 O p (nhα 2 ) v. As β u is the minimizer of L( β), then L( β)/ β β= βu = 0, i.e., A 5 = 0. For the term A 6, denote M irs = l i (β ) K h(x i x) (X i x) r (X i x) s K h (0) r x I(r = 1) + s x I(r = 0) r x I(s = 1) + s x I(s = 0). Using standard Taylor expansion, one can show that m x = O p (nh), s x = O p (1), r x = O p (h), (1.13) and for r = 0, 1, 1 n K h (X i x) X i x r = O p (h r ), 1 n K h (X i x) X i x r+2 = O p (h r+2 ).(1.14) Furthermore, using Theorem 1 in Acar et al. (2011), we have β u0 β 00 = β u1 β 01 = O p (b n ), where β u = ( β u0, β u1 ) denotes the unpenalized local linear estimator of β. Then A 6 = r=0 s=0 1 M irs α n ( β us β 0s )v r

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and